If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. 


Thread Tools  Rate Thread  Display Modes 
#1




External hard drive advice please
Wolf K wrote:
On 20180613 14:56, mechanic wrote: On Tue, 12 Jun 2018 20:21:53 0400, Wolf K wrote: Probability of failure of all three at once will be 1/9th of probability of failure of any one of them, which IMO is low enough. :) Eh? If I remember probability theory correctly, then if each device has the same probability of failure, then the probability that two will fail at the same time is (1/2)^2. If there are three, it will be (1/3)^2. And so on. This is the reason that a RAID system is more reliable than any of the drives in it. If I've misremembered the probability math, kindly correct it (and save me the work of checking it myself. :) ) It's not quite right. The probability of three independent disks all failing at the same is the cube of the individual probability: P(A)^3 If the individual probability is (not at all realistic) 0.1 (10%), then the probability of all three is 0.001 (0.1%). RAID is a bit complicated as the probability of a failure is highly correlated. Firstly, the potential risk failure is if *any* of the disks fail, which is the sum of the probabilities which for a 5 disk array is 5 x 0.1 = 0.5 (50%). Again not realistic. Probability increases with age of the disk, so a 5 year old drive is more likely to fail than a brand new one. This is where correlated failures occur, particularly with RAID5, as when a disk fails the array needs to be rebuilt putting a large strain on the existing (likely old) disks which can cause another one of them to fail. The array is now dead and unrecoverable which is one reason why RAID5 is not recommended. In terms of raw disk failure probabilities RAID arrays are no more reliable than separate disks, however, the redundancy and checksums allow for seamless recovery from failures. This is why a RAID is not a backup. 
Ads 
#2




External hard drive advice please
Chris wrote:
Wolf K wrote: On 20180613 14:56, mechanic wrote: On Tue, 12 Jun 2018 20:21:53 0400, Wolf K wrote: Probability of failure of all three at once will be 1/9th of probability of failure of any one of them, which IMO is low enough. :) Eh? If I remember probability theory correctly, then if each device has the same probability of failure, then the probability that two will fail at the same time is (1/2)^2. If there are three, it will be (1/3)^2. And so on. This is the reason that a RAID system is more reliable than any of the drives in it. If I've misremembered the probability math, kindly correct it (and save me the work of checking it myself. :) ) It's not quite right. The probability of three independent disks all failing at the same is the cube of the individual probability: P(A)^3 In addition to the P cubed thing, there's also a slightly more refined analysis you can do. It takes into account the MTTR (Mean Time To Repair). The idea is, when one disk fails, you buy another, and spend time transferring the results to the new disk. The system has a window of exposure, while one disk is out of commission. The longer the time to resolve this, the more of a factor it could represent to a complete loss of service. Repairing (replacing) a single disk when it fails, raises the availability of the system. The result then, is a state diagram or other kind of analysis model, that takes all the states, their probabilities into account. (Has some illustrations about carrying out such an analysis) http://www.engr.usask.ca/classes/EE/...es/notes11.pdf The duration of the MTTR is important too, as it affects the possibility of complete failure. In the systems we developed at work, this value was set to 72 hours, representative of humans "taking the long weekend off" and not realizing something was broken. In the real world, frequently duplicated systems have sufficient redundancy for "normal" kinds of requirements. Going triplicated, you have to watch for unexpected factors having a higher probability of happening, than the narrow set of things you're studying. For example, with three disks, say the AC power goes off and you're denied access to the data (for a short time). That could represent an event of importance if you absolutely needed access to the data at all times (say it was a long list of family phone numbers or something). I was hoping to find a worked case of a parallel triplicated system, but no such luck. Paul 
#3




External hard drive advice please
On Thu, 14 Jun 2018 08:39:54 0000 (UTC), Chris wrote:
It's not quite right. The probability of three independent disks all failing at the same is the cube of the individual probability: P(A)^3 Yes, that's what I had in mind  although the assumption of independence must be questionable, that's what did for Three Mile Island, as the enquiry revealed (if I'm remembering correctly). 
#4




External hard drive advice please
In article , Chris
wrote: RAID is a bit complicated as the probability of a failure is highly correlated. Firstly, the potential risk failure is if *any* of the disks fail, which is the sum of the probabilities which for a 5 disk array is 5 x 0.1 = 0.5 (50%). Again not realistic. it depends on the raid. with raid 0, *any* failure loses the array. raid 0 is for speed, not redundancy. with raid 6, *two* drives can fail at the same time and no data is lost. Probability increases with age of the disk, so a 5 year old drive is more likely to fail than a brand new one. This is where correlated failures occur, particularly with RAID5, as when a disk fails the array needs to be rebuilt putting a large strain on the existing (likely old) disks which can cause another one of them to fail. The array is now dead and unrecoverable which is one reason why RAID5 is not recommended. while that's true, the reason raid 5 is not recommended is due to the chance of an unrecoverable bit error, which with the density of modern hard drives, is statistically a nearguarantee to occur, and if that happens during a rebuild, the array is lost. In terms of raw disk failure probabilities RAID arrays are no more reliable than separate disks, however, the redundancy and checksums allow for seamless recovery from failures. a raid is significantly more reliable for all sorts of reasons, other than raid 0 which is used for speed, not reliability. This is why a RAID is not a backup. no, a raid is not a backup because it's a single device. the advantage of a raid (other than raid 0) is uptime. if a drive fails, the raid stays running and users can continue to access their data. a business cannot afford downtime while a new drive is obtained and they restore from a backup. a home user probably can. 
Thread Tools  
Display Modes  Rate This Thread 

