View Single Post
  #2  
Old June 14th 18, 10:01 AM posted to alt.comp.os.windows-10,alt.windows7.general,alt.comp.os.windows-8
Paul[_32_]
external usenet poster
 
Posts: 11,873
Default External hard drive advice please

Chris wrote:
Wolf K wrote:
On 2018-06-13 14:56, mechanic wrote:
On Tue, 12 Jun 2018 20:21:53 -0400, Wolf K wrote:

Probability of failure of all three at once will be 1/9th of
probability of failure of any one of them, which IMO is low
enough. :-)
Eh?

If I remember probability theory correctly, then if each device has the
same probability of failure, then the probability that two will fail at
the same time is (1/2)^2. If there are three, it will be (1/3)^2. And so
on. This is the reason that a RAID system is more reliable than any of
the drives in it.

If I've misremembered the probability math, kindly correct it (and save
me the work of checking it myself. :-) )


It's not quite right. The probability of three independent disks all
failing at the same is the cube of the individual probability: P(A)^3


In addition to the P cubed thing, there's also a
slightly more refined analysis you can do.

It takes into account the MTTR (Mean Time To Repair).

The idea is, when one disk fails, you buy another, and
spend time transferring the results to the new disk.
The system has a window of exposure, while one disk
is out of commission. The longer the time to resolve
this, the more of a factor it could represent to a
complete loss of service. Repairing (replacing) a single disk
when it fails, raises the availability of the system.

The result then, is a state diagram or other kind of
analysis model, that takes all the states, their probabilities
into account.

(Has some illustrations about carrying out such an analysis)

http://www.engr.usask.ca/classes/EE/...es/notes11.pdf

The duration of the MTTR is important too, as it affects the
possibility of complete failure. In the systems we developed
at work, this value was set to 72 hours, representative
of humans "taking the long weekend off" and not realizing
something was broken.

In the real world, frequently duplicated systems have
sufficient redundancy for "normal" kinds of requirements.
Going triplicated, you have to watch for unexpected factors
having a higher probability of happening, than the
narrow set of things you're studying. For example, with
three disks, say the AC power goes off and you're
denied access to the data (for a short time). That could
represent an event of importance if you absolutely
needed access to the data at all times (say it was
a long list of family phone numbers or something).

I was hoping to find a worked case of a parallel triplicated
system, but no such luck.

Paul
Ads