PCbanter - View Single Post

**Paul[_32_]** · #70 May 21st 18, 10:26 PM posted to alt.comp.os.windows-10

Jimmy Wilkinson Knife wrote:
On Mon, 21 May 2018 13:07:45 +0100, Paul wrote:

They "fixed" this in a firmware update, by having the
drive re-write the cells after three months (equals
degraded wear life and shortens the life of the drive).

As long as someone doesn't try to use it as long term storage and
doesn't plug it in for 6 months. Or does it stay put if switched off?

The leaking on that device, was independent of powered state.
The idea is, all the cells leak. But the sectors that are
in usage, and are "data at rest", they are slowly degrading
with time, and requiring more microseconds of error correction
by the ARM processor, per sector.

In fact, doing some math the other day, I figured out
it was costing me $1 to write a Flash drive from
one end to the other.

That can't be right. Are you claiming a $100 drive can only be written
completely 100 times?

That was the figure for the drive I bought.

Actually I've had terrible trouble with hard drives but never ever had a
single SSD fail, apart from OCZ **** that I very quickly stopped using.

The number of hard drives that either overheated or just started clicking.

I lost a couple Maxtor 40GB, which went south very quickly.
(From clicking to dead, takes a single day.)

I lost a Seagate 32550N 2GB, when the head lock jammed
at startup, the arm tried to move anyway, and it ground
the heads into the platter like a cigarette butt. And
the most wonderful "clock spring" noise came out of
the drive. They don't make head locks like that any more
(huge solenoid, looked out of place in the drive). There
was a gouge in the platter.

There are just a few flash drives, that are huge and
the interface happens to be slow. There's a 30TB one,
you can continuously write it at the full rate, and it
is guaranteed to pass the warranty period :-)
So that would be an example of a drive, where
a lab accident can't destroy it. Because it
can handle the wear life of writing continuously
at its full speed (of maybe 300 to 400MB/sec).
If the 30TB drive was NVMe format, and ran at
2500MB/sec, it might not be able to brag about
supporting continuous write for the entire warranty
period. You might have to stop writing it once in
a while :-)

That would be a ****ing busy server to write that much data. And if you
had such a server, you'd most likely need way more storage space, so
each drive wouldn't be in continuous use.

I think that 30TB drive is a wonderful drive from a
"cannot be abused" perspective. And I think it follows
a 5.25" form factor too, and holds 30TB. It's chock full
of chips. The average user isn't going to like the
speed though. Too many people have been spoiled by
NVMe speeds.

*******

Back to your SMART table for a moment...

Apparently the SMART table definitions overlap. Obviously,
an SSD doesn't have a "data address mark". And a HDD, while it
does have a notion of "terabytes write class" and a gross notion
of wear life, it isn't measured as such. I don't think
any HDD has a place to put that info on a HDD SMART.
The info is undoubtedly inside the drive somewhere, just
not something you'd find in HDD SMART.

202 Percentage Of The Rated Lifetime Used in your SSD === SSD Param
202 Data Address Mark Errors === HDD Param

If your SMART tool is an older one, it will use the older
definition. HDTune 2.55 (free version, now ten years old),
doesn't know anything about SSDs. This is why I recommended
the usage of the SSD Toolbox software, which may be available
on your SSD manufacturer site. The SSD Toolbox should be using
an SSD SMART table definition.

Data Address Mark errors, value 18, worst 18, warn 0, raw 000000000052

Consult the Toolkit for that SSD, and verify the Lifetime used
is 52%. That means roughly half the wear life is exhausted
(which is independent of how many sectors are spared out).

There is one brand where that parameter is very dangerous.
If you have an Intel drive, it stops responding when
the drive is worn out, as measured by Flash cell write
cycles. Other brands continue to run. In one case,
a drive was able (during a lifetime test), to exceed the
Health value by many times, before the sparing eventually
exhausted the spares pool. When the cells wear out, more
sectors will need to be spared, so the sparing rate
at some point will accelerate. Sometimes, it might be
a power failure, while in that state (lots of sparing),
that results in the drive being killed and no longer
responding. There might actually be some spares
left, when one of those "way over the top" SSDs die
on you.

But the Intel response is a "no mercy" response. Intel
wants you to back up your Intel SSD every day, so that
you can "laugh" when your SSD bricks itself. Now the
nice thing about such a behavior, is now you can't
even check the SMART table to see what happened :-/
Some drives signal their displeasure by reading but
not writing, and by remaining in a readable
state, it's up to the user whether they actually
"trust" any recovered data. The ECC should be able
to indicate whether sectors are irrecoverably bad or
not, so reading in such a state really shouldn't
be a problem.

But the Intel policy sucks, especially when the
typical "I could care less" class of consumer isn't aware
what their policy is on Health. I've only caught hints
of this, in some SSD reviews.

*******

A great series of articles, were the ones where they kept
writing to a series of drives, until they had all failed.
The article here also mentions in passing, what some of
the end of life policies are. It's possible the
Corsair Neutron in this article, was the MLC version,
while the one I bought was suspected to have TLC (as it
disappeared from the market for several months and
then "magically reappeared").

https://techreport.com/review/27909/...heyre-all-dead

The TLC drive with the bad "data at rest" behavior, that
might have been a Samsung.

There's nothing wrong with charge draining off the cells,
as long as the engineering is there to include an ECC
method that ensures readable data for ten years after
the write operation. The issue wasn't a failure as such,
since the data was still perfectly readable - it was
the fact the drive was slow that ****ed people off. When
these companies use the newest generation of "bad" flash,
it's up to them to overprovision enough so the
user doesn't notice what a crock they've become.
You see, they're getting ready to release QLC,
which is one bit more per cell than TLC. The TLC
was bad enough. What adventures will QLC bring ?

Paul