If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#16
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
On Mon, 05 Dec 2016 13:59:31 -0500, "(PeteCresswell)"
wrote: As far as subsystems go, I am in the process of doing an HD Tune "Benchmark" test for each of the 15 drives. Only have 5 done so far, but I am pretty sure I have already found one problem drive: http://tinyurl.com/jltzr3z https://photos.google.com/album/AF1Q...3DbHoWhO-2-SyX I get a 404. -- Char Jackson |
Ads |
#17
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per Char Jackson:
As far as subsystems go, I am in the process of doing an HD Tune "Benchmark" test for each of the 15 drives. Only have 5 done so far, but I am pretty sure I have already found one problem drive: http://tinyurl.com/jltzr3z https://photos.google.com/album/AF1Q...3DbHoWhO-2-SyX I get a 404. Arrrgh.... I failed to set Permission/Album Sharing. Just as well, though because it has since dawned on me that DriveBender was running it's Drive Balancing process - which has quite a large effect on the results.... trend lines look about the same, but huge dips in access speed.... Finally figured out how to turn off the balancing, waited for DB to quiesce, and am re-running all the benchmarks. Stay tuned.... -- Pete Cresswell |
#18
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per (PeteCresswell):
I failed to set Permission/Album Sharing. Now it's set to Public. http://tinyurl.com/hvvs7un https://photos.google.com/album/AF1Q...zizgHxzYxdVL0g Note Disk04: I re-ran a couple of times and found the results replicable. To me, the SMART numbers for Disk04 and Disk05 (05 being 'Normal') do not look significantly different: http://tinyurl.com/hgddrnq https://photos.google.com/share/AF1Q...UUlFb0MtN XRB Unless I'm missing something, it would seem that the HD Tune Benchmark cuts right to the chase - that is assuming my interpretation of HD Tune's Benchmark for Disk04 as being substandard is correct.... which might be a pretty big assumption.... -) -- Pete Cresswell |
#19
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per (PeteCresswell):
To me, the SMART numbers for Disk04 and Disk05 (05 being 'Normal') do not look significantly different: http://tinyurl.com/hgddrnq https://photos.google.com/share/AF1Q...UUlFb0MtN XRB Which tells you that I don't know what I am looking at.... because DriveBender says "SMART Status = Fair" for that drive.... and "...Excellent" for all the others. -- Pete Cresswell |
#20
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
(PeteCresswell) wrote:
Per (PeteCresswell): To me, the SMART numbers for Disk04 and Disk05 (05 being 'Normal') do not look significantly different: http://tinyurl.com/hgddrnq https://photos.google.com/share/AF1Q...UUlFb0MtN XRB Which tells you that I don't know what I am looking at.... because DriveBender says "SMART Status = Fair" for that drive.... and "...Excellent" for all the others. I can "see" your SMART photo, but still haven't laid eyes on your HDTune sequential tests. I'm getting a Google login prompt for the sequential picture. That's a peculiar pattern, having 231 in Current Pending and nothing showing in Reallocated. The Current Pending are not "harvested" until a write attempt on the damaged sector is made. If you have a "quiet" computer to test on, then the stuff you see in HDTune sequential transfer test could be "real". To allow all the Current Pending to be processed, I would use "dd" to transfer all the sectors to a file. Then use "dd" to write them back. I don't know if there's a nice GUI utility to do that or not. I had to "perk up" a 2TB drive here a few weeks ago, and it took at least 10 hours. Tools like ddrescue can sometimes get the job done a tiny bit faster, if your copy of dd simply cannot run the drive at full speed. You have to make a selection of block size (for "dd"), and as a purist, I like to make sure the block size divides evenly into the drive size in bytes. A value of 8192 works in most cases. On older drives, I might get a better result with 8192*27 = 221184 byte block size. What should happen after "re-writing" the disk, is the Current Pending should drop to zero, and some portion of them could end up in Reallocated. Current Pending goes "up and down" because it's a queue of work to do, whereas Reallocated only goes "up". The HDTune transfer curve will "look better" after perking up the drive, if the new reallocations are within the same track as the bad sectors they replace. And that's why it "Perks Up" in some cases. If a spare sector was on a different cylinder, the transfer curve would remain dreadful. The count of 231 does show "trouble" is brewing, and if your HDTune shows a bad spot, then that could be affecting your read bandwidth, or, require the disk to take as long as 15 seconds to read a sector in that area. Only after perking the drive up, and taking a look at both the benchmark and the SMART again, would I make a final judgment on whether it's replacement time. Macrium has an Exact Copy option. You could back up the 2TB drive to a larger drive, and store the output as an MRIMG. Then turn around and use the Exact Copy to put all the sectors back. But as I've noted before when it comes to Backup/Restore utilities, many of them lie about what they're doing. When Acronis did an Exact Copy and it only took ten minutes, I knew right away, it wasn't reading all the sectors. And consequently was not an Exact Copy. Only if it takes hours to run (as many hours as it takes HDTune to do a bad block scan), would I be satisfied that it "might be Exact". And this is why I use "dd" for this stuff, because it never lies about the only thing it does (well). http://www.chrysocome.net/dd http://www.chrysocome.net/downloads/dd-0.6beta3.zip Utilities like "dd" need an Administrator Command Prompt window. The Windows "dd" sometimes has permissions problems for individual partitions. But your operation in this case, is an entire disk, signified by using "Partition0". The Partition0 identifier means "the whole disk starting at Sector 0". And you should have access to that. dd if=\\?\Device\Harddisk3\Partition0 of=T:\WORK\2TB_sized_file.dd bs=8192 Five hours later... dd if=T:\WORK\2TB_sized_file.dd of=\\?\Device\Harddisk3\Partition0 bs=8192 If the first command stopped on a read error, then you'd be looking at ddrescue, or looking at your backup collection, and so on. My assumption in all this, is the drive is still healthy enough to pass a read scan with no errors. Paul |
#21
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per Paul:
but still haven't laid eyes on your HDTune sequential tests. I'm getting a Google login prompt for the sequential picture. Try this one: http://tinyurl.com/hz8ubf4 Moved to a different album - which others have been able to see. -- Pete Cresswell |
#22
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
(PeteCresswell) wrote:
Per Paul: but still haven't laid eyes on your HDTune sequential tests. I'm getting a Google login prompt for the sequential picture. Try this one: http://tinyurl.com/hz8ubf4 Moved to a different album - which others have been able to see. Based on how quiet the other drives are (no evidence if interference from background processes), I'd say Disk_04 has issues. You could swap in a drive and rebuild. Then take the flaky disk to another computer and work on it. If you do it that way (rebuild with a new drive), you can use "diskpart" "Clean All" command while the old disk is selected, and write zeros from end to end. That would be an example of a way to get the Current Pending processed and disposed of one way or another. Then try the HDTune benchmark as well as examine the SMART tab and see how it looks. Under normal circumstances you would expect most of those to be turned into Reallocated sectors. And with any luck, there will be enough that the Health will drop from 100% to 98%, and you can do the math and figure out how many spares are left. Sometimes, you can have a few Reallocated showing, the Health remains 100%, so you can't figure out that there are around 5000 remaining. ******* Rewriting the disk using the data contents already on it, would take longer as a revival technique, unless you can find a utility meant for the job. Doing it with "dd" is clumsy but doesn't require writing any code. Paul |
#23
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per Paul:
You could swap in a drive and rebuild. Then take the flaky disk to another computer and work on it. Pulled the disk. Re-formatted it on my Windows-7 PC - without "Quick Format". Took a couple hours, but now SMART seems to be saying there are no problematic sectors and Hard Disk Sentinel says "The hard disk status is PERFECT. Problematic or weak sectors were not found and there are no spin up or data transfer errors." -- Pete Cresswell |
#24
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
(PeteCresswell) wrote:
Per Paul: You could swap in a drive and rebuild. Then take the flaky disk to another computer and work on it. Pulled the disk. Re-formatted it on my Windows-7 PC - without "Quick Format". Took a couple hours, but now SMART seems to be saying there are no problematic sectors and Hard Disk Sentinel says "The hard disk status is PERFECT. Problematic or weak sectors were not found and there are no spin up or data transfer errors." All I can tell you, is the standard recipe. 1) Check SMART for Current Pending or Reallocated. 2) Benchmark the drive again. Are the spikes gone ? If both pass, you can put the disk back into service. I've had this happen here a couple of times, and re-writing the disk seems to perk things up. But you know that the automatic repair system inside the disk, it has no "undo" for repairs, so it will eventually use up the spare sectors. Only SCSI allows removing Grown defects and starting with the Factory defect list and allowing the drive to redetect the bad sectors. That was a pretty good scheme, but wouldn't be all that practical with 4TB sized disks. (The list would be huge.) Paul |
#25
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per Paul:
2) Benchmark the drive again. Are the spikes gone ? If both pass, you can put the disk back into service. I've had this happen here a couple of times, and re-writing the disk seems to perk things up. But you know that the automatic repair system inside the disk, it has no "undo" for repairs, so it will eventually use up the spare sectors. Only SCSI allows removing Grown defects and Yes... spikes are gone. I put it back into service - but am ready for a future failure when the extra sectors are used up. -- Pete Cresswell |
#26
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per (PeteCresswell):
Yes... spikes are gone. I put it back into service - Oops. Looks like I lied about the spikes. They are much less heinous, but some are still there and the pattern can be replicated: http://tinyurl.com/h3sm3ak Just to make sure I have this right: are the sectors of the disk read in the same sequence for each iteration of the test? Seems like formatting helped somewhat - since the spikes were all over the tests before and only towards the ends now... ?? -- Pete Cresswell |
#27
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
(PeteCresswell) wrote:
Per (PeteCresswell): Yes... spikes are gone. I put it back into service - Oops. Looks like I lied about the spikes. They are much less heinous, but some are still there and the pattern can be replicated: http://tinyurl.com/h3sm3ak Just to make sure I have this right: are the sectors of the disk read in the same sequence for each iteration of the test? Seems like formatting helped somewhat - since the spikes were all over the tests before and only towards the ends now... ?? Was that test spoiled by some OS activity ? Re-run the test. If those spikes are for real, that disk is "retired"... At least at my house it would be. I have days here, where I have a hell of a time getting a "quiet" test done, so the OS doesn't screw it up. I've even had days where the OS scared the **** out of me, by showing me test results that looked like I was in immediate danger of data loss. And moving the drive to another machine, different OS, and it's all back to normal. Don't toss the drive until you're thoroughly convinced the results are real. If it really does look like that, I'd retire it. Those spikes are a little too wide. Your software array might have redundancy, but you can't run an array like that with all the disks being spike-monsters. As more than one drive could drop out on you when you aren't watching it. The bad part would be, if frequently access info happens to live at the bottom of one of those spikes :-) Paul |
#28
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per Paul:
Was that test spoiled by some OS activity ? That's what I was hoping, but managed to replicate it 3x with spikes in the same place (by memory...not screen shots). Re-run the test. I'll give it a few more tries and save a screen snap just to make sure I've even had days where the OS scared the **** out of me, by showing me test results that looked like I was in immediate danger of data loss. And moving the drive to another machine, different OS, and it's all back to normal. My NAS box just did that to me yesterday: declared a 3-TB drive unfit for duty and took it out of the RAID-6 array. Formatted it on my 24-7 PC, could not find anything wrong... so I replaced one of the 2-TB RecordedTV drives on the 24-7 PC with it. Don't toss the drive until you're thoroughly convinced the results are real. If it really does look like that, I'd retire it. Those spikes are a little too wide. Your software array might have redundancy, but you can't run an array like that with all the disks being spike-monsters. As more than one drive could drop out on you when you aren't watching it. The bad part would be, if frequently access info happens to live at the bottom of one of those spikes :-) -- Pete Cresswell |
#29
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
Per Paul:
Was that test spoiled by some OS activity ? I've got one drive that *seems* slightly different from the rest. "The Rest" have benchmarks where the graph is much tighter as in http://tinyurl.com/hz8ubf4 The suspect drive's graphs aren't all that out of wack, but they are noticeably different: http://tinyurl.com/h43jkcs Most of those were done more-or-less back-to-back on the same machine. Assuming it is not system activity, are they telling you anything? -- Pete Cresswell |
#30
|
|||
|
|||
New CU/Mobo, Same System: Any Hope ?
(PeteCresswell) wrote:
Per Paul: Was that test spoiled by some OS activity ? I've got one drive that *seems* slightly different from the rest. "The Rest" have benchmarks where the graph is much tighter as in http://tinyurl.com/hz8ubf4 The suspect drive's graphs aren't all that out of wack, but they are noticeably different: http://tinyurl.com/h43jkcs Most of those were done more-or-less back-to-back on the same machine. Assuming it is not system activity, are they telling you anything? Ouch! I especially liked the drive that reported a "burst rate of 5MB/sec". Um, OK. Of all the things that should work on a disk, the cache RAM should work. Which is what the burst rate test is supposed to be testing. I would say there's a rather severe interference there. Look at the "snow shower" of seek times on some of those plots. Maybe you're minting Bitcoins in the background ? :-) (That's a reference to malware that does Bitcoin mining using your CPU as a host.) I think the nicest looking plots I've got here, were collected on a Core2 Duo running on a VIA chipset, with Win2K as the OS. But it's been some time since I've seen HDTune results as bad as yours in my computer room here. One thing that can interrupt a computer, is the BIOS SMM. That's System Management Mode. The BIOS can deliver an interrupt (SMI) to the OS, and the OS is: 1) Incapable of logging it. 2) Cannot detect that it happened. It has a higher priority than the OS. The Asus iPanel tray display, used to hook up to an actual SMI header plug on the motherboard, and the BIOS would run code to make the iPanel work. But on more modern platforms, the companies are more content to run TPU or EPU using SMM. So if the number of phases enabled on VCore needs to be adjusted, the BIOS could be doing that 30 times a second. If the SMM time is "more than microseconds", it can make a motherboard unusable as an audio recording workstation. The magnitude of your problem is many times that of an SMM problem. So we would have to cast around for reasonable excuses for your results. Runaway hardware ? Interrupt storm ? There are various indirect ways of detecting some of these things. For example, with older OSes, we could run "DPCLat" to measure DPC latency processing, and detect that SMM was screwing up the timely servicing of Deferred Procedure Calls. If this happened in my computer room, I'd sooner move the disk to a machine I trust, than try and figure out the root cause. It might not be that easy to figure out. Paul |
Thread Tools | |
Display Modes | Rate This Thread |
|
|