If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#1
|
|||
|
|||
What happens when a RAID gets out of sync?
I have two drives in a RAID1 configuration using Windows software RAID.
After the problems I had with my external drive last week, I have taken to checking Disk Monitor more frequently, and today it told me that the two drives had failed sync. What happens with writes when the drives are out of sync. Are the writes still going to both drives? |
Ads |
#2
|
|||
|
|||
What happens when a RAID gets out of sync?
Tim wrote:
I have two drives in a RAID1 configuration using Windows software RAID. After the problems I had with my external drive last week, I have taken to checking Disk Monitor more frequently, and today it told me that the two drives had failed sync. What happens with writes when the drives are out of sync. Are the writes still going to both drives? When an array is degraded, the writes only go to one disk. I tried my hand at experimenting with this last year, removing a drive, and pretending to replace the drive with a "new" drive. I think I had to mirror the partitions one at a time, which is not encouraging as it suggests the MBR on one of the disks doesn't have boot code. I don't know whether Windows is clever enough to clone extra materials when dealing with boot OSes or not. Maybe I got lucky when I did this and should have damaged the other disk instead, as part of the test. http://al.howardknight.net/msgid.cgi...nt-email.me%3E You could do a backup of "the part that is booting right now", as a representation. But if you did that with Macrium, it probably would not be smart enough to back that up as an ordinary drive, and instead also backed up the dynamic disk metadata. You could do a backup, then restore the backup to a new drive in a partition by partition fashion, then use the Macrium boot repair (from the CD) before booting that hard drive. That should allow for changing from software RAID1 back to regular single disk operation. In any case, at the current time, the OS should have some notion of the "generation" of each disk. It should know "one disk is a loser" and "one disk is a winner". The tricky part, is you the user identifying via serial number, which disk is which. You need this info when preparing suitable "hints" for yourself while working. For example, if pulling a drive and "pretending it's broken", I like to boot the Win10 Installer DVD and use the Command Prompt, use "diskpart", select the disk needing cleaning ("select disk 1"), then issue a "clean all" as that erases a drive from end to end, and is suitable for software RAID. For hardware RAID erasure, you take the disk drive to a computer that has a different chipset, and it ignores the metadata, and then you deal with erasing the disk on that system. If you have two storage controllers, one hardware RAID capable, the other a different brand, you can move the disk over to that port and do the erasure step there. I like to put an erased disk back in the machine, as it makes it easier to see which drive is resyncing to which other drive later. Since the above article was done with virtualized disk, I decided to proceed "head-long" into the mess, and just plow through it without artificially leaving breadcrumbs for myself, and Windows seems to have left enough hints that I recovered. When you are this clever, you should own a couple spare hard drives of sufficient size for triage. At some point, you will scare yourself enough to be doing a backup, and if something nasty happens (it shouldn't), you still have files. If you want, you can compare the dates on the files on the disks. Now, what environment will you use to do that ? The partition table is likely to have a single entry 0x42 dynamic disk, so the partition isn't "marked" for easy perusal (no 0x07 like a regular disk). Over in Linux, you measure the offset in sectors to the beginning of the partition in question, then do an NTFS loopback mount and apply an -offset type option to it, and it will then bring up the NTFS so you can look at it. You'd want to use -o ro,... to make the partition read-only so that no damage can happen while poking around. I can guarantee you a "good time". Now, how would I know about this, unless I've had some good times my own self :-/ One of my chief regrets on that mission, was not writing down the stupid command for future usage. One thing I discovered, is the offset supports 64 bit numbers and can be in hex format like 0x1000000000000000. The following example is in decimal. sudo mkdir /mnt/somepartition # Mount a C: partition I know is using a 1MB offset from the origin... # If you worked out the offset properly, you see "NTFS" in your hex # editor at that location. sudo mount -t ntfs -o loop,ro,offset=1048576 /dev/sda /mnt/somepartition cd /mnt/somepartition ls -algtR # list the entire disk, have a look around sudo umount /mnt/somepartition # cleanup steps sudo rmdir /mnt/somepartition # you're doing manual mount point management # because no automounter would ever find such # a partition. No clicky clicky on this one. HTH, Paul |
#3
|
|||
|
|||
What happens when a RAID gets out of sync?
Tim wrote in
. 28: I have two drives in a RAID1 configuration using Windows software RAID. After the problems I had with my external drive last week, I have taken to checking Disk Monitor more frequently, and today it told me that the two drives had failed sync. What happens with writes when the drives are out of sync. Are the writes still going to both drives? This happens to be a data only software raid, drive D:\. Right now I selected 'Reactivate Disk', and Disk Manager tells me they are resyncing. I quess my question is how does Windows know which version of a file to resync to the other drive. Is it as simple as Last Date Modified, or more complex. As far as my simplex C:\ drive, I image that about once a week, more often if I add programs, and after every Microsoft update. |
#4
|
|||
|
|||
What happens when a RAID gets out of sync?
Tim wrote:
Tim wrote in . 28: I have two drives in a RAID1 configuration using Windows software RAID. After the problems I had with my external drive last week, I have taken to checking Disk Monitor more frequently, and today it told me that the two drives had failed sync. What happens with writes when the drives are out of sync. Are the writes still going to both drives? This happens to be a data only software raid, drive D:\. Right now I selected 'Reactivate Disk', and Disk Manager tells me they are resyncing. I quess my question is how does Windows know which version of a file to resync to the other drive. Is it as simple as Last Date Modified, or more complex. As far as my simplex C:\ drive, I image that about once a week, more often if I add programs, and after every Microsoft update. The RAID driver has a chance to observe which device failed. And mark the working device as an "orphan", and at the same time keep track of the "failed" partner that goes with it. If the "failed" partner comes back with different information, it could be a time stamp on the metadata or something like it, keeps track. You should be able to split an array, run the driver against each disk individually, and convince each disk "it is the boss". All this will achieve, is two "degraded arrays" having different drive letters or so, and then the human will have to resolve the issue by torching one disk, and syncing back the good disk. But for actual, honest to God disk failures, the software should do a good job of keeping the functional copy and its files up to date. It's only if you craft an artificial trap with some care, that you can create a "puzzle" for yourself. And yes, there is such a failure mode, as a RAID mirror with "good status", that hasn't actually been mirroring for the last three months. One thing that strikes me, is the lack of audits. There should be scrubbing going on at the driver level, searching for sync loss. Even if such an audit contributed nothing to actual file recovery, at least it could give a prompt, warning of an "impossible" failure. Best guess, Paul |
Thread Tools | |
Display Modes | Rate This Thread |
|
|