Curse Intel and the RAID they rode in on!

mike123abc

Too many cables
Original poster
Supporting Founder
Sep 25, 2003
25,354
4,595
Norman, OK
WARNING: Sleep deprived rant follows:


I had a hard drive go out. The computer goes into a funny state and corrupts my RAID.

My computer had C: as a normal disk, then I had D: as a 3 disk RAID5.

My C: disk goes out partially, I assume it was a head crash over part of the disk because it can read part of the disk, the rest it cannot, I tried it in another machine, same problem. It is readable enough to start windows but not fully boot up.

Well windows gets partially up and you know how windows is, it loves to write on every disk to say hello I am here, I am windows, and you love me. I did not boot enough to load the Intel raid driver, but I see you raw empty spacious disk, let me mark you so I know you next time I see you... You are mine and you love me!

My machine was probably stuck in a boot blue screen cycle while I was gone for a couple days (bye bye tons of FAH points). Along the way it corrupted the RAID5. Now 2 of the 3 disks are marked not raid. Intel has no way to recover from this situation. The disks are intact, but they cannot be put back into the array. Probably 1 sector is messed up the other 750GB on each disk is intact.

I got a recovery program off the net, tested it out and it worked, so had to buy it to get the full version. It took it all of 10 seconds to recover the array, but of course it is demo so it will not recover unless you pay.

I guess the moral of this story is get an external standalone raid box so that windows cannot get in there and muck with it and hope it is more robust than the ZERO recovery options of the Intel raid system.

I should have known the "free" raid setup on motherboards was useless. Yes it works, but the first sign of trouble you are left holding the bag.
 
WARNING: Sleep deprived rant follows:


I had a hard drive go out. The computer goes into a funny state and corrupts my RAID.

My computer had C: as a normal disk, then I had D: as a 3 disk RAID5.

My C: disk goes out partially, I assume it was a head crash over part of the disk because it can read part of the disk, the rest it cannot, I tried it in another machine, same problem. It is readable enough to start windows but not fully boot up.

Well windows gets partially up and you know how windows is, it loves to write on every disk to say hello I am here, I am windows, and you love me. I did not boot enough to load the Intel raid driver, but I see you raw empty spacious disk, let me mark you so I know you next time I see you... You are mine and you love me!

My machine was probably stuck in a boot blue screen cycle while I was gone for a couple days (bye bye tons of FAH points). Along the way it corrupted the RAID5. Now 2 of the 3 disks are marked not raid. Intel has no way to recover from this situation. The disks are intact, but they cannot be put back into the array. Probably 1 sector is messed up the other 750GB on each disk is intact.

I got a recovery program off the net, tested it out and it worked, so had to buy it to get the full version. It took it all of 10 seconds to recover the array, but of course it is demo so it will not recover unless you pay.

I guess the moral of this story is get an external standalone raid box so that windows cannot get in there and muck with it and hope it is more robust than the ZERO recovery options of the Intel raid system.

I should have known the "free" raid setup on motherboards was useless. Yes it works, but the first sign of trouble you are left holding the bag.

There's a reason it's called "fakeraid". It's only really useful to run mirrored drives. (raid1)
 
How much is Intel's fault, and how much Window's? Not sure there's just one bad guy here.
 
How much is Intel's fault, and how much Window's? Not sure there's just one bad guy here.

90% Intel, they make real raid hardware for server level boards and then with a bios trick (fakeraid) allow the software (windows DLL) to think it's talking to a real controller on some entry level MB.

It's much better (linux software raid) that the software knows there's nothing behind the mask.

Intel® RAID Controllers
 
How much is Intel's fault, and how much Window's? Not sure there's just one bad guy here.

They really are both at fault. Intel should not have ever allowed a random write to the device outside of the raid context. MS really does not need to put its stamp of approval on any storage device it happens to find, it really should ask.

As an update 5 hours later about 50% of the data has been copied off the raid. It is just a process. I hope to have it completely recovered by tonight.

I am now considering an external raid box like Promise's NS4600: Intel's Tolapai Enables Better Network Performance : Promise Updates Its NAS Platform - Review Tom's Hardware

Yes it has an Intel controller in it, but it hopefully is isolated from windows and the BIOS. It should do well with the eSATA connection, probably better since it has hardware XOR parity calculations.
 
Hardware redundancy is not the same as having a backup, as you have discovered. The ability to rebuild your RAID from the parity drive and remaining data drives should be though of as a feature to reduce system downtime, not as a substitute for a backup.
 
I had a backup, and the only thing I might have lost was a couple recent iTunes songs. What bugged me so much was that the data was just sitting there undisturbed and the Intel/MS combo had rendered it invisible.

I got an utility off the net, which quite frankly should be something that Intel provides, to read the drives as the array and copy off all the files. I did not lose any files, just a long hassle of trying out 5 demo programs off the net, and finally figuring out which one actually can do the job, and I must say it does it quite well. I paid for the program and recovered all the raid contents.

Again it is just frustrating to have Intel which is not a minor player in the raid market, have a complete lack of recovery available with their products. The only recovery it seems to have is the ability to rebuild a RAID5 system if one disk fails and the others are still in the array. If you have the drives sitting there, intact but do not think they are in the array Intel will not even consider looking at them.
 
Well I ordered the NS4600 hopefully it will keep the RAID away from windows and the lame intel matrix storage manager. I will use the drives in a raid 5 plus spare drive configuration. I considered the raid 6 solutions, but none seem to combine price/servers solutions (i.e. itunes, DLNA, etc). I have the raid completely restored on a spare disk and have removed the raid drives from the system pending the delivery of the raid box.

BTW the system C: drive was replaced by a Patriot Memory 256 GB Torqx SSD drive. So far loving the performance increase. I replaced a 150GB raptor drive (which is the one that is listed above as partial failure),

While I am at it it seems like a good time to backup the restored system to a USB HD. Lucky it still fits on an 1TB external HD. It is getting close to needing more. Moral of this saga is not to trust Intel raid on the Motherboard to keep your data
 
Last edited:
They really are both at fault. Intel should not have ever allowed a random write to the device outside of the raid context. MS really does not need to put its stamp of approval on any storage device it happens to find, it really should ask.

As an update 5 hours later about 50% of the data has been copied off the raid. It is just a process. I hope to have it completely recovered by tonight.

I am now considering an external raid box like Promise's NS4600: Intel's Tolapai Enables Better Network Performance : Promise Updates Its NAS Platform - Review Tom's Hardware

Yes it has an Intel controller in it, but it hopefully is isolated from windows and the BIOS. It should do well with the eSATA connection, probably better since it has hardware XOR parity calculations.

Nothing wrong with Intel hardware. I've got scsi systems running Megaraid (i960) cards with intel contollers that have been online 7/24 for 10+ years.
 
There are lots of people who swear that the Intel Matrix RAID (the kind you find built in the bios of various motherboards) is crap.

I've found it can be quite reliable, when paired with server-quality RAID certified drives such as the WD RE2 or RE3 series. Even quite reliable with the not quite server-quality but not quite cheap-crap-quality either drives such as the WD "Caviar Black" series, when enabled for Time Limited Error Recovery using the wdtler utility you can find. Don't risk it with the cheaper "Caviar Blue" or "Green" drives which just aren't good enough for reliable RAID and/or are too slow. (side note: the Matrix RAID tends to drop a drive from the array because it takes too long to recover from an error condition and the controller thinks it's gone AWOL. The TLER is an attempt to work around this by reducing the amount of time the drive can spend trying to fix a write error. Instead of spending 12 seconds trying to find an alternate sector to write something to, in the case of RAID it is more desirable to fail the write entirely, so the controller can handle it.)

Case in point, I've got a home server running Win2008 Hyper-V with several virtual machines. All of this running on a ECS P45T-A board which has Intel's ICH10R, I have two mirror sets: a set of WD2500YD which are the Raid Edition drives, for the OS and a few VMs that need super-fast disk; and a set of WD1001FALS "Caviar Black" which I ran wdtler on, for data storage and less disk-intensive VMs. Not a bit of trouble from them in almost a year of runtime so far.

Over the same period, I've seen a couple of machines with the older ICH7R, and various drives like the WD5000KS, also wdtler modified, which continue to drop drives out of the array and have to be rebuilt. Thankfully since I only use mirrors there has always been one of the two that was bootable and it could automatically rebuild the other from it.

By the way I learned the hard way a couple years ago never to trust Matrix RAID with a RAID-5 set. At the time, I did the math of MTBF vs array rebuild time, and determined that it was inevitable: using less than the best server-quality RAID drives in a RAID-5 set of three or more drives virtually guarantees you will sooner or later have a situation where an array that's rebuilding because one of the drives failed has a second drive failure before the rebuild can complete, which means you lose the entire array (extremely frustrating with Matrix RAID because we knew the data was all still there as it was not a drive failure but a drive dropped from the array because the controller got tired of waiting, but still it could not be recovered since the Intel ROM has no "force member disk online" option like a real RAID controller has). This was before I learned the bit about TLER, so RAID-5 might not be so dangerous, but still not something I have any desire to experiment with.
 
Last edited:
I had WD RE3 drives in the array. I copied the contents of the array to another hard drive and have that in the system now until my raid NAS arrives. My C: drive was a raptor and it is toast. Essentially parts of the drive can be read, if other parts are attempted it hangs the system a long time before failing. It keeps the drive light on during this which leads me to suspect it tied up the system so long that the MATRIX thought the other drives had failed.

At least there was software that let me reconstruct the array and read all the information off of it. I could have lived with a month old backup, but still it is nice to have everything back. I had to settle for the old backup of the C: drive. Only one program had to be reinstalled. The issue of course is that it was an "activated" program so now I have to go through the hassle of reactivating it with the company tomorrow since of course it says that it is already activated.
 
Well now have the dedicated Promis NS4600 up and running. Have raid 5 across 3 WE RE3 disks and have an automatic spare 4th disk set up, so if any of the 3 fail it automatically starts rebuilding the array.

It has a bunch of built in servers including DLNA and iTunes.
 
With 4 identical disks why not just two RAID1 setups?
If you need faster read/writes, why not RAID01 with two pairs?

Both those will offer same amount of storage and better data protection: with your setup if two drives fail, everything is gone.

Diogen.
 
With 4 identical disks why not just two RAID1 setups?
If you need faster read/writes, why not RAID01 with two pairs?

Both those will offer same amount of storage and better data protection: with your setup if two drives fail, everything is gone.

Diogen.

It does offer raid 10. I got the drives from different vendors and at different times so I hope that it does not mean they will fail about the same time. I was torn between raid 5 with automatic drive replacement and raid 10 with the ability to recover from 1 maybe 2 disks going out. I picked raid 5 with backup so that I would know if one drive failed and was able to get an automatic recovery.

Another thought was paying 2x as much for one that offered raid 6. I decided I would just back this one up every now and then.

Nothing seems to be an ideal situation at a low cost. I would like to see a raid 6 with spare solution at the low end. Just like it is so hard to find a 10GB ethernet solution that is inexpensive.
 
Nothing seems to be an ideal situation at a low cost.
You seem not to be interested in low cost too much - the raid edition WD drives you use are at least 50% more per GB than non-RE drives...:)

Regardless, with RAID it's just statistics (and what you are after).
All the configurations mentioned can survive 1 drive failure without any losses. Let's consider failure of two.

From combinatorics, with 4 drives there are in total 6 possible combinations of two: 4!/2!/(4-2)! = 24/4 = 6

Let's say the drives are A, B, C, D. The D is the hot spare in your case. Possible combinations of 2: AB, AC, AD, BC, BD, CD.
Three failures of two drives - AB, AC, BC - kill all the information you have. That makes the "fatal rate" 50% (3 out of 6 possible combinations).

Consider RAID01: AB and CD are each RAID0 by itself and RAID1 of each other (A=C, B=D).
Only two failures of two drives - AC and BD - kill all the information. That makes the "fatal rate" 33% (2 out of 6 possible combinations).

RAID6 would obviously be the best - only one pair's failure (the content drives) will kill the content i.e. "fatal rate" of 17%.

The 2 independent RAID1 setups are in a different league - you never lose everything unless all 4 drives die. Two different pair's failure can take out half the content. The rest are harmless. The "half-fatal rate" is 33%.

As a general observation: in my experience RAID5 is used most often with 5 to 8 drives. With a higher drive count RAID6 is essentially a must.

Diogen.
 
Last edited:
Well now have the dedicated Promis NS4600 up and running. Have raid 5 across 3 WE RE3 disks and have an automatic spare 4th disk set up, so if any of the 3 fail it automatically starts rebuilding the array.

It has a bunch of built in servers including DLNA and iTunes.

Took a few books of green stamps? :p

I'm just jealous.
 
You seem not to be interested in low cost too much - the raid edition WD drives you use are at least 50% more per GB than non-RE drives...:)

I guess costs are relative. This box cost $443 plus the cost of the drives. To get a box that could do raid 6 across more drives suddenly jumps in costs to the multi thousands. I would love to have a raid 6 over 5-6 drives.

I guess an alternative is build my own PC with a raid card and have it dedicated to serving the raid.

For backup of important data I use Amazon S3 service plus USB HDs. I just want a fairly reliable solution for the house that I do not have to worry about.

Essentially my solution is to have a SSD for the Windows drive, then use a hard drive in the PC for all my stuff (mostly media files, pictures, movies, etc) and mirror it over to the array. And backup every now and then to an USB drive. Mainly the USB drive is for fast recovery of windows and all the installed software.

My next worry is that my old video tapes are not going to be readable very much longer, so having them all digitized on the PC is good. The next problem is trying to prevent the loss of that data, yes I can still recover the data from tape, but for how long. Plus it is a real time affair, I do not like taking a week to read in all the movies and months scanning in all the negatives.
 
... Plus it is a real time affair, I do not like taking a week to read in all the movies and months scanning in all the negatives.
Well, you know this has to be done and there is no way around it.
Spending some time planing such projects and using the right equipment would certainly pay off.

RE: data loss. One picture doesn't need more than 10MB. A collection of 10,000 will be 100GB.
For $100 you can have 10 copies of your collection. That's cheap. And will get only cheaper.

I don't believe in value proposition of RAID setups at home.
Get the "value" priced drives (WD Green) and keep everything (archived!) in at least 2 copies. Upgrade with every second generation.

Diogen.