Lots of Database errors

Status
Not open for further replies.

yaz96

Baby, It's Cold Outside
Original poster
Dec 22, 2005
12,829
1
Front Range, Colorado
When trying to switch between pages, getting alot of Database Errors today.

And the site could not be found a few times over the past couple of days.

Thought I'd let you know.
 
The database errors were do to our ISP accidentally unplugging us from the database server when they were looking at something for me.

Since late Thursday night we have been having a problem with our web server. At times the site would just die and the screen would fill with page fault errors.

Larry suspected something wrong with one of our drives early on, but our RAID cards reported no problems... until this morning when it reported one of the drives had bad sectors on it. Now I knew which drive was going bad. I ordered a new drive and have it being sent overnight to Dallas and it will be replaced tomorrow.

Hopefully this fixes the issues we have been seeing. This is the second drive to fail on it this year. So much for server grade drives. :)
 
Yup the database errors started at 5 and were fixed as soon as I was notified of the issue. :)

The other issue as I said has been going on since Thursday night.
 
No only images are sent from the cloud servers.

The issue is not the raid server, the issue is the hard drives failing. They are mechanical they will fail. Got to love these "Server Grade" drives. :)
 
They are not big enough yet.

Besides i has had 2 SSD drives fail on me due to controller board failure... They died faster then a standard drive.
 
Yup thats true Hall... (I use SSD drives on the computer in my truck, since the vibration o f the truck quickly killed my regular hard drives.)

With a big RAID 10 aray like we have we did not know which drive was bad until the raid card reported thats one of the drives was starting to have sector errors.

Until then we couldnt do anything as we did not know which drive to replace in the array.
 
What kind of RAID card? Some allow you to check the predictive failure and SMART variables of the drives. I have some nice Nagios plugins to throw warnings when drives believe they are going to fail. When they start failing or completely fail it throws a critical.
 
They are 3WARE cards. (9500s I believe)

When the problem first started happening, LER suspected a drive or RAID issue because of all the Page Fault error we were getting. The problem was we didn't know what was wrong until a few day later when the RAID controller noticed the drive had bad sectors and was trying to fix it.

That is when we finally knew exactly what the problem was and what drive to replace. :)
 
With a big RAID 10 aray like we have we did not know which drive was bad until the raid card reported thats one of the drives was starting to have sector errors.

Until then we couldnt do anything as we did not know which drive to replace in the array.
Why RAID 10? I assume the RAID controller is striping mirrored sets of drives because the other way (mirroring stripe sets) is more likely to fail (I should know!) If you have enough room and money, you set up a Hot-swap Spare that can be used to replace a drive that starts throwing errors. Whatever the situation, one drive in a RAID array should never cause the system to "see" the error at the file or application level. Losing one drive in a RAID array (except for RAID-0) should still leave the system in a usable, albeit vulnerable state.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Total: 0, Members: 0, Guests: 0)

Who Read This Thread (Total Members: 1)

Top