Congrats to the TT Techs!

LVWolfman

Active SatelliteGuys Member
Original poster
Dec 17, 2005
20
0
Las Vegas, Nevada
I know that you had a harrowing day... been there, done that. Not much is worse than having a server or three go down that directly affects the customers. Meanwhile you have (usually) bosses in one ear and customers calling into the other all wanting to know if it's fixed yet and demanding status reports.

Sort of reminds of me car trips with the kids in the back whining "Are we there yet?" every 30 seconds.

No one likes outages. Yet it seems like whatever is done to try to prepare for them, the evil gremlins of chaos eventually find a way through our defenses.

So congrats on getting the systems back up and running and for keeping your collective cool.
 
LVWolfman said:
I know that you had a harrowing day... been there, done that. Not much is worse than having a server or three go down that directly affects the customers. Meanwhile you have (usually) bosses in one ear and customers calling into the other all wanting to know if it's fixed yet and demanding status reports.
Sort of reminds of me car trips with the kids in the back whining "Are we there yet?" every 30 seconds.
No one likes outages. Yet it seems like whatever is done to try to prepare for them, the evil gremlins of chaos eventually find a way through our defenses.
So congrats on getting the systems back up and running and for keeping your collective cool.

You beat me to it....
 
Thanks for the kind words.

It is nice to hear something nice from the people we do our best to support.

I will pass the message on to the rest of the team.


Derek
 
Thanks for the positive comments - it really is appreciated.

Along the lines of the outage, several things combined to create the havoc which caused our downtime. The good news is that we've learned some lessons and determined what works and obviously, what doesn't work.

So, the great news is that we are working day and night right now to bring additional redundant datacenters online - literally all over the country and all over the world. We could accomplish a lot by changing some infrastrcuture things on the server side but we are also going to go a step further and make some modifications on the app side.

Basically, we are going to start round-robin DNS as well as secondaries and tertiaries at alternate sites. This in itself should counteract any problems at any one or even two centers. BUT, that's not good enough for us. We are also going to make Recast smart in regards to talking to the servers. It already has the ability to reference more than one registration server, but we are now going to have it attempt connections through a list of registration servers with timeouts indicating that it should move on to the next.

Finally, with upcoming changes to the registration verification process, in a worse case scenario and all of our servers are down or offline, everyone will have plenty of time to continue going about their business and using TimeTrax products without being affected directly.

The only negative I forsee at all, is that since we are going to be replicating data on a large scale to several different sites, we could potentially have a slight lag in getting any registration updates live and up on all the available registration servers. I think at our size it will still be minimal, but it will be slower than the instantaneous response to changes that we get now. Maybe the techies can resolve this one too.

David K.
Time Trax