I am a software designer / developer so I guess I would be an expert.
If a large system works in testing and crashes when it goes live, the cause would most likely be one of the following:
1. Race condition (multiple threads / processes trying to access the same thing at the same time or...