AFAIK the mining exercise involves generating the SHA256 residue of 256 bit numbers provided by the organization. New base numbers are provided every 10 minutes. The number "wins" if the 256 bit residue is less than a designated value. The last time I looked it appeared that the upper 59 or 60 bits had to be 0. Since the SHA256 is supposed to provide a uniform pdf for each bit that would say that the chances of getting a winning solution is 1/(2^59) or so. But here is where sharing comes in. If the first solution is bad an incremental number is added to the base and another computation is done and I think this process continues until the next base number comes up 10 minutes later. This means that the fastest machines will be able to generate more possible solutions, but it also means that if a bunch of computers are used the base number can be partitioned by multiples of the offset and a bunch of computers can be working on different ranges of the base.
I used to implement CRC-32, a similar type of problem, for ethernets, and believe me, SHA256 is exponentially longer to compute. As noted there are companies using special logic circuits doing this including one I am aware of doing it in Iceland where they can run faster.
I'm not a bitcoin expert by any means and haven't seen many simple to understand explanations of what is involved but this is my understanding of it.