I'm a hobby programmer currently studying actuarial science, so this seems like a logical question for me to ask! We all know that CPUs are highly accurate and have some degree of error correction of internal analogue signals (ie currents between transistors) and even redundant systems, but what is the probability that those will fail? While this is not a pertinent question for the every-day programmer (the fail rate of a CPU is probably infinitesimal for every-day considerations), I find it interesting nonetheless. Let's base the question on an example;
Suppose we have a binary operation:
1 2 3 4
bool a = true;
bool b = false;
bool test = a == b;
What is the probability of test being assigned a values of true? That is to say, what is the probability that it will incorrectly assign the values or stuff up the boolean test? I'm looking for an answer in terms of a sigma event number or probability (ie. On average, how many times do I have to run this program before it gets it wrong?).
I do remember seeing when the whole faster than light neutrino thing was going on, that they were looking at possible interference in chips from cosmic rays as a cause (at least, in some of the literature I read). This is a definite way to cause errors in a chip. But how often does a cosmic ray affect a calculation really?
IBM suggests in memory the answer is 'one cosmic-ray-induced error per 256 megabytes of RAM per month' ( http://www.scientificamerican.com/article.cfm?id=solar-storms-fast-facts). With memory being generally more susceptible to errors than the CPU and being larger than a CPU along with the fact that this would have to affect a part of a CPU at the time of calculation and affect the part of the CPU actually doing the calculation... I would expect the CPU errors caused by this to be significantly lower.
> (ie. On average, how many times do I have to run this program before it gets it wrong?).
On a modern processor architecture, you are unlikely to ever see it get it wrong (the probability of that happening is infinitesimal). If it is a soft error, the processor will detect it and apply error correction, typically via retries; if it is a hard error, the system will halt - kernel panic (unix) or KeBugCheck (windows) - with an MCE.