Chris and Fred discuss how management teams across many institutions continue to misuse the MTBF while believing they are fluent in the field of reliability (when they clearly aren’t)!

Key Points

Join Chris and Fred as they discuss perhaps the most common … and commonly used reliability ‘metric’ – mean time between failure (MTBF). When many of the US military standards regarding reliability were written, electronic calculators (let alone laptops with spreadsheets) didn’t exist. So the statistical inference had to be simple and ‘doable by hand.’ This necessitated the broad assumption of a constant hazard rate – meaning that reliability had to be measured in terms of MTBF. But for some reason, event though the majority of these handbooks have been withdrawn from circulation, we still see them almost enshrined in our reliability thinking.

Knowing if something is failing due to infant mortality, wear-out or at a constant rate is vitally important for improving reliability. If we see infant mortality, then we suspect that manufacturing defects are to blame. If we see wear-out, then we know our system has failure mechanisms based on the accumulation of damage and we can incorporate more margins into design. If we see a constant hazard rate, we reasonably suspect external environmental stresses (such as voltage spikes) are causing failure.

Chris showed Fred a little video, which showed the transition of a probability density function (PDF) from a wear-out mechanism (with the indicative bell curve) to a dominant infant mortality mechanism. The video saw the PDFs slowly morph from one extreme to the other … all the while illustrating the percentage of systems we would expect to fail by the MTBF. It started at 50 per cent for wear-out mechanisms, moved through 63 per cent for constant hazard rates, and then rapidly increased to almost 100 per cent for infant mortality mechanisms. That’s right – it is statistically possible for virtually all systems to fail by the MTBF if there is extreme infant mortality.

How many management teams know this?

Topics include:

  • The historical underpinnings of the MTBF and how it is a relic of a bygone era where it was assumed no one had access to electronic computational power.
  • What the MTBF actually means – and perhaps more importantly doesn’t mean when we are trying to understand reliability characteristics.
  • How dangerous the MTBF can be if it is assumed to ‘be reliability.’ MTBF is a parameter of the random failure process – not a reliability metric
  • How you need metrics beyond the MTBF to inform reliability based decisions in both the business and engineering elements of product development.

