One typically defines reliability as the probability that said device will perform functionally as required for a specified period of time. This all seems rather simplistic, and it can be, to a degree, with a large enough sample size and a long enough period of time. The main issue with deriving such figures is that they are required for a product’s release – not at the end of its lifetime when actual reliability can be determined.
To retrospectively calculate the reliability a component or device provided over its lifetime is fairly rudimentary math – total time/total failures. This is all well and good when proving near-obsolete products and potentially useful to prove the reliability of a typical product, but new integrators want to know how reliable this product is, not the previous incarnation.
Increasingly often, beyond a general acceptance of the estimated lifetime of industrial electronics, reliability is specified upfront at the earliest specification stage. Whilst this is more than logical – for instance, one must decide a warranty period based on an estimated lifetime – we now need to quantify that reliability. This is where our old friend, or increasingly, enemy, Mean Time Between Failure (MTBF) comes in. MTBF and “asset life” increasingly go hand in hand, but how accurate are any of these figures and what do they actually tell us?
It’s also worth pointing out MTBF’s less common cousin Mean Time To Failure (MTTF), which differs in that the latter generally is used for an irreparable product, so is used more often for atomic components rather than an assembled product. MTTF is calculated as total time/number of units.
Both have gaping holes in their accuracy; reliability of a given individual unit is a hugely complex calculation. To provide an example of this minefield, a client recently asked if their bespoke product we manufacture is suitable for a 10-year asset life. By querying this, they wanted us to provide evidence of a 10-year MTBF.
Interestingly, what would seem the most logical way to calculate MTBF gave the most bizarre result! Given the product has been in manufacture and deployed for more than 5 years, unlike a new product, we had the gift of substantial historical data. Unfortunately, that data of approximately 5,000 units, deployed over an average of 3 years, with around 14 failures provides an MTBF of more than 1,000 years!
As much as I’d like to gloat about our bespoke product’s reliability and my figures will entirely support this is a true MTBF figure, no one could realistically believe even the materials the product are constructed from will survive this length of time – though that could well be true of the plastic enclosure!
The second, perhaps more realistic method, only considers one component: the weakest link. It’s perfectly logical that by definition that the weakest link is the most likely to fail, and thus most likely to fail first. So should no calculation exist at all, and this figure just be passed through to the final product?
The way in which MTBF is presented I liken to how automobile manufacturers declare fuel consumption figures. Never in history has the real-world MPG achieved in a vehicle actually matched the extravagant claims of the manufacturer, as this figure was obtained in a far from real-world test with vents sealed, no wind, etc. Likewise, a component manufacturer’s MTBF is unlikely to encompass all, or any of the extraneous factors that will affect it – be that humidity, temperature, vibration, or shock. What these constants were during testing are almost never documented, thus any particular MTBF figure is rarely comparable to the next. Unfortunately, this regress follows to the final product; MTBF simply doesn’t cover the expected usage conditions or what the product lifetime should be.
The calculation of reliability and likelihood of failure has been studied in depth. Well-known, observable phenomena such as the “bathtub” effect are well documented but very difficult to encompass into a single “hours” integer. Weibull analysis, determining where a population of product currently lies in the
bathtub, is well worth researching further – alongside Accelerated Life Testing that tries to encompass an individual unit’s passage of time, though not quite for a millennium!
An increasingly popular website, www.nomtbf.com, is very much worth a read, pushing a backlash against this age-old quantification method. The reality is, though, there’s not even anything close to the “right answer” to truly calculate reliability.
Recently Glenn S. asked if I had a reference for clear definitions of MTBF and MTTF. After a bit of a search I sent him a definition or two, meanwhile he gathered a few more.
They are all basically the same, with some slight differences. What is interesting to me is the amount of variability in the interpretation and understanding.
Here’s the list Glenn collected:
This is what I’ve compiled thus far on definitions for MTTF and MTBF. There seems to be some variations in the definitions.
From a 1974 paper on DTIC.
Mean Time Between Failures (MTBF): The average operating time expected between failures in a population of identical components. This measure has meaning only when we are discussing a population where there is repair or replacement.
Mean Time To Failure (MTTF): The average operating time expected before failure of a component which is not repaired or replaced. This is simply the average time to failure of “n” units, i. e., the sum of “n” individual unit times to failure divided by “n” units.
MIL-HDBK-721 and MIL-STD-109 (Both Rescinded)
MEAN-TIME-TO-FAILURE (MTTF): A basic measure of reliability for non-repairable items: The total number of life units of an item divided by the total number of failures within that population, during a particular measurement interval under stated conditions.
MEAN-TIME-BETWEEN-FAILURE (MTBF): A basic measure of reliability for repairable items: The mean number of life units during which all parts of the item perform within their specified limits, during a particular measurement interval under stated conditions.
Internet PDF File of definitions
Mean Time to Failure. MTTF is the expected value (mean) of an item’s failure-free operating time. It is obtained from the reliability function R(t) as MTTF = ò R(t) dt, with TL as the upper limit of the integral if the life time is limited to TL (R(t) = 0 for t > TL ). MTTF applies to both non repairable and repairable items if one assumes that after a repair the item is as-good-as-new. If this is not the case, a new MTTF (MTTFsi starting from state Zj) can be considered (Table 6.2). An unbiased (empirical) estimate for MTTF is MTTF = (tl + … + tn )/n, where tl + … + tn are observed failure-free operating times of statistically identical, independent items.
MTBF= 1/l. MTBF should be reserved for items with constant failure rate A. In this case, MTBF = 1/ l is the expected value (mean) of the exponentially distributed item’s failure-free operating time, as expressed by Eqs. (1.9) and (A6.84). The definition given here agrees with the statistical methods generally used to estimate or demonstrate an MTBF. In particular MTBF= T/ k, where T is the given, fixed cumulative operating time (cumulated over an arbitrary number of statistically identical and independent items) and k the total number of failures (failed items) during T. The use of MTBF for mean operating time between failures (or, as formerly, for mean time between failures) has caused misuses (see the remarks on pp. 7, 318, 327,416) and should be dropped. The distinction often made between repairable and non-repairable items should also be avoided (see MTTF).
Mean Operating Time Between Failures – MTBF the expectation of the operating time between failures
Mean Time to Failure – MTTF the expectation of the time to failure
If you know other references with clear definitions I sure would like to know about them.
About Fred Schenkelberg
I am an experienced reliability engineering and management consultant with my firm FMS Reliability. My passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. View all posts by Fred Schenkelberg →