The Reliability Challenge
This article describes some of the challenges faced when using failure rate databases for reliability assessment and prediction in the development of new products. Applicable to any industry. Copyright 2022, Saegert Solutions Inc.
I’m often engaged by clients to help make the products that they are develop more reliable. The traditional approach to reliability engineering, using traditional RAMS industry techniques, can be difficult to complete when working with new technologies:
One reliability technique leverages a system that has already had some degree of design work completed. The reliability analyst consults a reliability database, and looks up the part numbers from the bill of materials in an effort to locate failure rate data for the various components in the system. Some components, such as those used in electronic devices are relatively well characterized, and searching by part numbers usually locates the correct component, and data about its failure rates. In some cases, base figures for component failure rates are given alongside figures to modify the failure rates based on the conditions in the operating environment, stress levels, etc.
Using these techniques, generating a reliability prediction is generally a matter of determining failure rates for all of the components in the system, adding component failure rates together to estimate failure rates for subsystems, and adding subsystem failure rates to come up with a failure rate estimation for the system as a whole.
This approach is easy to understand, and can be an effective tool for benchmarking, and to establish requirements for the reliability of the system, as well as for subsystems and components.
But there are limitation to this technique:
Limited selections of components, and applications: electrical and integrated circuit components are well represented in reliability databases, but new and unique devices are usually absent. A company developing a system around a hydrogen fuel cell may find database values for 80% of the hardware components in the system, but in-house testing will likely be required to determine failure rates for highly customized components. The greater the proportion of highly customized components in a system, the less likely that parts – count reliability analysis will produce a useful reliability model.
Limited information for some components: parts count failure rate databases may have a lot of specific information on some components (particularly on electronics), but databases for mechanical components may only use generic terms, with multiple entries for components like ‘pump, hydraulic’, or ‘motor, brushless, DC’ with a wide range in failure rate values, and often without much additional information to indicate the best choice for the application. Higher failure rates are more conservative, but actual failure rates for components and systems should be verified through reliability testing.
Assumption of constant failure rates: failure rate data in parts databases is usually presented with the assumption that failure rates are constant, and do not increase or decrease over time. These are mean values, and their use may be valid for reliability estimation, particularly with the assumptions of large numbers of units in service, and that units will be replaced prior to wearing out. In reality, failure rates can increase or decrease over time. Examples include mechanical components’ increasing failure rates as they wear out. These changing failure rates are depicted in the well-known ‘bath tub’ curve.
No information on specific failure modes: most failure data in component databases reflect the total failure rates of the components, based on test and field data. Times to failure are recorded, but often failure modes – the ways in which the components fail – are not. Components usually have more than one failure mode. For example: an electrical component may fail as a short circuit (one failure mode) or as an open circuit (a different failure mode), the rate of occurrence of one failure mode is likely different than the rate of occurrence of another. The failure rates in databases effectively sum failure rates for all failure modes into one number, and present the total failure rate for the component.
Failure modes play a prominent role in reliability planning and growth: different failure modes can have different failure effects, so reliability tools such as FMEA and Fault Tree Analysis (FTA) prioritize failure modes, and relying on total failure rates from databases for occurrence rates may be over-conservative. Most reliability growth testing, particularly tests based on ‘physics of failure’ leverage failure modes and failure causes for test planning.
Failure rates are more than the sum of parts: parts count methods can overlook failures that come from integrating components into systems. System failures related to assembly are usually attributed to poor quality in the manufacturing process, but without planning for those types of failures, it can be difficult to prevent them as well. Software is another ‘component’ of systems that is usually overlooked in parts-count reliability analyses; because software is highly customized, its reliability is usually confirmed in development and testing, but failures related to how the software interacts with the components in the systems can still occur, and may not be represented by parts count reliability predictions.
Failure rate databases can be a useful tool for benchmarking and targeting goals for reliability for new and existing products, but a disciplined analysis by a knowledgeable professional can help fill in the gaps and avoid potentially costly errors in a company’s reliability plan.
About the author: Alex Saegert is founder and principal consultant of Saegert Solutions. He is an ASQ certified reliability engineer (CRE) and supplier quality professional (CSQP), with a specialized ASQ credential in risk management. He has over 20 years experience engineering quality, reliability and safety into products from such diverse fields as medical devices, hydrogen fuel cells, alternative energy powertrains for cars, trucks and locomotives, and in the nuclear power industry. He is licensed as a professional engineer in the provinces of Alberta and British Columbia, Canada.