RCFA is the process of investigating failure of a product, process, equipment and using the information to implement a change. It’s also commonly referred to as Root Cause Analysis or RCA.
IDCON is convinced that the process of analyzing the root cause of failures and acting to eliminate these causes is one of the most powerful tools in improving plant reliability and performance. But what is a “root cause” or rcfa? One definition is “The cause of a problem which, if adequately addressed, will prevent a recurrence of that problem.”
Let’s look at an example.
A rcfa investigation uncovers more than one cause
Imagine that a bearing has failed, and that an investigation shows that it had not been lubricated.
Asking the question “why had it not been lubricated?” may lead to the discovery that the grease point for the bearing had been missed during a lubrication survey and it was not on the lubrication mechanic’s route sheet.
Using the above definition of “root cause”, this problem can be prevented from happening again by simply adding this grease point to the lubrication route sheet.
But if the definition of “root cause” is changed slightly to “The cause of a problem which, if adequately addressed, will prevent recurrence of that problem or similar problems”.
This raises the bar to a higher level. The next question then is “Why had the grease point been missed from the lubrication route?”.
The answer may be that lubrication routes were set up by a single person, with no checks or confirmation that the routes were complete.
This may lead to an action to change the procedure for the development of lubrication routes, which will ensure that there a no other missed lubrication points in the plant, nor will there be in the future.
By asking the question “why” a few more times, the root cause of a problem is often identified as a procedural, or management, shortcoming. Here’s a great article about which process to implement first, PM or root cause?
Addressing these root causes often requires a change of thinking and some pain and effort, but the results will be much longer-lasting and higher-value than correcting individual failures. If you don’t have a formal rcfa process in place, you may want to start with training the frontline in troubleshooting.
Looking for help with RCFA? Contact us to get your Root Cause process started.
For more articles on root cause analysis:
Root Cause Failure Analysis (RCFA)
RCFA (Root Cause Failure Analysis) is the process of investigating how an equipment failure, process problem, quality problem, safety incident, environmental incident, and many other problems in a plant happened. RCFA is also commonly referred to as Root Cause Analysis or RCA.
IDCON is convinced that the process of analyzing and eliminating the root causes of failures is a powerful tool for improving plant reliability. So, instead of the more common term Root Cause Failure Analysis we use the term Root Cause Problem Elimination. We find this better defines the outcome that the plant wants.
But what is RCFA? How is it used? And what are some practical implementation tips?
What problems can you use RCFA (Root Cause Failure Analysis) for?
The bottom-line is that it can be used to solve many types of problems from equipment problems to safety/environment incidences. This type of problem solving can and should be used across the organization.
What are the most common methods and names for RCFA?
There are many names and methods today, so let’s try to organize all the methods, tools, companies, to provide you a guide in navigating the RCFA concept.
The main pieces of RCFA are the documentation tool, the work process, and the thinking & behaviors. Learn more about the thinking & behaviors in RCFA by watching our video series “Root Cause and Thinking”
- Methods, software, and names for RCFA. It is prevalent that company names, documentation tools, software names are used to name a RCFA approach. While each service provider tries to have their own style, the basic tools and processes are very similar.
- Other names: Root Cause Analysis, Root Cause Problem Elimination (IDCON’s preferred name).
- Software: Apollo, TapRoot, Sologic, just to name a few
- Documentation tools: Fishbone, 5-why, Cause and effect, Pareto, fault tree, How-Can Diagram, etc.
- Documentation Tools for RCFA. The tools for documenting are very similar. Most use some type of Cause-and-Effect diagram, it may be a tree (top to bottom drawing), a flow (left-to-right drawing) and different shapes and colors may be used. But in essence, they are all the same. Some are based on a paper system; some are managed in a software.
Documentation vs. solving problems
Unfortunately, most RCFA training classes focus on the documentation tool. It is common that 90% of the classroom time is spent on learning the software or how to use a diagram. It is clearly better if the time is spent on learning how to solve problems and make the documentation tool and software secondary in the teaching process.
The work process for RCFA
The work process can be quite detailed, but the basic components are similar but may have different names.
1. The RCFA trigger
The first part of the process is the trigger. A company typically decides on several triggers for when a root cause investigation should take place. Common triggers are:
- Breakdown/ production interruption more than X hrs. (4 hrs. is common)
- Repetitive failure (3rd failure is common)
- Safety incident
- Environmental incident
The different triggers may seem straight forward, but it is not always clear when a trigger is met. It can be tricky to analyze the data in the Computerized Maintenance Management System (CMMS) to see when a trigger is met, especially for repetitive failures. Improvement projects and other non-repair maintenance often clutter the data. But a well-organized CMMS database can help identify repetitive problems that add up to large cost for the company over time.
2. The Problem Statement in RCFA
The problem statement has two key features:
- The RCFA problem statement must be a fact
- It is best if the RCFA problem statement has one object and one problem
The problem statement must be a fact. This may seem obvious, but it is very common that a group interpret an observed symptom and writes down a problem statement that is different than the observed symptom. For example, let’s say we observe that there is no flow reading a flow valve and write the problem statement “No flow in pipe”. Here we have a problem statement that may not be a fact (flow could be fine while flowmeter is broken). Let’s say we find a bearing that is 250 F (120C) and write “bearing has no lubricant”, it may have too much lubricant, or the problem may be due to electrical fluting. In some cases, a problem statement may be a fact, but we need to direct the RCFA as to what problem we want to solve. For example, “I can’t get into my house”. Versus “I have lost my keys” will probably lead to different investigations.
The RCFA problem statement has one object and one problem. This rule will improve an RCFA investigation greatly. It is common to see the problem statement as a whole list of facts and information. If there are multiple objects and multiple symptoms in a problem statement, it becomes very unclear what problem to solve. Make sure to have one problem and one object, all the other information can be used as evidence in the RCFA cause-and-effect chain.
3. Collect Facts and Information
Collecting fact and information is represented in this article as one step in the process. Facts and information are collected throughout the whole process. We need information to understand there is a problem, to write a problem statement and investigate the problem. But the central part of information gathering is used to develop potential causes and verifying most likely causes. We need facts to prove that one even causes another event in the root cause investigation. A very common mistake in RCFA investigations is to develop a cause-and-effect diagram without verifying what is true and not with facts. The investigation becomes an unproven hypothesis instead of an investigation.
It is important to collect and store broken equipment and broken parts in equipment failure RCFA’s. Most plants don’t have a system or a place to store broken parts, therefore most investigations become a guessing game. Make sure to have a designated place for broken pieces and equipment. Also, assure that plant personnel are trained in how to bag and tag parts, take proper photos and video, and how to catalogue the parts.
4. Identify Possible Causes in RCFA
The main ingrediencies for developing possible causes are:
- Creative thinking
- Facts and information
The first level of possible causes is developed by looking at the problem statement and the facts and information collected. Ask the question “how can INSERT PROBLEMS STATEMENT happen”. Using creative thinking while comparing the possibilities that come to mind with the information collected, possible causes can be identified. The process continues as each possible cause is developed. For example, if the problem statement is “bearing failed”, the possible causes are developed by asking “How can the bearing fail?”. Looking at the facts at hand we may develop possible causes such as “dynamic overload”, lack of “lubrication” etc. The first possible causes directly connected to the problem statement is called 1st level. The investigation continues by picking the most likely possible cause and ask the same question again. For example, “How can lack of lubrication happen”.
As a rule of thumb, it is good to try to list as many possible causes as possible in level 1 but limit them to where the evidence points in level 2 and beyond. It is very important to not disregards the comparison with collected facts and information. If we list all possible causes in all levels of the RCFA process it will end up becoming an FMEA (Failure Mode and Effects Analysis) which include everything that could happen instead of what happened in this case.
While identification of possible causes should include creativity and an open mind, make sure to follow the path where the evidence points. If that doesn’t work out, go back to level 1 and rethink other possibilities.
5. Select Most Likely Causes in RCFA
Identifying the true causes to the problem is done by comparing cause-and-effect with convincing evidence to the relationship. For example, if the cause is “excessive axial force on the bearing” and the effect is “damaged bearing”. We need to look for evidence that verify this “hypothesis”. Perhaps we take the bearing apart and inspect the bearing races and notice that the bearing wear prove that there was excessive axial load. The cause-and-effect chain should be verified in each step of the way.
All RCFA investigations have three levels of root causes:
- Technical root cause (what physically happened)
- A human cause (someone did something, or missed doing something to trigger the technical cause
- A systematic cause (the human did something because a management system wasn’t in place to direct or manage the human(s))
In the example above the technical root cause may be excessive axial force on the bearing. Let’s say the bearing is mounted on a roll then the human cause could be that the floating bearing (one side of the roll) was tightened down by someone that didn’t know it was supposed to be floating. The systematic root cause may be that training of craftspeople is poor and/or poor skills testing for new employees. The systematic root causes are almost much tougher to correct that the technical.
Remember, there are layers of root causes, not just a technical cause.
6. Identify Solutions
Identifying solutions to root causes may seem like a very basic step, but there are many factors to consider. First, make sure to use creative thinking in your solutions investigation to list several alternatives, not only the first that comes to mind. Second, evaluate the proposed solutions in terms of cost, functionality, acceptance by the team etc. Typically, it is a good idea to propose the solutions with a cost-benefit analysis.
The biggest problem in the RCFA process may be that solutions rarely are implemented. It is quite interesting and comfortable to do an RCFA investigation, lots of coffee, cake, and interesting technical discussions. But it is hard work and sometimes costly to implement solutions. For example, think of the root cause above where the mechanics are poorly trained. It is quite an undertaking to fix that if there are 100 mechanics on site.
7. Eliminate Root Causes
As mentioned above, the hardest part is to implement the identified solutions. The whole point of a root cause failure analysis investigation is NOT to identify the root causes, it is to eliminate root causes. This is the main reason IDCON calls its root cause process “Root Cause Problem Elimination (RCPE)”. Management of a company should make sure there is a process in place to evaluate, select, and eliminate root causes from investigations.
Root Cause Analysis Examples
Let’s say the technical cause of a bearing failure is that the bearing was misaligned
However, there are two more layers to solve. How can a bearing be misaligned?
How can not enough time be allowed to align the bearings?
In this case, the technical root (bearings misaligned) was corrected, the bearing is changed, and the shafts were then aligned.
However, the poorly planned shutdown that led to a lack of time for aligning wasn’t addressed. The human and systematic root takes more people and time to solve and if this happened in a reactive organization, it is unlikely that it will take time to address the situation. The organization is too busy fixing day-to-day emergencies.
The Root Cause Problem Elimination Thinking Process (the investigation) Explained
Most organizations have no trouble defining the day-to-day work processes, it is straight forward. As mentioned earlier, the hardest part in the work process is to implement solutions from the investigations, especially the human and systematic solutions.
The documentation tools are also usually easy to understand. A 5 min YouTube can often explain how to draw a cause-and-effect diagram or a histogram. The problem lies in the fact that most individuals and plants struggle with the thinking process. Many common RCPE tasks are missed or not done correctly. Below are the top 8 mistakes.
- Not correctly translating a situation to a comprehensive problem statement. Watch my video about problem statements
- Making sure that facts are used to drive the investigation instead of opinions and hearsay.
- Collecting information and evidence. Watch my video about how to collect information and evidence
- Store and catalogue broken parts found during the investigation. Watch my video with tips about this
- Not using cause, effect, and evidence as the holy trinity in driving the RCFA investigation
- Not having patience and attention to detail based on evidence. This can lead to jumping to conclusions
- Independent thinking instead of group thinking
- Inability to jump between creative thinking and critical thinking
To wrap up this article, let’s look at two more RCFA examples
RCFA Examples: A RCFA Investigation that uncovered more than one cause
A bearing failed at a Canadian paper mill, and it was decided to conduct a RCFA investigation. A basic problem statement was written “Bearing Failed”. The motor where the bearing was mounted was opened and the bearing was taken apart. It was clear that the bearing had not been lubricated.
Asking the question “why had it not been lubricated?” led to the discovery that the grease point for the bearing had been missed during a lubrication survey and it was not part of the lubrication mechanic’s route sheet. This is an exceptionally basic root cause from a technical standpoint because the technical RCFA was very easy. However, when digging into the human and systematic causes the problem becomes much more complicated.
The technical problem can be fixed by simply by adding this specific grease point to the lubrication route sheet. But, if we look at the human and systematic root, the problem was a bit more complex.
The human RCFA was determined to be that the lubrication technician missed the lubrication point because it wasn’t on the route.
The systematic factor (work process) showed that many other points were missing, had the wrong grease and incorrect amount of grease because no route survey had been done in many years. In turn this triggered them to investigate the lubrication program including handling, storing, and application.
They discovered all aspects of the lubrication program were very poor. Here is where the investigation stopped, but it would be possible to continue and look at the whole PM program and I’m sure there would be issues there too.
Another RCFA Example of Systematic and Human factors
Another client had a problem with too many bearing failures on their AC motors. So, with the group we picked one of the motors to investigate (collect evidence and history). The group chose to pursue a problem statement of “Inboard motor bearing failed (one problem/one object). We did the investigation and found the problem quickly since we were able to open the motor (see the pictures below).
Figure 1: RCFA examples. Here you see the motor is over greased; causing the motor to overheat and fail
So, the technical issue was easy to solve – over greasing caused the failure. But the real question was ‘How Can” the bearing be so over greased? We dug in and here’s what we learned:
The E/I team gave the lubrication tasks to the mechanics because they didn’t want to do it. The mechanics were offended that they were given the task for something they believed should be done by electrical, so they handed it off to summer workers.
And because the summer workers were there only in the summer, they were told “Grease it good”, which they did, as you can see.
One of the most important jobs to was given to the least experienced people. The human factor; inexperienced workers and the systematic factor; only greasing in the summer ultimately caused the technical problem.
Addressing these root causes often requires a change of thinking and some pain and effort, but the results will be much longer-lasting and higher-value than correcting individual failures. If you don’t have a formal RCFA process in place, you may want to start with training the frontline in troubleshooting.
Can you implement RCFA in reactive maintenance environments?
This is a good questions and Christer Idhammar has written a great article about which process to implement first, PM or root cause?
We firmly believe that you need to have your basics in place before fully implementing a RCPE process, because what you’ll find is that in many investigations the system that people work in is the root to many of the problems you encounter; from inadequate training to poor preventive maintenance processes to planning and scheduling.
If you need help with improving your overall reliability at your organization, we recommend beginning with an assessment of your current program. Contact us to learn more about how common-sense consulting and training can benefit you.