If we see an issue, we typically think „this is a problem“. In general this is however not quite correct. Typically we see symptoms. A symptom is a sign for something not being in correct order or not functioning correctly. Sometimes symptoms are signs for certain diseases. The symptoms have to be analyzed by a trained specialist (in general) to find the underlying problem.
We however keep it simple: You cut your finger and spilled food on the floor. This is an issue. To limit the problem, we put a band aid as a containment action. If you did not do too much harm to your finger, the band aid will soak up all the blood and after a some time the bleeding will be stopped by your body starting to repair itself. Issue fixed? Well, no. Not really. It is just a containment action keeping you from spilling more blood on the floor. It does not keep you from cutting your finger again tomorrow.
Yo may want to analyze why you cut your finger in the first place in that situation. Wit a specific situation we do not know and therefore we cannot understand this technical root cause. The root cause of the problem is what made you cut your finger. Get the picture? We need to fix that root cause.
To find the technical root cause we have to observe how you happened to cut your finger. Go and see (genchi genbutsu). I cannot trust you if I ask you how you cut your finger, right? This is potentially embarrassing. You may tell me some story. I would. Therefore we have to invest some time and watch what you are doing to really understand your workflows. Go, look and see. WE need to see everything. Typical steps in your work flow and also things you do during maintenance or when exceptional things happen. It will be a bit time-consuming. That is OK.
When we think we have found the technical root cause, we need to verify that we really found it. When cutting fingers, we can spare that. Remember? We have that containment action, of the band aid, so we do no harm. So we switch the technical root cause on and off and verify on and off for the issue. This requires some bravery. You need to trust your containment action.
At this point we have a verified technical root cause. This is unfortunately only of limited value. It assures than we don not do the same mistake again. Behind each technical root cause, there is a system root cause. This means somewhere in our system there is a an issue that allowed us to hurt ourselves. Before we get there, let us take our „lean lesson“ first.
When we are sure we found the root cause and everything is safe now, we can remove the containment action. What was the containment action in our case here? Not the band aid, it is the self-healing body. O.K. let us keep that for a while. In the engineering or administration environment we can often observe that the containment action is kept in place and cost extra effort and wastes a lot of money. Remember all these check lists? If we are sure we found the root cause of the problem. we can safely remove the containment action. Right?
Now we can concentrate on the system root cause. We never planed to cut our finger in the workflow. Why was it possible that it happened? Which oversight was made? How can we prevent this from happening in similar situations again? That is the systemic root cause. Fix that as well.
Check the eight disciplines of problem solving for reference. They trace back to the Ford Motor Company. There are many more structured ways to drill down to the root cause of the problem. Another popular method is the 5-why method. There are many more. What ever you use, just understand the difference between symptom, problem and root cause.