Thinking of Swiss Cheese: Reason’s Theory of Active and Latent Failures

The Theory of Active and Latent Failures was proposed by James Reason in his book, Human Error. Reason stated accidents within most complex systems, such as health care, are caused by a breakdown or absence of safety barriers across four levels within a system. These levels can best be described as Unsafe Acts, Preconditions for Unsafe Acts, Supervisory Factors, and Organizational Influences. Reason used the term “active failures” to describe factors at the Unsafe Acts level, whereas “latent failures” was used to describe unsafe conditions higher up in the system.

This is represented as the Swiss Cheese model, and has become very popular in root cause analysis and risk management circles and widely applied beyond the safety world.

Swiss Cheese Model

In the Swiss Cheese model, the holes in the cheese depict the failure or absence of barriers within a system. Such occurrences represent failures that threaten the overall integrity of the system. If such failures never occurred within a system (i.e., if the system were perfect), then there would not be any holes in the cheese. We would have a nice Engelberg cheddar.

Not every hole that exists in a system will lead to an error. Sometimes holes may be inconsequential. Other times, holes in the cheese may be detected and corrected before something bad happens. This process of detecting and correcting errors occurs all the time.

The holes in the cheese are dynamic, not static. They open and close over time due to many factors, allowing the system to function appropriately without catastrophe. This is what human factors engineers call “resilience.” A resilient system is one that can adapt and adjust to changes or disturbances.

Holes in the cheese open and close at different rates. The rate at which holes pop up or disappear is determined by the type of failure the hole represents.

  1. Holes that occur at the Unsafe Acts level, and even some at the Preconditions level, represent active failures. Active failures usually occur during the activity of work and are directly linked to the bad outcome. Active failures change during the process of performing, opening, and closing over time as people make errors, catch their errors, and correct them.
  2. Latent failures occur higher up in the system, above the Unsafe Acts level — the Organizational, Supervisory, and Preconditions levels. These failures are referred to as “latent” because when they occur or open, they often go undetected. They can lie “dormant” or “latent” in the system for an extended period of time before they are recognized. Unlike active failures, latent failures do not close or disappear quickly.

Most events (harms) are associated with multiple active and latent failures. Unlike the typical Swiss Cheese diagram above, which shows an arrow flying through one hole at each level of the system, there can be a variety of failures at each level that interact to produce an event. In other words, there can be several failures at the Organizational, Supervisory, Preconditions, and Unsafe Acts levels that all lead to harm. The number of holes in the cheese associated with events are more frequent at the Unsafe Acts and Preconditions levels, but (usually) become fewer as one progresses upward through the Supervisory and Organizational levels.

Given the frequency and dynamic nature of activities, there are more opportunities for holes to open up at the Unsafe and Preconditions levels on a frequent basis and there are often more holes identified at these levels during root cause investigation and risk assessments.

The way the holes in the cheese interact across levels is important:

  • One-to-many mapping of causal factors is when a hole at a higher level (e.g., Preconditions) may result in several holes at a lower level (e.g. Unsafe acts)
  • Many-to-one mapping of causal factors when multiple holes at the higher level (e.g. preconditions) might interact to produce a single hole at the lower level (e.g. Unsafe Acts)

By understand the Swiss Cheese model, and Reason’s wider work in Active and Latent Failures, we can strengthen our approach to problem-solving.

Plus cheese is cool.

Swiss Cheese on a cheese board with knife

Call a Band-Aid a Band-Aid: Corrections and Problem-Solving

A common mistake made in problem-solving, especially within the deviation process, is not giving enough foresight to band-aids. As I discussed in the post “Treating All Investigations the Same” it is important to be able to determine what problems need deep root-cause analysis and which ones should be more catch and release.

For catch and release you usually correct, document, and close. In these cases the problem is inherently small enough and the experience suggesting a possible course of action – the correction – sound enough, that you can proceed without root cause analysis and a solution. If those problems persist, and experience and intuition-drive solutions prove ineffective, then we might decide to engage in structured problem-solving for a more effective solution and outcome.

In the post “When troubleshooting causes trouble” I discussed that lays out the 4Cs: Concern, Cause, Countermeasure, Check Results. It is during the Countermeasure step that we determine what immediate or temporary countermeasures can be taken to reduce or eliminate the problem? Where we apply correction and immediate action.

It helps to agree on what a correction is, especially as it relates to corrective actions. Folks often get confused here. A Correction addresses the problem, it does not get to addressing the cause.

Fixing a tire, rebooting a computer, doing the dishes. These are all corrections.

As I discussed in “Design Problem Solving into the Process” good process design involves thinking of as many problems that could occur, identifying the ways to notice these problems, and having clear escalation paths. For low-risk issues, that is often just fix, record, move on. I talk a lot more about this in the post “Managing Events Systematically.”

A good problem-solving system is built to help people decide when to apply these band-aids, and when to engage in more structured problem-solving. This reliance on situational awareness is key to build into the organization.

Design Problem Solving into the Process

Good processes and systems have ways designed into them to identify when a problem occurs, and ensure it gets the right rigor of problem-solving. A model like Art Smalley’s can be helpful here.

Each and every process should go through the following steps:

  1. Define those problems that should be escalated and those that should not. Everyone working in a process should have the same definition of what is a problem. Often times we end up with a hierarchy of issues that are solved within the process – Level 1 – and those processes that go to a root cause process (deviation/CAPA) – level 2.
  2. Identify the ways to notice a problem. Make the work as visual as possible so it is easier to detect the problem.
  3. Define the escalation method. There should be one clear way to surface a problem. There are many ways to create a signal, but it should be simple, timely, and very clear.

These three elements make up the request for help.

The next two steps make up the response to that request.

  1. Who is the right person to respond? Supervisor? Area management? Process Owner? Quality?
  2. How does the individual respond, and most importantly when? This should be standardized so the other end of that help chain is not wondering whether, when, and in what form that help is going to arrive.

In order for this to work, it is important to identify clear ownership of the problem. There always must be one person clearly accountable, even if only responsible for bits, so they can push the problem forward.

It is easy for problem-solving to stall. So make sure progress is transparent. Knowing what is being worked on, and what is not, is critical.

Prioritization is key. Not every problem needs solving so have a mechanism to ensure the right problems are being solved in the process.

Problem solving within a process

Defining Values, with Speaking Out as an example

Which espoused values and desired behaviors will best enable an organization to live its quality purpose? There’s been a lot of writing and thought on this, and for this post, I am going to start with ISO 10018-2020 “Quality management — Guidance for people engagement” and develop an example of a value to build in your organization.

ISO 10018-2020 gives 6 areas:

  • Context of the organization and quality culture
  • Leadership
  • Planning and Strategy
  • Knowledge and Awareness
  • Competence
  • Improvement

This list is pretty well aligned to other models, including the Malcolm Baldrige Excellence Framework (NIST), EFQM Excellence Model, SIQ Model for Performance Excellence, and such tools as the PDA Culture of Quality Assessment.

A concept that we find in ISO 10018-2020 (and everywhere else) is the handling of errors, mistakes, everyday problems and ‘niggles’, near misses, critical incidents, and failures; to ensure they are reported and recorded honestly and transparently. That the time is taken for these to be discussed openly and candidly, viewed as opportunities for learning how to prevent their recurrence by improving systems but also as potentially protective of potentially larger and more consequential failures or errors. The team takes the time and effort to engage in ‘second orderproblem-solving. ‘First order’ problem solving is the quick fixing of issues as they appear so as to stop them disrupting normal workflow. ‘Second order’ problem solving involves identifying the root causes of problems and taking action to address these rather than their signs and symptoms. The team takes ownership of mistakes instead of blaming, accusing, or scapegoating individual team members. The team proactively seeks to identify errors and problems it may have missed in its processes or outputs by seeking feedback and asking for help from external stakeholders, e.g. colleagues in other teams, and customers, and also by engaging in frequent experimentation and testing.

We can tackle this in two ways. The first is to define all the points above as a value. The second would be to look at themes for this and the other aspects of robust quality culture and come up with a set of standard values, for example:

  • Accountable
  • Ownership
  • Action Orientated
  • Speak up

Don’t be afraid to take a couple of approaches to get values that really sing in your organization.

Values can be easily written in the following format:

  1. Value: A one or two-word title for each value
  2. Definition: A two or three sentence description that clearly states what this value means in your organization
  3. Desired Behaviors: “I statement” behaviors that simply state activities. The behaviors we choose reinforce the values’ definitions by describing exactly how you want members of the organization to interact.
    • Is this observable behavior? Can we assess someone’s demonstration of this behavior by watching and/or listening to their interactions? By seeing results?
    • Is this behavior measurable? Can we reliably “score” this behavior? Can we rank how individual models or demonstrates this behavior?

For the rest of this post, I am going to focus on how you would write a value statement for Speak Up.

First, ask two questions:

  • Specific to your organization’s work environment, how would you define “Speak Up.”
  • What phrase or sentences describe what you mean by “Speak Up.”

Then broaden by considering how fellow leaders and team members would act to demonstrate “Speak Up”, as you defined it.

  • How would leaders and team members act so that, when you observe them, you would see a demonstration of Speaking Up? Note three or four behaviors that would clearly demonstrate your definition.

Next, answer these questions exclusively from your team member’s perspective:

  • How would employees define Speaking Out?
  • How would their definition differ from yours? Why?
  • What behaviors would employees feel they must model to demonstrate Speaking Out properly?
  • How would their modeled behaviors differ from yours? Why?

This process allows us to create common alignment based on a shared purpose.

By going through this process we may end up with a Value that looks like this:

  1. Value: Speaking Out
  2. Definition: Problems are reported and recorded honestly and transparently. Employees are not afraid to speak up, identify quality issues, or challenge the status quo for improved quality; they believe management will act on their suggestions. 
  3. Desired Behaviors:
    • I hold myself accountable for raising problems and issues to my team promptly.
    • I attack process and problems, not people.
    • I work to anticipate and fend off the possibility of failures occurring.
    • I approach admissions of errors and lack of knowledge/skill with support.

Avoiding Logical Pitfalls

When documenting a root cause analysis or risk assessment or any of the myriad other technical reports we are making a logical argument. In this post, I want to evaluate six common pitfalls to avoid in your writing.

Claiming to follow logically: Non Sequiturs and Genetic Fallacies

Non-sequiturs and genetic fallacies involve statements that are offered in a way that suggests they follow logically one from the other, when in fact no such link exists.

Non-sequiturs (meaning ‘that which does not follow’) often happens when we make connective explanations without justification. Genetic fallacies occur when we draw assumptions about something by tracing its origins back even though no necessary link can be made between the present situation and the claimed original one.

This is a very common mistake and usually stems from poor use of causal thinking. The best way to address it in an organization is continuing to build discipline in thought processes and documenting the connections and why things are connected.

Making Assumptions: Begging the Question

Begging the question, assuming the very point at issue happens a lot in investigations. One of the best ways to avoid this is to ensure a proper problem statement.

Restricting the Options to Two: ‘Black and White’ Thinking

In black and white thinking or the false dichotomy, the arguer gives only two options when other alternatives are possible.

Being Unclear: Equivocation and Ambiguity

  • Lexical: Refers to individual words
  • Referential: Occurs when the context is unclear
  • Syntactical: Results from grammatical confusions

Just think of all the various meanings of validation and you can understand this problem.

Thinking Wishfully

Good problem-solving will drive down the tendency to assume conclusions, but these probably exist in every organization.

Detecting the Whiff of Red Herrings

Human error is the biggest red herring of them all.

Six logical fallacies