Thinking of Swiss Cheese: Reason’s Theory of Active and Latent Failures

The Theory of Active and Latent Failures was proposed by James Reason in his book, Human Error. Reason stated accidents within most complex systems, such as health care, are caused by a breakdown or absence of safety barriers across four levels within a system. These levels can best be described as Unsafe Acts, Preconditions for Unsafe Acts, Supervisory Factors, and Organizational Influences. Reason used the term “active failures” to describe factors at the Unsafe Acts level, whereas “latent failures” was used to describe unsafe conditions higher up in the system.

This is represented as the Swiss Cheese model, and has become very popular in root cause analysis and risk management circles and widely applied beyond the safety world.

Swiss Cheese Model

In the Swiss Cheese model, the holes in the cheese depict the failure or absence of barriers within a system. Such occurrences represent failures that threaten the overall integrity of the system. If such failures never occurred within a system (i.e., if the system were perfect), then there would not be any holes in the cheese. We would have a nice Engelberg cheddar.

Not every hole that exists in a system will lead to an error. Sometimes holes may be inconsequential. Other times, holes in the cheese may be detected and corrected before something bad happens. This process of detecting and correcting errors occurs all the time.

The holes in the cheese are dynamic, not static. They open and close over time due to many factors, allowing the system to function appropriately without catastrophe. This is what human factors engineers call “resilience.” A resilient system is one that can adapt and adjust to changes or disturbances.

Holes in the cheese open and close at different rates. The rate at which holes pop up or disappear is determined by the type of failure the hole represents.

  1. Holes that occur at the Unsafe Acts level, and even some at the Preconditions level, represent active failures. Active failures usually occur during the activity of work and are directly linked to the bad outcome. Active failures change during the process of performing, opening, and closing over time as people make errors, catch their errors, and correct them.
  2. Latent failures occur higher up in the system, above the Unsafe Acts level — the Organizational, Supervisory, and Preconditions levels. These failures are referred to as “latent” because when they occur or open, they often go undetected. They can lie “dormant” or “latent” in the system for an extended period of time before they are recognized. Unlike active failures, latent failures do not close or disappear quickly.

Most events (harms) are associated with multiple active and latent failures. Unlike the typical Swiss Cheese diagram above, which shows an arrow flying through one hole at each level of the system, there can be a variety of failures at each level that interact to produce an event. In other words, there can be several failures at the Organizational, Supervisory, Preconditions, and Unsafe Acts levels that all lead to harm. The number of holes in the cheese associated with events are more frequent at the Unsafe Acts and Preconditions levels, but (usually) become fewer as one progresses upward through the Supervisory and Organizational levels.

Given the frequency and dynamic nature of activities, there are more opportunities for holes to open up at the Unsafe and Preconditions levels on a frequent basis and there are often more holes identified at these levels during root cause investigation and risk assessments.

The way the holes in the cheese interact across levels is important:

  • One-to-many mapping of causal factors is when a hole at a higher level (e.g., Preconditions) may result in several holes at a lower level (e.g. Unsafe acts)
  • Many-to-one mapping of causal factors when multiple holes at the higher level (e.g. preconditions) might interact to produce a single hole at the lower level (e.g. Unsafe Acts)

By understand the Swiss Cheese model, and Reason’s wider work in Active and Latent Failures, we can strengthen our approach to problem-solving.

Plus cheese is cool.

Swiss Cheese on a cheese board with knife

Similac Recall is a Systematic Failure in our Food/Drug Safety

There has been a lot of press lately for the Abbott Nutrition recall of infant formula. Fundamentally this is a colossal failure of our regulatory program, another failure in a long string of failures, and confirmation that the time is now for radical changes in the agency.

Consumer Reports, in the article “How the FDA Bungled the Powdered Infant Formula Recall” does a good job covering the important points, so please read that article.

The optimist in me hopes that this calamity will drive needed change, as has been the unfortunate history of regulatory change in this country. I’m just not sure I hold enough confidence in Congress to get the job done.

The Failure Space of Clinical Trials – Protocol Deviations and Events

Let us turn our failure space model, and level of problems, to deviations in a clinical trial. This is one of those areas that regulations and tribal practice have complicated, perhaps needlessly. It is also complicated by the different players of clinical sites, sponsor, and usually these days a number of Contract Research Organizations (CRO).

What is a Protocol Deviation?

Protocol deviation is any change, divergence, or departure from the study design or procedures defined in the approved protocol.

Protocol deviations may include unplanned instances of protocol noncompliance. For example, situations in which the clinical investigator failed to perform tests or examinations as required by the protocol or failures on the part of subjects to complete scheduled visits as required by the protocol, would be considered protocol deviations.

In the case of deviations which are planned exceptions to the protocol such deviations should be reviewed and approved by the IRB, the sponsor, and by the FDA for medical devices, prior to implementation, unless the change is necessary to eliminate apparent immediate hazards to the human subjects (21 CFR 312.66), or to protect the life or physical well-being of the subject (21 CFR 812.150(a)(4)).

The FDA, July 2020. Compliance Program Guidance Manual for Clinical Investigator Inspections (7348.811).

In assessing protocol deviations/violations, the FDA instructs field staff to determine whether changes to the protocol were: (1) documented by an amendment, dated, and maintained with the protocol; (2) reported to the sponsor (when initiated by the clinical investigator); and (3) approved by the IRB and FDA (if applicable) before implementation (except when necessary to eliminate apparent immediate hazard(s) to human subjects).

Regulation/GuidanceStates
ICH E-6 (R2) Section 4.5.1-4.5.44.5.1“trial should be conducted in compliance with the protocol agreed to by the sponsor and, if required by the regulatory authorities…”
4.5.2 The investigator should not implement any deviation from, or changes of, the protocol without agreement by the sponsor and prior review and documented approval/favorable opinion from the IRB/IEC of an amendment, except where necessary to eliminate an immediate hazard(s) to trial subjects, or when the change(s) involves only logistical or administrative aspects of the trial (e.g., change in monitor(s), change of telephone number(s)).
4.5.3 The investigator, or person designated by the investigator, should document and explain any deviation from the approved protocol.
4.5.4 The investigator may implement a deviation from, or a change in, the protocol to eliminate an immediate hazard(s) to trial subjects without prior IRB/IEC approval/favorable opinion.
ICH E3, section 9.6The sponsor should describe the quality management approach implemented in the trial and summarize important deviations from the predefined quality tolerance limits and remedial actions taken in the clinical study report
21CFR 312.53(vi) (a)investigators selected “Will conduct the study(ies) in accordance with the relevant, current protocol(s) and will only make changes in a protocol after notifying the sponsor, except when necessary to protect the safety, the rights, or welfare of subjects.”
21CFR 56.108(a)IRB shall….ensur[e] that changes in approved research….may not be initiated without IRB review and approval except where necessary to eliminate apparent immediate hazards to the human subjects.
21 CFR 56.108(b)“IRB shall….follow written procedures for ensuring prompt reporting to the IRB, appropriate institutional officials, and the Food and Drug Administration of… any unanticipated problems involving risks to human subjects or others…[or] any instance of serious or continuing noncompliance with these regulations or the requirements or determinations of the IRB.”
45 CFR 46.103(b)(5)Assurances applicable to federally supported or conducted research shall at a minimum include….written procedures for ensuring prompt reporting to the IRB….[of] any unanticipated problems involving risks to subjects or others or any serious or continuing noncompliance with this policy or the requirements or determinations of the IRB.
FDA Form-1572 (Section 9)lists the commitments the investigator is undertaking in signing the 1572 wherein the clinical investigator agrees “to conduct the study(ies) in accordance with the relevant, current protocol(s) and will only make changes in a protocol after notifying the sponsor, except when necessary to protect the safety, the rights, or welfare of subjects… [and] not to make any changes in the research without IRB approval, except where necessary to eliminate apparent immediate hazards to the human subjects.”
A few key regulations and guidances (not meant to be a comprehensive list)

How Protocol Deviations are Implemented

Many companies tend to have a failure scale built into their process, differentiating between protocol deviations and violations based on severity. Others use a minor, major, and even critical scale to denote differences in severity. The axis here for severity is the degree to which affects the subject’s rights, safety, or welfare, and/or the integrity of the resultant data (i.e., the sponsor’s ability to use the data in support of the drug).

Other companies divide into protocol deviations and violations:

  • Protocol Deviation: A protocol deviation occurs when, without significant consequences, the activities on a study diverge from the IRB-approved protocol, e.g., missing a visit window because the subject is traveling. Not as serious as a protocol violation.
  • Protocol Violation: A divergence from the protocol that materially (a) reduces the quality or completeness of the data, (b) makes the ICF inaccurate, or (c) impacts a subject’s safety, rights or welfare. Examples of protocol violations may include: inadequate or delinquent informed consent; inclusion/exclusion criteria not met; unreported SAEs; improper breaking of the blind; use of prohibited medication; incorrect or missing tests; mishandled samples; multiple visits missed or outside permissible windows; materially inadequate record-keeping; intentional deviation from protocol, GCP or regulations by study personnel; and subject repeated noncompliance with study requirements.

This is probably a place when nomenclature can serve to get in the way, rather than provide benefit. The EMA says pretty much the same in “ICH guideline E3 – questions and answers (R1).

Principles of Events in Clinical Practice

  1. Severity of the event is based on degree to which affects the subject’s rights, safety, or welfare, and/or the integrity of the resultant data
  2. Events (problems, deviations, etc) will happen at all levels of a clinical practice (Sponsor, CRO, Site, etc)
  3. Events happen beyond the Protocol. These need to be managed appropriately as well.
  4. The event needs to be categorized, evaluated and trended by the sponsor

Severity of the Event

Starting in the study planning stage, ICH E6(R2) GCP requires sponsors to identify risks to critical study processes and study data and to evaluate these risks based on likelihood, detectability and impact on subject safety and data integrity.

Sponsors then establish key quality indicators (KQIs) and quality tolerance thresholds. KQI is really just a key risk indicator and should be treated similarly.

Study events that exceed the risk threshold should trigger an evaluation to determine if action is needed. In this way, sponsors can proactively manage risk and address protocol noncompliance.

The best practice here is to have a living risk assessment for each study. Evaluate across studies to understand your overall organization risk, and look for opportunities for wide-scale mitigations. Feedup into your risk register.

Event Classification for Clinical Protocols and GCPs

Where the Event happens

Deviations in the clinical space are a great example of the management of supplier events, and at the end of the day there is little difference between a GMP supplier event management, a GLP or a GCP. The individual requirements might be different but the principles and the process are the same.

Each entity in the trial organization should have their own deviation system where they investigate deviations, performing root cause investigation and enacting CAPAs.

This is where it starts to get tricky. first of all, not all sites have the infrastructure to do this well. Second the nature of reporting, usually through the Electronic Data Capture (EDC) system, can lead to balkanization at the site. Site’s need to have strong compliance programs through compiling deviation details into a single sitewide system that allows the site to trend deviations across studies in addition to following sponsor reporting requirements.

Unfortunately too many site’s rely on the sponsor’s program. Sponsors need to be evaluating the strength of this program during site selection and through auditing.

Events Happen

Consistent Event Reporting is Critical

Deviations should be to all process, procedure and plans, and just not the protocol.

Categorizing deviations is usually a pain point and an area where more consistency needs to be driven. I recommend first having a good standard set of categorizations. The industry would benefit from adopting a standard, and I think Norman Goldfarb’s proposal is still the best.

Once you have categories, and understand to your KQIs and other aspects you need to make sure they are consistently done. The key mechanisms of this are:

  1. Training
  2. Monitoring (in all its funny permutations)
  3. Periodic evaluations and Trending

Deviations should be trended, at a minimum, in several ways:

  1. Per site per study
  2. Per site all activities
  3. All sites per study
  4. All sites all activities

And remember, trending doesn’t count of you do not analyze the problem and take appropriate CAPAs.

This will allow trends to be identified and appropriate corrective and preventive actions identified to systematically improve.

Share your stories

As we move through of careers we all have endless incidents that can either be denied and suppressed or acknowledged and framed as “falls,” “failures,” or “mistakes.” These so-called falls all enhance our professional growth. By focusing on the process of falling, and then rising back up, we are able to have a greater understanding of the choices we have made, and the consequences of our choices.

Sharing and bearing witness to stories of failure from our professional and personal lives provide opportunities for us to explore and get closer to the underlying meaning of our work, our questions of what is it that we are trying to accomplish in our work as quality professionals. Our missteps allow us to identify paths we needed to take or create new stories and new pathways to emerge within the context of our work. As we share stories of tensions, struggles, and falling down, we realized how important these experiences are in the process of learning, of crafting one’s presence as a human being among human beings, of becoming a quality professional.

We may not have asked for a journey of struggle when we decided to become quality professionals, but the process of becoming tacitly involves struggle and difficulty. There is a clear pattern among individuals who demonstrate the ability to rise strong pain and adversity in that they are able to describe their experiences, and lay meaning to it.

It is important to recognize that simply recognizing and affirming struggle, or that something is not going as it should, does not necessarily lead to productive change. To make a change and to work towards a culture of excellence we must recognize that emotions and feelings are in the game. Learning to lead is an emotionally-laden process. And early-stage professionals feel exceptionally vulnerable within this process. This field requires early-stage professionals to hone their interpersonal, technical, and organizational skills, all while turning their gaze inward to understanding how their positioning in the organization impacts can be utilized for change. Novice professionals often struggle in terms of communicating ideas orally or in writing, being able to manage multiple tasks at once, staying on top of their technical content, or even thinking critically about who they are in the broader world. Early-stage professionals are always on the brink of vulnerability.

Share your stories. Help others share theirs.

I’m organizing a PechaKucha/Ignite event as part of the ASQ’s Team and Workplace Excellence Forum to sharpen our stories. More details coming soon. Start thinking of your stories to share!

Royalty-free stock photo ID: 642783229

Barriers and root cause analysis

Barriers, or controls, are one of the (not-at-all) secret sauces of root cause analysis.

By understanding barriers, we can understand both why a problem happened and how it can be prevented in the future. An evaluation of current process controls as part of root cause analysis can help determine whether all the current barriers pertaining to the problem you are investigating were present and effective (even if they worked or not).

At its simplest it is just a three-part brainstorm:

Barrier Analysis
Barriers that failedThe barrier was in place and operational at the time of the accident, but it failed to prevent the accident.
Barriers that were not usedThe barrier was available, but workers chose not to use it.
Barriers that did not existThe barrier did not exist at the time of the event. A source of potential corrective and preventive actions (depending on what they are)
Three questions of barrier analysis

The key to this brainstorming session is to try to find all of the failed, unused, or nonexistent barriers. Do not be concerned if you are not certain which category they belong in.

Most forms of barrier analysis look at two types, technical and administrative, and we can further breakdown administrative into “human” and “organization.”

ChooseTechnicalHumanOrganization
IfA technical or engineering control existsThe control relies on a human reviewer or operatorThe control involves a transfer of responsibility. For example, a document reviewed by both manufacturing and quality.
ExamplesSeparation among manufacturing or packaging lines

 

Emergency power supply

Dedicated equipment

Barcoding

Keypad controlled doors

Separated storage for components

Software that prevents a workflow from going further if a field is not completed Redundant designs

Training and certifications

 

Use of checklist

Verification of critical task by a second person

 

Clear procedures and policies

 

Adequate supervision

Adequate load of work

Periodic process audits

These barriers are the same as current controls is in a risk assessment, which is key in a wide variety of risk assessment tools.