Barriers and Root Cause Analysis: A Comprehensive Framework

Barriers, or controls, are one of the fundamental elements of root cause analysis. By understanding barriers—including their types and functions—we can understand both why a problem happened and how it can be prevented in the future. An evaluation of current process controls as part of root cause analysis can help determine whether all the current barriers pertaining to the problem you are investigating were present and effective.

Understanding Barrier Analysis

At its simplest, barrier analysis is a three-part brainstorm that examines the status and effectiveness of safety measures:

Barrier Analysis
Barriers that failed
Barriers that were not used
Barriers that did not exist

The key to this brainstorming session is to try to find all of the failed, unused, or nonexistent barriers. Do not be concerned if you are not certain which category they belong in initially.

Types of Barriers: Technical, Human, and Organizational

Most forms of barrier analysis examine two primary types: technical and administrative. Administrative barriers can be further broken down into “human” and “organizational” categories.

ChooseTechnicalHumanOrganizational
IfA technical or engineering control existsThe control relies on a human reviewer or operatorThe control involves a transfer of responsibility. For example, a document reviewed by both manufacturing and quality.
ExamplesSeparation among manufacturing or packaging lines
Emergency power supply
Dedicated equipment
Barcoding
Keypad controlled doors
Separated storage for components
Software that prevents a workflow from going further if a field is not completed
Redundant designs
Training and certifications
Use of checklist
Verification of critical task by a second person
Clear procedures and policies
Adequate supervision
Adequate load of work
Periodic process audits

Preventive vs. Mitigative Barriers: A Critical Distinction

A fundamental aspect of barrier analysis involves understanding the difference between preventive and mitigative barriers. This distinction is crucial for comprehensive risk management and aligns with widely used frameworks such as bow-tie analysis.

Preventive Barriers

Preventive barriers are measures designed to prevent the top event from occurring. These barriers:

  • Focus on stopping incidents before they happen
  • Act as the first line of defense against threats
  • Aim to reduce the likelihood that a risk will materialize
  • Are proactive in nature, addressing potential causes before they can lead to unwanted events

Examples of preventive barriers include:

  • Regular equipment maintenance programs
  • Training and certification programs
  • Access controls and authentication systems
  • Equipment qualification protocols (IQ/OQ/PQ) validating proper installation and operation

Mitigative Barriers

Mitigative barriers are designed to reduce the impact and severity of consequences after the top event has occurred. These barriers:

  • Focus on damage control rather than prevention
  • Act to minimize harm when preventive measures have failed
  • Reduce the severity or substantially decrease the likelihood of consequences occurring
  • Are reactive in nature, coming into play after a risk has materialized

Examples of mitigative barriers include:

  • Alarm systems and response procedures
  • Containment measures for hazards
  • Emergency response teams and protocols
  • Backup power systems for critical operations

Timeline and Implementation Differences

The timing of barrier implementation and failure differs significantly between preventive and mitigative barriers:

  • Preventive barriers often fail over days, weeks, or years before the top event occurs, providing more opportunities for identification and intervention
  • Mitigative barriers often fail over minutes or hours after the top event occurs, requiring higher reliability and immediate effectiveness
  • This timing difference leads to higher reliance on mitigative barriers working correctly the first time

Enhanced Barrier Analysis Framework

Building on the traditional three-part analysis, organizations should incorporate the preventive vs. mitigative distinction into their barrier evaluation:

Enhanced Barrier Analysis
Preventive barriers that failed
Preventive barriers that were not used
Preventive barriers that did not exist
Mitigative barriers that failed
Mitigative barriers that were not used
Mitigative barriers that did not exist

Integration with Risk Assessment

These barriers are the same as current controls in risk assessment, which is key in a wide variety of risk assessment tools. The optimal approach involves balancing both preventive and mitigative barriers without placing reliance on just one type. Some companies may favor prevention by placing high confidence in their systems and practices, while others may emphasize mitigation through reactive policies, but neither approach alone is advisable as they each result in over-reliance on one type of barrier.

Practical Application

When conducting barrier analysis as part of root cause investigation:

  1. Identify all relevant barriers that were supposed to protect against the incident
  2. Classify each barrier as preventive or mitigative based on its intended function
  3. Determine the barrier type: technical, human, or organizational
  4. Assess barrier status: failed, not used, or did not exist
  5. Evaluate the balance between preventive and mitigative measures
  6. Develop corrective actions that address gaps in both preventive and mitigative barriers

This comprehensive approach to barrier analysis provides a more nuanced understanding of how incidents occur and how they can be prevented or their consequences minimized in the future. By understanding both the preventive and mitigative functions of barriers, organizations can develop more robust risk management strategies that address threats at multiple points in the incident timeline.

Risk Based Thinking

Risk-based thinking is a crucial component of modern quality management systems and consists of four key aspects: anticipate, monitor, respond, and learn. Each aspect ensures an organization can effectively manage and mitigate risks, enhancing overall performance and reliability.

Anticipate

Anticipating risks involves proactively identifying and analyzing potential risks that could impact the organization’s operations or objectives. This step is about foreseeing problems before they occur and planning how to address them. It requires a thorough understanding of the organization’s processes, the external and internal factors that could affect these processes, and the potential consequences of various risks. By anticipating risks, organizations can prepare more effectively and prevent many issues from occurring.

Monitor

Monitoring involves continuously observing and tracking the operational environment to detect risk indicators early. This ongoing process helps catch deviations from expected outcomes or standards, which could indicate the emergence of a risk. Effective monitoring relies on establishing metrics that help to quickly and accurately identify when things are starting to veer off course. This real-time data collection is crucial for enabling timely responses to potential threats.

Respond

Responding to risks is about taking appropriate actions to manage or mitigate identified risks based on their severity and potential impact. This step involves implementing the planned risk responses that were developed during the anticipation phase. The effectiveness of these responses often depends on the speed and decisiveness of the actions taken. Responses can include adjusting processes, reallocating resources, or activating contingency plans. The goal is to minimize the organization’s and its stakeholders’ negative impact.

Learn

Learning from the management of risks is a critical component that closes the loop of risk-based thinking. This aspect involves analyzing the outcomes of risk responses and understanding what worked well and what did not. Learning from these experiences is essential for continuous improvement. It helps organizations refine risk management processes, improve response strategies, and better prepare for future risks. This iterative learning process ensures that risk management efforts are increasingly effective over time.

The four aspects of risk-based thinking—anticipate, monitor, respond, and learn—form a continuous cycle that helps organizations manage uncertainties proactively. This approach protects the organization from potential downsides and enables it to seize opportunities that arise from a well-understood risk landscape. Organizations can enhance their resilience and adaptability by embedding these practices into everyday operations.

Implementing Risk-Based Thinking

1. Understand the Concept of Risk-Based Thinking

Risk-based thinking involves a proactive approach to identifying, analyzing, and addressing risks. This mindset should be ingrained in the organization’s culture and used as a basis for decision-making.

2. Identify Risks and Opportunities

Identify potential risks and opportunities. This can be achieved through various methods such as SWOT analysis, brainstorming sessions, and process mapping. It’s crucial to involve people at all levels of the organization since they can provide diverse perspectives on potential risks and opportunities.

3. Analyze and Prioritize Risks

Once risks and opportunities are identified, they should be analyzed to understand their potential impact and likelihood. This analysis will help prioritize which risks need immediate attention and which opportunities should be pursued.

4. Plan and Implement Responses

After prioritizing, develop strategies to address these risks and opportunities. Plans should include preventive measures for risks and proactive steps to seize opportunities. Integrating these plans into the organization’s overall strategy and daily operations is important to ensure they are effective.

5. Monitor and Review

Implementing risk-based thinking is not a one-time activity but an ongoing process. Regular monitoring and reviewing of risks, opportunities, and the effectiveness of responses are crucial. This can be done through regular audits, performance evaluations, and feedback mechanisms. Adjustments should be made based on these reviews to improve the risk management process.

6. Learn and Improve

Organizations should learn from their experiences in managing risks and opportunities. This involves analyzing what worked well and what didn’t and using this information to improve future risk management efforts. Continuous improvement should be a key goal, aligning with the Plan-Do-Check-Act (PDCA) cycle.

7. Documentation and Compliance

Maintaining proper documentation is essential for tracking and managing risk-based thinking activities. Documents such as risk registers, action plans, and review reports should be updated and readily available.

8. Training and Culture

Training and cultural adaptation are necessary to implement risk-based thinking effectively. All employees should be trained on the principles of risk-based thinking and how to apply them in their roles. Creating a culture encouraging open communication about risks and supporting risk-taking within defined limits is also vital.

Evaluating Controls as Part of Risk Management

When I teach an introductory risk management class, I usually use an icebreaker of “What is the riskiest activity you can think of doing. Inevitably you will get some version of skydiving, swimming with sharks, jumping off bridges. This activity is great because it starts all conversations around likelihood and severity. At heart, the question brings out the concept of risk important activities and the nature of controls.

The things people think of, such as skydiving, are great examples of activities that are surrounded by activities that control risk. The very activity is based on accepting reducing risk as low as possible and then proceeding in the safest possible pathway. These risk important activities are the mechanism just before a critical step that:

  1. Ensure the appropriate transfer of information and skill
  2. Ensure the appropriate number of actions to reduce risk
  3. Influence the presence or effectiveness of barriers
  4. Influence the ability to maintain positive control of the moderation of hazards

Risk important activities is a concept important to safety-thought and are at the center of a lot of human error reduction tools and practices. Risk important activities are all about thinking through the right set of controls, building them into the procedure, and successfully executing them before reaching the critical step of no return. Checklists are a great example of this mindset at work, but there are a ton of ways of doing them.

In the hospital they use a great thought process, “Five rights of Safe Medication Practices” that are: 1) right patient, 2) right drug, 3) right dose, 4) right route, and 5) right time. Next time you are getting medication in the doctor’s office or hospital evaluate just what your caregiver is doing and how it fits into that process. Those are examples of risk important activities.

Assessing controls during risk assessment

Risk is affected by the overall effectiveness of any controls that are in place.

The key aspects of controls are:

  • the mechanism by which the controls are intended to modify risk
  • whether the controls are in place, are capable of operating as intended, and are achieving the expected results
  • whether there are shortcomings in the design of controls or the way they are applied
  • whether there are gaps in controls
  • whether controls function independently, or if they need to function collectively to be effective
  • whether there are factors, conditions, vulnerabilities or circumstances that can reduce or eliminate control effectiveness including common cause failures
  • whether controls themselves introduce additional risks.

A risk can have more than one control and controls can affect more than one risk.

We always want to distinguish between controls that change likelihood, consequences or both, and controls that change how the burden of risk is shared between stakeholders

Any assumptions made during risk analysis about the actual effect and reliability of controls should be validated where possible, with a particular emphasis on individual or combinations of controls that are assumed to have a substantial modifying effect. This should take into account information gained through routine monitoring and review of controls.

Risk Important Activities, Critical Steps and Process

Critical steps are the way we meet our critical-to-quality requirements. The activities that ensure our product/service meets the needs of the organization.

These critical steps are the points of no-return, the point where the work-product is transformed into something else. Risk important activities are what we do to remove the danger of executing that critical step.

Beyond that critical step, you have rejection or rework. When I am cooking there is a lot of prep work which can be a mixture of critical steps, from which there is no return. I break the egg wrong and get eggshells in my batter, there is a degree of rework necessary. This is true for all our processes.

The risk-based approach to the process is to understand the critical steps and mitigate controls.

We are thinking through the following:

  • Critical Step: The action that triggers irreversibility. Think in terms of critical-to-quality attributes.
  • Input: What came before in the process
  • Output: The desired result (positive) or the possible difficulty (negative)
  • Preconditions: Technical conditions that must exist before the critical step
  • Resources: What is needed for the critical step to be completed
  • Local factors: Things that could influence the critical step. When human beings are involved, this is usually what can influence the performer’s thinking and actions before and during the critical step
  • Defenses: Controls, barriers and safeguards

Risk Management Mindset

Good risk management requires a mindset that includes the following attributes:

  • Expect to be surprised: Our processes are usually underspecified and there is a lot of hidden knowledge. Risk management serves to interrogate the unknowns
  • Possess a chronic sense of unease: There is no such thing as perfect processes, procedures, training, design, planning. Past performance is not a guarantee of future success.
  • Bend, not break: Everything is dynamic, especially risk. Quality comes from adaptability.
  • Learn: Learn from what goes well, from mistakes, have a learning culture
  • Embrace humility: No one knows everything, bring those in who know what you do not.
  • Acknowledge differences between work-as-imagined and work-as-done: Work to reduce the differences.
  • Value collaboration: Diversity of input
  • Drive out subjectivity: Understand how opinions are formed and decisions are made.
  • Systems Thinking: Performance emerges from complex, interconnected and interdependent systems and their components

The Role of Monitoring

One cannot control risk, or even successfully identify it unless a system is able flexibly to monitor both its own performance (what happens inside the system’s boundary) and what happens in the environment (outside the system’s boundary). Monitoring improves the ability to cope with possible risks

When performing the risk assessment, challenge existing monitoring and ensure that the right indicators are in place. But remember, monitoring itself is a low-effectivity control.

Ensure that there are leading indicators, which can be used as valid precursors for changes and events that are about to happen.

For each monitoring control, as yourself the following:

IndicatorHow have the indicators been defined? (By analysis, by tradition, by industry consensus, by the regulator, by international standards, etc.)
RelevanceWhen was the list created? How often is it revised? On which basis is it revised? Who is responsible for maintaining the list?
TypeHow many of the indicators are of the ‘leading,’ type and how many are of the lagging? Do indicators refer to single or aggregated measurements?
ValidityHow is the validity of an indicator established (regardless of whether it is leading or lagging)? Do indicators refer to an articulated process model, or just to ‘common sense’?
DelayFor lagging indicators, how long is the typical lag? Is it acceptable?
Measurement typeWhat is the nature of the measurements? Qualitative or quantitative? (If quantitative, what kind of scaling is used?)
Measurement frequencyHow often are the measurements made? (Continuously, regularly, every now and then?)
AnalysisWhat is the delay between measurement and analysis/interpretation? How many of the measurements are directly meaningful and how many require analysis of some kind? How are the results communicated and used?
StabilityAre the measured effects transient or permanent?
Organization SupportIs there a regular inspection scheme or -schedule? Is it properly resourced? Where does this measurement fit into the management review?

Key risk indicators come into play here.

Hierarchy of Controls

Not every control is the same. This principle applies to both current control and planning future controls.

Build Key Risk Indicators

We perform risk assessments; execute risk mitigations; and we end up with four types of inherent risks (parenthesis is opportunities) in our risk register:

  1. Mitigated (or enhanced)
  2. Avoided (or exploited)
  3. Transferred (or shared)
  4. Accepted

We’ve built a set of risk response plans to ensure we are continuing to treat these risks. And now we need to monitor the effectiveness of our risk plan and to ensure that the risks are behaving in the manner anticipated during risk treatment.

The living risk assessment is designed to conduct reassessment of risks after treatment and continuously throughout the life cycle. However, not all systems and risks need to be reassessed continually, and the organization should prioritize which systems should be reassessed based on a schedule.

Identify indicators that inform the organization about the status of the risk without having to conduct a full risk assessment every time. The trending status of these indicators can act as a flag for investigations, which may result in complete risk assessments.

This risk indicator is then a metric that indicates the state of the level of risk. It is important to note that not all indicators show the exact level of risk exposure, instead providing a trend of drivers, causes or intermediary effects of risk.

The most important risks can be categorized as key risks and the indicators for these key risks are known as key risk indicators (KRIs) which can be defined as: A metric that provides a leading or lagging indicator of the current state of risk exposure on key objectives. KRIs can be used to continually assess current and predict potential risk exposures.

These KRIs need to have a strong relationship with the key performance indicators of the organization.

KRIs are monitored through Quality Management Review.

A good rule of thumb is as you identify the key performance indicators to assess the performance of a specific process, product, system or function you then identify the risks and the KRIs for that objective.

Strive to have leading indicators that measure the elements that influences the risk performance. Lagging indicators will measure they actual performance of the risk controls.

These KRIs qualitatively or quantitatively present the risk exposure by having a strong relationship qirh the risk, its intermediate output or its drivers.

Let’s think in terms of a pharmaceutical supply chain. We’ve done our risk assessments and end up with a top level view like this:

For the risk column we should have some good probabilities and impacts and mitigations in place. We can then chose some KRIs to monitor, such as

  1. Nonconformance rate
  2. Supplier score card
  3. Lab error rate
  4. Product Complaints

As we develop, our KRIs can get more specific and focused. A good KRI is:

  • Quantifiable
  • Measurable (accurately and precisely) 
  • Can be validated (have a high level of confidence) 
  • Relevant (measuring the right thing associated with decisions) 

In developing a KRI to serve as a leading indicator for potential future occurrences of a risk, it can be helpful to think through the chain of events that led to the event so that management can uncover the ultimate driver (i.e., root cause(s)) of the risk event. When KRIs for root cause events and intermediate events are monitored, we are in an enviable position to identify early mitigation strategies that can begin to reduce or eliminate the impact associated with an emerging risk event.

These KRIs will help us monitor and quantify our risk exposure. They help our organizations compare business objectives and strategy to actual performance to isolate changes, measure the effectiveness of processes or projects, and demonstrate changes in the frequency or impact of a specific risk event.

Effective KRIs can provide value to the organization in a variety of ways. Potential value may be derived from each of the following contributions:

  • Risk Appetite – KRIs require the determination of appropriate thresholds for action at different levels within the organization. By mapping KRI measures to identified risk appetite and tolerance levels, KRIs can be a useful tool for better articulating the risk appetite that best represents the organizational mindset.
  • Risk and Opportunity Identification – KRIs can be designed to alert management to trends that may adversely affect the achievement of organizational objectives or may indicate the presence of new opportunities.
  • Risk Treatment – KRIs can initiate action to mitigate developing risks by serving as triggering mechanisms. KRIs can serve as controls by defining limits to certain actions.