Barriers and Root Cause Analysis: A Comprehensive Framework

Barriers, or controls, are one of the fundamental elements of root cause analysis. By understanding barriers—including their types and functions—we can understand both why a problem happened and how it can be prevented in the future. An evaluation of current process controls as part of root cause analysis can help determine whether all the current barriers pertaining to the problem you are investigating were present and effective.

Understanding Barrier Analysis

At its simplest, barrier analysis is a three-part brainstorm that examines the status and effectiveness of safety measures:

Barrier Analysis
Barriers that failed
Barriers that were not used
Barriers that did not exist

The key to this brainstorming session is to try to find all of the failed, unused, or nonexistent barriers. Do not be concerned if you are not certain which category they belong in initially.

Types of Barriers: Technical, Human, and Organizational

Most forms of barrier analysis examine two primary types: technical and administrative. Administrative barriers can be further broken down into “human” and “organizational” categories.

Choose	Technical	Human	Organizational
If	A technical or engineering control exists	The control relies on a human reviewer or operator	The control involves a transfer of responsibility. For example, a document reviewed by both manufacturing and quality.
Examples	Separation among manufacturing or packaging lines Emergency power supply Dedicated equipment Barcoding Keypad controlled doors Separated storage for components Software that prevents a workflow from going further if a field is not completed Redundant designs	Training and certifications Use of checklist Verification of critical task by a second person	Clear procedures and policies Adequate supervision Adequate load of work Periodic process audits

Preventive vs. Mitigative Barriers: A Critical Distinction

A fundamental aspect of barrier analysis involves understanding the difference between preventive and mitigative barriers. This distinction is crucial for comprehensive risk management and aligns with widely used frameworks such as bow-tie analysis.

Preventive Barriers

Preventive barriers are measures designed to prevent the top event from occurring. These barriers:

Focus on stopping incidents before they happen
Act as the first line of defense against threats
Aim to reduce the likelihood that a risk will materialize
Are proactive in nature, addressing potential causes before they can lead to unwanted events

Examples of preventive barriers include:

Regular equipment maintenance programs
Training and certification programs
Access controls and authentication systems
Equipment qualification protocols (IQ/OQ/PQ) validating proper installation and operation

Mitigative Barriers

Mitigative barriers are designed to reduce the impact and severity of consequences after the top event has occurred. These barriers:

Focus on damage control rather than prevention
Act to minimize harm when preventive measures have failed
Reduce the severity or substantially decrease the likelihood of consequences occurring
Are reactive in nature, coming into play after a risk has materialized

Examples of mitigative barriers include:

Alarm systems and response procedures
Containment measures for hazards
Emergency response teams and protocols
Backup power systems for critical operations

Timeline and Implementation Differences

The timing of barrier implementation and failure differs significantly between preventive and mitigative barriers:

Preventive barriers often fail over days, weeks, or years before the top event occurs, providing more opportunities for identification and intervention
Mitigative barriers often fail over minutes or hours after the top event occurs, requiring higher reliability and immediate effectiveness
This timing difference leads to higher reliance on mitigative barriers working correctly the first time

Enhanced Barrier Analysis Framework

Building on the traditional three-part analysis, organizations should incorporate the preventive vs. mitigative distinction into their barrier evaluation:

Enhanced Barrier Analysis
Preventive barriers that failed
Preventive barriers that were not used
Preventive barriers that did not exist
Mitigative barriers that failed
Mitigative barriers that were not used
Mitigative barriers that did not exist

Integration with Risk Assessment

These barriers are the same as current controls in risk assessment, which is key in a wide variety of risk assessment tools. The optimal approach involves balancing both preventive and mitigative barriers without placing reliance on just one type. Some companies may favor prevention by placing high confidence in their systems and practices, while others may emphasize mitigation through reactive policies, but neither approach alone is advisable as they each result in over-reliance on one type of barrier.

Practical Application

When conducting barrier analysis as part of root cause investigation:

Identify all relevant barriers that were supposed to protect against the incident
Classify each barrier as preventive or mitigative based on its intended function
Determine the barrier type: technical, human, or organizational
Assess barrier status: failed, not used, or did not exist
Evaluate the balance between preventive and mitigative measures
Develop corrective actions that address gaps in both preventive and mitigative barriers

This comprehensive approach to barrier analysis provides a more nuanced understanding of how incidents occur and how they can be prevented or their consequences minimized in the future. By understanding both the preventive and mitigative functions of barriers, organizations can develop more robust risk management strategies that address threats at multiple points in the incident timeline.

Evaluating Controls as Part of Risk Management

When I teach an introductory risk management class, I usually use an icebreaker of “What is the riskiest activity you can think of doing. Inevitably you will get some version of skydiving, swimming with sharks, jumping off bridges. This activity is great because it starts all conversations around likelihood and severity. At heart, the question brings out the concept of risk important activities and the nature of controls.

The things people think of, such as skydiving, are great examples of activities that are surrounded by activities that control risk. The very activity is based on accepting reducing risk as low as possible and then proceeding in the safest possible pathway. These risk important activities are the mechanism just before a critical step that:

Ensure the appropriate transfer of information and skill
Ensure the appropriate number of actions to reduce risk
Influence the presence or effectiveness of barriers
Influence the ability to maintain positive control of the moderation of hazards

Risk important activities is a concept important to safety-thought and are at the center of a lot of human error reduction tools and practices. Risk important activities are all about thinking through the right set of controls, building them into the procedure, and successfully executing them before reaching the critical step of no return. Checklists are a great example of this mindset at work, but there are a ton of ways of doing them.

In the hospital they use a great thought process, “Five rights of Safe Medication Practices” that are: 1) right patient, 2) right drug, 3) right dose, 4) right route, and 5) right time. Next time you are getting medication in the doctor’s office or hospital evaluate just what your caregiver is doing and how it fits into that process. Those are examples of risk important activities.

Assessing controls during risk assessment

Risk is affected by the overall effectiveness of any controls that are in place.

The key aspects of controls are:

the mechanism by which the controls are intended to modify risk
whether the controls are in place, are capable of operating as intended, and are achieving the expected results
whether there are shortcomings in the design of controls or the way they are applied
whether there are gaps in controls
whether controls function independently, or if they need to function collectively to be effective
whether there are factors, conditions, vulnerabilities or circumstances that can reduce or eliminate control effectiveness including common cause failures
whether controls themselves introduce additional risks.

A risk can have more than one control and controls can affect more than one risk.

We always want to distinguish between controls that change likelihood, consequences or both, and controls that change how the burden of risk is shared between stakeholders

Any assumptions made during risk analysis about the actual effect and reliability of controls should be validated where possible, with a particular emphasis on individual or combinations of controls that are assumed to have a substantial modifying effect. This should take into account information gained through routine monitoring and review of controls.

Risk Important Activities, Critical Steps and Process

Critical steps are the way we meet our critical-to-quality requirements. The activities that ensure our product/service meets the needs of the organization.

These critical steps are the points of no-return, the point where the work-product is transformed into something else. Risk important activities are what we do to remove the danger of executing that critical step.

Beyond that critical step, you have rejection or rework. When I am cooking there is a lot of prep work which can be a mixture of critical steps, from which there is no return. I break the egg wrong and get eggshells in my batter, there is a degree of rework necessary. This is true for all our processes.

The risk-based approach to the process is to understand the critical steps and mitigate controls.

We are thinking through the following:

Critical Step: The action that triggers irreversibility. Think in terms of critical-to-quality attributes.
Input: What came before in the process
Output: The desired result (positive) or the possible difficulty (negative)
Preconditions: Technical conditions that must exist before the critical step
Resources: What is needed for the critical step to be completed
Local factors: Things that could influence the critical step. When human beings are involved, this is usually what can influence the performer’s thinking and actions before and during the critical step
Defenses: Controls, barriers and safeguards

Risk Management Mindset

Good risk management requires a mindset that includes the following attributes:

Expect to be surprised: Our processes are usually underspecified and there is a lot of hidden knowledge. Risk management serves to interrogate the unknowns
Possess a chronic sense of unease: There is no such thing as perfect processes, procedures, training, design, planning. Past performance is not a guarantee of future success.
Bend, not break: Everything is dynamic, especially risk. Quality comes from adaptability.
Learn: Learn from what goes well, from mistakes, have a learning culture
Embrace humility: No one knows everything, bring those in who know what you do not.
Acknowledge differences between work-as-imagined and work-as-done: Work to reduce the differences.
Value collaboration: Diversity of input
Drive out subjectivity: Understand how opinions are formed and decisions are made.
Systems Thinking: Performance emerges from complex, interconnected and interdependent systems and their components

The Role of Monitoring

One cannot control risk, or even successfully identify it unless a system is able flexibly to monitor both its own performance (what happens inside the system’s boundary) and what happens in the environment (outside the system’s boundary). Monitoring improves the ability to cope with possible risks

When performing the risk assessment, challenge existing monitoring and ensure that the right indicators are in place. But remember, monitoring itself is a low-effectivity control.

Ensure that there are leading indicators, which can be used as valid precursors for changes and events that are about to happen.

For each monitoring control, as yourself the following:

Indicator	How have the indicators been defined? (By analysis, by tradition, by industry consensus, by the regulator, by international standards, etc.)
Relevance	When was the list created? How often is it revised? On which basis is it revised? Who is responsible for maintaining the list?
Type	How many of the indicators are of the ‘leading,’ type and how many are of the lagging? Do indicators refer to single or aggregated measurements?
Validity	How is the validity of an indicator established (regardless of whether it is leading or lagging)? Do indicators refer to an articulated process model, or just to ‘common sense’?
Delay	For lagging indicators, how long is the typical lag? Is it acceptable?
Measurement type	What is the nature of the measurements? Qualitative or quantitative? (If quantitative, what kind of scaling is used?)
Measurement frequency	How often are the measurements made? (Continuously, regularly, every now and then?)
Analysis	What is the delay between measurement and analysis/interpretation? How many of the measurements are directly meaningful and how many require analysis of some kind? How are the results communicated and used?
Stability	Are the measured effects transient or permanent?
Organization Support	Is there a regular inspection scheme or -schedule? Is it properly resourced? Where does this measurement fit into the management review?

Key risk indicators come into play here.

Hierarchy of Controls

Not every control is the same. This principle applies to both current control and planning future controls.

Human Performance and Data Integrity

Gilbert’s Behavior Engineering Model (BEM) presents a concise way to consider both the environmental and the individual influences on a person’s behavior. The model suggests that a person’s environment supports impact to one’s behavior through information, instrumentation, and motivation. Examples include feedback, tools, and financial incentives (respectively), to name a few. The model also suggests that an individual’s behavior is influenced by their knowledge, capacity, and motives. Examples include training/education, physical or emotional limitations, and what drives them (respectively), to name a few. Let’s look at some further examples to better understand the variability of individual behavioral influences to see how they may negatively impact data integrity.
Kip Wolf “People: The Most Persistent Risk To Data Integrity“

Good article in Pharmaceutical Online last week. It cannot be stated enough, and it is good that folks like Kip keep saying it — to understand data integrity we need to understand behavior — what people do and say — and realize it is a means to an end. It is very easy to focus on the behaviors which are observable acts that can be seen and heard by management and auditors and other stakeholders but what is more critical is to design systems to drive the behaviors we want. To recognize that behavior and its causes are extremely valuable as the signal for improvement efforts to anticipate, prevent, catch, or recover from errors.

By realizing that error-provoking aspects of design, procedures, processes, and human nature exist throughout our organizations. And people cannot perform better than the organization supporting them.

Design Consideration

Human Error Considerations

Manage Controls

Define the Scope of Work

· Identify the critical steps

· Consider the possible errors associated with each critical step and the likely consequences.

· Ponder the "worst that could happen."

· Consider the appropriate human performance tool(s) to use.

· Identify other controls, contingencies, and relevant operating experience.

When tasks are identified and prioritized, and resources

are properly allocated (e.g., supervision, tools, equipment, work control, engineering support, training), human performance can flourish.

These organizational factors create a unique array of job-site conditions – a good work environment – that sets people up for success. Human error increases when expectations are not set, tasks are not clearly identified, and resources are not available to carry out the job.

The error precursors – conditions that provoke error – are reduced. This includes things such as:

· Unexpected conditions

· Workarounds

· Departures from the routine

· Unclear standards

· Need to interpret requirements

Properly managing controls is

dependent on the elimination of error precursors that challenge the integrity of controls and allow human error to become consequential.

Apply proactive Risk Management

When risk is properly analyzed we can take appropriate action to mitigate the risks. Include the criteria in risk assessments:

· Adverse environmental conditions (e.g. impact of gowning, noise, temperature, etc)

· Unclear roles/responsibilities

· Time pressures

· High workload

· Confusing displays or controls

Addressing risk through engineering and administrative controls are a cornerstone of a quality system.

Strong administrative and cultural controls can withstand human error. Controls are weakened when conditions are present that provoke error.

Eliminating error precursors

in the workplace reduces

the incidences of active errors.

Perform Work

Utilizing error reduction tools as part of all work. Examples include:

· Self-checking

o Questioning attitude

o Stop when unsure

o Effective communication

o Procedure use and adherence

o Peer-checking

o Second-person verifications

o Turnovers

Engineering Controls can often take the place of some of these, for example second-person verifications can be replaced by automation.

Appropriate process and tools in place to ensure that the organizational processes and values are in place to adequately support performance.

Because people err and make mistakes, it is all the more important that controls are implemented and properly maintained.

Feedback and Improvement

Continuous improvement is critical. Topics should include:

· Surprises or unexpected outcomes.

· Usability and quality of work documents

· Knowledge and skill shortcomings

· Minor errors during the activity

· Unanticipated workplace conditions

· Adequacy of tools and Resources

· Quality of work planning/scheduling

· Adequacy of supervision

Errors during work are inevitable. If we strive to understand and address even inconsequential acts we can strengthen controls and make future performance better.

Vulnerabilities with controls can be found and corrected when management decides it is important enough to devote resources to the effort

The fundamental aim of oversight is to improve resilience to significant events triggered by active errors in the workplace—that is, to minimize the severity of events.

Oversight controls provide opportunities to see what is happening, to identify specific vulnerabilities or performance gaps, to take action to address those vulnerabilities and performance gaps, and to verify that they have been resolved.

Risk Based Data Integrity Assessment

A quick overview. The risk-based approach will utilize three factors, data criticality, existing controls, and level of detection.

When assessing current controls, technical controls (properly implemented) are stronger than operational or organizational controls as they can eliminate the potential for data falsification or human error rather than simply reducing/detecting it.

For criticality, it helps to build a table based on what the data is used for. For example:

For controls, use a table like the one below. Rank each column and then multiply the numbers together to get a final control ranking. For example, if a process has Esign (1), no access control (3), and paper archival (2) then the control ranking would be 6 (1 x 3 x 2).

Determine detectibility on the table below, rank each column and then multiply the numbers together to get a final detectability ranking.

Another way to look at these scores:

Multiple above to determine a risk ranking and move ahead with mitigations. Mitigations should be to drive risk as low as possible, though the following table can be used to help determine priority.

Risk Rating	Action	Mitigation
>25	High Risk-Potential Impact to Patient Safety or Product Quality	Mandatory
12-25	Moderate Risk-No Impact to Patient Safety or Product Quality but Potential Regulatory Risk	Recommended
<12	Negligible DI Risk	Not Required

In the case of long-term risk remediation actions, risk reducing short-term actions shall be implemented to reduce risk and provide an acceptable level of governance until the long-term remediation actions are completed.

Relevant site procedures (e.g., change control, validation policy) should outline the scope of additional testing through the change management process.

Reassessment of the system may be completed following the completion of remediation activities. The reassessment may be done at any time during the remediation process to document the impact of the remediation actions.

Once final remediation is complete, a reassessment of the equipment/system should be completed to demonstrate that the risk rating has been mitigated by the remediation actions taken. Think living risk assessment.

Barriers and root cause analysis

Barriers, or controls, are one of the (not-at-all) secret sauces of root cause analysis.

By understanding barriers, we can understand both why a problem happened and how it can be prevented in the future. An evaluation of current process controls as part of root cause analysis can help determine whether all the current barriers pertaining to the problem you are investigating were present and effective (even if they worked or not).

At its simplest it is just a three-part brainstorm:

Barrier Analysis
Barriers that failed	The barrier was in place and operational at the time of the accident, but it failed to prevent the accident.
Barriers that were not used	The barrier was available, but workers chose not to use it.
Barriers that did not exist	The barrier did not exist at the time of the event. A source of potential corrective and preventive actions (depending on what they are)

Three questions of barrier analysis

The key to this brainstorming session is to try to find all of the failed, unused, or nonexistent barriers. Do not be concerned if you are not certain which category they belong in.

Most forms of barrier analysis look at two types, technical and administrative, and we can further breakdown administrative into “human” and “organization.”

Choose

Technical

Human

Organization

A technical or engineering control exists

The control relies on a human reviewer or operator

The control involves a transfer of responsibility. For example, a document reviewed by both manufacturing and quality.

Examples

Separation among manufacturing or packaging lines

Emergency power supply

Dedicated equipment

Barcoding

Keypad controlled doors

Separated storage for components

Software that prevents a workflow from going further if a field is not completed Redundant designs

Training and certifications

Use of checklist

Verification of critical task by a second person

Clear procedures and policies

Adequate supervision

Adequate load of work

Periodic process audits

These barriers are the same as current controls is in a risk assessment, which is key in a wide variety of risk assessment tools.