Computer Software Assurance Draft

The FDA published on 13-Sep-2022 the long-awaited draft of the guidance “Computer Software Assurance for Production and Quality System Software,” and you may, based on all the emails and posting be wondering just how radical a change this is.

It’s not. This guidance is just one big “calm down people” letter from the agency. They publish these sorts of guidance every now and then because we as an industry can sometimes learn the wrong lessons.

This guidance states:

  1. Determine intended use
  2. Perform a risk assessment
  3. Perform activities to the required level

I wrote about this approach in “Risk Based Data Integrity Assessment,” and it has existed in GAMP5 and other approaches for years.

So read the guidance, but don’t panic. You are either following it already or you just need to spend some time getting better at risk assessments and creating some matrix approaches.

Thinking of Swiss Cheese: Reason’s Theory of Active and Latent Failures

The Theory of Active and Latent Failures was proposed by James Reason in his book, Human Error. Reason stated accidents within most complex systems, such as health care, are caused by a breakdown or absence of safety barriers across four levels within a system. These levels can best be described as Unsafe Acts, Preconditions for Unsafe Acts, Supervisory Factors, and Organizational Influences. Reason used the term “active failures” to describe factors at the Unsafe Acts level, whereas “latent failures” was used to describe unsafe conditions higher up in the system.

This is represented as the Swiss Cheese model, and has become very popular in root cause analysis and risk management circles and widely applied beyond the safety world.

Swiss Cheese Model

In the Swiss Cheese model, the holes in the cheese depict the failure or absence of barriers within a system. Such occurrences represent failures that threaten the overall integrity of the system. If such failures never occurred within a system (i.e., if the system were perfect), then there would not be any holes in the cheese. We would have a nice Engelberg cheddar.

Not every hole that exists in a system will lead to an error. Sometimes holes may be inconsequential. Other times, holes in the cheese may be detected and corrected before something bad happens. This process of detecting and correcting errors occurs all the time.

The holes in the cheese are dynamic, not static. They open and close over time due to many factors, allowing the system to function appropriately without catastrophe. This is what human factors engineers call “resilience.” A resilient system is one that can adapt and adjust to changes or disturbances.

Holes in the cheese open and close at different rates. The rate at which holes pop up or disappear is determined by the type of failure the hole represents.

  1. Holes that occur at the Unsafe Acts level, and even some at the Preconditions level, represent active failures. Active failures usually occur during the activity of work and are directly linked to the bad outcome. Active failures change during the process of performing, opening, and closing over time as people make errors, catch their errors, and correct them.
  2. Latent failures occur higher up in the system, above the Unsafe Acts level — the Organizational, Supervisory, and Preconditions levels. These failures are referred to as “latent” because when they occur or open, they often go undetected. They can lie “dormant” or “latent” in the system for an extended period of time before they are recognized. Unlike active failures, latent failures do not close or disappear quickly.

Most events (harms) are associated with multiple active and latent failures. Unlike the typical Swiss Cheese diagram above, which shows an arrow flying through one hole at each level of the system, there can be a variety of failures at each level that interact to produce an event. In other words, there can be several failures at the Organizational, Supervisory, Preconditions, and Unsafe Acts levels that all lead to harm. The number of holes in the cheese associated with events are more frequent at the Unsafe Acts and Preconditions levels, but (usually) become fewer as one progresses upward through the Supervisory and Organizational levels.

Given the frequency and dynamic nature of activities, there are more opportunities for holes to open up at the Unsafe and Preconditions levels on a frequent basis and there are often more holes identified at these levels during root cause investigation and risk assessments.

The way the holes in the cheese interact across levels is important:

  • One-to-many mapping of causal factors is when a hole at a higher level (e.g., Preconditions) may result in several holes at a lower level (e.g. Unsafe acts)
  • Many-to-one mapping of causal factors when multiple holes at the higher level (e.g. preconditions) might interact to produce a single hole at the lower level (e.g. Unsafe Acts)

By understand the Swiss Cheese model, and Reason’s wider work in Active and Latent Failures, we can strengthen our approach to problem-solving.

Plus cheese is cool.

Swiss Cheese on a cheese board with knife

The Risk Question

The risk question established the purpose and scope – the context of the risk assessment. This step is critical since it sets the risk assessment’s direction, tone, and expectations.  From this risk question stems the risk team; the degree, extent, or rigor of the assessment; the risk assessment methodologies; the risk criteria; and levels of acceptable risk.

The risk problem needs to be clear, concise, and well understood by all stakeholders. Every successful risk assessment needs a tightly defined beginning and end, so the assessment team can set good boundaries for the assessment with internal (resources, knowledge, culture, values, etc) and external (technology, legal, regulatory, economy, perceptions of external stakeholders, etc) parameters in mind.

To ensure the risk team focuses on the correct elements, the risk question should clearly explain what is expected. For example:

  • For a risk assessment of potential emergencies/disasters, should the assessment be limited to emergencies/disasters at facility sites or include events off-site? Should it include natural, manmade, or technological emergencies/disasters, or all of them?
  • If the hazards associated with the job of repairing a porch as to be assessed, would it just cover the actual porch repair, or would it include hazards like setting up the space, bringing materials on site, and the hazards associated with use/not-use of the porch?
  • If the risk assessment covers getting a new family dog does it include just those associated with the dog, or does it include changes to the schedule or even next year’s vacation?

Setting the scope too narrow on the risk question might prevent a hazard and the resulting risk from being identified and assessed or making it too broad could prevent the risk assessment from getting to the real purpose.

Risk questions can be broken down in a tree structure to more define scopes, which can help drive effective teams.

For example, if we are doing a risk assessment on changing the family’s diet, it might look like this:

The current draft of ICH Q9 places a lot of importance on the risk question, rightfully so. As a tool it helps focus and define the risk assessment, producing better results.

Preliminary Hazard Analysis

The Preliminary Hazard Analysis (PHA) is a risk tool that is used during initial design and development, thus the name “preliminary”, to identify systematic hazards that affect the intended function of the design to provide an opportunity to modify requirements that will help avoid issues in the design.

Like a fair amount of tools used in risk, the PHA was created by the US Army. ANSI/ASSP Z.590.3 “Prevention through Design, Guidelines for Addressing Occupational Hazards and Risks in Design and Redesign Processes” makes this one of the eight risk assessment tools everyone should know.

Taking the time to perform a PHA early on in the design will speed up the design process and avoid costly mistakes. Any identified hazards that cannot be avoided or eliminated are then controlled so that the risk is reduced to an acceptable level.

PHAs can also be used to examine existing systems, prioritize risk levels and select those systems requiring further study. The use of a single PHA may also be appropriate for simple, less compelx systems.

Main steps of PHA

A. Identify Hazards

Like a Structured What-If, the Preliminary Hazard Analysis benefits from an established list of general categories:

  • by the source of risk: raw materials, environmental, equipment, usability and human factors, safety hazards, etc.
  • by consequence, aspects or dimensions of objectives or performance

Based on the established list, a preliminary hazard list is identified which lists the potential, significant hazards associated with a design. The purpose of the preliminary hazard list is to initially identify the most evident or worst-credible hazards that could occur in the system being designed. Such hazards may be inherent to the design or created by the interaction with other systems/environment/etc.

A team should be involved in collecting and reviewing.

B. Sequence of Events

Once the hazards are identified, the sequence of events that leads from each hazard to various hazardous situations is identified.

C. Hazardous Situation

For each sequence of events, we identify one or more hazardous situations.

D. Impact

For each hazardous situation, we identify one or more outcomes (or harms).

E. Severity and occurrence of the impact

Based on the identified outcomes/harms the severity is determined. An occurrence or probability is determined for each sequence of events that leads from the hazard to the hazardous situation to the outcome.

Based on severity and likelihood of occurrence a risk level is determined.

From hazard to a variety of harms

I tend to favor a 5×5 matrix for a PHA, though some use 3×3, and I’ve even seen 4×5.

Intended outcomes

Likelihood of Occurrence

Severity Rating

Impact to failure scale

1

Very unlikely

2

Likely

3

Possible

4

Likely

5

Very Likely

5

Complete failure

5

10

15

20

25

4

Maximum tolerable failure

4

8

12

16

20

3

Maximum anticipated failure

3

6

9

12

15

2

Minimum anticipated failure

2

4

6

8

10

1

Negligible

1

2

3

4

5

Very high risk: 15 or greater, High risk 9-14, Medium risk 5-8, Low risk 1-4

 

F. Risk Control Measures

Based on the risk level risk controls and developed and applied. These risk controls will help the design team create new requirements that will drive the design.

On-going risks should be evaluated for the risk register.

Risk Assessment for Environmental Monitoring

Maybe you’ve been there too, you need to take a risk-based approach to determine environmental monitoring, so you go to a HAACP or FMEA and realize those tools just do not work to provide information to determine how to distribute monitoring to best verify that processes are operating under control.

What you want to do is build a heat map showing the relative probability of contamination in a defined area or room| covering six areas:

  1. Amenability of equipment and surfaces to cleaning and sanitization
  2. Personnel presence and flow
  3. Material flow
  4. Proximity to open product or exposed direct product-contact material
  5. Interventions/operations by personnel and their complexity
  6. Frequency of interventions/process operations.

This approach builds off of the design activities and is part of a set of living risk assessments that inform the environmental monitoring part of your contamination control strategy.

Hope to see you in Bethesda to discuss more!