The Risks of Nonspecificity in Work-As-Prescribed

There are a lot of ways to discuss uncertainty, and narrow down on vaguess and unspecificity, following Smithson’s model of Ignorance.

Different Kinds of Unknowns, Source: Smithson (1989, p. 9); also in Bammer et al. (2008, p. 294).

An alternative way to look at uncertainty is offered by Klir, which adds discord to the mix.

Work-As-Prescribed can be a real avenue for all three of these uncertainties. But by using risk management to examine the possibilities of these uncertainties we can truly interrogate. This is one of the things we mean by risk management and knowledge management being bound at the hip as enablers.

To do this we need to make sure that:

  • There is the management of information quality. Management of information quality is crucial in risk management because uncertainty is prevalent. Uncertainty, as a state for which we lack information, means that uncertainty analysis should play an integral part in risk management to ensure that the uncertainty in the risk management process is kept at a feasible level.
  • There is explicit management of either existing knowledge that can be applied to improve the quality of the analyses or to improve the knowledge acquired in the process that can be used in the follow-up process. Knowledge management is pivotal to ensuring an effective risk management process by providing context and learning possibilities. In essence, risk management is not just about managing risks – the entire context surrounding the risks must be understood and managed effectively.

Resilience

In the current world scenario, which is marked by high volatility, uncertainty, complexity, and ambiguity (VUCA), threats are increasingly unforeseen. As organizations, we are striving for this concept of Resilience.

Resilience is one of those hot words, and like many hot business terms it can mean a few different things depending on who is using it, and that can lead to confusion. I tend to see the following uses, which are similar in theme.

Where usedMeaning
PhysicsThe property of a material to absorb energy when deformed and not fracture nor break; in other words, the material’s elasticity.
EcologyThe capacity of an ecosystem to absorb and respond to disturbances without permanent damage to the relationships between species.
PsychologyAn individual’s coping mechanisms and strategies.
Organizational and Management studiesThe ability to maintain an acceptable level of service in the face of periodic or catastrophic systemic and singular faults and disruptions (e.g. natural disasters, cyber or terrorist attacks, supply chain disturbances).

For our purposes, resilience can be viewed as the ability of an organization to maintain quality over time, in the face of faults and disruptions. Given we live in a time of disruption, resilience is obviously of great interest to us.

In my post “Principles behind a good system” I lay out eight principles for good system development. Resilience is not a principle, it is an outcome. It is through applying our principles we gain resilience. However, like any outcome we need to design for it deliberately.

We gain resilience in the organization through levers that can be lumped together as operational and organizational.

The attributes that give resilience are the same that we build as part of our quality culture:

On the operational side, we have processes to drive risk management, business continuity, and issue management. A set of activities that we engage in.

Like many activities they key is to think of these as holistic endeavors proactively building resiliency into the organizaiton.

Risk Assessments Do Not Replace Technical Knowledge

The US Food and Drug Administration (FDA) last month warned Indian generic drugmaker Lupin Limited over three good manufacturing practice (GMP) violations at its facility in Maharashtra, India that identified issues with the company’s written procedures for equipment cleaning, its written procedures for monitoring and controlling the performance of processing steps and the “failure to investigate all critical deviations.”

The FDA said the company “performed multiple risk assessments with the purpose to verify whether existing cleaning procedures and practices eliminate or reduce genotoxic impurities … generated through the manufacture of [redacted] drugs after you detected [redacted] impurities in your [active pharmaceutical ingredient] API.” The company also performed risk assessments to determine whether its cleaning procedures reduced the risk of cross-contamination of intermediates and API. However, FDA said the risk assessments “lacked data to support that existing equipment cleaning procedures are effective in removing [redacted] along with residual API from each respective piece of equipment to acceptable levels. “The identification of genotoxic impurities in quantities near their established limits suggests excursions are possible. All intermediates and API manufactured on non-dedicated equipment used to manufacture [redacted] drugs should be subject to validated sampling and analytical testing to ensure they are not contaminated with unacceptable levels of genotoxic impurities,” FDA said.

At heart this warning letter shows a major weakness in many company’s risk management approach, they use the risk assessment to replace technical inquiry, instead of as a tool to determine the appropriateness of technical understanding and as a way to manage the uncertainty around technical knowledge.

A significant point in the current Q9 draft is to deal with this issue, which we see happen again and again. Risk management cannot tell you whether your cleaning procedures are effective or not. Only a validated testing scheme can. Risk management looks at the aggregate and evaluates possibilities.

Computer Software Assurance Draft

The FDA published on 13-Sep-2022 the long-awaited draft of the guidance “Computer Software Assurance for Production and Quality System Software,” and you may, based on all the emails and posting be wondering just how radical a change this is.

It’s not. This guidance is just one big “calm down people” letter from the agency. They publish these sorts of guidance every now and then because we as an industry can sometimes learn the wrong lessons.

This guidance states:

  1. Determine intended use
  2. Perform a risk assessment
  3. Perform activities to the required level

I wrote about this approach in “Risk Based Data Integrity Assessment,” and it has existed in GAMP5 and other approaches for years.

So read the guidance, but don’t panic. You are either following it already or you just need to spend some time getting better at risk assessments and creating some matrix approaches.

Thinking of Swiss Cheese: Reason’s Theory of Active and Latent Failures

The Theory of Active and Latent Failures was proposed by James Reason in his book, Human Error. Reason stated accidents within most complex systems, such as health care, are caused by a breakdown or absence of safety barriers across four levels within a system. These levels can best be described as Unsafe Acts, Preconditions for Unsafe Acts, Supervisory Factors, and Organizational Influences. Reason used the term “active failures” to describe factors at the Unsafe Acts level, whereas “latent failures” was used to describe unsafe conditions higher up in the system.

This is represented as the Swiss Cheese model, and has become very popular in root cause analysis and risk management circles and widely applied beyond the safety world.

Swiss Cheese Model

In the Swiss Cheese model, the holes in the cheese depict the failure or absence of barriers within a system. Such occurrences represent failures that threaten the overall integrity of the system. If such failures never occurred within a system (i.e., if the system were perfect), then there would not be any holes in the cheese. We would have a nice Engelberg cheddar.

Not every hole that exists in a system will lead to an error. Sometimes holes may be inconsequential. Other times, holes in the cheese may be detected and corrected before something bad happens. This process of detecting and correcting errors occurs all the time.

The holes in the cheese are dynamic, not static. They open and close over time due to many factors, allowing the system to function appropriately without catastrophe. This is what human factors engineers call “resilience.” A resilient system is one that can adapt and adjust to changes or disturbances.

Holes in the cheese open and close at different rates. The rate at which holes pop up or disappear is determined by the type of failure the hole represents.

  1. Holes that occur at the Unsafe Acts level, and even some at the Preconditions level, represent active failures. Active failures usually occur during the activity of work and are directly linked to the bad outcome. Active failures change during the process of performing, opening, and closing over time as people make errors, catch their errors, and correct them.
  2. Latent failures occur higher up in the system, above the Unsafe Acts level — the Organizational, Supervisory, and Preconditions levels. These failures are referred to as “latent” because when they occur or open, they often go undetected. They can lie “dormant” or “latent” in the system for an extended period of time before they are recognized. Unlike active failures, latent failures do not close or disappear quickly.

Most events (harms) are associated with multiple active and latent failures. Unlike the typical Swiss Cheese diagram above, which shows an arrow flying through one hole at each level of the system, there can be a variety of failures at each level that interact to produce an event. In other words, there can be several failures at the Organizational, Supervisory, Preconditions, and Unsafe Acts levels that all lead to harm. The number of holes in the cheese associated with events are more frequent at the Unsafe Acts and Preconditions levels, but (usually) become fewer as one progresses upward through the Supervisory and Organizational levels.

Given the frequency and dynamic nature of activities, there are more opportunities for holes to open up at the Unsafe and Preconditions levels on a frequent basis and there are often more holes identified at these levels during root cause investigation and risk assessments.

The way the holes in the cheese interact across levels is important:

  • One-to-many mapping of causal factors is when a hole at a higher level (e.g., Preconditions) may result in several holes at a lower level (e.g. Unsafe acts)
  • Many-to-one mapping of causal factors when multiple holes at the higher level (e.g. preconditions) might interact to produce a single hole at the lower level (e.g. Unsafe Acts)

By understand the Swiss Cheese model, and Reason’s wider work in Active and Latent Failures, we can strengthen our approach to problem-solving.

Plus cheese is cool.

Swiss Cheese on a cheese board with knife