The Kafkaesque Quality System: Escaping the Bureaucratic Trap

On the morning of his thirtieth birthday, Josef K. is arrested. He doesn’t know what crime he’s accused of committing. The arresting officers can’t tell him. His neighbors assure him the authorities must have good reasons, though they don’t know what those reasons are. When he seeks answers, he’s directed to a court that meets in tenement attics, staffed by officials whose actions are never explained but always assumed to be justified. The bureaucracy processing his case is described as “flawless,” yet K. later witnesses a servant destroying paperwork because he can’t determine who the recipient should be.

Franz Kafka wrote The Trial in 1914, but he could have been describing a pharmaceutical deviation investigation in 2026.

Consider: A batch is placed on hold. The deviation report cites “failure to follow approved procedure.” Investigators interview operators, review batch records, and examine environmental monitoring data. The investigation concludes that training was inadequate, procedures were unclear, and the change control process should have flagged this risk. Corrective actions are assigned: retraining all operators, revising the SOP, and implementing a new review checkpoint in change control. The CAPA effectiveness check, conducted six months later, confirms that all actions have been completed. The quality system has functioned flawlessly.

Yet if you ask the operator what actually happened—what really happened, in the moment when the deviation occurred—you get a different story. The procedure said to verify equipment settings before starting, but the equipment interface doesn’t display the parameters the SOP references. It hasn’t for the past three software updates. So operators developed a workaround: check the parameters through a different screen, document in the batch record that verification occurred, and continue. Everyone knows this. Supervisors know it. The quality oversight person stationed on the manufacturing floor knows it. It’s been working fine for months.

Until this batch, when the workaround didn’t work, and suddenly everyone had to pretend they didn’t know about the workaround that everyone knew about.

This is what I call the Kafkaesque quality system. Not because it’s absurd—though it often is. But because it exhibits the same structural features Kafka identified in bureaucratic systems: officials whose actions are never explained, contradictory rationalizations praised as features rather than bugs, the claim of flawlessness maintained even as paperwork literally gets destroyed because nobody knows what to do with it, and above all, the systemic production of gaps between how things are supposed to work and how they actually work—gaps that everyone must pretend don’t exist.

Pharmaceutical quality systems are not designed to be Kafkaesque. They’re designed to ensure that medicines are safe, effective, and consistently manufactured to specification. They emerge from legitimate regulatory requirements grounded in decades of experience about what can go wrong when quality oversight is inadequate. ICH Q10, the FDA’s Quality Systems Guidance, EU GMP—these frameworks represent hard-won knowledge about the critical control points that prevent contamination, mix-ups, degradation, and the thousand other ways pharmaceutical manufacturing can fail.

But somewhere between the legitimate need for control and the actual functioning of quality systems, something goes wrong. The system designed to ensure quality becomes a system designed to ensure compliance. The compliance designed to demonstrate quality becomes compliance designed to satisfy inspections. The investigations designed to understand problems become investigations designed to document that all required investigation steps were completed. And gradually, imperceptibly, we build the Castle—an elaborate bureaucracy that everyone assumes is functioning properly, that generates enormous amounts of documentation proving it functions properly, and that may or may not actually be ensuring the quality it was built to ensure.

Legibility and Control

Regulatory authorities, corporate management, and any entity trying to govern complex systems—need legibility. They need to be able to “read” what’s happening in the systems they regulate. For pharmaceutical regulators, this means being able to understand, from batch records and validation documentation and investigation reports, whether a manufacturer is consistently producing medicines of acceptable quality.

Legibility requires simplification. The actual complexity of pharmaceutical manufacturing—with its tacit knowledge, operator expertise, equipment quirks, material variability, and environmental influences—cannot be fully captured in documents. So we create simplified representations. Batch records that reduce manufacturing to a series of checkboxes. Validation protocols that demonstrate method performance under controlled conditions. Investigation reports that fit problems into categories like “inadequate training” or “equipment malfunction”.

This simplification serves a legitimate purpose. Without it, regulatory oversight would be impossible. How could an inspector evaluate whether a manufacturer maintains adequate control if they had to understand every nuance of every process, every piece of tacit knowledge held by every operator, every local adaptation that makes the documented procedures actually work?

But we can often mistake the simplified, legible representation for the reality it represents. We fall prey to the fallacy that if we can fully document a system, we can fully control it. If we specify every step in SOPs, operators will perform those steps. If we validate analytical methods, those methods will continue performing as validated. If we investigate deviations and implement CAPAs, similar deviations won’t recur.

The assumption is seductive because it’s partly true. Documentation does facilitate control. Validation does improve analytical reliability. CAPA does prevent recurrence—sometimes. But the simplified, legible version of pharmaceutical manufacturing is always a reduction of the actual complexity. And our quality systems can forget that the map is not the territory.

What happens when the gap between the legible representation and the actual reality grows too large? Our Pharmaceutical quality systems fail quietly, in the gap between work-as-imagined and work-as-done. In procedures that nobody can actually follow. In validated methods that don’t work under routine conditions. In investigations that document everything except what actually happened. In quality metrics that measure compliance with quality processes rather than actual product quality.

Metis: The Knowledge Bureaucracies Cannot See

We can contrast this formal, systematic, documented knowledge with metis: practical wisdom gained through experience, local knowledge that adapts to specific contexts, the know-how that cannot be fully codified.

Greek mythology personified metis as cunning intelligence, adaptive resourcefulness, the ability to navigate complex situations where formal rules don’t apply. Scott uses the term to describe the local, practical knowledge that makes complex systems actually work despite their formal structures.

In pharmaceutical manufacturing, metis is the operator who knows that the tablet press runs better when you start it up slowly, even though the SOP doesn’t mention this. It’s the analytical chemist who can tell from the peak shape that something’s wrong with the HPLC column before it fails system suitability. It’s the quality reviewer who recognizes patterns in deviations that indicate an underlying equipment issue nobody has formally identified yet.

This knowledge is typically tacit—difficult to articulate, learned through experience rather than training, tied to specific contexts. Studies suggest tacit knowledge comprises 90% of organizational knowledge, yet it’s rarely documented because it can’t easily be reduced to procedural steps. When operators leave or transfer, their metis goes with them.

High-modernist quality systems struggle with metis because they can’t see it. It doesn’t appear in batch records. It can’t be validated. It doesn’t fit into investigation templates. From the regulator’s-eye view, or the quality management’s-eye view—it’s invisible.

So we try to eliminate it. We write more detailed SOPs that specify exactly how to operate equipment, leaving no room for operator discretion. We implement lockout systems that prevent deviation from prescribed parameters. We design quality oversight that verifies operators follow procedures exactly as written.

This creates a dilemma that Sidney Dekker identifies as central to bureaucratic safety systems: the gap between work-as-imagined and work-as-done.

Work-as-imagined is how quality management, procedure writers, and regulators believe manufacturing happens. It’s documented in SOPs, taught in training, and represented in batch records. Work-as-done is what actually happens on the manufacturing floor when real operators encounter real equipment under real conditions.

In ultra-adaptive environments—which pharmaceutical manufacturing surely is, with its material variability, equipment drift, environmental factors, and human elements—work cannot be fully prescribed in advance. Operators must adapt, improvise, apply judgment. They must use metis.

But adaptation and improvisation look like “deviation from approved procedures” in a high-modernist quality system. So operators learn to document work-as-imagined in batch records while performing work-as-done on the floor. The batch record says they “verified equipment settings per SOP section 7.3.2” when what they actually did was apply the metis they’ve learned through experience to determine whether the equipment is really ready to run.

This isn’t dishonesty—or rather, it’s the kind of necessary dishonesty that bureaucratic systems force on the people operating within them. Kafka understood this. The villagers in The Castle provide contradictory explanations for the officials’ actions, and everyone praises this ambiguity as a feature of the system rather than recognizing it as a dysfunction. Everyone knows the official story and the actual story don’t match, but admitting that would undermine the entire bureaucratic structure.

Metis, Expertise, and the Architecture of Knowledge

Understanding why pharmaceutical quality systems struggle to preserve and utilize operator knowledge requires examining how knowledge actually exists and develops in organizations. Three frameworks illuminate different facets of this challenge: James C. Scott’s concept of metis, W. Edwards Deming’s System of Profound Knowledge, and the research on expertise development and knowledge management pioneered by Ikujiro Nonaka and Anders Ericsson.

These frameworks aren’t merely academic concepts. They reveal why quality systems that look comprehensive on paper fail in practice, why experienced operators leave and take critical capability with them, and why organizations keep making the same mistakes despite extensive documentation of lessons learned.

The Architecture of Knowledge: Tacit and Explicit

Management scholar Ikujiro Nonaka distinguishes between two fundamental types of knowledge that coexist in all organizations. Explicit knowledge is codifiable—it can be expressed in words, numbers, formulas, documented procedures. It’s the content of SOPs, validation protocols, batch records, training materials. It’s what we can write down and transfer through formal documentation.

Tacit knowledge is subjective, experience-based, and context-specific. It includes cognitive skills like beliefs, mental models, and intuition, as well as technical skills like craft and know-how. Tacit knowledge is notoriously difficult to articulate. When an experienced analytical chemist looks at a chromatogram and says “something’s not right with that peak shape,” they’re drawing on tacit knowledge built through years of observing normal and abnormal results.

Nonaka’s insight is that these two types of knowledge exist in continuous interaction through what he calls the SECI model—four modes of knowledge conversion that form a spiral of organizational learning:

Socialization (tacit to tacit): Tacit knowledge transfers between individuals through shared experience and direct interaction. An operator training a new hire doesn’t just explain the procedure; they demonstrate the subtle adjustments, the feel of properly functioning equipment, the signs that something’s going wrong. This is experiential learning, the acquisition of skills and mental models through observation and practice.
Externalization (tacit to explicit): The difficult process of making tacit knowledge explicit through articulation. This happens through dialogue, metaphor, and reflection-on-action—stepping back from practice to describe what you’re doing and why. When investigation teams interview operators about what actually happened during a deviation, they’re attempting externalization. But externalization requires psychological safety; operators won’t articulate their tacit knowledge if doing so will reveal deviations from approved procedures.
Combination (explicit to explicit): Documented knowledge combined into new forms. This is what happens when validation teams synthesize development data, platform knowledge, and method-specific studies into validation strategies. It’s the easiest mode because it works entirely with already-codified knowledge.
Internalization (explicit to tacit): The process of embodying explicit knowledge through practice until it becomes “sticky” individual knowledge—operational capability. When operators internalize procedures through repeated execution, they’re converting the explicit knowledge in SOPs into tacit capability. Over time, with reflection and deliberate practice, they develop expertise that goes beyond what the SOP specifies.

Metis is the tacit knowledge that resists externalization. It’s context-specific, adaptive, often non-verbal. It’s what operators know about equipment quirks, material variability, and process subtleties—knowledge gained through direct engagement with complex, variable systems.

High-modernist quality systems, in their drive for legibility and control, attempt to externalize all tacit knowledge into explicit procedures. But some knowledge fundamentally resists codification. The operator’s ability to hear when equipment isn’t running properly, the analyst’s judgment about whether a result is credible despite passing specification, the quality reviewer’s pattern recognition that connects apparently unrelated deviations—this metis cannot be fully proceduralized.

Worse, the attempt to externalize all knowledge into procedures creates what Nonaka would recognize as a broken learning spiral. Organizations that demand perfect procedural compliance prevent socialization—operators can’t openly share their tacit knowledge because it would reveal that work-as-done doesn’t match work-as-imagined. Externalization becomes impossible because articulating tacit knowledge is seen as confession of deviation. The knowledge spiral collapses, and organizations lose their capacity for learning.

Deming’s Theory of Knowledge: Prediction and Learning

W. Edwards Deming’s System of Profound Knowledge provides a complementary lens on why quality systems struggle with knowledge. One of its four interrelated elements—Theory of Knowledge—addresses how we actually learn and improve systems.

Deming’s central insight: there is no knowledge without theory. Knowledge doesn’t come from merely accumulating experience or documenting procedures. It comes from making predictions based on theory and testing whether those predictions hold. This is what makes knowledge falsifiable—it can be proven wrong through empirical observation.

Consider analytical method validation through this lens. Traditional validation documents that a method performed acceptably under specified conditions; this is a description of past events, not theory. Lifecycle validation, properly understood, makes a theoretical prediction: “This method will continue generating results of acceptable quality when operated within the defined control strategy”. That prediction can be tested through Stage 3 ongoing verification. When the prediction fails—when the method doesn’t perform as validation claimed—we gain knowledge about the gap between our theory (the validation claim) and reality.

This connects directly to metis. Operators with metis have internalized theories about how systems behave. When an experienced operator says “We need to start the tablet press slowly today because it’s cold in here and the tooling needs to warm up gradually,” they’re articulating a theory based on their tacit understanding of equipment behavior. The theory makes a prediction: starting slowly will prevent the coating defects we see when we rush on cold days.

But hierarchical, procedure-driven quality systems don’t recognize operator theories as legitimate knowledge. They demand compliance with documented procedures regardless of operator predictions about outcomes. So the operator follows the SOP, the coating defects occur, a deviation is written, and the investigation concludes that “procedure was followed correctly” without capturing the operator’s theoretical knowledge that could have prevented the problem.

Deming’s other element—Knowledge of Variation—is equally crucial. He distinguished between common cause variation (inherent to the system, management’s responsibility to address through system redesign) and special cause variation (abnormalities requiring investigation). His research across multiple industries suggested that 94% of problems are common cause—they reflect system design issues, not individual failures.

Bureaucratic quality systems systematically misattribute variation. When operators struggle to follow procedures, the system treats this as special cause (operator error, inadequate training) rather than common cause (the procedures don’t match operational reality, the system design is flawed). This misattribution prevents system improvement and destroys operator metis by treating adaptive responses as deviations.

From Deming’s perspective, metis is how operators manage system variation when procedures don’t account for the full range of conditions they encounter. Eliminating metis through rigid procedural compliance doesn’t eliminate variation—it eliminates the adaptive capacity that was compensating for system design flaws.

Ericsson and the Development of Expertise

Psychologist Anders Ericsson’s research on expertise development reveals another dimension of how knowledge works in organizations. His studies across fields from chess to music to medicine dismantled the myth that expert performers have unusual innate talents. Instead, expertise is the result of what he calls deliberate practice—individualized training activities specifically designed to improve particular aspects of performance through repetition, feedback, and successive refinement.

Deliberate practice has specific characteristics:

It involves tasks initially outside the current realm of reliable performance but masterable within hours through focused concentration
It requires immediate feedback on performance
It includes reflection between practice sessions to guide subsequent improvement
It continues for extended periods—Ericsson found it takes a minimum of ten years of full-time deliberate practice to reach high levels of expertise even in well-structured domains

Critically, experience alone does not create expertise. Studies show only a weak correlation between years of professional experience and actual performance quality. Merely repeating activities leads to automaticity and arrested development—practice makes permanent, but only deliberate practice improves performance.

This has profound implications for pharmaceutical quality systems. When we document procedures and require operators to follow them exactly, we’re eliminating the deliberate practice conditions that develop expertise. Operators execute the same steps repeatedly without feedback on the quality of performance (only on compliance with procedure), without reflection on how to improve, and without tackling progressively more challenging aspects of the work.

Worse, the compliance focus actively prevents expertise development. Ericsson emphasizes that experts continually try to improve beyond their current level of performance. But quality systems that demand perfect procedural compliance punish the very experimentation and adaptation that characterizes deliberate practice. Operators who develop metis through deliberate engagement with operational challenges must conceal that knowledge because it reveals they adapted procedures rather than following them exactly.

The expertise literature also reveals how knowledge transfers—or fails to transfer—in organizations. Research identifies multiple knowledge transfer mechanisms: social networks, organizational routines, personnel mobility, organizational design, and active search. But effective transfer depends critically on the type of knowledge involved.

Tacit knowledge transfers primarily through mentoring, coaching, and peer-to-peer interaction—what Nonaka calls socialization. When experienced operators leave, this tacit knowledge vanishes if it hasn’t been transferred through direct working relationships. No amount of documentation captures it because tacit knowledge is experience-based and context-specific.

Explicit knowledge transfers through documentation, formal training, and digital platforms. This is what quality systems are designed for: capturing knowledge in SOPs, specifications, validation protocols. But organizations often mistake documentation for knowledge transfer. Creating comprehensive procedures doesn’t ensure that people learn from them. Without internalization—the conversion of explicit knowledge back into tacit operational capability through practice and reflection—documented knowledge remains inert.

Knowledge Management Failures in Pharmaceutical Quality

These three frameworks—Nonaka’s knowledge conversion spiral, Deming’s theory of knowledge and variation, Ericsson’s deliberate practice—reveal systematic failures in how pharmaceutical quality systems handle knowledge:

Broken socialization: Quality systems that punish deviation prevent operators from openly sharing tacit knowledge about work-as-done. New operators learn the documented procedures but not the metis that makes those procedures actually work.
Failed externalization: Investigation processes that focus on compliance rather than understanding don’t capture operator theories about causation. The tacit knowledge that could prevent recurrence remains tacit—and often punishable if revealed.
Meaningless combination: Organizations generate elaborate CAPA documentation by combining explicit knowledge about what should happen without incorporating tacit knowledge about what actually happens. The resulting “knowledge” doesn’t reflect operational reality.
Superficial internalization: Training programs that emphasize procedure memorization rather than capability development don’t convert explicit knowledge into genuine operational expertise. Operators learn to document compliance without developing the metis needed for quality work.
Misattribution of variation: Systems treat operator adaptation as special cause (individual failure) rather than recognizing it as response to common cause system design issues. This prevents learning because the organization never addresses the system flaws that necessitate adaptation.
Prevention of deliberate practice: Rigid procedural compliance eliminates the conditions for expertise development—challenging tasks, immediate feedback on quality (not just compliance), reflection, and progressive improvement. Organizations lose expertise development capacity.
Knowledge transfer theater: Extensive documentation of lessons learned and best practices without the mentoring relationships and communities of practice that enable actual tacit knowledge transfer. Knowledge “management” that manages documents rather than enabling organizational learning.

The consequence is what Nonaka would call organizational knowledge destruction rather than creation. Each layer of bureaucracy, each procedure demanding rigid compliance, each investigation that treats adaptation as deviation, breaks another link in the knowledge spiral. The organization becomes progressively more ignorant about its own operations even as it generates more and more documentation claiming to capture knowledge.

Building Systems That Preserve and Develop Metis

If metis is essential for quality, if expertise develops through deliberate practice, if knowledge exists in continuous interaction between tacit and explicit forms, how do we design quality systems that work with these realities rather than against them?

Enable genuine socialization: Create legitimate spaces for experienced operators to work directly with less experienced ones in conditions where tacit knowledge can be openly shared. This means job shadowing, mentoring relationships, and communities of practice where work-as-done can be discussed without fear of punishment for revealing that it differs from work-as-imagined.

Design for externalization: Investigation processes should aim to capture operator theories about causation, not just document procedural compliance. Use dialogue, ask operators for metaphors and analogies that help articulate tacit understanding, create reflection opportunities where people can step back from action to describe what they know. But this requires just culture—operators won’t externalize knowledge if doing so triggers blame.

Support deliberate practice: Instead of demanding perfect procedural compliance, create conditions for expertise development. This means progressively challenging work assignments, immediate feedback on quality of outcomes (not just compliance), reflection time between executions, and explicit permission to adapt within understood boundaries. Document decision rules rather than rigid procedures, so operators develop judgment rather than just following steps.

Apply Deming’s knowledge theory: Make quality system elements falsifiable by articulating explicit predictions that can be tested. Validated methods should predict ongoing performance, CAPAs should predict reduction in deviation frequency, training should predict capability improvement. Then test those predictions systematically and learn when they fail.

Correctly attribute variation: When operators struggle with procedures or adapt them, ask whether this is special cause (unusual circumstances) or common cause (system design doesn’t match operational reality). If it’s common cause—which Deming suggests is 94% of the time—management must redesign the system rather than demanding better compliance.

Build knowledge transfer mechanisms: Recognize that different knowledge types require different transfer approaches. Tacit knowledge needs mentoring and communities of practice, not just documentation. Explicit knowledge needs accessible documentation and effective training, not just comprehensive procedure libraries. Knowledge transfer is a property of organizational systems and culture, not just techniques.

Measure knowledge outcomes, not documentation volume: Success isn’t demonstrated by comprehensive procedures or extensive training records. It’s demonstrated by whether people can actually perform quality work, whether they have the tacit knowledge and expertise that come from deliberate practice and genuine organizational learning. Measure investigation quality by whether investigations capture knowledge that prevents recurrence, measure CAPA effectiveness by whether problems actually decrease, measure training effectiveness by whether capability improves.

The fundamental insight across all three frameworks is that knowledge is not documentation. Knowledge exists in the dynamic interaction between explicit and tacit forms, between theory and practice, between individual expertise and organizational capability. Quality systems designed around documentation—assuming that if we write comprehensive procedures and require people to follow them, quality will result—are systems designed in ignorance of how knowledge actually works.

Metis is not an obstacle to be eliminated through standardization. It is an essential organizational capability that develops through deliberate practice and transfers through socialization. Deming’s profound knowledge isn’t just theory—it’s the lens that reveals why bureaucratic systems systematically destroy the very knowledge they need to function effectively.

Building quality systems that preserve and develop metis means building systems for organizational learning, not organizational documentation. It means recognizing operator expertise as legitimate knowledge rather than deviation from procedures. It means creating conditions for deliberate practice rather than demanding perfect compliance. It means enabling knowledge conversion spirals rather than breaking them through blame and rigid control.

This is the escape from the Kafkaesque quality system. Not through more procedures, more documentation, more oversight—but through quality systems designed around how humans actually learn, how expertise actually develops, how knowledge actually exists in organizations.

The Pathologies of Bureaucracy

Sociologist Robert K. Merton studied how bureaucracies develop characteristic dysfunctions even when staffed by competent, well-intentioned people. He identified what he called “bureaucratic pathologies”—systematic problems that emerge from the structure of bureaucratic organizations rather than from individual failures.

The primary pathology is what Merton called “displacement of goals”. Bureaucracies establish rules and procedures as means to achieve organizational objectives. But over time, following the rules becomes an end in itself. Officials focus on “doing things by the book” rather than on whether the book is achieving its intended purpose.

Does this sound familiar to pharmaceutical quality professionals?

How many deviation investigations focus primarily on demonstrating that investigation procedures were followed—impact assessment completed, timeline met, all required signatures obtained—with less attention to whether the investigation actually understood what happened and why? How many CAPA effectiveness checks verify that corrective actions were implemented but don’t rigorously test whether they solved the underlying problem? How many validation studies are designed to satisfy validation protocol requirements rather than to genuinely establish method fitness for purpose?

Merton identified another pathology: bureaucratic officials are discouraged from showing initiative because they lack the authority to deviate from procedures. When problems arise that don’t fit prescribed categories, officials “pass the buck” to the next level of hierarchy. Meanwhile, the rigid adherence to rules and the impersonal attitude this generates are interpreted by those subject to the bureaucracy as arrogance or indifference.

Quality professionals will recognize this pattern. The quality oversight person on the manufacturing floor sees a problem but can’t address it without a deviation report. The deviation report triggers an investigation that can’t conclude without identifying root cause according to approved categories. The investigation assigns CAPA that requires multiple levels of approval before implementation. By the time the CAPA is implemented, the original problem may have been forgotten, or operators may have already developed their own workaround that will remain invisible to the formal system.

Dekker argues that bureaucratization creates “structural secrecy”—not active concealment, but systematic conditions under which information cannot flow. Bureaucratic accountability determines who owns data “up to where and from where on”. Once the quality staff member presents a deviation report to management, their bureaucratic accountability is complete. What happens to that information afterward is someone else’s problem.

Meanwhile, operators know things that quality staff don’t know, quality staff know things that management doesn’t know, and management knows things that regulators don’t know. Not because anyone is deliberately hiding information, but because the bureaucratic structure creates boundaries across which information doesn’t naturally flow.

This is structural secrecy, and it’s lethal to quality systems because quality depends on information about what’s actually happening. When the formal system cannot see work-as-done, cannot access operator metis, cannot flow information across bureaucratic boundaries, it’s managing an imaginary factory rather than the real one.

Compliance Theater: The Performance of Quality

If bureaucratic quality systems manage imaginary factories, they require imaginary proof that quality is maintained. Enter compliance theater—the systematic creation of documentation and monitoring that prioritizes visible adherence to requirements over substantive achievement of quality objectives.

Compliance theater has several characteristic features:

Surface-level implementation: Organizations develop extensive documentation, training programs, and monitoring systems that create the appearance of comprehensive quality control while lacking the depth necessary to actually ensure quality.
Metrics gaming: Success is measured through easily manipulable indicators—training completion rates, deviation closure timeliness, CAPA on-time implementation—rather than outcomes reflecting actual quality performance.
Resource misallocation: Significant resources devoted to compliance performance rather than substantive quality improvement, creating opportunity costs that impede genuine progress.
Temporal patterns: Activity spikes before inspections or audits rather than continuous vigilance.

Consider CAPA effectiveness checks. In principle, these verify that corrective actions actually solved the underlying problem. But how many CAPA effectiveness checks truly test this? The typical approach: verify that the planned actions were implemented (revised SOP distributed, training completed, new equipment qualified), wait for some period during which no similar deviation occurs, declare the CAPA effective.

This is ritualistic compliance, not genuine verification. If the deviation was caused by operator metis being inadequate for the actual demands of the task, and the corrective action was “revise SOP to clarify requirements and retrain operators,” the effectiveness check should test whether operators now have the knowledge and capability to handle the task. But we don’t typically test capability. We verify that training attendance was documented and that no deviations of the exact same type have been reported in the past six months.

No deviations reported is not the same as no deviations occurring. It might mean operators developed better workarounds that don’t trigger quality system alerts. It might mean supervisors are managing issues informally rather than generating deviation reports. It might mean we got lucky.

But the paperwork says “CAPA verified effective,” and the compliance theater continues.

Analytical method validation presents another arena for compliance theater. Traditional validation treats validation as an event: conduct studies demonstrating acceptable performance, generate a validation report, file with regulatory authorities, and consider the method “validated”. The implicit assumption is that a method that passed validation will continue performing acceptably forever, as long as we check system suitability.

But methods validated under controlled conditions with expert analysts and fresh materials often perform differently under routine conditions with typical analysts and aged reagents. The validation represented work-as-imagined. What happens during routine testing is work-as-done.

If we took lifecycle validation seriously, we would treat validation as predicting future performance and continuously test those predictions through Stage 3 ongoing verification. We would monitor not just system suitability pass/fail but trends suggesting performance drift. We would investigate anomalous results as potential signals of method inadequacy.

But Stage 3 verification is underdeveloped in regulatory guidance and practice. So validated methods continue being used until they fail spectacularly, at which point we investigate the failure, implement CAPA, revalidate, and resume the cycle.

The validation documentation proves the method is validated. Whether the method actually works is a separate question.

The Bureaucratic Trap: How Good Systems Go Bad

I need to emphasize: pharmaceutical quality systems did not become bureaucratic because quality professionals are incompetent or indifferent. The bureaucratization happens through the interaction of legitimate pressures that push systems toward forms that are legible, auditable, and defensible but increasingly disconnected from the complex reality they’re meant to govern.

Regulatory pressure: Inspectors need evidence that quality is controlled. The most auditable evidence is documentation showing compliance with established procedures. Over time, quality systems optimize for auditability rather than effectiveness.
Liability pressure: When quality failures occur, organizations face regulatory action, litigation, and reputational damage. The best defense is demonstrating that all required procedures were followed. This incentivizes comprehensive documentation even when that documentation doesn’t enhance actual quality.
Complexity: Pharmaceutical manufacturing is genuinely complex, with thousands of variables affecting product quality. Reducing this complexity to manageable procedures requires simplification. The simplification is necessary, but organizations forget that it’s a reduction rather than the full reality.
Scale: As organizations grow, quality systems must work across multiple sites, products, and regulatory jurisdictions. Standardization is necessary for consistency, but standardization requires abstracting away local context—precisely the domain where metis operates.
Knowledge loss: When experienced operators leave, their tacit knowledge goes with them. Organizations try to capture this knowledge in ever-more-detailed procedures, but metis cannot be fully proceduralized. The detailed procedures give the illusion of captured knowledge while the actual knowledge has vanished.
Management distance: Quality executives are increasingly distant from manufacturing operations. They manage through metrics, dashboards, and reports rather than direct observation. These tools require legibility—quantitative measures, standardized reports, formatted data. The gap between management’s understanding and operational reality grows.
Inspection trauma: After regulatory inspections that identify deficiencies, organizations often respond by adding more procedures, more documentation, more oversight. The response to bureaucratic dysfunction is more bureaucracy.

Each of these pressures is individually rational. Taken together, they create what the conditions for failure: administrative ordering of complex systems, confidence in formal procedures and documentation, authority willing to enforce compliance, and increasingly, a weakened operational environment that can’t effectively resist.

What we get is the Kafkaesque quality system: elaborate, well-documented, apparently flawless, generating enormous amounts of evidence that it’s functioning properly, and potentially failing to ensure the quality it was designed to ensure.

The Consequences: When Bureaucracy Defeats Quality

The most insidious aspect of bureaucratic quality systems is that they can fail quietly. Unlike catastrophic contamination events or major product recalls, bureaucratic dysfunction produces gradual degradation that may go unnoticed because all the quality metrics say everything is fine.

Investigation without learning: Investigations that focus on completing investigation procedures rather than understanding causal mechanisms don’t generate knowledge that prevents recurrence. Organizations keep investigating the same types of problems, implementing CAPAs that check compliance boxes without addressing underlying issues, and declaring investigations “closed” when the paperwork is complete.

Research on incident investigation culture reveals what investigators call “new blame”—a dysfunction where investigators avoid examining human factors for fear of seeming accusatory, instead quickly attributing problems to “unclear procedures” or “inadequate training” without probing what actually happened. This appears to be blame-free but actually prevents learning by refusing to engage with the complexity of how humans interact with systems.

Analytical unreliability: Methods that “passed validation” may be silently failing under routine conditions, generating subtly inaccurate results that don’t trigger obvious failures but gradually degrade understanding of product quality. Nobody knows because Stage 3 verification isn’t rigorous enough to detect drift.

Operator disengagement: When operators know that the formal procedures don’t match operational reality, when they’re required to document work-as-imagined while performing work-as-done, when they see problems but reporting them triggers bureaucratic responses that don’t fix anything, they disengage. They stop reporting. They develop workarounds. They focus on satisfying the visible compliance requirements rather than ensuring genuine quality.

This is exactly what Merton predicted: bureaucratic structures that punish initiative and reward procedural compliance create officials who follow rules rather than thinking about purpose.

Resource misallocation: Organizations spend enormous resources on compliance activities that satisfy audit requirements without enhancing quality. Documentation of training that doesn’t transfer knowledge. CAPA systems that process hundreds of actions of marginal effectiveness. Validation studies that prove compliance with validation requirements without establishing genuine fitness for purpose.

Structural secrecy: Critical information that front-line operators possess about equipment quirks, material variability, and process issues doesn’t flow to quality management because bureaucratic boundaries prevent information transfer. Management makes decisions based on formal reports that reflect work-as-imagined while work-as-done remains invisible.

Loss of resilience: Organizations that depend on rigid procedures and standardized responses become brittle. When unexpected situations arise—novel contamination sources, unusual material properties, equipment failures that don’t fit prescribed categories—the organization can’t adapt because it has systematically eliminated the metis that enables adaptive response.

This last point deserves emphasis. Quality systems should make organizations more resilient—better able to maintain quality despite disturbances and variability. But bureaucratic quality systems can do the opposite. By requiring that everything be prescribed in advance, they eliminate the adaptive capacity that enables resilience.

The Alternative: High Reliability Organizations

So how do we escape the bureaucratic trap? The answer emerges from studying what researchers Karl Weick and Kathleen Sutcliffe call “High Reliability Organizations”—organizations that operate in complex, hazardous environments yet maintain exceptional safety records.

Nuclear aircraft carriers. Air traffic control systems. Wildland firefighting teams. These organizations can’t afford the luxury of bureaucratic dysfunction because failure means catastrophic consequences. Yet they operate in environments at least as complex as pharmaceutical manufacturing.

Weick and Sutcliffe identified five principles that characterize HROs:

Preoccupation with failure: HROs treat any anomaly as a potential symptom of deeper problems. They don’t wait for catastrophic failures. They investigate near-misses rigorously. They encourage reporting of even minor issues.

This is the opposite of compliance-focused quality systems that measure success by absence of major deviations and treat minor issues as acceptable noise.

Reluctance to simplify: HROs resist the temptation to reduce complex situations to simple categories. They maintain multiple interpretations of what’s happening rather than prematurely converging on a single explanation.

This challenges the bureaucratic need for legibility. It’s harder to manage systems that resist simple categorization. But it’s more effective than managing simplified representations that don’t reflect reality.

Sensitivity to operations: HROs maintain ongoing awareness of what’s happening at the sharp end where work is actually done. Leaders stay connected to operational reality rather than managing through dashboards and metrics.

This requires bridging the gap between work-as-imagined and work-as-done. It requires seeing metis rather than trying to eliminate it.

Commitment to resilience: HROs invest in adaptive capacity—the ability to respond effectively when unexpected situations arise. They practice scenario-based training. They maintain reserves of expertise. They design systems that can accommodate surprises.

This is different from bureaucratic systems that try to prevent all surprises through comprehensive procedures.

Deference to expertise: In HROs, authority migrates to whoever has relevant expertise regardless of hierarchical rank. During anomalous situations, the person with the best understanding of what’s happening makes decisions, even if that’s a junior operator rather than a senior manager.

Weick describes this as valuing “greasy hands knowledge”—the practical, experiential understanding of people directly involved in operations. This is metis by another name.

These principles directly challenge bureaucratic pathologies. Where bureaucracies focus on following established procedures, HROs focus on constant vigilance for signs that procedures aren’t working. Where bureaucracies demand hierarchical approval, HROs defer to frontline expertise. Where bureaucracies simplify for legibility, HROs maintain complexity.

Can pharmaceutical quality systems adopt HRO principles? Not easily, because the regulatory environment demands legibility and auditability. But neither can pharmaceutical quality systems afford continued bureaucratic dysfunction as complexity increases and the gap between work-as-imagined and work-as-done widens.

Building Falsifiable Quality Systems

Throughout this blog I’ve advocated for what I call falsifiable quality systems—systems designed to make testable predictions that could be proven wrong through empirical observation.

Traditional quality systems make unfalsifiable claims: “This method was validated according to ICH Q2 requirements.” “Procedures are followed.” “CAPA prevents recurrence.” These are statements about activities that occurred in the past, not predictions about future performance.

Falsifiable quality systems make explicit predictions: “This analytical method will generate reportable results within ±5% of true value under normal operating conditions.” “When operated within the defined control strategy, this process will consistently produce product meeting specifications.” “The corrective action implemented will reduce this deviation type by at least 50% over the next six months”.

These predictions can be tested. If ongoing data shows the method isn’t achieving ±5% accuracy, the prediction is falsified—the method isn’t performing as validation claimed. If deviations haven’t decreased after CAPA implementation, the prediction is falsified—the corrective action didn’t work.

Falsifiable systems create accountability for effectiveness rather than compliance. They force honest engagement with whether quality systems are actually ensuring quality.

This connects directly to HRO principles. Preoccupation with failure means treating falsification seriously—when predictions fail, investigating why. Reluctance to simplify means acknowledging the complexity that makes some predictions uncertain. Sensitivity to operations means using operational data to test predictions continuously. Commitment to resilience means building systems that can recognize and respond when predictions fail.

It also requires what researchers call “just culture”—systems that distinguish between honest errors, at-risk behaviors, and reckless violations. Bureaucratic blame cultures punish all failures, driving problems underground. “No-blame” cultures avoid examining human factors, preventing learning. Just cultures examine what happened honestly, including human decisions and actions, while focusing on system improvement rather than individual punishment.

In just culture, when a prediction is falsified—when a validated method fails, when CAPA doesn’t prevent recurrence, when operators can’t follow procedures—the response isn’t to blame individuals or to paper over the gap with more documentation. The response is to examine why the prediction was wrong and redesign the system to make it correct.

This requires the intellectual honesty to acknowledge when quality systems aren’t working. It requires willingness to look at work-as-done rather than only work-as-imagined. It requires recognizing operator metis as legitimate knowledge rather than deviation from procedures. It requires valuing learning over legibility.

Practical Steps: Escaping the Castle

How do pharmaceutical quality organizations actually implement these principles? How do we escape Kafka’s Castle once we’ve built it?

I won’t pretend this is easy. The pressures toward bureaucratization are real and powerful. Regulatory requirements demand legibility. Corporate management requires standardization. Inspection findings trigger defensive responses. The path of least resistance is always more procedures, more documentation, more oversight.

But some concrete steps can bend the trajectory away from bureaucratic dysfunction toward genuine effectiveness:

Make quality systems falsifiable: For every major quality commitment—validated analytical methods, qualified processes, implemented CAPAs—articulate explicit, testable predictions about future performance. Then systematically test those predictions through ongoing monitoring. When predictions fail, investigate why and redesign systems rather than rationalizing the failure away.

Close the WAI/WAD gap: Create safe mechanisms for understanding work-as-done. Don’t punish operators for revealing that procedures don’t match reality. Instead, use this information to improve procedures or acknowledge that some adaptation is necessary and train operators in effective adaptation rather than pretending perfect procedural compliance is possible.

Value metis: Recognize that operator expertise, analytical judgment, and troubleshooting capability are not obstacles to standardization but essential elements of quality systems. Document not just procedures but decision rules for when to adapt. Create mechanisms for transferring tacit knowledge. Include experienced operators in investigation and CAPA design.

Practice just culture: Distinguish between system-induced errors, at-risk behaviors under production pressure, and genuinely reckless violations. Focus investigations on understanding causal factors rather than assigning blame or avoiding blame. Hold people accountable for reporting problems and learning from them, not for making the inevitable errors that complex systems generate.

Implement genuine Stage 3 verification: Treat validation as predicting ongoing performance rather than certifying past performance. Monitor analytical methods, processes, and quality system elements for signs that their performance is drifting from predictions. Detect and address degradation early rather than waiting for catastrophic failure.

Bridge bureaucratic boundaries: Create information flows that cross organizational boundaries so that what operators know reaches quality management, what quality management knows reaches site leadership, and what site leadership knows shapes corporate quality strategy. This requires fighting against structural secrecy, perhaps through regular gemba walks, operator inclusion in quality councils, and bottom-up reporting mechanisms that protect operators who surface uncomfortable truths.

Test CAPA effectiveness honestly: Don’t just verify that corrective actions were implemented. Test whether they solved the problem. If a deviation was caused by inadequate operator capability, test whether capability improved. If it was caused by equipment limitation, test whether the limitation was eliminated. If the problem hasn’t recurred but you haven’t tested whether your corrective action was responsible, you don’t know if the CAPA worked—you know you got lucky.

Question metrics that measure activity rather than outcomes: Training completion rates don’t tell you whether people learned anything. Deviation closure timeliness doesn’t tell you whether investigations found root causes. CAPA implementation rates don’t tell you whether CAPAs were effective. Replace these with metrics that test quality system predictions: analytical result accuracy, process capability indices, deviation recurrence rates after CAPA, investigation quality assessed by independent review.

Embrace productive failure: When quality system elements fail—when validated methods prove unreliable, when procedures can’t be followed, when CAPAs don’t prevent recurrence—treat these as opportunities to improve systems rather than problems to be concealed or rationalized. HRO preoccupation with failure means seeing small failures as gifts that reveal system weaknesses before they cause catastrophic problems.

Continuous improvement, genuinely practiced: Implement PDCA (Plan-Do-Check-Act) or PDSA (Plan-Do-Study-Act) cycles not as compliance requirements but as systematic methods for testing changes before full implementation. Use small-scale experiments to determine whether proposed improvements actually improve rather than deploying changes enterprise-wide based on assumption.

Reduce the burden of irrelevant documentation: Much compliance documentation serves no quality purpose—it exists to satisfy audit requirements or regulatory expectations that may themselves be bureaucratic artifacts. Distinguish between documentation that genuinely supports quality (specifications, test results, deviation investigations that find root causes) and documentation that exists to demonstrate compliance (training attendance rosters for content people already know, CAPA effectiveness checks that verify nothing). Fight to eliminate the latter, or at least prevent it from crowding out the former.

The Politics of De-Bureaucratization

Here’s the uncomfortable truth: escaping the Kafkaesque quality system requires political will at the highest levels of organizations.

Quality professionals can implement some improvements within their spheres of influence—better investigation practices, more rigorous CAPA effectiveness checks, enhanced Stage 3 verification. But truly escaping the bureaucratic trap requires challenging structures that powerful constituencies benefit from.

Regulatory authorities benefit from legibility—it makes inspection and oversight possible. Corporate management benefits from standardization and quantitative metrics—they enable governance at scale. Quality bureaucracies themselves benefit from complexity and documentation—they justify resources and headcount.

Operators and production management often bear the costs of bureaucratization—additional documentation burden, inability to adapt to reality, blame when gaps between procedures and practice are revealed. But they’re typically the least powerful constituencies in pharmaceutical organizations.

Changing this dynamic requires quality leaders who understand that their role is ensuring genuine quality rather than managing compliance theater. It requires site leaders who recognize that bureaucratic dysfunction threatens product quality even when all audit checkboxes are green. It requires regulatory relationships mature enough to discuss work-as-done openly rather than pretending work-as-imagined is reality.

Scott argues that successful resistance to high-modernist schemes depends on civil society’s capacity to push back. In pharmaceutical organizations, this means empowering operational voices—the people with metis, with greasy-hands knowledge, with direct experience of the gap between procedures and reality. It means creating forums where they can speak without fear of retaliation. It means quality leaders who listen to operational expertise even when it reveals uncomfortable truths about quality system dysfunction.

This is threatening to bureaucratic structures precisely because it challenges their premise—that quality can be ensured through comprehensive documented procedures enforced by hierarchical oversight. If we acknowledge that operator metis is essential, that adaptation is necessary, that work-as-done will never perfectly match work-as-imagined, we’re admitting that the Castle isn’t really flawless.

But the Castle never was flawless. Kafka knew that. The servant destroying paperwork because he couldn’t figure out the recipient wasn’t an aberration—it was a glimpse of reality. The question is whether we continue pretending the bureaucracy works perfectly while it fails quietly, or whether we build quality systems honest enough to acknowledge their limitations and resilient enough to function despite them.

The Quality System We Need

Pharmaceutical quality systems exist in genuine tension. They must be rigorous enough to prevent failures that harm patients. They must be documented well enough to satisfy regulatory scrutiny. They must be standardized enough to work across global operations. These are not trivial requirements, and they cannot be dismissed as mere bureaucratic impositions.

But they must also be realistic enough to accommodate the complexity of manufacturing, flexible enough to incorporate operator metis, honest enough to acknowledge the gap between procedures and practice, and resilient enough to detect and correct performance drift before catastrophic failures occur.

We will not achieve this by adding more procedures, more documentation, more oversight. We’ve been trying that approach for decades, and the result is the bureaucratic trap we’re in. Every new procedure adds another layer to the Castle, another barrier between quality management and operational reality, another opportunity for the gap between work-as-imagined and work-as-done to widen.

Instead, we need quality systems designed around falsifiable predictions tested through ongoing verification. Systems that value learning over legibility. Systems that bridge bureaucratic boundaries to incorporate greasy-hands knowledge. Systems that distinguish between productive compliance and compliance theater. Systems that acknowledge complexity rather than reducing it to manageable simplifications that don’t reflect reality.

We need, in short, to stop building the Castle and start building systems for humans doing real work under real conditions.

Kafka never finished The Castle. The manuscript breaks off mid-sentence. Whether K. ever reaches the Castle, whether the officials ever explain themselves, whether the flawless bureaucracy ever acknowledges its contradictions—we’ll never know.

But pharmaceutical quality professionals don’t have the luxury of leaving the story unfinished. We’re living in it. Every day we choose whether to add another procedure to the Castle or to build something different. Every deviation investigation either perpetuates compliance theater or pursues genuine learning. Every CAPA either checks boxes or solves problems. Every validation either creates falsifiable predictions or generates documentation that satisfies audits without ensuring quality.

The bureaucratic trap is powerful precisely because each individual choice seems reasonable. Each procedure addresses a real gap. Each documentation requirement responds to an audit finding. Each oversight layer prevents a potential problem. And gradually, imperceptibly, we build a system that looks comprehensive and rigorous and “flawless” but may or may not be ensuring the quality it exists to ensure.

Escaping the trap requires intellectual honesty about whether our quality systems are working. It requires organizational courage to acknowledge gaps between procedures and practice. It requires regulatory maturity to discuss work-as-done rather than pretending work-as-imagined is reality. It requires quality leadership that values effectiveness over auditability.

Most of all, it requires remembering why we built quality systems in the first place: not to satisfy inspections, not to generate documentation, not to create employment for quality professionals, but to ensure that medicines reaching patients are safe, effective, and consistently manufactured to specification.

That goal is not served by Kafkaesque bureaucracy. It’s not served by the Castle, with its mysterious officials and contradictory explanations and flawless procedures that somehow involve destroying paperwork when nobody knows what to do with it.

It’s served by systems designed for humans, systems that acknowledge complexity, systems that incorporate the metis of people who actually do the work, systems that make falsifiable predictions and honestly evaluate whether those predictions hold.

It’s served by escaping the bureaucratic trap.

The question is whether pharmaceutical quality leadership has the courage to leave the Castle.

Sidney Dekker: The Safety Scientist Who Influences How I Think About Quality

Over the past decades, as I’ve grown and now led quality organizations in biotechnology, I’ve encountered many thinkers who’ve shaped my approach to investigation and risk management. But few have fundamentally altered my perspective like Sidney Dekker. His work didn’t just add to my toolkit—it forced me to question some of my most basic assumptions about human error, system failure, and what it means to create genuinely effective quality systems.

Dekker’s challenge to move beyond “safety theater” toward authentic learning resonates deeply with my own frustrations about quality systems that look impressive on paper but fail when tested by real-world complexity.

Why Dekker Matters for Quality Leaders

Professor Sidney Dekker brings a unique combination of academic rigor and operational experience to safety science. As both a commercial airline pilot and the Director of the Safety Science Innovation Lab at Griffith University, he understands the gap between how work is supposed to happen and how it actually gets done. This dual perspective—practitioner and scholar—gives his critiques of traditional safety approaches unusual credibility.

But what initially drew me to Dekker’s work wasn’t his credentials. It was his ability to articulate something I’d been experiencing but couldn’t quite name: the growing disconnect between our increasingly sophisticated compliance systems and our actual ability to prevent quality problems. His concept of “drift into failure” provided a framework for understanding why organizations with excellent procedures and well-trained personnel still experience systemic breakdowns.

The “New View” Revolution

Dekker’s most fundamental contribution is what he calls the “new view” of human error—a complete reframing of how we understand system failures. Having spent years investigating deviations and CAPAs, I can attest to how transformative this shift in perspective can be.

The Traditional Approach I Used to Take:

Human error causes problems
People are unreliable; systems need protection from human variability
Solutions focus on better training, clearer procedures, more controls

Dekker’s New View That Changed My Practice:

Human error is a symptom of deeper systemic issues
People are the primary source of system reliability, not the threat to it
Variability and adaptation are what make complex systems work

This isn’t just academic theory—it has practical implications for every investigation I lead. When I encounter “operator error” in a deviation investigation, Dekker’s framework pushes me to ask different questions: What made this action reasonable to the operator at the time? What system conditions shaped their decision-making? How did our procedures and training actually perform under real-world conditions?

This shift aligns perfectly with the causal reasoning approaches I’ve been developing on this blog. Instead of stopping at “failure to follow procedure,” we dig into the specific mechanisms that drove the event—exactly what Dekker’s view demands.

Drift Into Failure: Why Good Organizations Go Bad

Perhaps Dekker’s most powerful concept for quality leaders is “drift into failure”—the idea that organizations gradually migrate toward disaster through seemingly rational local decisions. This isn’t sudden catastrophic failure; it’s incremental erosion of safety margins through competitive pressure, resource constraints, and normalized deviance.

I’ve seen this pattern repeatedly. For example, a cleaning validation program starts with robust protocols, but over time, small shortcuts accumulate: sampling points that are “difficult to access” get moved, hold times get shortened when production pressure increases, acceptance criteria get “clarified” in ways that gradually expand limits.

Each individual decision seems reasonable in isolation. But collectively, they represent drift—a gradual migration away from the original safety margins toward conditions that enable failure. The contamination events and data integrity issues that plague our industry often represent the endpoint of these drift processes, not sudden breakdowns in otherwise reliable systems.

Beyond Root Cause: Understanding Contributing Conditions

Traditional root cause analysis seeks the single factor that “caused” an event, but complex system failures emerge from multiple interacting conditions. The take-the-best heuristic I’ve been exploring on this blog—focusing on the most causally powerful factor—builds directly on Dekker’s insight that we need to understand mechanisms, not hunt for someone to blame.

When I investigate a failure now, I’m not looking for THE root cause. I’m trying to understand how various factors combined to create conditions for failure. What pressures were operators experiencing? How did procedures perform under actual conditions? What information was available to decision-makers? What made their actions reasonable given their understanding of the situation?

This approach generates investigations that actually help prevent recurrence rather than just satisfying regulatory expectations for “complete” investigations.

Just Culture: Moving Beyond Blame

Dekker’s evolution of just culture thinking has been particularly influential in my leadership approach. His latest work moves beyond simple “blame-free” environments toward restorative justice principles—asking not “who broke the rule” but “who was hurt and how can we address underlying needs.”

This shift has practical implications for how I handle deviations and quality events. Instead of focusing on disciplinary action, I’m asking: What systemic conditions contributed to this outcome? What support do people need to succeed? How can we address the underlying vulnerabilities this event revealed?

This doesn’t mean eliminating accountability—it means creating accountability systems that actually improve performance rather than just satisfying our need to assign blame.

Safety Theater: The Problem with Compliance Performance

Dekker’s most recent work on “safety theater” hits particularly close to home in our regulated environment. He defines safety theater as the performance of compliance when under surveillance that retreats to actual work practices when supervision disappears.

I’ve watched organizations prepare for inspections by creating impressive documentation packages that bear little resemblance to how work actually gets done. Procedures get rewritten to sound more rigorous, training records get updated, and everyone rehearses the “right” answers for auditors. But once the inspection ends, work reverts to the adaptive practices that actually make operations function.

This theater emerges from our desire for perfect, controllable systems, but it paradoxically undermines genuine safety by creating inauthenticity. People learn to perform compliance rather than create genuine safety and quality outcomes.

The falsifiable quality systems I’ve been advocating on this blog represent one response to this problem—creating systems that can be tested and potentially proven wrong rather than just demonstrated as compliant.

Six Practical Takeaways for Quality Leaders

After years of applying Dekker’s insights in biotechnology manufacturing, here are the six most practical lessons for quality professionals:

1. Treat “Human Error” as the Beginning of Investigation, Not the End

When investigations conclude with “human error,” they’ve barely started. This should prompt deeper questions: Why did this action make sense? What system conditions shaped this decision? What can we learn about how our procedures and training actually perform under pressure?

2. Understand Work-as-Done, Not Just Work-as-Imagined

There’s always a gap between procedures (work-as-imagined) and actual practice (work-as-done). Understanding this gap and why it exists is more valuable than trying to force compliance with unrealistic procedures. Some of the most important quality improvements I’ve implemented came from understanding how operators actually solve problems under real conditions.

3. Measure Positive Capacities, Not Just Negative Events

Traditional quality metrics focus on what didn’t happen—no deviations, no complaints, no failures. I’ve started developing metrics around investigation quality, learning effectiveness, and adaptive capacity rather than just counting problems. How quickly do we identify and respond to emerging issues? How effectively do we share learning across sites? How well do our people handle unexpected situations?

4. Create Psychological Safety for Learning

Fear and punishment shut down the flow of safety-critical information. Organizations that want to learn from failures must create conditions where people can report problems, admit mistakes, and share concerns without fear of retribution. This is particularly challenging in our regulated environment, but it’s essential for moving beyond compliance theater toward genuine learning.

5. Focus on Contributing Conditions, Not Root Causes

Complex failures emerge from multiple interacting factors, not single root causes. The take-the-best approach I’ve been developing helps identify the most causally powerful factor while avoiding the trap of seeking THE cause. Understanding mechanisms is more valuable than finding someone to blame.

6. Embrace Adaptive Capacity Instead of Fighting Variability

People’s ability to adapt and respond to unexpected conditions is what makes complex systems work, not a threat to be controlled. Rather than trying to eliminate human variability through ever-more-prescriptive procedures, we should understand how that variability creates resilience and design systems that support rather than constrain adaptive problem-solving.

Connection to Investigation Excellence

Dekker’s work provides the theoretical foundation for many approaches I’ve been exploring on this blog. His emphasis on testable hypotheses rather than compliance theater directly supports falsifiable quality systems. His new view framework underlies the causal reasoning methods I’ve been developing. His focus on understanding normal work, not just failures, informs my approach to risk management.

Most importantly, his insistence on moving beyond negative reasoning (“what didn’t happen”) to positive causal statements (“what actually happened and why”) has transformed how I approach investigations. Instead of documenting failures to follow procedures, we’re understanding the specific mechanisms that drove events—and that makes all the difference in preventing recurrence.

Essential Reading for Quality Leaders

If you’re leading quality organizations in today’s complex regulatory environment, these Dekker works are essential:

Start Here:

The Field Guide to Understanding Human Error (3rd Edition) – Foundation for the new view approach
Just Culture: Restorative Justice and Accountability (4th Edition) – Practical approach to accountability without blame

For Investigation Excellence:

Behind Human Error (with Woods, Cook, et al.) – Comprehensive framework for moving beyond blame
Drift into Failure – Understanding how good organizations gradually deteriorate

For Current Challenges:

Safety Theater – Critique of compliance-focused approaches particularly relevant to regulated industries
Patient Safety: A Human Factors Approach – Healthcare applications that translate well to pharmaceutical quality

The Leadership Challenge

Dekker’s work challenges us as quality leaders to move beyond the comfortable certainty of compliance-focused approaches toward the more demanding work of creating genuine learning systems. This requires admitting that our procedures and training might not work as intended. It means supporting people when they make mistakes rather than just punishing them. It demands that we measure our success by how well we learn and adapt, not just how well we document compliance.

This isn’t easy work. It requires the kind of organizational humility that Amy Edmondson and other leadership researchers emphasize—the willingness to be proven wrong in service of getting better. But in my experience, organizations that embrace this challenge develop more robust quality systems and, ultimately, better outcomes for patients.

The question isn’t whether Sidney Dekker is right about everything—it’s whether we’re willing to test his ideas and learn from the results. That’s exactly the kind of falsifiable approach that both his work and effective quality systems demand.

Causal Reasoning: A Transformative Approach to Root Cause Analysis

Energy Safety Canada recently published a white paper on causal reasoning that offers valuable insights for quality professionals across industries. As someone who has spent decades examining how we investigate deviations and perform root cause analysis, I found their framework refreshing and remarkably aligned with the challenges we face in pharmaceutical quality. The paper proposes a fundamental shift in how we approach investigations, moving from what they call “negative reasoning” to “causal reasoning” that could significantly improve our ability to prevent recurring issues and drive meaningful improvement.

The Problem with Traditional Root Cause Analysis

Many of us in quality have experienced the frustration of seeing the same types of deviations recur despite thorough investigations and seemingly robust CAPAs. The Energy Safety Canada white paper offers a compelling explanation for this phenomenon: our investigations often focus on what did not happen rather than what actually occurred.

This approach, which the authors term “negative reasoning,” leads investigators to identify counterfactuals-things that did not occur, such as “operators not following procedures” or “personnel not stopping work when they should have”. The problem is fundamental: what was not happening cannot create the outcomes we experienced. As the authors aptly state, these counterfactuals “exist only in retrospection and never actually influenced events,” yet they dominate many of our investigation conclusions.

This insight resonates strongly with what I’ve observed in pharmaceutical quality. Six years ago the MHRA’s 2019 citation of 210 companies for inadequate root cause analysis and CAPA development – including 6 critical findings – takes on renewed significance in light of Sanofi’s 2025 FDA warning letter. While most cited organizations likely believed their investigation processes were robust (as Sanofi presumably did before their contamination failures surfaced), these parallel cases across regulatory bodies and years expose a persistent industry-wide disconnect between perceived and actual investigation effectiveness. These continued failures exemplify how superficial root cause analysis creates dangerous illusions of control – precisely the systemic flaw the MHRA data highlighted six years prior.

Negative Reasoning vs. Causal Reasoning: A Critical Distinction

The white paper makes a distinction that I find particularly valuable: negative reasoning seeks to explain outcomes based on what was missing from the system, while causal reasoning looks for what was actually present or what happened. This difference may seem subtle, but it fundamentally changes the nature and outcomes of our investigations.

When we use negative reasoning, we create what the white paper calls “an illusion of cause without being causal”. We identify things like “failure to follow procedures” or “inadequate risk assessment,” which may feel satisfying but don’t explain why those conditions existed in the first place. These conclusions often lead to generic corrective actions that fail to address underlying issues.

In contrast, causal reasoning requires statements that have time, place, and magnitude. It focuses on what was necessary and sufficient to create the effect, building a logically tight cause-and-effect diagram. This approach helps reveal how work is actually done rather than comparing reality to an imagined ideal.

This distinction parallels the gap between “work-as-imagined” (the black line) and “work-as-done” (the blue line). Too often, our investigations focus only on deviations from work-as-imagined without trying to understand why work-as-done developed differently.

A Tale of Two Analyses: The Power of Causal Reasoning

The white paper presents a compelling case study involving a propane release and operator injury that illustrates the difference between these two approaches. When initially analyzed through negative reasoning, investigators concluded the operator:

Used an improper tool
Deviated from good practice
Failed to recognize hazards
Failed to learn from past experiences

These conclusions placed blame squarely on the individual and led leadership to consider terminating the operator.

However, when the same incident was examined through causal reasoning, a different picture emerged:

The operator used the pipe wrench because it was available at the pump specifically for this purpose
The pipe wrench had been deliberately left at that location because operators knew the valve was hard to close
The operator acted quickly because he perceived a risk to the plant and colleagues
Leadership had actually endorsed this workaround four years earlier during a turnaround

This causally reasoned analysis revealed that what appeared to be an individual failure was actually a system-level issue that had been normalized over time. Rather than punishing the operator, leadership recognized their own role in creating the conditions for the incident and implemented systemic improvements.

This example reminded me of our discussions on barrier analysis, where we examine barriers that failed, weren’t used, or didn’t exist. But causal reasoning takes this further by exploring why those conditions existed in the first place, creating a much richer understanding of how work actually happens.

First 24 Hours: Where Causal Reasoning Meets The Golden Day

In my recent post on “The Golden Start to a Deviation Investigation,” I emphasized how critical the first 24 hours are after discovering a deviation. This initial window represents our best opportunity to capture accurate information and set the stage for a successful investigation. The Energy Safety Canada white paper complements this concept perfectly by providing guidance on how to use those critical hours effectively.

When we apply causal reasoning during these early stages, we focus on collecting specific, factual information about what actually occurred rather than immediately jumping to what should have happened. This means documenting events with specificity (time, place, magnitude) and avoiding premature judgments about deviations from procedures or expectations.

As I’ve previously noted, clear and precise problem definition forms the foundation of any effective investigation. Causal reasoning enhances this process by ensuring we document using specific, factual language that describes what occurred rather than what didn’t happen. This creates a much stronger foundation for the entire investigation.

Beyond Human Error: System Thinking and Leadership’s Role

One of the most persistent challenges in our field is the tendency to attribute events to “human error.” As I’ve discussed before, when human error is suspected or identified as the cause, this should be justified only after ensuring that process, procedural, or system-based errors have not been overlooked. The white paper reinforces this point, noting that human actions and decisions are influenced by the system in which people work.

In fact, the paper presents a hierarchy of causes that resonates strongly with systems thinking principles I’ve advocated for previously. Outcomes arise from physical mechanisms influenced by human actions and decisions, which are in turn governed by systemic factors. If we only address physical mechanisms or human behaviors without changing the system, performance will eventually migrate back to where it has always been.

This connects directly to what I’ve written about quality culture being fundamental to providing quality. The white paper emphasizes that leadership involvement is directly correlated with performance improvement. When leaders engage to set conditions and provide resources, they create an environment where investigations can reveal systemic issues rather than just identify procedural deviations or human errors.

Implementing Causal Reasoning in Pharmaceutical Quality

For pharmaceutical quality professionals looking to implement causal reasoning in their investigation processes, I recommend starting with these practical steps:

1. Develop Investigator Competencies

As I’ve discussed in my analysis of Sanofi’s FDA warning letter, having competent investigators is crucial. Organizations should:

Define required competencies for investigators
Provide comprehensive training on causal reasoning techniques
Implement mentoring programs for new investigators
Regularly assess and refresh investigator skills

2. Shift from Counterfactuals to Causal Statements

Review your recent investigations and look for counterfactual statements like “operators did not follow the procedure.” Replace these with causal statements that describe what actually happened and why it made sense to the people involved at the time.

3. Implement a Sponsor-Driven Approach

The white paper emphasizes the importance of investigation sponsors (otherwise known as Area Managers) who set clear conditions and expectations. This aligns perfectly with my belief that quality culture requires alignment between top management behavior and quality system philosophy. Sponsors should:

Clearly define the purpose and intent of investigations
Specify that a causal reasoning orientation should be used
Provide resources and access needed to find data and translate it into causes
Remain engaged throughout the investigation process

Infographic capturing the 4 things a sponsor should do above

4. Use Structured Causal Analysis Tools

While the M-based frameworks I’ve discussed previously (4M, 5M, 6M) remain valuable for organizing contributing factors, they should be complemented with tools that support causal reasoning. The Cause-Consequence Analysis (CCA) I described in a recent post offers one such approach, combining elements of fault tree analysis and event tree analysis to provide a holistic view of risk scenarios.

From Understanding to Improvement

The Energy Safety Canada white paper’s emphasis on causal reasoning represents a valuable contribution to how we think about investigations across industries. For pharmaceutical quality professionals, this approach offers a way to move beyond compliance-focused investigations to truly understand how our systems operate and how to improve them.

As the authors note, “The capacity for an investigation to improve performance is dependent on the type of reasoning used by investigators”. By adopting causal reasoning, we can build investigations that reveal how work actually happens rather than simply identifying deviations from how we imagine it should happen.

This approach aligns perfectly with my long-standing belief that without a strong quality culture, people will not be ready to commit and involve themselves fully in building and supporting a robust quality management system. Causal reasoning creates the transparency and learning that form the foundation of such a culture.

I encourage quality professionals to download and read the full white paper, reflect on their current investigation practices, and consider how causal reasoning might enhance their approach to understanding and preventing deviations. The most important questions to consider are:

Do your investigation conclusions focus on what didn’t happen rather than what did?
How often do you identify “human error” without exploring the system conditions that made that error likely?
Are your leaders engaged as sponsors who set conditions for successful investigations?
What barriers exist in your organization that prevent learning from events?

As we continue to evolve our understanding of quality and safety, approaches like causal reasoning offer valuable tools for creating the transparency needed to navigate complexity and drive meaningful improvement.

Quality Review

What does a quality reviewer do?

Maintaining high-quality products is paramount, and a critical component of ensuring quality is implementing a robust review of work by a second or third person, a peer review, and/or quality review—also known as a work product review process. Like many tools, it can be underutilized. It also gets to the heart of the question of Quality Unit oversight.

Introduction to Work Product Review

Work product review systematically evaluates the output from various processes or tasks to ensure they meet predefined quality standards. This review is crucial in environments where the quality of the final product directly impacts safety and efficacy, such as in pharmaceutical manufacturing. Work product review aims to identify any deviations or defects early in the process, allowing for timely corrections and minimizing the risk of non-compliance with regulatory requirements.

Criteria for Work Product Review

To ensure that work product reviews are effective, several key criteria should be established:

Integration with Quality Management Systems: Integrate risk-based thinking into the quality management system to ensure that work product reviews are aligned with overall quality objectives. This involves regularly reviewing and updating risk assessments to reflect changes in processes or new information.
Clear Objectives: The review should have well-defined objectives that align with the process they exist within and regulatory requirements. For instance, in pharmaceutical manufacturing, these objectives might include ensuring that all documentation is accurate and complete and that manufacturing processes adhere to GMP standards.
Risk-Based: Apply work product reviews to areas identified as high-risk during the risk assessment. This ensures that resources are allocated efficiently, focusing on processes that have the greatest potential impact on quality.
Standardized Procedures: Standardized procedures should be established for conducting the review. These procedures should outline the steps involved, the reviewers’ roles and responsibilities, and the criteria for accepting or rejecting the work product.
Trained Reviewers: Reviewers should be adequately trained and competent in the subject matter. This means understanding not just the deliverable being reviewed but the regulatory framework it sits within and how it applies to the specific work products being reviewed in a GMP environment.
Documentation: All reviews should be thoroughly documented. This documentation should include the review’s results, any findings or issues identified, and actions taken to address these issues.
Feedback Loop: There should be a mechanism for feedback from the review process to improve future work products. This could involve revising procedures or providing additional training to personnel.

Bridging the Gap Between Work-as-Imagined, Work-as-Prescribed, and Work-as-Done

Work product review is a systematic process that evaluates the output from various tasks to ensure they meet predefined quality standards connecting to work-as-imagined, work-as-prescribed, and work-as-done. Work product review serves as a bridge between these concepts by systematically evaluating the output of work processes. Here’s how it connects:

Alignment with Work-as-Prescribed: Work product review ensures that outputs comply with established standards and procedures (work-as-prescribed), helping to maintain regulatory compliance and quality standards.
Insight into Work-as-Done: Through the review process, organizations gain insight into how work is actually being performed (work-as-done). This helps identify any deviations from prescribed procedures and allows for adjustments to improve alignment between work-as-prescribed and work-as-done.
Closing the Gap with Work-as-Imagined: By documenting and addressing discrepancies between work-as-imagined and work-as-done, work product review facilitates communication and feedback that can refine policies and procedures. This helps to bring work-as-imagined closer to the realities of work-as-done, improving the effectiveness of quality oversight.

Work product review is essential for ensuring that the quality of work outputs aligns with both prescribed standards and the realities of how work is actually performed. By bridging the gaps between work-as-imagined, work-as-prescribed, and work-as-done, organizations can enhance their quality management systems and maintain high standards of quality, safety and efficacy.

Aligning to the Role of Quality Unit Oversight

While work product review does not guarantee Quality Unit Oversight, it is a potential control to ensure this oversight.

In the pharmaceutical industry, the Quality Unit plays a pivotal role in ensuring drug products’ safety, efficacy, and quality. It oversees all quality-related aspects, from raw material selection to final product release. However, the Quality Unit must be enabled appropriately and structured within the organization to effectively exercise its authority and fulfill its responsibilities. This blog post explores what it means for a Quality Unit to have the necessary authority and how insufficient implementation of its responsibilities can impact pharmaceutical manufacturing.

Responsibilities of the Quality Unit

Establishing and Maintaining the Quality System: The Quality Unit must set up and continuously update the quality management system to ensure compliance with GxPs and industry best practices.

Auditing and Compliance: Conduct internal audits to ensure adherence to policies and procedures, and report quality system performance metrics.

Approving and Rejecting Components and Products: The Quality Unit has the authority to approve or reject components, drug products, and packaging materials based on quality standards.

Investigating Nonconformities: Ensuring thorough investigations into production errors, discrepancies, and complaints related to product quality.

Keeping Management Informed: Reporting on product, process, and system risks, as well as outcomes of regulatory inspections.

What It Means for a Quality Unit to Be Enabled

For a Quality Unit to be effectively enabled, it must have:

Independence: The Quality Unit should operate independently of production units to avoid conflicts of interest and ensure unbiased decision-making.
Authority: It must have the authority to approve or reject the work product without undue influence from other departments.
Resources: Adequate personnel are essential for conducting the quality unit functions.
Documentation and Procedures: Clear, documented procedures outlining responsibilities and processes are crucial for maintaining consistency and compliance.

Insufficient Implementation of Responsibilities

When a Quality Unit insufficiently implements its responsibilities, it can lead to significant issues, including:

Regulatory Noncompliance: Failure to adhere to GxPs and regulatory standards can result in regulatory action.
Product Quality Issues: Inadequate oversight can lead to the release of substandard products, posing risks to patient safety and public health.
Lack of Continuous Improvement: Without effective quality systems in place, opportunities for process improvements and innovation may be missed.

The Quality Unit is the backbone of pharmaceutical manufacturing, ensuring that products meet the highest standards of quality and safety. By understanding the Quality Unit’s responsibilities and ensuring it has the necessary authority and resources, pharmaceutical companies can maintain compliance, protect public health, and foster a culture of continuous improvement. Inadequate implementation of these responsibilities can have severe consequences, emphasizing the importance of a well-structured and empowered Quality Unit.

By understanding these responsibilities, we can take a risk-based approach to applying quality review.

When to Apply Quality Review as Work Product Review

Work product review by Quality should be applied at critical stages to guarantee critical-to-quality attributes, including adherence to the regulations. This should be a risk-based approach. As such, it should be identified as controls in a living risks assessment and adjusted (add more, remove where unnecessary) as appropriate.

Closely scrutinize the responsibilities of the Quality Unit in the regulations to ensure all are met.

Best Practices in Quality Review

Rubrics are a great way to standardize quality reviews. If it is important enough to require a work review, it is important enough to standardize. The process owner should develop and maintain these rubrics with an appropriate group of stakeholder custodians. This is a key part of knowledge management. Having this cross-functional perspective on the output and what quality looks like is critical. This rubric should include:

Definition of prescribed work and the intended output that is being reviewed
Potential outcomes related to critical attributes, including definitions of technical accuracy
Methods and techniques used to generate the outcome
Operating experience and lessons learned
Risks, hazards, and user-centered design considerations
Requirements, standards, and code compliance
Planning, oversight, and acceptance testing
Input data and sources
Assumptions
Documentation required
Reviews and approvals required
Program or procedural obstacles to desired performance
Surprise situations, for example, unanticipated risk factors, schedule or scope changes, and organizational issues
Engineering human performance tool(s) applicable to activities being reviewed.

The rubric should have an assessment component, and that assessment should feed back into the originator’s qualified state.

Work product reviews must be early enough to allow feedback into the normal work for repetitive tasks. This should lead to gates in processes, quality-on-the-floor, or better-trained supervisors performing better and more effective reviews. This feedback should always be to the responsible person – the originator—and should be, wherever possible, face-to-face feedback to resolve the particular issues identified. This dialogue is critical.

Conclusion

Work product review is a powerful tool for enhancing quality oversight. By aligning this process with the responsibilities of the Quality Unit and implementing best practices such as standardized rubrics and a risk-based approach, companies can ensure that their products meet the highest standards of quality and safety. Effective work product review not only supports regulatory compliance but also fosters a culture of continuous improvement, which is essential for maintaining excellence in the pharmaceutical industry.

Types of Work, an Explainer

The concepts of work-as-imagined, work-as-prescribed, work-as-done, work-as-disclosed, and work-as-reported have been discussed and developed primarily within the field of human factors and ergonomics. These concepts have been elaborated by various experts, including Steven Shorrock, who has written extensively on the topic and I cannot recommend enough.

Work-as-Imagined: This concept refers to how people think work should be done or imagine it is done. It is often used by policymakers, regulators, and managers who design work processes without direct involvement in the actual work.
Work-as-Prescribed: This involves the formalization of work through rules, procedures, and guidelines. It is how work is officially supposed to be done, often documented in organizational standards.
Work-as-Done: This represents the reality of how work is actually performed in practice, including the adaptations and adjustments made by workers to meet real-world demands.
Work-as-Disclosed: Also known as work-as-reported or work-as-explained, this is how people describe or report their work, which may differ from both work-as-prescribed and work-as-done due to various factors, including safety and organizational culture[3][4].
Work-as-Reported: This term is often used interchangeably with work-as-disclosed and refers to the accounts of work provided by workers, which may be influenced by what they believe should be communicated to others.
Work-as-Measured: The quantifiable aspects of work that are tracked and assessed, often focusing on performance metrics and outcomes

Aspect	Work-as-Done	Work-as-Imagined	Work-as-Instructed	Work-as-Prescribed	Work-as-Reported	Work-as-Measured
Definition	Actual activities performed in the workplace.	How work is thought to be done, based on assumptions and expectation.	Direct instructions given to workers on task performance.	Formalized work according to rules, policies, and procedures.	Description of work as shared verbally or in writing.	Quantitative assessment of work performance.
Purpose	Achieve objectives in real-world conditions, adapting as necessary.	Conceptual understanding and planning of work.	Ensure tasks are performed correctly and efficiently.	Standardize and control work for compliance and safety.	Communicate work processes and outcomes.	Evaluate work efficiency and effectiveness.
Characteristics	Adaptive, context-dependent, often involves improvisation.	Based on assumptions, may not align with reality.	Clear, direct, and often specific to tasks.	Detailed, formal, assumed to be the correct way to work.	May not fully reflect reality, influenced by audience and context.	Objective, based on metrics and data.

Aspect	Work-as-Measured	Work-as-Judged
Definition	Quantification or classification of aspects of work.	Evaluation or assessment of work based on criteria or standards.
Purpose	To assess, understand, and evaluate work performance using metrics and data.	To form opinions or make decisions about work quality or effectiveness.
Characteristics	Objective and subjective measures, often numerical; can lack stability and validity.	Subjective, influenced by personal biases, experiences, and expectations.
Agency	Conducted by supervisors, managers, or specialists in various fields.	Performed by individuals or groups with authority to evaluate work performance.
Granularity	Can range from coarse (e.g., overall productivity) to fine (e.g., specific actions).	Typically broader, considering overall performance rather than specific details.
Influence	Affected by technological, social, and regulatory contexts.	Affected by preconceived notions and potential biases.