The Hidden Contamination Hazards: What the Catalent Warning Letter Reveals About Systemic Aseptic Processing Failures

The November 2025 FDA Warning Letter to Catalent Indiana, LLC reads like an autopsy report—a detailed dissection of how contamination hazards aren’t discovered but rather engineered into aseptic operations through a constellation of decisions that individually appear defensible yet collectively create what I’ve previously termed the “zemblanity field” in pharmaceutical quality. Section 2, addressing failures under 21 CFR 211.113(b), exposes contamination hazards that didn’t emerge from random misfortune but from deliberate choices about decontamination strategies, sampling methodologies, intervention protocols, and investigation rigor.​

What makes this warning letter particularly instructive isn’t the presence of contamination events—every aseptic facility battles microbial ingress—but rather the systematic architectural failures that allowed contamination hazards to persist unrecognized, uninvestigated, and unmitigated despite multiple warning signals spanning more than 20 deviations and customer complaints. The FDA’s critique centers on three interconnected contamination hazard categories: VHP decontamination failures involving occluded surfaces, inadequate environmental monitoring methods that substituted convenience for detection capability, and intervention risk assessments that ignored documented contamination routes.

For those of us responsible for contamination control in aseptic manufacturing, this warning letter demands we ask uncomfortable questions: How many of our VHP cycles are validated against surfaces that remain functionally occluded? How often have we chosen contact plates over swabs because they’re faster, not because they’re more effective? When was the last time we terminated a media fill and treated it with the investigative rigor of a batch contamination event?

The Occluded Surface Problem: When Decontamination Becomes Theatre

The FDA’s identification of occluded surfaces as contamination sources during VHP decontamination represents a failure mode I’ve observed with troubling frequency across aseptic facilities. The fundamental physics are unambiguous: vaporized hydrogen peroxide achieves sporicidal efficacy through direct surface contact at validated concentration-time profiles. Any surface the vapor doesn’t contact—or contacts at insufficient concentration—remains a potential contamination reservoir regardless of cycle completion indicators showing “successful” decontamination.​

The Catalent situation involved two distinct occluded surface scenarios, each revealing different architectural failures in contamination hazard assessment. First, equipment surfaces occluded during VHP decontamination that subsequently became contamination sources during atypical interventions involving equipment changes. The FDA noted that “the most probable root cause” of an environmental monitoring failure was equipment surfaces occluded during VHP decontamination, with contamination occurring during execution of an atypical intervention involving changes to components integral to stopper seating.​

This finding exposes a conceptual error I frequently encounter: treating VHP decontamination as a universal solution that overcomes design deficiencies rather than as a validated process with specific performance boundaries. The Catalent facility’s own risk assessments advised against interventions that could disturb potentially occluded surfaces, yet these interventions continued—creating the precise contamination pathway their risk assessments identified as unacceptable.​

The second occluded surface scenario involved wrapped components within the filling line where insufficient VHP exposure allowed potential contamination. The FDA cited “occluded surfaces on wrapped [components] within the [equipment] as the potential cause of contamination”. This represents a validation failure: if wrapping materials prevent adequate VHP penetration, either the wrapping must be eliminated, the decontamination method must change, or these surfaces must be treated through alternative validated processes.​

The literature on VHP decontamination is explicit about occluded surface risks. As Sandle notes, surfaces must be “designed and installed so that operations, maintenance, and repairs can be performed outside the cleanroom” and where unavoidable, “all surfaces needing decontaminated” must be explicitly identified. The PIC/S guidance is similarly unambiguous: “Continuously occluded surfaces do not qualify for such trials as they cannot be exposed to the process and should have been eliminated”. Yet facilities continue to validate VHP cycles that demonstrate biological indicator kill on readily accessible flat coupons while ignoring the complex geometries, wrapped items, and recessed surfaces actually present in their filling environments.

What does a robust approach to occluded surface assessment look like? Based on the regulatory expectations and technical literature, facilities should:

Conduct comprehensive occluded surface mapping during design qualification. Every component introduced into VHP-decontaminated spaces must undergo geometric analysis to identify surfaces that may not receive adequate vapor exposure. This includes crevices, threaded connections, wrapped items, hollow spaces, and any surface shadowed by another object. The mapping should document not just that surfaces exist but their accessibility to vapor flow based on the specific VHP distribution characteristics of the equipment.​

Validate VHP distribution using chemical and biological indicators placed on identified occluded surfaces. Flat coupon placement on readily accessible horizontal surfaces tells you nothing about vapor penetration into wrapped components or recessed geometries. Biological indicators should be positioned specifically where vapor exposure is questionable—inside wrapped items, within threaded connections, under equipment flanges, in dead-legs of transfer lines. If biological indicators in these locations don’t achieve the validated log reduction, the surfaces are occluded and require design modification or alternative decontamination methods.​

Establish clear intervention protocols that distinguish between “sterile-to-sterile” and “potentially contaminated” surface contact. The Catalent finding reveals that atypical interventions involving equipment changes exposed the Grade A environment to surfaces not reliably exposed to VHP. Intervention risk assessments must explicitly categorize whether the intervention involves only VHP-validated surfaces or introduces components from potentially occluded areas. The latter category demands heightened controls: localized Grade A air protection, pre-intervention surface swabbing and disinfection, real-time environmental monitoring during the intervention, and post-intervention investigation if environmental monitoring shows any deviation.​

Implement post-decontamination surface monitoring that targets historically occluded locations. If your facility has identified occluded surfaces that cannot be designed out, these become critical sampling locations for post-VHP environmental monitoring. Trending of these specific locations provides early detection of decontamination effectiveness degradation before contamination reaches product-contact surfaces.

The FDA’s remediation demand is appropriately comprehensive: “a review of VHP exposure to decontamination methods as well as permitted interventions, including a retrospective historical review of routine interventions and atypical interventions to determine their risks, a comprehensive identification of locations that are not reliably exposed to VHP decontamination (i.e., occluded surfaces), your plan to reduce occluded surfaces where feasible, review of currently permitted interventions and elimination of high-risk interventions entailing equipment manipulations during production campaigns that expose the ISO 5 environment to surfaces not exposed to a validated decontamination process, and redesign of any intervention that poses an unacceptable contamination risk”.​

This remediation framework represents best practice for any aseptic facility using VHP decontamination. The occluded surface problem isn’t limited to Catalent—it’s an industry-wide vulnerability wherever VHP validation focuses on demonstrating sporicidal activity under ideal conditions rather than confirming adequate vapor contact across all surfaces within the validated space.

Contact Plates Versus Swabs: The Detection Capability Trade-Off

The FDA’s critique of Catalent’s environmental monitoring methodology exposes a decision I’ve challenged repeatedly throughout my career: the use of contact plates for sampling irregular, product-contact surfaces in Grade A environments. The technical limitations are well-established, yet contact plates persist because they’re faster and operationally simpler—prioritizing workflow convenience over contamination detection capability.

The specific Catalent deficiency involved sampling filling line components using “contact plate, sampling [surfaces] with one sweeping sampling motion.” The FDA identified two fundamental inadequacies: “With this method, you are unable to attribute contamination events to specific [locations]” and “your firm’s use of contact plates is not as effective as using swab methods”. These limitations aren’t novel discoveries—they’re inherent to contact plate methodology and have been documented in the microbiological literature for decades.​

Contact plates—rigid agar surfaces pressed against the area to be sampled—were designed for flat, smooth surfaces where complete agar-to-surface contact can be achieved with uniform pressure. They perform adequately on stainless steel benchtops, isolator walls, and other horizontal surfaces. But filling line components—particularly those identified in the warning letter—present complex geometries: curved surfaces, corners, recesses, and irregular topographies where rigid agar cannot conform to achieve complete surface contact.

The microbial recovery implications are significant. When a contact plate fails to achieve complete surface contact, microorganisms in uncontacted areas remain unsampled. The result is a false-negative environmental monitoring reading that suggests contamination control while actual contamination persists undetected. Worse, the “sweeping sampling motion” described in the warning letter—moving a single contact plate across multiple locations—creates the additional problem the FDA identified: inability to attribute any recovered contamination to a specific surface. Was the contamination on the first component contacted? The third? Somewhere in between? This sampling approach provides data too imprecise for meaningful contamination source investigation.

The alternative—swab sampling—addresses both deficiencies. Swabs conform to irregular surfaces, accessing corners, recesses, and curved topographies that contact plates cannot reach. Swabs can be applied to specific, discrete locations, enabling precise attribution of any contamination recovered to a particular surface. The trade-off is operational: swab sampling requires more time, involves additional manipulative steps within Grade A environments, and demands different operator technique validation.​

Yet the Catalent warning letter makes clear that this operational inconvenience doesn’t justify compromised detection capability for critical product-contact surfaces. The FDA’s expectation—acknowledged in Catalent’s response—is swab sampling “to replace use of contact plates to sample irregular surfaces”. This represents a fundamental shift from convenience-optimized to detection-optimized environmental monitoring.​

What should a risk-based surface sampling strategy look like? The differentiation should be based on surface geometry and criticality:

Contact plates remain appropriate for flat, smooth, readily accessible surfaces where complete agar contact can be verified and where contamination risk is lower (Grade B floors, isolator walls, equipment external surfaces). The speed and simplicity advantages of contact plates justify their continued use in these applications.

Swab sampling should be mandatory for product-contact surfaces, irregular geometries, recessed areas, and any location where contact plate conformity is questionable. This includes filling needles, stopper bowls, vial transport mechanisms, crimping heads, and the specific equipment components cited in the Catalent letter. The additional time required for swab sampling is trivial compared to the contamination risk from inadequate monitoring.

Surface sampling protocols must specify the exact location sampled, not general equipment categories. Rather than “sample stopper bowl,” protocols should identify “internal rim of stopper bowl,” “external base of stopper bowl,” “stopper agitation mechanism interior surfaces.” This specificity enables contamination source attribution during investigations and ensures sampling actually reaches the highest-risk surfaces.

Swab technique must be validated to ensure consistent recovery from target surfaces. Simply switching from contact plates to swabs doesn’t guarantee improved detection unless swab technique—pressure applied, surface area contacted, swab saturation, transfer to growth media—is standardized and demonstrated to achieve adequate microbial recovery from the specific materials and geometries being sampled.​

The EU GMP Annex 1 and FDA guidance documents emphasize detection capability over convenience in environmental monitoring. The expectation isn’t perfect contamination prevention—that’s impossible in aseptic processing—but rather monitoring systems sensitive enough to detect contamination events when they occur, enabling investigation and corrective action before product impact. Contact plates on irregular surfaces fail this standard by design, not because of operator error or inadequate validation but because the fundamental methodology cannot access the surfaces requiring monitoring.​

The Intervention Paradox: When Risk Assessments Identify Hazards But Operations Ignore Them

Perhaps the most troubling element of the Catalent contamination hazards section isn’t the presence of occluded surfaces or inadequate sampling methods but rather the intervention management failure that reveals a disconnect between risk assessment and operational decision-making. Catalent’s risk assessments explicitly “advised against interventions that can disturb potentially occluded surfaces,” yet these high-risk interventions continued during production campaigns.​

This represents what I’ve termed “investigation theatre” in previous posts—creating the superficial appearance of risk-based decision-making while actual operations proceed according to production convenience rather than contamination risk mitigation. The risk assessment identified the hazard. The environmental monitoring data confirmed the hazard when contamination occurred during the intervention. Yet the intervention continued as an accepted operational practice.​

The specific intervention involved equipment changes to components “integral to stopper seating in the [filling line]”. These components operate at the critical interface between the sterile stopper and the vial—precisely the location where any contamination poses direct product impact risk. The intervention occurred during production campaigns rather than between campaigns when comprehensive decontamination and validation could occur. The intervention involved surfaces potentially occluded during VHP decontamination, meaning their microbiological state was unknown when introduced into the Grade A filling environment.​

Every element of this scenario screams “unacceptable contamination risk,” yet it persisted as accepted practice until FDA inspection. How does this happen? Based on my experience across multiple aseptic facilities, the failure mode follows a predictable pattern:

Production scheduling drives intervention timing rather than contamination risk assessment. Stopping a campaign for equipment maintenance creates schedule disruption, yield loss, and capacity constraints. The pressure to maintain campaign continuity overwhelms contamination risk considerations that appear theoretical compared to the immediate, quantifiable production impact.

Risk assessments become compliance artifacts disconnected from operational decision-making. The quality unit conducts a risk assessment, documents that certain interventions pose unacceptable contamination risk, and files the assessment. But when production encounters the situation requiring that intervention, the actual decision-making process references production need, equipment availability, and batch schedules—not the risk assessment that identified the intervention as high-risk.

Interventions become “normalized deviance”—accepted operational practices despite documented risks. After performing a high-risk intervention successfully (meaning without detected contamination) multiple times, it transitions from “high-risk intervention requiring exceptional controls” to “routine intervention” in operational thinking. The fact that adequate controls prevented contamination detection gets inverted into evidence that the intervention isn’t actually high-risk.

Environmental monitoring provides false assurance when contamination goes undetected. If a high-risk intervention occurs and subsequent environmental monitoring shows no contamination, operations interprets this as validation that the intervention is acceptable. But as discussed in the contact plate section, inadequate sampling methodology may fail to detect contamination that actually occurred. The absence of detected contamination becomes “proof” that contamination didn’t occur, reinforcing the normalization of high-risk interventions.

The EU GMP Annex 1 requirements for intervention management represent regulatory recognition of these failure modes. Annex 1 Section 8.16 requires “the list of interventions evaluated via risk analysis” and Section 9.36 requires that aseptic process simulations include “interventions and associated risks”. The framework is explicit: identify interventions, assess their contamination risk, validate that operators can perform them aseptically through media fills, and eliminate interventions that cannot be performed without unacceptable contamination risk.​

What does robust intervention risk management look like in practice?

Categorize interventions by contamination risk based on specific, documented criteria. The categorization should consider: surfaces contacted (sterile-to-sterile vs. potentially contaminated), duration of exposure, proximity to open product, operator actions required, first air protection feasibility, and frequency. This creates a risk hierarchy that enables differentiated control strategies rather than treating all interventions equivalently.​

Establish clear decision authorities for different intervention risk levels. Routine interventions (low contamination risk, validated through media fills, performed regularly) can proceed under operator judgment following standard procedures. High-risk interventions (those involving occluded surfaces, extended exposure, or proximity to open product) should require quality unit pre-approval including documented risk assessment and enhanced controls specification. Interventions identified as posing unacceptable risk should be prohibited until equipment redesign or process modification eliminates the contamination hazard.​

Validate intervention execution through media fills that specifically simulate the intervention’s contamination challenges. Generic media fills demonstrating overall aseptic processing capability don’t validate specific high-risk interventions. If your risk assessment identifies a particular intervention as posing contamination risk, your media fill program must include that intervention, performed by the operators who will execute it, under the conditions (campaign timing, equipment state, environmental conditions) where it will actually occur.​

Implement intervention-specific environmental monitoring that targets the contamination pathways identified in risk assessments. If the risk assessment identifies that an intervention may expose product to surfaces not reliably decontaminated, environmental monitoring immediately following that intervention should specifically sample those surfaces and adjacent areas. Trending this intervention-specific monitoring data separately from routine environmental monitoring enables detection of intervention-associated contamination patterns.​

Conduct post-intervention investigations when environmental monitoring shows any deviation. The Catalent warning letter describes an environmental monitoring failure whose “most probable root cause” was an atypical intervention involving equipment changes. This temporal association between intervention and contamination should trigger automatic investigation even if environmental monitoring results remain within action levels. The investigation should assess whether intervention protocols require modification or whether the intervention should be eliminated.​

The FDA’s remediation demand addresses this gap directly: “review of currently permitted interventions and elimination of high-risk interventions entailing equipment manipulations during production campaigns that expose the ISO 5 environment to surfaces not exposed to a validated decontamination process”. This requirement forces facilities to confront the intervention paradox: if your risk assessment identifies an intervention as high-risk, you cannot simultaneously permit it as routine operational practice. Either modify the intervention to reduce risk, validate enhanced controls that mitigate the risk, or eliminate the intervention entirely.​

Media Fill Terminations: When Failures Become Invisible

The Catalent warning letter’s discussion of media fill terminations exposes an investigation failure mode that reveals deeper quality system inadequacies. Since November 2023, Catalent terminated more than five media fill batches representing the filling line. Following two terminations for stoppering issues and extrinsic particle contamination, the facility “failed to open a deviation or an investigation at the time of each failure, as required by your SOPs”.​

Read that again. Media fills—the fundamental aseptic processing validation tool, the simulation specifically designed to challenge contamination control—were terminated due to failures, and no deviation was opened, no investigation initiated. The failures simply disappeared from the quality system, becoming invisible until FDA inspection revealed their existence.

The rationalization is predictable: “there was no impact to the SISPQ (Safety, Identity, Strength, Purity, Quality) of the terminated media batches or to any customer batches” because “these media fills were re-executed successfully with passing results”. This reasoning exposes a fundamental misunderstanding of media fill purpose that I’ve encountered with troubling frequency across the industry.​

A media fill is not a “test” that you pass or fail with product consequences. It is a simulation—a deliberate challenge to your aseptic processing capability using growth medium instead of product specifically to identify contamination risks without product impact. When a media fill is terminated due to a processing failure, that termination is itself the critical finding. The termination reveals that your process is vulnerable to exactly the failure mode that caused termination: stoppering problems that could occur during commercial filling, extrinsic particles that could contaminate product.

The FDA’s response is appropriately uncompromising: “You do not provide the investigations with a root cause that justifies aborting and re-executing the media fills, nor do you provide the corrective actions taken for each terminated media fill to ensure effective CAPAs were promptly initiated”. The regulatory expectation is clear: media fill terminations require investigation identical in rigor to commercial batch failures. Why did the stoppering issue occur? What equipment, material, or operator factors contributed? How do we prevent recurrence? What commercial batches may have experienced similar failures that went undetected?​

The re-execution logic is particularly insidious. By immediately re-running the media fill and achieving passing results, Catalent created the appearance of successful validation while ignoring the process vulnerability revealed by the termination. The successful re-execution proved only that under ideal conditions—now with heightened operator awareness following the initial failure—the process could be executed successfully. It provided no assurance that commercial operations, without that heightened awareness and under the same conditions that caused the initial termination, wouldn’t experience identical failures.

What should media fill termination management look like?

Treat every media fill termination as a critical deviation requiring immediate investigation initiation. The investigation should identify the root cause of the termination, assess whether the failure mode could occur during commercial manufacturing, evaluate whether previous commercial batches may have experienced similar failures, and establish corrective actions that prevent recurrence. This investigation must occur before re-execution, not instead of investigation.​

Require quality unit approval before media fill re-execution. The approval should be based on documented investigation findings demonstrating that the termination cause is understood, corrective actions are implemented, and re-execution will validate process capability under conditions that include the corrective actions. Re-execution without investigation approval perpetuates the “keep running until we get a pass” mentality that defeats media fill purpose.​

Implement media fill termination trending as a critical quality indicator. A facility terminating “more than five media fill batches” in a period should recognize this as a signal of fundamental process capability problems, not as a series of unrelated events requiring re-execution. Trending should identify common factors: specific operators, equipment states, intervention types, campaign timing.​

Ensure deviation tracking systems cannot exclude media fill terminations. The Catalent situation arose partly because “you failed to initiate a deviation record to capture the lack of an investigation for each of the terminated media fills, resulting in an undercounting of the deviations”. Quality metrics that exclude media fill terminations from deviation totals create perverse incentives to avoid formal deviation documentation, rendering media fill findings invisible to quality system oversight.​

The broader issue extends beyond media fill terminations to how aseptic processing validation integrates with quality systems. Media fills should function as early warning indicators—detecting aseptic processing vulnerabilities before product impact occurs. But this detection value requires that findings from media fills drive investigations, corrective actions, and process improvements with the same rigor as commercial batch deviations. When media fill failures can be erased through re-execution without investigation, the entire validation framework becomes performative rather than protective.

The Stopper Supplier Qualification Failure: Accepting Contamination at the Source

The stopper contamination issues discussed throughout the warning letter—mammalian hair found in or around stopper regions of vials from nearly 20 batches across multiple products—reveal a supplier qualification and incoming inspection failure that compounds the contamination hazards already discussed. The FDA’s critique focuses on Catalent’s “inappropriate reliance on pre-shipment samples (tailgate samples)” and failure to implement “enhanced or comparative sampling of stoppers from your other suppliers”.​

The pre-shipment or “tailgate” sample approach represents a fundamental violation of GMP sampling principles. Under this approach, the stopper supplier—not Catalent—collected samples from lots prior to shipment and sent these samples directly to Catalent for quality testing. Catalent then made accept/reject decisions for incoming stopper lots based on testing of supplier-selected samples that never passed through Catalent’s receiving or storage processes.​

Why does this matter? Because representative sampling requires that samples be selected from the material population actually received by the facility, stored under facility conditions, and handled through facility processes. Supplier-selected pre-shipment samples bypass every opportunity to detect contamination introduced during shipping, storage transitions, or handling. They enable a supplier to selectively sample from cleaner portions of production lots while shipping potentially contaminated material in the same lot to the customer.

The FDA guidance on this issue is explicit and has been for decades: samples for quality attribute testing “are to be taken at your facility from containers after receipt to ensure they are representative of the components in question”. This isn’t a new expectation emerging from enhanced regulatory scrutiny—it’s a baseline GMP requirement that Catalent systematically violated through reliance on tailgate samples.​

But the tailgate sample issue represents only one element of broader supplier qualification failures. The warning letter notes that “while stoppers from [one supplier] were the primary source of extrinsic particles, they were not the only source of foreign matter.” Yet Catalent implemented “limited, enhanced sampling strategy for one of your suppliers” while failing to “increase sampling oversight” for other suppliers. This selective enhancement—focusing remediation only on the most problematic supplier while ignoring systemic contamination risks across the stopper supply base—predictably failed to resolve ongoing contamination issues.​

What should stopper supplier qualification and incoming inspection look like for aseptic filling operations?

Eliminate pre-shipment or tailgate sampling entirely. All quality testing must be conducted on samples taken from received lots, stored in facility conditions, and selected using documented random sampling procedures. If suppliers require pre-shipment testing for their internal quality release, that’s their process requirement—it doesn’t substitute for the purchaser’s independent incoming inspection using facility-sampled material.​

Implement risk-based incoming inspection that intensifies sampling when contamination history indicates elevated risk. The warning letter notes that Catalent recognized stoppers as “a possible contributing factor for contamination with mammalian hairs” in July 2024 but didn’t implement enhanced sampling until May 2025—a ten-month delay. The inspection enhancement should be automatic and immediate when contamination events implicate incoming materials. The sampling intensity should remain elevated until trending data demonstrates sustained contamination reduction across multiple lots.​

Apply visual inspection with reject criteria specific to the defect types that create product contamination risk. Generic visual inspection looking for general “defects” fails to detect the specific contamination types—embedded hair, extrinsic particles, material fragments—that create sterile product risks. Inspection protocols must specify mammalian hair, fiber contamination, and particulate matter as reject criteria with sensitivity adequate to detect single-particle contamination in sampled stoppers.​

Require supplier process changes—not just enhanced sampling—when contamination trends indicate process capability problems. The warning letter acknowledges Catalent “worked with your suppliers to reduce the likelihood of mammalian hair contamination events” but notes that despite these efforts, “you continued to receive complaints from customers who observed mammalian hair contamination in drug products they received from you”. Enhanced sampling detects contamination; it doesn’t prevent it. Suppliers demonstrating persistent contamination require process audits, environmental control improvements, and validated contamination reduction demonstrated through process capability studies—not just promises to improve quality.​

Implement finished product visual inspection with heightened sensitivity for products using stoppers from suppliers with contamination history. The FDA notes that Catalent indicated “future batches found during visual inspection of finished drug products would undergo a re-inspection followed by tightened acceptable quality limit to ensure defective units would be removed” but didn’t provide the re-inspection procedure. This two-stage inspection approach—initial inspection followed by re-inspection with enhanced criteria for lots from high-risk suppliers—provides additional contamination detection but must be validated to demonstrate adequate defect removal.​

The broader lesson extends beyond stoppers to supplier qualification for any component used in sterile manufacturing. Components introduce contamination risks—microbial bioburden, particulate matter, chemical residues—that cannot be fully mitigated through end-product testing. Supplier qualification must function as a contamination prevention tool, ensuring that materials entering aseptic operations meet microbiological and particulate quality standards appropriate for their role in maintaining sterility. Reliance on tailgate samples, delayed sampling enhancement, and acceptance of persistent supplier contamination all represent failures to recognize suppliers as critical contamination control points requiring rigorous qualification and oversight.

The Systemic Pattern: From Contamination Hazards to Quality System Architecture

Stepping back from individual contamination hazards—occluded surfaces, inadequate sampling, high-risk interventions, media fill terminations, supplier qualification failures—a systemic pattern emerges that connects this warning letter to the broader zemblanity framework I’ve explored in previous posts. These aren’t independent, unrelated deficiencies that coincidentally occurred at the same facility. They represent interconnected architectural failures in how the quality system approaches contamination control.​

The pattern reveals itself through three consistent characteristics:

Detection systems optimized for convenience rather than capability. Contact plates instead of swabs for irregular surfaces. Pre-shipment samples instead of facility-based incoming inspection. Generic visual inspection instead of defect-specific contamination screening. Each choice prioritizes operational ease and workflow efficiency over contamination detection sensitivity. The result is a quality system that generates reassuring data—passing environmental monitoring, acceptable incoming inspection results, successful visual inspection—while actual contamination persists undetected.

Risk assessments that identify hazards without preventing their occurrence. Catalent’s risk assessments advised against interventions disturbing potentially occluded surfaces, yet these interventions continued. The facility recognized stoppers as contamination sources in July 2024 but delayed enhanced sampling until May 2025. Media fill terminations revealed aseptic processing vulnerabilities but triggered re-execution rather than investigation. Risk identification became separated from risk mitigation—the assessment process functioned as compliance theatre rather than decision-making input.​

Investigation systems that erase failures rather than learn from them. Media fill terminations occurred without deviation initiation. Mammalian hair contamination events were investigated individually without recognizing the trend across 20+ deviations. Root cause investigations concluded “no product impact” based on passing sterility tests rather than addressing the contamination source enabling future events. The investigation framework optimized for batch release justification rather than contamination prevention.​

These patterns don’t emerge from incompetent quality professionals or inadequate resource allocation. They emerge from quality system design choices that prioritize production efficiency, workflow continuity, and batch release over contamination detection, investigation rigor, and source elimination. The system delivers what it was designed to deliver: maximum throughput with minimum disruption. It fails to deliver what patients require: contamination control capable of detecting and eliminating sterility risks before product impact.

Recommendations: Building Contamination Hazard Detection Into System Architecture

What does effective contamination hazard management look like at the quality system architecture level? Based on the Catalent failures and broader industry patterns, several principles should guide aseptic operations:

Design decontamination validation around worst-case geometries, not ideal conditions. VHP validation using flat coupons on horizontal surfaces tells you nothing about vapor penetration into the complex geometries, wrapped components, and recessed surfaces actually present in your filling line. Biological indicator placement should target occluded surfaces specifically—if you can’t achieve validated kill on these locations, they’re contamination hazards requiring design modification or alternative decontamination methods.

Select environmental monitoring methods based on detection capability for the surfaces and conditions actually requiring monitoring. Contact plates are adequate for flat, smooth surfaces. They’re inadequate for irregular product-contact surfaces, recessed areas, and complex geometries. Swab sampling takes more time but provides contamination detection capability that contact plates cannot match. The operational convenience sacrifice is trivial compared to the contamination risk from monitoring methods incapable of detecting contamination when it occurs.​

Establish intervention risk classification with decision authorities proportional to contamination risk. Routine low-risk interventions validated through media fills can proceed under operator judgment. High-risk interventions—those involving occluded surfaces, extended exposure, or proximity to open product—require quality unit pre-approval with documented enhanced controls. Interventions identified as posing unacceptable risk should be prohibited pending equipment redesign.​

Treat media fill terminations as critical deviations requiring investigation before re-execution. The termination reveals process vulnerability—the investigation must identify root cause, assess commercial batch risk, and establish corrective actions before validation continues. Re-execution without investigation perpetuates the failures that caused termination.​

Implement supplier qualification with facility-based sampling, contamination-specific inspection criteria, and automatic sampling enhancement when contamination trends emerge. Tailgate samples cannot provide representative material assessment. Visual inspection must target the specific contamination types—mammalian hair, particulate matter, material fragments—that create product risks. Enhanced sampling should be automatic and sustained when contamination history indicates elevated risk.​

Build investigation systems that learn from contamination events rather than erasing them through re-execution or “no product impact” conclusions. Contamination events represent failures in contamination control regardless of whether subsequent testing shows product remains within specification. The investigation purpose is preventing recurrence, not justifying release.​

The FDA’s comprehensive remediation demands represent what quality system architecture should look like: independent assessment of investigation capability, CAPA effectiveness evaluation, contamination hazard risk assessment covering material flows and equipment placement, detailed remediation with specific improvements, and ongoing management oversight throughout the manufacturing lifecycle.​

The Contamination Control Strategy as Living System

The Catalent warning letter’s contamination hazards section serves as a case study in how quality systems can simultaneously maintain surface-level compliance while allowing fundamental contamination control failures to persist. The facility conducted VHP decontamination cycles, performed environmental monitoring, executed media fills, and inspected incoming materials—checking every compliance box. Yet contamination hazards proliferated because these activities optimized for operational convenience and batch release justification rather than contamination detection and source elimination.

The EU GMP Annex 1 Contamination Control Strategy requirement represents regulatory recognition that contamination control cannot be achieved through isolated compliance activities. It requires integrated systems where facility design, decontamination processes, environmental monitoring, intervention protocols, material qualification, and investigation practices function cohesively to detect, investigate, and eliminate contamination sources. The Catalent failures reveal what happens when these elements remain disconnected: decontamination cycles that don’t reach occluded surfaces, monitoring that can’t detect contamination on irregular geometries, interventions that proceed despite identified risks, investigations that erase failures through re-execution​

For those of us responsible for contamination control in aseptic manufacturing, the question isn’t whether our facilities face similar vulnerabilities—they do. The question is whether our quality systems are architected to detect these vulnerabilities before regulators discover them. Are your VHP validations addressing actual occluded surfaces or ideal flat coupons? Are you using contact plates because they detect contamination effectively or because they’re operationally convenient? Do your intervention protocols prevent the high-risk activities your risk assessments identify? When media fills terminate, do investigations occur before re-execution?

The Catalent warning letter provides a diagnostic framework for assessing contamination hazard management. Use it. Map your own decontamination validation against the occluded surface criteria. Evaluate your environmental monitoring method selection against detection capability requirements. Review intervention protocols for alignment with risk assessments. Examine media fill termination handling for investigation rigor. Assess supplier qualification for facility-based sampling and contamination-specific inspection.

The contamination hazards are already present in your aseptic operations. The question is whether your quality system architecture can detect them.

Sidney Dekker: The Safety Scientist Who Influences How I Think About Quality

Over the past decades, as I’ve grown and now led quality organizations in biotechnology, I’ve encountered many thinkers who’ve shaped my approach to investigation and risk management. But few have fundamentally altered my perspective like Sidney Dekker. His work didn’t just add to my toolkit—it forced me to question some of my most basic assumptions about human error, system failure, and what it means to create genuinely effective quality systems.

Dekker’s challenge to move beyond “safety theater” toward authentic learning resonates deeply with my own frustrations about quality systems that look impressive on paper but fail when tested by real-world complexity.

Why Dekker Matters for Quality Leaders

Professor Sidney Dekker brings a unique combination of academic rigor and operational experience to safety science. As both a commercial airline pilot and the Director of the Safety Science Innovation Lab at Griffith University, he understands the gap between how work is supposed to happen and how it actually gets done. This dual perspective—practitioner and scholar—gives his critiques of traditional safety approaches unusual credibility.

But what initially drew me to Dekker’s work wasn’t his credentials. It was his ability to articulate something I’d been experiencing but couldn’t quite name: the growing disconnect between our increasingly sophisticated compliance systems and our actual ability to prevent quality problems. His concept of “drift into failure” provided a framework for understanding why organizations with excellent procedures and well-trained personnel still experience systemic breakdowns.

The “New View” Revolution

Dekker’s most fundamental contribution is what he calls the “new view” of human error—a complete reframing of how we understand system failures. Having spent years investigating deviations and CAPAs, I can attest to how transformative this shift in perspective can be.

The Traditional Approach I Used to Take:

  • Human error causes problems
  • People are unreliable; systems need protection from human variability
  • Solutions focus on better training, clearer procedures, more controls

Dekker’s New View That Changed My Practice:

  • Human error is a symptom of deeper systemic issues
  • People are the primary source of system reliability, not the threat to it
  • Variability and adaptation are what make complex systems work

This isn’t just academic theory—it has practical implications for every investigation I lead. When I encounter “operator error” in a deviation investigation, Dekker’s framework pushes me to ask different questions: What made this action reasonable to the operator at the time? What system conditions shaped their decision-making? How did our procedures and training actually perform under real-world conditions?

This shift aligns perfectly with the causal reasoning approaches I’ve been developing on this blog. Instead of stopping at “failure to follow procedure,” we dig into the specific mechanisms that drove the event—exactly what Dekker’s view demands.

Drift Into Failure: Why Good Organizations Go Bad

Perhaps Dekker’s most powerful concept for quality leaders is “drift into failure”—the idea that organizations gradually migrate toward disaster through seemingly rational local decisions. This isn’t sudden catastrophic failure; it’s incremental erosion of safety margins through competitive pressure, resource constraints, and normalized deviance.

I’ve seen this pattern repeatedly. For example, a cleaning validation program starts with robust protocols, but over time, small shortcuts accumulate: sampling points that are “difficult to access” get moved, hold times get shortened when production pressure increases, acceptance criteria get “clarified” in ways that gradually expand limits.

Each individual decision seems reasonable in isolation. But collectively, they represent drift—a gradual migration away from the original safety margins toward conditions that enable failure. The contamination events and data integrity issues that plague our industry often represent the endpoint of these drift processes, not sudden breakdowns in otherwise reliable systems.

Beyond Root Cause: Understanding Contributing Conditions

Traditional root cause analysis seeks the single factor that “caused” an event, but complex system failures emerge from multiple interacting conditions. The take-the-best heuristic I’ve been exploring on this blog—focusing on the most causally powerful factor—builds directly on Dekker’s insight that we need to understand mechanisms, not hunt for someone to blame.

When I investigate a failure now, I’m not looking for THE root cause. I’m trying to understand how various factors combined to create conditions for failure. What pressures were operators experiencing? How did procedures perform under actual conditions? What information was available to decision-makers? What made their actions reasonable given their understanding of the situation?

This approach generates investigations that actually help prevent recurrence rather than just satisfying regulatory expectations for “complete” investigations.

Just Culture: Moving Beyond Blame

Dekker’s evolution of just culture thinking has been particularly influential in my leadership approach. His latest work moves beyond simple “blame-free” environments toward restorative justice principles—asking not “who broke the rule” but “who was hurt and how can we address underlying needs.”

This shift has practical implications for how I handle deviations and quality events. Instead of focusing on disciplinary action, I’m asking: What systemic conditions contributed to this outcome? What support do people need to succeed? How can we address the underlying vulnerabilities this event revealed?

This doesn’t mean eliminating accountability—it means creating accountability systems that actually improve performance rather than just satisfying our need to assign blame.

Safety Theater: The Problem with Compliance Performance

Dekker’s most recent work on “safety theater” hits particularly close to home in our regulated environment. He defines safety theater as the performance of compliance when under surveillance that retreats to actual work practices when supervision disappears.

I’ve watched organizations prepare for inspections by creating impressive documentation packages that bear little resemblance to how work actually gets done. Procedures get rewritten to sound more rigorous, training records get updated, and everyone rehearses the “right” answers for auditors. But once the inspection ends, work reverts to the adaptive practices that actually make operations function.

This theater emerges from our desire for perfect, controllable systems, but it paradoxically undermines genuine safety by creating inauthenticity. People learn to perform compliance rather than create genuine safety and quality outcomes.

The falsifiable quality systems I’ve been advocating on this blog represent one response to this problem—creating systems that can be tested and potentially proven wrong rather than just demonstrated as compliant.

Six Practical Takeaways for Quality Leaders

After years of applying Dekker’s insights in biotechnology manufacturing, here are the six most practical lessons for quality professionals:

1. Treat “Human Error” as the Beginning of Investigation, Not the End

When investigations conclude with “human error,” they’ve barely started. This should prompt deeper questions: Why did this action make sense? What system conditions shaped this decision? What can we learn about how our procedures and training actually perform under pressure?

2. Understand Work-as-Done, Not Just Work-as-Imagined

There’s always a gap between procedures (work-as-imagined) and actual practice (work-as-done). Understanding this gap and why it exists is more valuable than trying to force compliance with unrealistic procedures. Some of the most important quality improvements I’ve implemented came from understanding how operators actually solve problems under real conditions.

3. Measure Positive Capacities, Not Just Negative Events

Traditional quality metrics focus on what didn’t happen—no deviations, no complaints, no failures. I’ve started developing metrics around investigation quality, learning effectiveness, and adaptive capacity rather than just counting problems. How quickly do we identify and respond to emerging issues? How effectively do we share learning across sites? How well do our people handle unexpected situations?

4. Create Psychological Safety for Learning

Fear and punishment shut down the flow of safety-critical information. Organizations that want to learn from failures must create conditions where people can report problems, admit mistakes, and share concerns without fear of retribution. This is particularly challenging in our regulated environment, but it’s essential for moving beyond compliance theater toward genuine learning.

5. Focus on Contributing Conditions, Not Root Causes

Complex failures emerge from multiple interacting factors, not single root causes. The take-the-best approach I’ve been developing helps identify the most causally powerful factor while avoiding the trap of seeking THE cause. Understanding mechanisms is more valuable than finding someone to blame.

6. Embrace Adaptive Capacity Instead of Fighting Variability

People’s ability to adapt and respond to unexpected conditions is what makes complex systems work, not a threat to be controlled. Rather than trying to eliminate human variability through ever-more-prescriptive procedures, we should understand how that variability creates resilience and design systems that support rather than constrain adaptive problem-solving.

Connection to Investigation Excellence

Dekker’s work provides the theoretical foundation for many approaches I’ve been exploring on this blog. His emphasis on testable hypotheses rather than compliance theater directly supports falsifiable quality systems. His new view framework underlies the causal reasoning methods I’ve been developing. His focus on understanding normal work, not just failures, informs my approach to risk management.

Most importantly, his insistence on moving beyond negative reasoning (“what didn’t happen”) to positive causal statements (“what actually happened and why”) has transformed how I approach investigations. Instead of documenting failures to follow procedures, we’re understanding the specific mechanisms that drove events—and that makes all the difference in preventing recurrence.

Essential Reading for Quality Leaders

If you’re leading quality organizations in today’s complex regulatory environment, these Dekker works are essential:

Start Here:

For Investigation Excellence:

  • Behind Human Error (with Woods, Cook, et al.) – Comprehensive framework for moving beyond blame
  • Drift into Failure – Understanding how good organizations gradually deteriorate

For Current Challenges:

The Leadership Challenge

Dekker’s work challenges us as quality leaders to move beyond the comfortable certainty of compliance-focused approaches toward the more demanding work of creating genuine learning systems. This requires admitting that our procedures and training might not work as intended. It means supporting people when they make mistakes rather than just punishing them. It demands that we measure our success by how well we learn and adapt, not just how well we document compliance.

This isn’t easy work. It requires the kind of organizational humility that Amy Edmondson and other leadership researchers emphasize—the willingness to be proven wrong in service of getting better. But in my experience, organizations that embrace this challenge develop more robust quality systems and, ultimately, better outcomes for patients.

The question isn’t whether Sidney Dekker is right about everything—it’s whether we’re willing to test his ideas and learn from the results. That’s exactly the kind of falsifiable approach that both his work and effective quality systems demand.

Beyond Malfunction Mindset: Normal Work, Adaptive Quality, and the Future of Pharmaceutical Problem-Solving

Beyond the Shadow of Failure

Problem-solving is too often shaped by the assumption that the system is perfectly understood and fully specified. If something goes wrong—a deviation, a batch out-of-spec, or a contamination event—our approach is to dissect what “failed” and fix that flaw, believing this will restore order. This way of thinking, which I call the malfunction mindset, is as ingrained as it is incomplete. It assumes that successful outcomes are the default, that work always happens as written in SOPs, and that only failure deserves our scrutiny.

But here’s the paradox: most of the time, our highly complex manufacturing environments actually succeed—often under imperfect, shifting, and not fully understood conditions. If we only study what failed, and never question how our systems achieve their many daily successes, we miss the real nature of pharmaceutical quality: it is not the absence of failure, but the presence of robust, adaptive work. Taking this broader, more nuanced perspective is not just an academic exercise—it’s essential for building resilient operations that truly protect patients, products, and our organizations.

Drawing from my thinking through zemblanity (the predictable but often overlooked negative outcomes of well-intentioned quality fixes), the effectiveness paradox (why “nothing bad happened” isn’t proof your quality system works), and the persistent gap between work-as-imagined and work-as-done, this post explores why the malfunction mindset persists, how it distorts investigations, and what future-ready quality management should look like.

The Allure—and Limits—of the Failure Model

Why do we reflexively look for broken parts and single points of failure? It is, as Sidney Dekker has argued, both comforting and defensible. When something goes wrong, you can always point to a failed sensor, a missed checklist, or an operator error. This approach—introducing another level of documentation, another check, another layer of review—offers a sense of closure and regulatory safety. After all, as long as you can demonstrate that you “fixed” something tangible, you’ve fulfilled investigational due diligence.

Yet this fails to account for how quality is actually produced—or lost—in the real world. The malfunction model treats systems like complicated machines: fix the broken gear, oil the creaky hinge, and the machine runs smoothly again. But, as Dekker reminds us in Drift Into Failure, such linear thinking ignores the drift, adaptation, and emergent complexity that characterize real manufacturing environments. The truth is, in complex adaptive systems like pharmaceutical manufacturing, it often takes more than one “error” for failure to manifest. The system absorbs small deviations continuously, adapting and flexing until, sometimes, a boundary is crossed and a problem surfaces.

W. Edwards Deming’s wisdom rings truer than ever: “Most problems result from the system itself, not from individual faults.” A sustainable approach to quality is one that designs for success—and that means understanding the system-wide properties enabling robust performance, not just eliminating isolated malfunctions.

Procedural Fundamentalism: The Work-as-Imagined Trap

One of the least examined, yet most impactful, contributors to the malfunction mindset is procedural fundamentalism—the belief that the written procedure is both a complete specification and an accurate description of work. This feels rigorous and provides compliance comfort, but it is a profound misreading of how work actually happens in pharmaceutical manufacturing.

Work-as-imagined, as elucidated by Erik Hollnagel and others, represents an abstraction: it is how distant architects of SOPs visualize the “correct” execution of a process. Yet, real-world conditions—resource shortages, unexpected interruptions, mismatched raw materials, shifting priorities—force adaptation. Operators, supervisors, and Quality professionals do not simply “follow the recipe”: they interpret, improvise, and—crucially—adjust on the fly.

When we treat procedures as authoritative descriptions of reality, we create the proxy problem: our investigations compare real operations against an imagined baseline that never fully existed. Deviations become automatically framed as problem points, and success is redefined as rigid adherence, regardless of context or outcome.

Complexity, Performance Variability, and Real Success

So, how do pharmaceutical operations succeed so reliably despite the ever-present complexity and variability of daily work?

The answer lies in embracing performance variability as a feature of robust systems, not a flaw. In high-reliability environments—from aviation to medicine to pharmaceutical manufacturing—success is routinely achieved not by demanding strict compliance, but by cultivating adaptive capacity.

Consider environmental monitoring in a sterile suite: The procedure may specify precise times and locations, but a seasoned operator, noticing shifts in people flow or equipment usage, might proactively sample a high-risk area more frequently. This adaptation—not captured in work-as-imagined—actually strengthens data integrity. Yet, traditional metrics would treat this as a procedural deviation.

This is the paradox of the malfunction mindset: in seeking to eliminate all performance variability, we risk undermining precisely those adaptive behaviors that produce reliable quality under uncertainty.

Why the Malfunction Mindset Persists: Cognitive Comfort and Regulatory Reinforcement

Why do organizations continue to privilege the malfunction mindset, even as evidence accumulates of its limits? The answer is both psychological and cultural.

Component breakdown thinking is psychologically satisfying—it offers a clear problem, a specific cause, and a direct fix. For regulatory agencies, it is easy to measure and audit: did the deviation investigation determine the root cause, did the CAPA address it, does the documentation support this narrative? Anything that doesn’t fit this model is hard to defend in audits or inspections.

Yet this approach offers, at best, a partial diagnosis and, at worst, the illusion of control. It encourages organizations to catalog deviations while blindly accepting a much broader universe of unexamined daily adaptations that actually determine system robustness.

Complexity Science and the Art of Organizational Success

To move toward a more accurate—and ultimately more effective—model of quality, pharmaceutical leaders must integrate the insights of complexity science. Drawing from the work of Stuart Kauffman and others at the Santa Fe Institute, we understand that the highest-performing systems operate not at the edge of rigid order, but at the “edge of chaos,” where structure is balanced with adaptability.

In these systems, success and failure both arise from emergent properties—the patterns of interaction between people, procedures, equipment, and environment. The most meaningful interventions, therefore, address how the parts interact, not just how each part functions in isolation.

This explains why traditional root cause analysis, focused on the parts, often fails to produce lasting improvements; it cannot account for outcomes that emerge only from the collective dynamics of the system as a whole.

Investigating for Learning: The Take-the-Best Heuristic

A key innovation needed in pharmaceutical investigations is a shift to what Hollnagel calls Safety-II thinking: focusing on how things go right as well as why they occasionally go wrong.

Here, the take-the-best heuristic becomes crucial. Instead of compiling lists of all deviations, ask: Among all contributing factors, which one, if addressed, would have the most powerful positive impact on future outcomes, while preserving adaptive capacity? This approach ensures investigations generate actionable, meaningful learning, rather than feeding the endless paper chase of “compliance theater.”

Building Systems That Support Adaptive Capability

Taking complexity and adaptive performance seriously requires practical changes to how we design procedures, train, oversee, and measure quality.

  • Procedure Design: Make explicit the distinction between objectives and methods. Procedures should articulate clear quality goals, specify necessary constraints, but deliberately enable workers to choose methods within those boundaries when faced with new conditions.
  • Training: Move beyond procedural compliance. Develop adaptive expertise in your staff, so they can interpret and adjust sensibly—understanding not just “what” to do, but “why” it matters in the bigger system.
  • Oversight and Monitoring: Audit for adaptive capacity. Don’t just track “compliance” but also whether workers have the resources and knowledge to adapt safely and intelligently. Positive performance variability (smart adaptations) should be recognized and studied.
  • Quality System Design: Build systematic learning from both success and failure. Examine ordinary operations to discern how adaptive mechanisms work, and protect these capabilities rather than squashing them in the name of “control.”

Leadership and Systems Thinking

Realizing this vision depends on a transformation in leadership mindset—from one seeking control to one enabling adaptive capacity. Deming’s profound knowledge and the principles of complexity leadership remind us that what matters is not enforcing ever-stricter compliance, but cultivating an organizational context where smart adaptation and genuine learning become standard.

Leadership must:

  • Distinguish between complicated and complex: Apply detailed procedures to the former (e.g., calibration), but support flexible, principles-based management for the latter.
  • Tolerate appropriate uncertainty: Not every problem has a clear, single answer. Creating psychological safety is essential for learning and adaptation during ambiguity.
  • Develop learning organizations: Invest in deep understanding of operations, foster regular study of work-as-done, and celebrate insights from both expected and unexpected sources.

Practical Strategies for Implementation

Turning these insights into institutional practice involves a systematic, research-inspired approach:

  • Start procedure development with observation of real work before specifying methods. Small scale and mock exercises are critical.
  • Employ cognitive apprenticeship models in training, so that experience, reasoning under uncertainty, and systems thinking become core competencies.
  • Begin investigations with appreciative inquiry—map out how the system usually works, not just how it trips up.
  • Measure leading indicators (capacity, information flow, adaptability) not just lagging ones (failures, deviations).
  • Create closed feedback loops for corrective actions—insisting every intervention be evaluated for impact on both compliance and adaptive capacity.

Scientific Quality Management and Adaptive Systems: No Contradiction

The tension between rigorous scientific quality management (QbD, process validation, risk management frameworks) and support for adaptation is a false dilemma. Indeed, genuine scientific quality management starts with humility: the recognition that our understanding of complex systems is always partial, our controls imperfect, and our frameworks provisional.

A falsifiable quality framework embeds learning and adaptation at its core—treating deviations as opportunities to test and refine models, rather than simply checkboxes to complete.

The best organizations are not those that experience the fewest deviations, but those that learn fastest from both expected and unexpected events, and apply this knowledge to strengthen both system structure and adaptive capacity.

Embracing Normal Work: Closing the Gap

Normal pharmaceutical manufacturing is not the story of perfect procedural compliance; it’s the story of people, working together to achieve quality goals under diverse, unpredictable, and evolving conditions. This is both more challenging—and more rewarding—than any plan prescribed solely by SOPs.

To truly move the needle on pharmaceutical quality, organizations must:

  • Embrace performance variability as evidence of adaptive capacity, not just risk.
  • Investigate for learning, not blame; study success, not just failure.
  • Design systems to support both structure and flexible adaptation—never sacrificing one entirely for the other.
  • Cultivate leadership that values humility, systems thinking, and experimental learning, creating a culture comfortable with complexity.

This approach will not be easy. It means questioning decades of compliance custom, organizational habit, and intellectual ease. But the payoff is immense: more resilient operations, fewer catastrophic surprises, and, above all, improved safety and efficacy for the patients who depend on our products.

The challenge—and the opportunity—facing pharmaceutical quality management is to evolve beyond compliance theater and malfunction thinking into a new era of resilience and organizational learning. Success lies not in the illusory comfort of perfectly executed procedures, but in the everyday adaptations, intelligent improvisation, and system-level capabilities that make those successes possible.

The call to action is clear: Investigate not just to explain what failed, but to understand how, and why, things so often go right. Protect, nurture, and enhance the adaptive capacities of your organization. In doing so, pharmaceutical quality can finally become more than an after-the-fact audit; it will become the creative, resilient capability that patients, regulators, and organizations genuinely want to hire.

Take-the-Best Heuristic for Causal Investigation

The integration of Gigerenzer’s take-the-best heuristic with a causal reasoning framework creates a powerful approach to root cause analysis that addresses one of the most persistent problems in quality investigations: the tendency to generate exhaustive lists of contributing factors without identifying the causal mechanisms that actually drove the event.

Traditional root cause analysis often suffers from what we might call “factor proliferation”—the systematic identification of every possible contributing element without distinguishing between those that were causally necessary for the outcome and those that merely provide context. This comprehensive approach feels thorough but often obscures the most important causal relationships by giving equal weight to diagnostic and non-diagnostic factors.

The take-the-best heuristic offers an elegant solution by focusing investigative effort on identifying the single most causally powerful factor—the factor that, if changed, would have been most likely to prevent the event from occurring. This approach aligns perfectly with causal reasoning’s emphasis on identifying what was actually present and necessary for the outcome, rather than cataloging everything that might have been relevant.

From Counterfactuals to Causal Mechanisms

The most significant advantage of applying take-the-best to causal investigation is its natural resistance to the negative reasoning trap that dominates traditional root cause analysis. When investigators ask “What single factor was most causally responsible for this outcome?” they’re forced to identify positive causal mechanisms rather than falling back on counterfactuals like “failure to follow procedure” or “inadequate training.”

Consider a typical pharmaceutical deviation where a batch fails specification due to contamination. Traditional analysis might identify multiple contributing factors: inadequate cleaning validation, operator error, environmental monitoring gaps, supplier material variability, and equipment maintenance issues. Each factor receives roughly equal attention in the investigation report, leading to broad but shallow corrective actions.

A take-the-best causal approach would ask: “Which single factor, if it had been different, would most likely have prevented this contamination?” The investigation might reveal that the cleaning validation was adequate under normal conditions, but a specific equipment configuration created dead zones that weren’t addressed in the original validation. This equipment configuration becomes the take-the-best factor because changing it would have directly prevented the contamination, regardless of other contributing elements.

This focus on the most causally powerful factor doesn’t ignore other contributing elements—it prioritizes them based on their causal necessity rather than their mere presence during the event.

The Diagnostic Power of Singular Focus

One of Gigerenzer’s key insights about take-the-best is that focusing on the single most diagnostic factor can actually improve decision accuracy compared to complex multivariate approaches. In causal investigation, this translates to identifying the factor that had the greatest causal influence on the outcome—the factor that represents the strongest link in the causal chain.

This approach forces investigators to move beyond correlation and association toward genuine causal understanding. Instead of asking “What factors were present during this event?” the investigation asks “What factor was most necessary and sufficient for this specific outcome to occur?” This question naturally leads to the kind of specific, testable causal statements.

For example, rather than concluding that “multiple factors contributed to the deviation including inadequate procedures, training gaps, and environmental conditions,” a take-the-best causal analysis might conclude that “the deviation occurred because the procedure specified a 30-minute hold time that was insufficient for complete mixing under the actual environmental conditions present during manufacturing, leading to stratification that caused the observed variability.” This statement identifies the specific causal mechanism (insufficient hold time leading to incomplete mixing) while providing the time, place, and magnitude specificity that causal reasoning demands.

Preventing the Generic CAPA Trap

The take-the-best approach to causal investigation naturally prevents one of the most common failures in pharmaceutical quality: the generation of generic, unfocused corrective actions that address symptoms rather than causes. When investigators identify multiple contributing factors without clear causal prioritization, the resulting CAPAs often become diffuse efforts to “improve” everything without addressing the specific mechanisms that drove the event.

By focusing on the single most causally powerful factor, take-the-best investigations generate targeted corrective actions that address the specific mechanism identified as most necessary for the outcome. This creates more effective prevention strategies while avoiding the resource dilution that often accompanies broad-based improvement efforts.

The causal reasoning framework enhances this focus by requiring that the identified factor be described in terms of what actually happened rather than what failed to happen. Instead of “failure to follow cleaning procedures,” the investigation might identify “use of abbreviated cleaning cycle during shift change because operators prioritized production schedule over cleaning thoroughness.” This causal statement directly leads to specific corrective actions: modify shift change procedures, clarify prioritization guidance, or redesign cleaning cycles to be robust against time pressure.

Systematic Application

Implementing take-the-best causal investigation in pharmaceutical quality requires systematic attention to identifying and testing causal hypotheses rather than simply cataloging potential contributing factors. This process follows a structured approach:

Step 1: Event Reconstruction with Causal Focus – Document what actually happened during the event, emphasizing the sequence of causal mechanisms rather than deviations from expected procedure. Focus on understanding why actions made sense to the people involved at the time they occurred.

Step 2: Causal Hypothesis Generation – Develop specific hypotheses about which single factor was most necessary and sufficient for the observed outcome. These hypotheses should make testable predictions about system behavior under different conditions.

Step 3: Diagnostic Testing – Systematically test each causal hypothesis to determine which factor had the greatest influence on the outcome. This might involve data analysis, controlled experiments, or systematic comparison with similar events.

Step 4: Take-the-Best Selection – Identify the single factor that testing reveals to be most causally powerful—the factor that, if changed, would be most likely to prevent recurrence of the specific event.

Step 5: Mechanistic CAPA Development – Design corrective actions that specifically address the identified causal mechanism rather than implementing broad-based improvements across all potential contributing factors.

Integration with Falsifiable Quality Systems

The take-the-best approach to causal investigation creates naturally falsifiable hypotheses that can be tested and validated over time. When an investigation concludes that a specific factor was most causally responsible for an event, this conclusion makes testable predictions about system behavior that can be validated through subsequent experience.

For example, if a contamination investigation identifies equipment configuration as the take-the-best causal factor, this conclusion predicts that similar contamination events will be prevented by addressing equipment configuration issues, regardless of training improvements or procedural changes. This prediction can be tested systematically as the organization gains experience with similar situations.

This integration with falsifiable quality systems creates a learning loop where investigation conclusions are continuously refined based on their predictive accuracy. Investigations that correctly identify the most causally powerful factors will generate effective prevention strategies, while investigations that miss the key causal mechanisms will be revealed through continued problems despite implemented corrective actions.

The Leadership and Cultural Implications

Implementing take-the-best causal investigation requires leadership commitment to genuine learning rather than blame assignment. This approach often reveals system-level factors that leadership helped create or maintain, requiring the kind of organizational humility that the Energy Safety Canada framework emphasizes.

The cultural shift from comprehensive factor identification to focused causal analysis can be challenging for organizations accustomed to demonstrating thoroughness through exhaustive documentation. Leaders must support investigators in making causal judgments and prioritizing factors based on their diagnostic power rather than their visibility or political sensitivity.

This cultural change aligns with the broader shift toward scientific quality management that both the adaptive toolbox and falsifiable quality frameworks require. Organizations must develop comfort with making specific causal claims that can be tested and potentially proven wrong, rather than maintaining the false safety of comprehensive but non-specific factor lists.

The take-the-best approach to causal investigation represents a practical synthesis of rigorous scientific thinking and adaptive decision-making. By focusing on the single most causally powerful factor while maintaining the specific, testable language that causal reasoning demands, this approach generates investigations that are both scientifically valid and operationally useful—exactly what pharmaceutical quality management needs to move beyond the recurring problems that plague traditional root cause analysis.

Why ‘First-Time Right’ is a Dangerous Myth in Continuous Manufacturing

In manufacturing circles, “First-Time Right” (FTR) has become something of a sacred cow-a philosophy so universally accepted that questioning it feels almost heretical. Yet as continuous manufacturing processes increasingly replace traditional batch production, we need to critically examine whether this cherished doctrine serves us well or creates dangerous blind spots in our quality assurance frameworks.

The Seductive Promise of First-Time Right

Let’s start by acknowledging the compelling appeal of FTR. As commonly defined, First-Time Right is both a manufacturing principle and KPI that denotes the percentage of end-products leaving production without quality defects. The concept promises a manufacturing utopia: zero waste, minimal costs, maximum efficiency, and delighted customers receiving perfect products every time.

The math seems straightforward. If you produce 1,000 units and 920 are defect-free, your FTR is 92%. Continuous improvement efforts should steadily drive that percentage upward, reducing the resources wasted on imperfect units.

This principle finds its intellectual foundation in Six Sigma methodology, which can tend to give it an air of scientific inevitability. Yet even Six Sigma acknowledges that perfection remains elusive. This subtle but crucial nuance often gets lost when organizations embrace FTR as an absolute expectation rather than an aspiration.

First-Time Right in biologics drug substance manufacturing refers to the principle and performance metric of producing a biological drug substance that meets all predefined quality attributes and regulatory requirements on the first attempt, without the need for rework, reprocessing, or batch rejection. In this context, FTR emphasizes executing each step of the complex, multi-stage biologics manufacturing process correctly from the outset-starting with cell line development, through upstream (cell culture/fermentation) and downstream (purification, formulation) operations, to the final drug substance release.

Achieving FTR is especially challenging in biologics because these products are made from living systems and are highly sensitive to variations in raw materials, process parameters, and environmental conditions. Even minor deviations can lead to significant quality issues such as contamination, loss of potency, or batch failure, often requiring the entire batch to be discarded.

In biologics manufacturing, FTR is not just about minimizing waste and cost; it is critical for patient safety, regulatory compliance, and maintaining supply reliability. However, due to the inherent variability and complexity of biologics, FTR is best viewed as a continuous improvement goal rather than an absolute expectation. The focus is on designing and controlling processes to consistently deliver drug substances that meet all critical quality attributes-recognizing that, despite best efforts, some level of process variation and deviation is inevitable in biologics production

The Unique Complexities of Continuous Manufacturing

Traditional batch processing creates natural boundaries-discrete points where production pauses, quality can be assessed, and decisions about proceeding can be made. In contrast, continuous manufacturing operates without these convenient checkpoints, as raw materials are continuously fed into the manufacturing system, and finished products are continuously extracted, without interruption over the life of the production run.

This fundamental difference requires a complete rethinking of quality assurance approaches. In continuous environments:

  • Quality must be monitored and controlled in real-time, without stopping production
  • Deviations must be detected and addressed while the process continues running
  • The interconnected nature of production steps means issues can propagate rapidly through the system
  • Traceability becomes vastly more complex

Regulatory agencies recognize these unique challenges, acknowledging that understanding and managing risks is central to any decision to greenlight CM in a production-ready environment. When manufacturing processes never stop, quality assurance cannot rely on the same methodologies that worked for discrete batches.

The Dangerous Complacency of Perfect-First-Time Thinking

The most insidious danger of treating FTR as an achievable absolute is the complacency it breeds. When leadership becomes fixated on achieving perfect FTR scores, several dangerous patterns emerge:

Overconfidence in Automation

While automation can significantly improve quality, it is important to recognize the irreplaceable value of human oversight. Automated systems, no matter how advanced, are ultimately limited by their programming, design, and maintenance. Human operators bring critical thinking, intuition, and the ability to spot subtle anomalies that machines may overlook. A vigilant human presence can catch emerging defects or process deviations before they escalate, providing a layer of judgment and adaptability that automation alone cannot replicate. Relying solely on automation creates a dangerous blind spot-one where the absence of human insight can allow issues to go undetected until they become major problems. True quality excellence comes from the synergy of advanced technology and engaged, knowledgeable people working together.

Underinvestment in Deviation Management

If perfection is expected, why invest in systems to handle imperfections? Yet robust deviation management-the processes used to identify, document, investigate, and correct deviations becomes even more critical in continuous environments where problems can cascade rapidly. Organizations pursuing FTR often underinvest in the very systems that would help them identify and address the inevitable deviations.

False Sense of Process Robustness

Process robustness refers to the ability of a manufacturing process to tolerate the variability of raw materials, process equipment, operating conditions, environmental conditions and human factors. An obsession with FTR can mask underlying fragility in processes that appear to be performing well under normal conditions. When we pretend our processes are infallible, we stop asking critical questions about their resilience under stress.

Quality Culture Deterioration

When FTR becomes dogma, teams may become reluctant to report or escalate potential issues, fearing they’ll be seen as failures. This creates a culture of silence around deviations-precisely the opposite of what’s needed for effective quality management in continuous manufacturing. When perfection is the only acceptable outcome, people hide imperfections rather than address them.

Magical Thinking in Quality Management

The belief that we can eliminate all errors in complex manufacturing processes amounts to what organizational psychologists call “magical thinking” – the delusional belief that one can do the impossible. In manufacturing, this often manifests as pretending that doing more tasks with less resources will not hurt the work quality.

This is a pattern I’ve observed repeatedly in my investigations of quality failures. When leadership subscribes to the myth that perfection is not just desirable but achievable, they create the conditions for quality disasters. Teams stop preparing for how to handle deviations and start pretending deviations won’t occur.

The irony is that this approach actually undermines the very goal of FTR. By acknowledging the possibility of failure and building systems to detect and learn from it quickly, we actually increase the likelihood of getting things right.

Building a Healthier Quality Culture for Continuous Manufacturing

Rather than chasing the mirage of perfect FTR, organizations should focus on creating systems and cultures that:

  1. Detect deviations rapidly: Continuous monitoring through advanced process control systems becomes essential for monitoring and regulating critical parameters throughout the production process. The question isn’t whether deviations will occur but how quickly you’ll know about them.
  2. Investigate transparently: When issues occur, the focus should be on understanding root causes rather than assigning blame. The culture must prioritize learning over blame.
  3. Implement robust corrective actions: Deviations should be thoroughly documented including details about when and where it occurred, who identified it, a detailed description of the nonconformance, initial actions taken, results of the investigation into the cause, actions taken to correct and prevent recurrence, and a final evaluation of the effectiveness of these actions.
  4. Learn systematically: Each deviation represents a valuable opportunity to strengthen processes and prevent similar issues in the future. The organization that learns fastest wins, not the one that pretends to be perfect.

Breaking the Groupthink Cycle

The FTR myth thrives in environments characterized by groupthink, where challenging the prevailing wisdom is discouraged. When leaders obsess over FTR metrics while punishing those who report deviations, they create the perfect conditions for quality disasters.

This connects to a theme I’ve explored repeatedly on this blog: the dangers of losing institutional memory and critical thinking in quality organizations. When we forget that imperfection is inevitable, we stop building the systems and cultures needed to manage it effectively.

Embracing Humility, Vigilance, and Continuous Learning

True quality excellence comes not from pretending that errors don’t occur, but from embracing a more nuanced reality:

  • Perfection is a worthy aspiration but an impossible standard
  • Systems must be designed not just to prevent errors but to detect and address them
  • A healthy quality culture prizes transparency and learning over the appearance of perfection
  • Continuous improvement comes from acknowledging and understanding imperfections, not denying them

The path forward requires humility to recognize the limitations of our processes, vigilance to catch deviations quickly when they occur, and an unwavering commitment to learning and improving from each experience.

In the end, the most dangerous quality issues aren’t the ones we detect and address-they’re the ones our systems and culture allow to remain hidden because we’re too invested in the myth that they shouldn’t exist at all. First-Time Right should remain an aspiration that drives improvement, not a dogma that blinds us to reality.

From Perfect to Perpetually Improving

As continuous manufacturing becomes the norm rather than the exception, we need to move beyond the simplistic FTR myth toward a more sophisticated understanding of quality. Rather than asking, “Did we get it perfect the first time?” we should be asking:

  • How quickly do we detect when things go wrong?
  • How effectively do we contain and remediate issues?
  • How systematically do we learn from each deviation?
  • How resilient are our processes to the variations they inevitably encounter?

These questions acknowledge the reality of manufacturing-that imperfection is inevitable-while focusing our efforts on what truly matters: building systems and cultures capable of detecting, addressing, and learning from deviations to drive continuous improvement.

The companies that thrive in the continuous manufacturing future won’t be those with the most impressive FTR metrics on paper. They’ll be those with the humility to acknowledge imperfection, the systems to detect and address it quickly, and the learning cultures that turn each deviation into an opportunity for improvement.