I fully agree with this excellent post and its closing line “The public should therefore not need to request such materials from the agency, but should have easy, online access to them at any time.”
All 483s, complete response letters (CRL), and other FDA decisions should be easily accessible. This would be a net positive gain for our profession. I know I’ve reached out to my congress critters about this as the FDA is going through budgeting (and Congress continues to not fund the agency enough).
I’ve seen my fair share of risk assessments listing data quality or bias as hazards. I tend to think that is pretty sloppy. I especially see this a lot in conversations around AI/ML. Data quality is not a risk. It is a causal factor in the failure or severity.
Data Quality and Data Bias
Data Quality
Data quality refers to how well a dataset meets certain criteria that make it fit for its intended use. The key dimensions of data quality include:
Accuracy – The data correctly represents the real-world entities or events it’s supposed to describe.
Completeness – The dataset contains all the necessary information without missing values.
Consistency – The data is uniform and coherent across different systems or datasets.
Timeliness – The data is up-to-date and available when needed.
Validity – The data conforms to defined business rules and parameters.
Uniqueness – There are no duplicate records in the dataset.
High-quality data is crucial for making informed quality decisions, conducting accurate analyses, and developing reliable AI/ML models. Poor data quality can lead to operational issues, inaccurate insights, and flawed strategies.
Data Bias
Data bias refers to systematic errors or prejudices present in the data that can lead to inaccurate or unfair outcomes, especially in machine learning and AI applications. Some common types of data bias include:
Sampling bias – When the data sample doesn’t accurately represent the entire population.
Selection bias – When certain groups are over- or under-represented in the dataset.
Reporting bias – When the frequency of events in the data doesn’t reflect real-world frequencies.
Measurement bias – When the data collection method systematically skews the results.
Algorithmic bias – When the algorithms or models introduce biases in the results.
Data bias can lead to discriminatory outcomes and produce inaccurate predictions or classifications.
Relationship between Data Quality and Bias
While data quality and bias are distinct concepts, they are closely related:
Poor data quality can introduce or exacerbate biases. For example, incomplete or inaccurate data may disproportionately affect certain groups.
High-quality data doesn’t necessarily mean unbiased data. A dataset can be accurate, complete, and consistent but still contain inherent biases.
Addressing data bias often involves improving certain aspects of data quality, such as completeness and representativeness.
Organizations must implement robust data governance practices to ensure high-quality and unbiased data, regularly assess their data for quality issues and potential biases, and use techniques like data cleansing, resampling, and algorithmic debiasing.
Identifying the Hazards and the Risks
It is critical to remember the difference between a hazard and a risk. Data quality is a causal factor in the hazard, not a harm.
Think of it like a fever. An open wound is a causal factor for the fever, which has a root cause of poor wound hygiene. I can have the factor (the wound), but without the presence of the root cause (poor wound hygiene), the event (fever) would not develop (okay, there may be other root causes in play as well; remember there is never really just one root cause).
Potential Issues of Poor Data Quality and Inadequate Data Governance
The risks associated with poor data quality and inadequate data governance can significantly impact organizations. Here are the key areas where risks can develop:
Decreased Data Quality
Inaccurate, incomplete, or inconsistent data leads to flawed decision-making
Errors in customer information, product details, or financial data can cause operational issues
Poor quality data hinders effective analysis and forecasting
Compliance Failures:
Non-compliance with regulations can result in regulatory actions
Legal complications and reputational damage from failing to meet regulatory requirements
Increased scrutiny from regulatory bodies
Security Breaches
Inadequate data protection increases vulnerability to cyberattacks and data breaches
Financial costs associated with breach remediation, legal fees, and potential fines
Loss of customer trust and long-term reputational damage
Operational Inefficiencies
Time wasted on manual data cleaning and correction
Reduced productivity due to employees working with unreliable data
Inefficient processes resulting from poor data integration or inconsistent data formats
Missed Opportunities
Failure to identify market trends or customer insights due to unreliable data
Missed sales leads or potential customers because of inaccurate contact information
Inability to capitalize on business opportunities due to lack of trustworthy data
Poor Decision-Making
Decisions based on inaccurate or incomplete data leading to suboptimal outcomes, including deviations and product/study impact
Misallocation of resources due to flawed insights from poor quality data
Inability to effectively measure and improve performance
Potential Issues of Data Bias
Data bias presents significant risks across various domains, particularly when integrated into machine learning (ML) and artificial intelligence (AI) systems. These risks can manifest in several ways, impacting both individuals and organizations.
Discrimination and Inequality
Data bias can lead to discriminatory outcomes, systematically disadvantaging certain groups based on race, gender, age, or socioeconomic status. For example:
Judicial Systems: Biased algorithms used in risk assessments for bail and sentencing can result in harsher penalties for people of color compared to their white counterparts, even when controlling for similar circumstances.
Healthcare: AI systems trained on biased medical data may provide suboptimal care recommendations for minority groups, potentially exacerbating health disparities.
Erosion of Trust and Reputation
Organizations that rely on biased data for decision-making risk losing the trust of their customers and stakeholders. This can have severe reputational consequences:
Customer Trust: If customers perceive that an organization’s AI systems are biased, they may lose trust in the brand, leading to a decline in customer loyalty and revenue.
Reputation Damage: High-profile cases of AI bias, such as discriminatory hiring practices or unfair loan approvals, can attract negative media attention and public backlash.
Legal and Regulatory Risks
There are significant legal and regulatory risks associated with data bias:
Compliance Issues: Organizations may face legal challenges and fines if their AI systems violate anti-discrimination laws.
Regulatory Scrutiny: Increasing awareness of AI bias has led to calls for stricter regulations to ensure fairness and accountability in AI systems.
Poor Decision-Making
Biased data can lead to erroneous decisions that negatively impact business operations:
Operational Inefficiencies: AI models trained on biased data may make poor predictions, leading to inefficient resource allocation and operational mishaps.
Financial Losses: Incorrect decisions based on biased data can result in financial losses, such as extending credit to high-risk individuals or mismanaging inventory.
Amplification of Existing Biases
AI systems can perpetuate and even amplify existing biases if not properly managed:
Feedback Loops: Biased AI systems can create feedback loops where biased outcomes reinforce the biased data, leading to increasingly skewed results over time.
Entrenched Inequities: Over time, biased AI systems can entrench societal inequities, making it harder to address underlying issues of discrimination and inequality.
Ethical and Moral Implications
The ethical implications of data bias are profound:
Fairness and Justice: Biased AI systems challenge the principles of fairness and justice, raising moral questions about using such technologies in critical decision-making processes.
Human Rights: There are concerns that biased AI systems could infringe on human rights, particularly in areas like surveillance, law enforcement, and social services.
Perform the Risk Assessment
ICH Q9 (r1) Risk Management Process
Risk Management happens at the system/process level, where an AI/ML solution will be used. As appropriate, it drills down to the technology level. Never start with the technology level.
Hazard Identification
It is important to identify product quality hazards that may ultimately lead to patient harm. What is the hazard of that bad decision? What is the hazard of bad quality data? Those are not hazards; they are causes.
Hazard identification, the first step of a risk assessment, begins with a well-defined question defining why the risk assessment is being performed. It helps define the system and the appropriate scope of what will be studied. It addresses the “What might go wrong?” question, including identifying the possible consequences of hazards. The output of the hazard identification step is the identification of the possibilities (i.e., hazards) that the risk event (e.g., impact to product quality) happens.
The risk question takes the form of “What is the risk of using AI/ML solution for <Process/System> to <purpose of AI/MIL solution.” For example, “What is the risk of using AI/ML to identify deviation recurrence and help prioritize CAPAs?” or “What is the risk of using AI/ML to monitor real-time continuous manufacturing to determine the need to evaluate for a potential diversion?”
We can now identify the specific failure modes associated with AI/ML. This may involve deeep dive risk assessments. A failure mode is the specific way a failure occurs. So in this case, the specific way that bad data or bad decision making can happen. Multiple failure modes can, and usually do, lead to the same hazardous situation.
Make sure you drill down on failure causes. If more than 5 potential causes can be identified for a proposed failure mode, it is too broad and probably written at a high level in the process or item being risk assessed. It should be broken down into several specific failure modes with fewer potential causes and more manageable.
Start with an outline of how the process works and a description of the AI/ML (special technology) used in the process. Then, interrogate the following for potential failure modes:
The steps in the process or item under study in which AI/ML interventions occur;
The process/procedure documentation for example, master batch records, SOPs, protocols, etc.
Current and proposed process/procedure in sufficient detail to facilitate failure mode identification;
A causal factor is a significant contributor to an incident, event, or problem that, if eliminated or addressed, would have prevented the occurrence or reduced its severity or frequency. Here are the key points to understand about causal factors:
Definition: A causal factor is a major unplanned, unintended contributor to an incident (a negative event or undesirable condition) that, if eliminated, would have either prevented the occurrence of the incident or reduced its severity or frequency.
Distinction from root cause: While a causal factor contributes to an incident, it is not necessarily the primary driver. The root cause, on the other hand, is the fundamental reason for the occurrence of a problem or event. (Pay attention to the deficiencies of the model)
Multiple contributors: An incident may have multiple causal factors, and eliminating one causal factor might not prevent the incident entirely but could reduce its likelihood or impact. Swiss-Cheese Model.
Identification methods: Causal factors can be identified through various techniques, including: Root cause analysis (including such tools as fishbone diagrams (Ishikawa diagrams) or the Why-Why technique), Causal Learning Cycle(CLC) analysis, and Causal factor charting.
Importance in problem-solving: Identifying causal factors is crucial for developing effective preventive measures and improving safety, quality, and efficiency.
Characteristics: Causal factors must be mistakes, errors, or failures that directly lead to an incident or fail to mitigate its consequences. They should not contain other causal factors within them.
Distinction from root causes: It’s important to note that root causes are not causal factors but rather lead to causal factors. Examples of root causes often mistaken for causal factors include inadequate procedures, improper training, or poor work culture.
Human Factors are not always Causal Factors, but can be!
Human factor and human error are related concepts but are not the same. A human error is always a causal factor, and the human factor explains why human errors can happen.
Human Error
Human error refers to an unintentional action or decision that fails to achieve the intended outcome. It encompasses mistakes, slips, lapses, and violations that can lead to accidents or incidents. There are two types:
Unintentional Errors include slips (attentional failures) and lapses (memory failures) caused by distractions, interruptions, fatigue, or stress.
Intentional Errors are violations in which an individual knowingly deviates from safe practices, procedures, or regulations. They are often categorized into routine, situational, or exceptional violations.
Human Factors
Human factors is a broader field that studies how humans interact with various system elements, including tools, machines, environments, and processes. It aims to optimize human well-being and overall system performance by understanding human capabilities, limitations, behaviors, and characteristics.
Physical Ergonomics focuses on human anatomical, anthropometric, physiological, and biomechanical characteristics.
Cognitive Ergonomics deals with mental processes such as perception, memory, reasoning, and motor response.
Organizational Ergonomics involves optimizing organizational structures, policies, and processes to improve overall system performance and worker well-being.
Relationship Between Human Factors and Human Error
Causal Relationship: Human factors delve into the underlying reasons why human errors occur. They consider the conditions and systems that contribute to errors, such as poor design, inadequate training, high workload, and environmental factors.
Error Prevention: By addressing human factors, organizations can design systems and processes that minimize the likelihood of human errors. This includes implementing error-proofing solutions, improving ergonomics, and enhancing training and supervision.
Key Differences
Focus:
Human Error: Focuses on the outcome of an action or decision that fails to achieve the intended result.
Human Factors: Focuses on the broader context and conditions that influence human performance and behavior.
Approach:
Human Error: Often addressed through training, disciplinary actions, and procedural changes.
Human Factors: Involves a multidisciplinary approach to design systems, environments, and processes that support optimal human performance and reduce the risk of errors.
Quality (Q) Guidelines focus on the chemical, pharmaceutical, and biological quality standards, including stability testing protocols to ensure the longevity and consistency of drug products.
Safety (S) Guidelines address non-clinical and preclinical safety evaluations, guiding the toxicological assessments necessary to protect patients’ health.
Efficacy (E) Guidelines cover the clinical aspects of pharmaceutical development, providing standards for designing, conducting, and analyzing clinical trials to ensure therapeutic benefits.
Multidisciplinary (M) Guidelines encompass guidelines that do not fit neatly into the other categories, dealing with genomics, terminologies, and technical aspects of drug registration.
Any Q document is instantly and rightly viewed as a GMP guideline. This includes the quality trio, which, while they have a good philosophy, are still written specifically for GMP purposes. So, if you write your paper, good practice guide, standard, article, or what-have-you and refer heavily to the Q trio, you are either writing a GMP-centered piece or losing most of your audience.
The frustrating thing is that quality-by-design (Q8), risk management (Q9), and quality system management (Q10) are core concepts that apply across the pharmaceutical lifecycle, and there are best practices across all three that can and should be universal, especially Q9(r1), which can really better define risk management as defined in E6, and Q10, which can really shore up parts of E8.
What I would love to see the ICH do is write a technical reference document on risk management. Then, E6 and Q9 would have specific implementation aspects related to their focus. Put all the shared approaches in one place and build on them. The amusing thing is that they are already doing that. For example, Q13 applies the Q trio to continuous manufacturing, and Q14 applies it to the analytical lifecycle.
But for now, if you are writing and just referring to Q9 and Q10 don’t be surprised when all your clinical and safety colleagues tune you out.
This year’s rant is triggered by reading a good practices guide designed to be pan-GxP and getting frustrated by its utter GMP focus. I knew I was in trouble when it specifically discussed “Product and Process Understanding” as a critical factor and then referenced ICH Q10. Use those terms with ICH Q10, and you just announced to the entire world that this is a GMP book. It is important to use a wider term and then reference product/process understanding as one subcategory or way of meeting it.
I rather like the approach of ICH E6 and E8 here, which is to use the wider term “Critical to Quality,” which in the broader sense can be expanded to mean the key factors that must be controlled or monitored to ensure the quality, safety, and efficacy of pharmaceutical products from development to clinical studies to manufacturing and distribution and beyond. It’s a risk-based approach focused on what matters most for patient safety and reliable results.