Data Quality, Data Bias, and the Risk Assessment

I’ve seen my fair share of risk assessments listing data quality or bias as hazards. I tend to think that is pretty sloppy. I especially see this a lot in conversations around AI/ML. Data quality is not a risk. It is a causal factor in the failure or severity.

Data Quality and Data Bias

Data Quality

Data quality refers to how well a dataset meets certain criteria that make it fit for its intended use. The key dimensions of data quality include:

  1. Accuracy – The data correctly represents the real-world entities or events it’s supposed to describe.
  2. Completeness – The dataset contains all the necessary information without missing values.
  3. Consistency – The data is uniform and coherent across different systems or datasets.
  4. Timeliness – The data is up-to-date and available when needed.
  5. Validity – The data conforms to defined business rules and parameters.
  6. Uniqueness – There are no duplicate records in the dataset.

High-quality data is crucial for making informed quality decisions, conducting accurate analyses, and developing reliable AI/ML models. Poor data quality can lead to operational issues, inaccurate insights, and flawed strategies.

Data Bias

Data bias refers to systematic errors or prejudices present in the data that can lead to inaccurate or unfair outcomes, especially in machine learning and AI applications. Some common types of data bias include:

  1. Sampling bias – When the data sample doesn’t accurately represent the entire population.
  2. Selection bias – When certain groups are over- or under-represented in the dataset.
  3. Reporting bias – When the frequency of events in the data doesn’t reflect real-world frequencies.
  4. Measurement bias – When the data collection method systematically skews the results.
  5. Algorithmic bias – When the algorithms or models introduce biases in the results.

Data bias can lead to discriminatory outcomes and produce inaccurate predictions or classifications.

Relationship between Data Quality and Bias

While data quality and bias are distinct concepts, they are closely related:

  • Poor data quality can introduce or exacerbate biases. For example, incomplete or inaccurate data may disproportionately affect certain groups.
  • High-quality data doesn’t necessarily mean unbiased data. A dataset can be accurate, complete, and consistent but still contain inherent biases.
  • Addressing data bias often involves improving certain aspects of data quality, such as completeness and representativeness.

Organizations must implement robust data governance practices to ensure high-quality and unbiased data, regularly assess their data for quality issues and potential biases, and use techniques like data cleansing, resampling, and algorithmic debiasing.

Identifying the Hazards and the Risks

It is critical to remember the difference between a hazard and a risk. Data quality is a causal factor in the hazard, not a harm.

Hazard Identification

Think of it like a fever. An open wound is a causal factor for the fever, which has a root cause of poor wound hygiene. I can have the factor (the wound), but without the presence of the root cause (poor wound hygiene), the event (fever) would not develop (okay, there may be other root causes in play as well; remember there is never really just one root cause).

Potential Issues of Poor Data Quality and Inadequate Data Governance

The risks associated with poor data quality and inadequate data governance can significantly impact organizations. Here are the key areas where risks can develop:

Decreased Data Quality

  • Inaccurate, incomplete, or inconsistent data leads to flawed decision-making
  • Errors in customer information, product details, or financial data can cause operational issues
  • Poor quality data hinders effective analysis and forecasting

Compliance Failures:

  • Non-compliance with regulations can result in regulatory actions
  • Legal complications and reputational damage from failing to meet regulatory requirements
  • Increased scrutiny from regulatory bodies

Security Breaches

  • Inadequate data protection increases vulnerability to cyberattacks and data breaches
  • Financial costs associated with breach remediation, legal fees, and potential fines
  • Loss of customer trust and long-term reputational damage

Operational Inefficiencies

  • Time wasted on manual data cleaning and correction
  • Reduced productivity due to employees working with unreliable data
  • Inefficient processes resulting from poor data integration or inconsistent data formats

Missed Opportunities

  • Failure to identify market trends or customer insights due to unreliable data
  • Missed sales leads or potential customers because of inaccurate contact information
  • Inability to capitalize on business opportunities due to lack of trustworthy data

Poor Decision-Making

  • Decisions based on inaccurate or incomplete data leading to suboptimal outcomes, including deviations and product/study impact
  • Misallocation of resources due to flawed insights from poor quality data
  • Inability to effectively measure and improve performance

Potential Issues of Data Bias

Data bias presents significant risks across various domains, particularly when integrated into machine learning (ML) and artificial intelligence (AI) systems. These risks can manifest in several ways, impacting both individuals and organizations.

Discrimination and Inequality

Data bias can lead to discriminatory outcomes, systematically disadvantaging certain groups based on race, gender, age, or socioeconomic status. For example:

  • Judicial Systems: Biased algorithms used in risk assessments for bail and sentencing can result in harsher penalties for people of color compared to their white counterparts, even when controlling for similar circumstances.
  • Healthcare: AI systems trained on biased medical data may provide suboptimal care recommendations for minority groups, potentially exacerbating health disparities.

Erosion of Trust and Reputation

Organizations that rely on biased data for decision-making risk losing the trust of their customers and stakeholders. This can have severe reputational consequences:

  • Customer Trust: If customers perceive that an organization’s AI systems are biased, they may lose trust in the brand, leading to a decline in customer loyalty and revenue.
  • Reputation Damage: High-profile cases of AI bias, such as discriminatory hiring practices or unfair loan approvals, can attract negative media attention and public backlash.

Legal and Regulatory Risks

There are significant legal and regulatory risks associated with data bias:

  • Compliance Issues: Organizations may face legal challenges and fines if their AI systems violate anti-discrimination laws.
  • Regulatory Scrutiny: Increasing awareness of AI bias has led to calls for stricter regulations to ensure fairness and accountability in AI systems.

Poor Decision-Making

Biased data can lead to erroneous decisions that negatively impact business operations:

  • Operational Inefficiencies: AI models trained on biased data may make poor predictions, leading to inefficient resource allocation and operational mishaps.
  • Financial Losses: Incorrect decisions based on biased data can result in financial losses, such as extending credit to high-risk individuals or mismanaging inventory.

Amplification of Existing Biases

AI systems can perpetuate and even amplify existing biases if not properly managed:

  • Feedback Loops: Biased AI systems can create feedback loops where biased outcomes reinforce the biased data, leading to increasingly skewed results over time.
  • Entrenched Inequities: Over time, biased AI systems can entrench societal inequities, making it harder to address underlying issues of discrimination and inequality.

Ethical and Moral Implications

The ethical implications of data bias are profound:

  • Fairness and Justice: Biased AI systems challenge the principles of fairness and justice, raising moral questions about using such technologies in critical decision-making processes.
  • Human Rights: There are concerns that biased AI systems could infringe on human rights, particularly in areas like surveillance, law enforcement, and social services.

Perform the Risk Assessment

ICH Q9 (r1) Risk Management Process

Risk Management happens at the system/process level, where an AI/ML solution will be used. As appropriate, it drills down to the technology level. Never start with the technology level.

Hazard Identification

It is important to identify product quality hazards that may ultimately lead to patient harm. What is the hazard of that bad decision? What is the hazard of bad quality data? Those are not hazards; they are causes.

Hazard identification, the first step of a risk assessment, begins with a well-defined question defining why the risk assessment is being performed. It helps define the system and the appropriate scope of what will be studied. It addresses the “What might go wrong?” question, including identifying the possible consequences of hazards. The output of the hazard identification step is the identification of the possibilities (i.e., hazards) that the risk event (e.g., impact to product quality) happens.

The risk question takes the form of “What is the risk of using AI/ML solution for <Process/System> to <purpose of AI/MIL solution.” For example, “What is the risk of using AI/ML to identify deviation recurrence and help prioritize CAPAs?” or “What is the risk of using AI/ML to monitor real-time continuous manufacturing to determine the need to evaluate for a potential diversion?”

Process maps, data maps, and knowledge maps are critical here.

We can now identify the specific failure modes associated with AI/ML. This may involve deeep dive risk assessments. A failure mode is the specific way a failure occurs. So in this case, the specific way that bad data or bad decision making can happen. Multiple failure modes can, and usually do, lead to the same hazardous situation.

Make sure you drill down on failure causes. If more than 5 potential causes can be identified for a proposed failure mode, it is too broad and probably written at a high level in the process or item being risk assessed. It should be broken down into several specific failure modes with fewer potential causes and more manageable.

Start with an outline of how the process works and a description of the AI/ML (special technology) used in the process. Then, interrogate the following for potential failure modes:

  • The steps in the process or item under study in which AI/ML interventions occur;
  • The process/procedure documentation for example, master batch records, SOPs, protocols, etc.
    • Current and proposed process/procedure in sufficient detail to facilitate failure mode identification;
  • Critical Process Controls

Color and Risk Evaluation

These experiments show some preliminary evidence that the color assignment in risk matrices might influence people’s perception of risk gravity, and therefore their decisionmaking with regards to risk mitigation. We found that individuals might be tempted to cross color boundaries when reducing risks even if this option is not advantageous (i.e., the boundary crossing effect). However, this effect was not consistently found when we included exploratory analyses of
risk mitigations at different impact levels.

Pending future research replicating these results, the cautious recommendation is that the potential biasing effects of color should be considered alongside the goal of communication. If the purpose of communication is informing individuals in an unbiased way, these findings suggest it might be worth eliminating colors from risk matrices in order to reduce the risk of the boundary-crossing effect. On the other hand, if the goal of communication is to persuade individuals to implement certain risk mitigation actions, it might be that assigning colors so as to elicit the boundary-crossing effect would facilitate this. This could be the case, for example, when designing risk matrices that communicate action standards (i.e., severity level at which risk mitigation should be implemented) (Keller et al., 2009). This advice might be particularly relevant in the case of semiqualitative risk matrices, where color assignment might be arbitrary due to the absence of clear numeric cut-off points separating risk severity categories, and to situations where the users of the risk matrix are expected to be of higher numeracy and not have prior training in the design and use of risk matrices.

Proto, R., Recchia, G., Dryhurst, S., & Freeman, A. L. J. (2023). Do colored cells in risk matrices affect decision-making and risk perception? Insights from randomized controlled studies. Risk Analysis, 43, 2114–2128. https://doi.org/10.1111/risa.14091

Well, that is thought-provoking. I guess I need to start evaluating the removal of a lot of color from SOPs, work instructions, and templates.

The Mistake I See in Most Quality Risk Management SOPs

I have a little trick when reviewing a Quality Risk Management SOP. I go to the process/procedure map section, and if I see only the illustration from ICH Q9, I know I am looking at an organization that hasn’t actually thought about risk management.

A risk management process needs more than the methodology behind individual risk management (assess, control, review). It needs to include the following:

  1. Risk Plan: How do you manage risk management holistically? Which systems/processes have living risk assessments? What are your planned reviews? What significant initiatives around quality risk management are included?
  2. Risk Register: How do you manage your entire portfolio of risks? Link to quality management review.
  3. Selection of tools, and even more importantly, development of tools.
  4. Mechanisms and tools for risk treatment
  5. Improvement strategy for the quality risk management program. How do we know if the program is working as intended?
  6. How to define, select, and train risk owners
  7. How to engage the appropriate stakeholders in the risk process

Too many quality risk management SOPs do not read like process or procedure. They read like a regurgitation of ICH Q9 or the ISO31000 documents. Neither is a good thing. You must go deeper and create an executable process to govern the system.

Why the Shift to Hazard Identification in ICH Q9(r1) Matters

The revised ICH Q9 (R1) guideline shifts from “Risk Identification” to “Hazard Identification” to reflect a more precise approach to identifying potential sources of harm (hazards) rather than broadly identifying risks.

  1. Alignment with Risk Assessment Definition: The term “Hazard Identification” is more consistent with the established definition of Risk Assessment, which involves identifying hazards and analyzing and evaluating the associated risks.
  2. Clarity and Precision: By focusing on hazards, the guideline aims to improve the clarity and precision of the risk management process. This helps better understand and assess the potential harms associated with identified hazards, leading to more effective risk management.
  3. Improved Perception and Assessment: The change is expected to enhance how hazards are perceived and assessed, making the risk management process more robust and scientifically grounded. This is particularly important for ensuring patient safety and product quality.
  4. Consistency in Terminology: The revision aims to standardize the terminology used in quality risk management, reducing confusion and ensuring all stakeholders understand the terms and processes involved.
ICH Q9 (r1) Figure 1: Overview of a typical quality risk management process

This small change in terminology can lead to better risk-based decisions by highlighting the need to identify hazards and not risks during the first step of the risk assessment process to remove any distractions about risks that may interfere with the hazard identification activity. When a Risk Assessment team focuses only on identifying hazards, they do not have to think about any related probabilities of occurrence – they only have to consider the potential hazards concerning the risk question under consideration. This is also the case of the severity of harm during hazard identification. There is no need to work to estimate the severity of the harm that may be presented by a hazard that comes later after the hazards have been identified.  

Living Risk in the Validation Lifecycle

Risk management plays a pivotal role in validation by enabling a risk-based approach to defining validation strategies, ensuring regulatory compliance, mitigating product quality and safety risks, facilitating continuous improvement, and promoting cross-functional collaboration. Integrating risk management principles into the validation lifecycle is essential for maintaining control and consistently producing high-quality products in regulated industries such as biotech and medical devices.

We will conduct various risk assessments in our process lifecycle—many ad hoc (static) and a few living (dynamic). Understanding how they fit together in a larger activity set is crucial.

In the Facility, Utilities, Systems, and Equipment (FUSE) space, we are taking the process understanding, translating it into a design, and then performing Design Qualification (DQ) to verify that the critical aspects (CAs) and critical design elements (CDEs) necessary to control risks identified during the quality risk assessment (QRA) are present in the design. This helps mitigate risks to product quality and patient safety. To do this, we need to properly understand the process. Unfortunately, we often start with design before understanding the process and then need to go back and perform rework. Too often I see a dFMEA ignored or as an input to the pFMEA instead of working together in a full risk management cycle.

The Preliminary Hazard Analysis (PHA) supports a pFMEA, which supports a dFMEA, which supports the pFMEA (which also benefits at this stage from a HAACP). Tools fit together to provide the approach. Tools do not become the approach.

Design and Process FMEAs

DFMEA (Design Failure Mode and Effects Analysis) and PFMEA (Process Failure Mode and Effects Analysis) are both methodologies used within the broader FMEA framework to identify and mitigate potential failures. Still, they focus on different aspects of development and manufacturing.

DFMEAPFMEA
Scope and FocusPrimarily scrutinizes design to preempt flaws.Focuses on processes to ensure effectiveness, efficiency and reliability.
Stakeholder InvolvementEngages design-oriented teams like engineering, quality engineers, and reliability engineers.Involves operation-centric personnel such as manufacturing, quality control, quality operations, and process engineers.
Inputs and OutputsRelies on design requirements, product specs, and component interactions to craft a robust product.Utilizes process steps, equipment capabilities, and parameters to design a stable operational process.
Stages in lifecycleConducted early in development, concurrent with the design phase, it aids in early issue detection and minimizes design impact.Executed in production planning post-finalized design, ensuring optimized operations prior to full-scale production.
Updated WhenExecuted in production planning post-finalized design, ensuring optimized operations before full-scale production.Process changes and under annual review.
dFMEA and pFMEA

Risk Analysis in the Design Phase

The design qualification phase is especially suitable for determining risks for products and patients stemming from the equipment or machine. These risks should be identified during the design qualification and reflected by appropriate measures in the draft design so that the operator can effectively eliminate, adequately control, and monitor or observe them. To identify design defects (mechanical) or in the creation of systems (electronics) on time and to eliminate them at a low cost, it is advisable to perform the following risk analysis activities for systems, equipment, or processes:

  • Categorize the GMP criticality and identify the critical quality attributes and process parameters;
  • Categorize the requirements regarding the patient impact and product impact (for example, in the form of a trace matrix);
  • Identify critical functions and system elements (e.g., the definition of a calibration concept and preventive maintenance);
  • Investigate functions for defect recognition. This includes checking alarms and fault indications, operator error, etc. The result of this risk analysis may be the definition of further maintenance activities, a different assessment of a measurement point, or the identification of topics to include in the operating manuals or procedures.

Additional risk analyses for verifying the design may include usability studies using equipment mock-ups or preliminary production trials (engineering studies) regarding selected topics to prove the feasibility of specific design aspects (e.g., interaction between machine and materials).

Too often, we misunderstand risk assessments and start doing them at the most granular level. This approach allows us to right-size our risk assessments and holistically look at the entire lifecycle.