Pillars of Good Data

One thing we should all agree with is that we need reliable reliable, accurate, and trustworthy data. Which is why we strive for the principles of data governance, data quality, and data integrity, three interconnected concepts that work together to create a robust data management framework.

Overarching Framework: Data Governance

Data governance serves as the overarching framework that establishes the policies, procedures, and standards for managing data within an organization. It provides the structure and guidance necessary for effective data management, including:

  • Defining roles and responsibilities for data management
  • Establishing data policies and standards
  • Creating processes for data handling and decision-making
  • Ensuring compliance with regulations and internal policies

Data governance sets the stage for both data quality and data integrity initiatives by providing the necessary organizational structure and guidelines.

Data Quality: Ensuring Fitness for Purpose

Within the data governance framework, data quality focuses on ensuring that data is fit for its intended use. This involves:

  • Assessing data against specific quality dimensions (e.g., accuracy, completeness, consistency, validity, timeliness)
  • Implementing data cleansing and standardization processes
  • Monitoring and measuring data quality metrics
  • Continuously improving data quality through feedback loops and corrective actions

Data quality initiatives are guided by the policies and standards set forth in the data governance framework, ensuring that quality efforts align with organizational goals and requirements.

Data Integrity: Maintaining Trustworthiness

Data integrity works in tandem with data quality to ensure that data remains accurate, complete, consistent, and reliable throughout its lifecycle. The ALCOA+ principles, widely used in regulated industries, provide a comprehensive framework for ensuring data integrity.

ALCOA+ Principles

Attributable: Ensuring that data can be traced back to its origin and the individual responsible for its creation or modification.

Legible: Maintaining data in a clear, readable format that is easily understandable.

Contemporaneous: Recording data at the time of the event or observation to ensure accuracy and prevent reliance on memory.

Original: Preserving the original record or a certified true copy to maintain data authenticity.

Accurate: Ensuring data correctness and freedom from errors.

Complete: Capturing all necessary information without omissions.

Consistent: Maintaining data coherence across different systems and over time.

Enduring: Preserving data for the required retention period in a format that remains accessible.

Available: Ensuring data is readily accessible when needed for review or inspection.

Additional Data Integrity Measures

Security Measures: Implementing robust security protocols to protect data from unauthorized access, modification, or deletion.

Data Lineage Tracking: Establishing systems to monitor and document data transformations and origins throughout its lifecycle.

Auditability: Ensuring data changes are traceable through comprehensive logging and change management processes.

Data Consistency: Maintaining uniformity of data across various systems and databases.

Data integrity measures are often defined and enforced through data governance policies, while also supporting data quality objectives by preserving the accuracy and reliability of data. By adhering to the ALCOA+ principles and implementing additional integrity measures, organizations can ensure their data remains trustworthy and compliant with regulatory requirements.

Synergy in Action

The collaboration between these three elements can be illustrated through a practical example:

  1. Data Governance Framework: An organization establishes a data governance committee that defines policies for GxP data management, including data quality standards and security requirements.
  2. Data Quality Initiative: Based on the governance policies, the organization implements data quality checks to ensure GxP information is accurate, complete, and up-to-date. This includes:
    • Regular data profiling to identify quality issues
    • Data cleansing processes to correct errors
    • Validation rules to prevent the entry of incorrect data
  3. Data Integrity Measures: To maintain the trustworthiness of GxP data, the organization:
    • Implements access controls to prevent unauthorized modifications
    • Qualifies system to meet ALCOA+ requirements
    • Establishes audit trails to track changes to GxP records

By working together, these elements ensure that:

  • GxP data meets quality standards (data quality)
  • The data remains has a secure and unaltered lineage (data integrity)
  • All processes align with organizational policies and regulatory requirements (data governance)

Continuous Improvement Cycle

The relationship between data governance, quality, and integrity is not static but forms a continuous improvement cycle:

  1. Data governance policies inform data quality and integrity standards.
  2. Data quality assessments and integrity checks provide feedback on the effectiveness of governance policies.
  3. This feedback is used to refine and improve governance policies, which in turn enhance data quality and integrity practices.

This ongoing cycle ensures that an organization’s data management practices evolve to meet changing business needs and technological advancements.

Data governance, data quality, and data integrity work together as a cohesive system to ensure that an organization’s data is not only accurate and reliable but also properly managed, protected, and utilized in alignment with business objectives and regulatory requirements. This integrated approach is essential for organizations seeking to maximize the value of their data assets while minimizing risks associated with poor data management.

A GMP Application based on ISA S88.01

A great example of Data governance is applying ISA S88.01 to enhance batch control processes and improve overall manufacturing operations.

Data Standardization and Structure

ISA S88.01 provides a standardized framework for batch control, including models and terminology that define the physical, procedural, and recipe aspects of batch manufacturing. This standardization directly supports data governance efforts by:

  • Establishing a common language for batch processes across the organization
  • Defining consistent data structures and hierarchies
  • Facilitating clear communication between different departments and systems

Improved Data Quality

By following the ISA S88.01 standard, organizations can ensure higher data quality throughout the batch manufacturing process:

  • Consistent Data Collection: The standard defines specific data points to be collected at each stage of the batch process, ensuring comprehensive and uniform data capture.
  • Traceability: ISA S88.01 enables detailed tracking of each phase of the batch process, including raw materials used, process parameters, and quality data.
  • Data Integrity: The structured approach helps maintain data integrity by clearly defining data sources, formats, and relationships.

Enhanced Data Management

The ISA S88.01 model supports effective data management practices:

  • Modular Approach: The standard’s modular structure allows for easier management of data related to specific equipment, procedures, or recipes.
  • Scalability: As processes or equipment change, the modular nature of ISA S88.01 facilitates easier updates to data structures and governance policies.
  • Data Lifecycle Management: The standard’s clear delineation of process stages aids in managing data throughout its lifecycle, from creation to archival.

Regulatory Compliance

ISA S88.01 supports data governance efforts related to regulatory compliance:

  • Audit Trails: The standard’s emphasis on traceability aligns with regulatory requirements for maintaining detailed records of batch processes.
  • Consistent Documentation: Standardized terminology and structures facilitate the creation of consistent, compliant documentation.

Decision Support and Analytics

The structured data approach of ISA S88.01 enhances data governance initiatives aimed at improving decision-making:

  • Data Integration: The standard facilitates easier integration of batch data with other enterprise systems, supporting comprehensive analytics.
  • Performance Monitoring: Standardized data structures enable more effective monitoring and comparison of batch processes across different units or sites.

Continuous Improvement

Both data governance and ISA S88.01 support continuous improvement efforts:

  • Process Optimization: The structured data from ISA S88.01 compliant systems can be more easily analyzed to identify areas for process improvement.
  • Knowledge Management: The standard terminology and models facilitate better knowledge sharing and retention within the organization.

By leveraging ISA S88.01 in conjunction with robust data governance practices, organizations can create a powerful framework for managing batch processes, ensuring data quality, and driving operational excellence in manufacturing environments.

Signature Logs

A colleague asks “In the era of digitalization and electronic signatures, do you believe in continuing to collect wet ink signature as part of employee training file? Can Part 11 electronic signature be used as an attestation that electronic signature is legally binding as handwritten signature?”

Great question. Collecting wet signatures is a real pain. Transitioning to digital practices can also significantly streamline our processes. It seems like a win-win. What could go wrong?

First, let’s ask “just how digital are you?”. It is essential to inventory your various practices and determine what is what. I think there are several categories here:

  1. Starts as paper, retained as paper
  2. It starts as paper and is retained as electronic. For example, you might print a form, fill it out, and route it through DocuSign or your eDMS for approval.
  3. Starts as electronic, retained as paper
  4. The entire lifecycle is electronic.

Most pharmaceutical companies are in a weird situation where we do a lot of work, starting on paper, scanning it, and then approving it. This is especially true at virtual companies, where a lot of the action happens at a CxO.

Do that inventory because you probably have more paper than you think—lots of paper. Plus, having an inventory will allow you to decide on future steps.

Before we get to the solution, let’s look at the regulatory requirements.

A is for Attributable (that’s good enough for me)

First Principle: Records should be signed and dated using a unique identifier attributable to the author. (PIC/S Data Integrity Guidance 8.6.1 Expectation 4.)

The guidance then goes on to say, “Check that there are signature and initials logs that are controlled and current and that demonstrate the use of unique examples, not just standardized printed letters.”

Second Principle: Persons using electronic signatures shall, prior to or at the time of such use, certify to the agency that the electronic signatures in their system, used on or after August 20, 1997, are intended to be the legally binding equivalent of traditional handwritten signatures. (21CFR11.100(c))

To comply with 21 CFR 11.100(c), organizations must:

  1. Prepare a Certification Letter: Draft a letter to the FDA certifying that the electronic signatures used in their system are legally binding.
  2. Submit the Certification: Send the certification letter to the FDA.
  3. Maintain Records: For future reference, keep a copy of the certification letter in the organization’s regulatory information management system (RIM) or quality management system (QMS) records.
  4. Keep Individual Records: Everyone should affirm that the electronic signature used across systems is binding.
  5. Be Prepared for Requests: Be ready to provide additional certification or testimony if the FDA requests. Like, say, an inspection.

This regulation ensures that electronic signatures are treated with the same level of trust and legal standing as traditional handwritten signatures, thereby supporting the integrity and reliability of electronic records in FDA-regulated industries.

Third Principle: The FDA lives within a constellation of other laws

Individual employees generally do not need to provide a wet signature attesting to the legally binding nature of an electronic signature. However, there are some important considerations:

  1. Legal validity: Electronic signatures are legally binding in the United States under the ESIGN Act and UETA, provided certain conditions are met.
  2. Intent and consent: Two critical elements for a legally binding electronic signature are:
  • Intent to sign
  • Consent to do business electronically
  1. Best practices for employers:
  • Implement a uniform policy on how employees sign agreements and onboarding documents.
  • Consider using two-factor verification for electronic signatures to provide additional proof of authenticity.
  • Ensure clear labeling of buttons and boxes for electronic signatures.
  • Include a consent clause for electronic transactions.
  • Provide an opt-out option for those unable to sign electronically.

While employees generally don’t need to provide a wet signature attesting to the legally binding nature of an electronic signature, employers should ensure their electronic signature process demonstrates intent and consent.

What to do

If your inventory showed everything is electronic, great. Get that attestation from the user as part of new hire orientation, and you are good to go. That attestation can be electronic. It just needs to be quickly retrievable in a way to answer an inspection.

If the inventory showed any paper, then yes, keep collecting those signature/initial logs.

Data Quality, Data Bias, and the Risk Assessment

I’ve seen my fair share of risk assessments listing data quality or bias as hazards. I tend to think that is pretty sloppy. I especially see this a lot in conversations around AI/ML. Data quality is not a risk. It is a causal factor in the failure or severity.

Data Quality and Data Bias

Data Quality

Data quality refers to how well a dataset meets certain criteria that make it fit for its intended use. The key dimensions of data quality include:

  1. Accuracy – The data correctly represents the real-world entities or events it’s supposed to describe.
  2. Completeness – The dataset contains all the necessary information without missing values.
  3. Consistency – The data is uniform and coherent across different systems or datasets.
  4. Timeliness – The data is up-to-date and available when needed.
  5. Validity – The data conforms to defined business rules and parameters.
  6. Uniqueness – There are no duplicate records in the dataset.

High-quality data is crucial for making informed quality decisions, conducting accurate analyses, and developing reliable AI/ML models. Poor data quality can lead to operational issues, inaccurate insights, and flawed strategies.

Data Bias

Data bias refers to systematic errors or prejudices present in the data that can lead to inaccurate or unfair outcomes, especially in machine learning and AI applications. Some common types of data bias include:

  1. Sampling bias – When the data sample doesn’t accurately represent the entire population.
  2. Selection bias – When certain groups are over- or under-represented in the dataset.
  3. Reporting bias – When the frequency of events in the data doesn’t reflect real-world frequencies.
  4. Measurement bias – When the data collection method systematically skews the results.
  5. Algorithmic bias – When the algorithms or models introduce biases in the results.

Data bias can lead to discriminatory outcomes and produce inaccurate predictions or classifications.

Relationship between Data Quality and Bias

While data quality and bias are distinct concepts, they are closely related:

  • Poor data quality can introduce or exacerbate biases. For example, incomplete or inaccurate data may disproportionately affect certain groups.
  • High-quality data doesn’t necessarily mean unbiased data. A dataset can be accurate, complete, and consistent but still contain inherent biases.
  • Addressing data bias often involves improving certain aspects of data quality, such as completeness and representativeness.

Organizations must implement robust data governance practices to ensure high-quality and unbiased data, regularly assess their data for quality issues and potential biases, and use techniques like data cleansing, resampling, and algorithmic debiasing.

Identifying the Hazards and the Risks

It is critical to remember the difference between a hazard and a risk. Data quality is a causal factor in the hazard, not a harm.

Hazard Identification

Think of it like a fever. An open wound is a causal factor for the fever, which has a root cause of poor wound hygiene. I can have the factor (the wound), but without the presence of the root cause (poor wound hygiene), the event (fever) would not develop (okay, there may be other root causes in play as well; remember there is never really just one root cause).

Potential Issues of Poor Data Quality and Inadequate Data Governance

The risks associated with poor data quality and inadequate data governance can significantly impact organizations. Here are the key areas where risks can develop:

Decreased Data Quality

  • Inaccurate, incomplete, or inconsistent data leads to flawed decision-making
  • Errors in customer information, product details, or financial data can cause operational issues
  • Poor quality data hinders effective analysis and forecasting

Compliance Failures:

  • Non-compliance with regulations can result in regulatory actions
  • Legal complications and reputational damage from failing to meet regulatory requirements
  • Increased scrutiny from regulatory bodies

Security Breaches

  • Inadequate data protection increases vulnerability to cyberattacks and data breaches
  • Financial costs associated with breach remediation, legal fees, and potential fines
  • Loss of customer trust and long-term reputational damage

Operational Inefficiencies

  • Time wasted on manual data cleaning and correction
  • Reduced productivity due to employees working with unreliable data
  • Inefficient processes resulting from poor data integration or inconsistent data formats

Missed Opportunities

  • Failure to identify market trends or customer insights due to unreliable data
  • Missed sales leads or potential customers because of inaccurate contact information
  • Inability to capitalize on business opportunities due to lack of trustworthy data

Poor Decision-Making

  • Decisions based on inaccurate or incomplete data leading to suboptimal outcomes, including deviations and product/study impact
  • Misallocation of resources due to flawed insights from poor quality data
  • Inability to effectively measure and improve performance

Potential Issues of Data Bias

Data bias presents significant risks across various domains, particularly when integrated into machine learning (ML) and artificial intelligence (AI) systems. These risks can manifest in several ways, impacting both individuals and organizations.

Discrimination and Inequality

Data bias can lead to discriminatory outcomes, systematically disadvantaging certain groups based on race, gender, age, or socioeconomic status. For example:

  • Judicial Systems: Biased algorithms used in risk assessments for bail and sentencing can result in harsher penalties for people of color compared to their white counterparts, even when controlling for similar circumstances.
  • Healthcare: AI systems trained on biased medical data may provide suboptimal care recommendations for minority groups, potentially exacerbating health disparities.

Erosion of Trust and Reputation

Organizations that rely on biased data for decision-making risk losing the trust of their customers and stakeholders. This can have severe reputational consequences:

  • Customer Trust: If customers perceive that an organization’s AI systems are biased, they may lose trust in the brand, leading to a decline in customer loyalty and revenue.
  • Reputation Damage: High-profile cases of AI bias, such as discriminatory hiring practices or unfair loan approvals, can attract negative media attention and public backlash.

Legal and Regulatory Risks

There are significant legal and regulatory risks associated with data bias:

  • Compliance Issues: Organizations may face legal challenges and fines if their AI systems violate anti-discrimination laws.
  • Regulatory Scrutiny: Increasing awareness of AI bias has led to calls for stricter regulations to ensure fairness and accountability in AI systems.

Poor Decision-Making

Biased data can lead to erroneous decisions that negatively impact business operations:

  • Operational Inefficiencies: AI models trained on biased data may make poor predictions, leading to inefficient resource allocation and operational mishaps.
  • Financial Losses: Incorrect decisions based on biased data can result in financial losses, such as extending credit to high-risk individuals or mismanaging inventory.

Amplification of Existing Biases

AI systems can perpetuate and even amplify existing biases if not properly managed:

  • Feedback Loops: Biased AI systems can create feedback loops where biased outcomes reinforce the biased data, leading to increasingly skewed results over time.
  • Entrenched Inequities: Over time, biased AI systems can entrench societal inequities, making it harder to address underlying issues of discrimination and inequality.

Ethical and Moral Implications

The ethical implications of data bias are profound:

  • Fairness and Justice: Biased AI systems challenge the principles of fairness and justice, raising moral questions about using such technologies in critical decision-making processes.
  • Human Rights: There are concerns that biased AI systems could infringe on human rights, particularly in areas like surveillance, law enforcement, and social services.

Perform the Risk Assessment

ICH Q9 (r1) Risk Management Process

Risk Management happens at the system/process level, where an AI/ML solution will be used. As appropriate, it drills down to the technology level. Never start with the technology level.

Hazard Identification

It is important to identify product quality hazards that may ultimately lead to patient harm. What is the hazard of that bad decision? What is the hazard of bad quality data? Those are not hazards; they are causes.

Hazard identification, the first step of a risk assessment, begins with a well-defined question defining why the risk assessment is being performed. It helps define the system and the appropriate scope of what will be studied. It addresses the “What might go wrong?” question, including identifying the possible consequences of hazards. The output of the hazard identification step is the identification of the possibilities (i.e., hazards) that the risk event (e.g., impact to product quality) happens.

The risk question takes the form of “What is the risk of using AI/ML solution for <Process/System> to <purpose of AI/MIL solution.” For example, “What is the risk of using AI/ML to identify deviation recurrence and help prioritize CAPAs?” or “What is the risk of using AI/ML to monitor real-time continuous manufacturing to determine the need to evaluate for a potential diversion?”

Process maps, data maps, and knowledge maps are critical here.

We can now identify the specific failure modes associated with AI/ML. This may involve deeep dive risk assessments. A failure mode is the specific way a failure occurs. So in this case, the specific way that bad data or bad decision making can happen. Multiple failure modes can, and usually do, lead to the same hazardous situation.

Make sure you drill down on failure causes. If more than 5 potential causes can be identified for a proposed failure mode, it is too broad and probably written at a high level in the process or item being risk assessed. It should be broken down into several specific failure modes with fewer potential causes and more manageable.

Start with an outline of how the process works and a description of the AI/ML (special technology) used in the process. Then, interrogate the following for potential failure modes:

  • The steps in the process or item under study in which AI/ML interventions occur;
  • The process/procedure documentation for example, master batch records, SOPs, protocols, etc.
    • Current and proposed process/procedure in sufficient detail to facilitate failure mode identification;
  • Critical Process Controls

Back Up and Recovery Testing

Backup and recovery testing are critical to ensuring data integrity and business continuity for critical computerized systems. They are also a hard regulatory requirement in our computer system lifecycle.

Part 11 (21 CFR 11.10 and 11.30) requires that:
“For the availability of computerized systems supporting critical processes, provisions should be made to ensure continuity of the systems in the event of an incident or system failure. This includes implementing adequate backup and recovery measures, as well as providing sufficient system redundancy and failover mechanisms.”

Part 11 also requires that “The backup and recovery processes must be validated in order to ensure that they operate in an effective and reliable manner.”

Similarly, Annex 11 requires that backup and recovery processes be validated to ensure they operate reliably and effectively. Annex 11 also requires that the validation process be documented and includes a risk assessment of the system’s critical processes.

Similar requirements can be found across the GxP data integrity requirements.

The regulatory requirements require that backup and recovery processes be validated to ensure they can reliably recover the system in case of an incident or failure. This validation process must be documented, including a risk assessment of the system’s critical processes.

Backup and recovery testing:

  1. Verifies Backup Integrity: Testing backups lets you verify that the backup data is complete, accurate, and not corrupted. It ensures that the backed-up data can be reliably restored when needed, maintaining the integrity of the original data.
  2. Validates Recovery Procedures: Regularly testing the recovery process helps identify and resolve any issues or gaps in the recovery procedures. This ensures that the data can be restored wholly and correctly, preserving its integrity during recovery.
  3. Identifies Data Corruption: Testing can reveal data corruption that may have gone unnoticed. By restoring backups and comparing them with the original data, you can detect and address any data integrity issues before they become critical.
  4. Improves Disaster Preparedness: Regular backup and recovery testing helps organizations identify and address potential issues before a disaster strikes. This improves the organization’s preparedness and ability to recover data with integrity in a disaster or data loss incident.
  5. Maintains Business Continuity: Backup and recovery testing helps maintain business continuity by ensuring that backups are reliable and recovery procedures are adequate. Organizations can minimize downtime and data loss, ensuring the integrity of critical business data and operations.

To maintain data integrity, it is recommended that backup and recovery testing be performed regularly. This should follow industry best practices and adhere to the organization’s recovery time objectives (RTOs) and recovery point objectives (RPOs). Testing should cover various scenarios, including full system restores, partial data restores, and data validation checks.

LevelDescriptionKey ActivitiesFrequency
Backup TestsEnsures data is backed up correctly and consistently.– Check backup infrastructure health
– Verify data consistency
– Ensure all critical data is covered
– Check security settings
Regularly (daily, weekly, monthly)
Recovery TestsEnsures data can be restored effectively and within required timeframes.– Test recovery time and point objectives (RTO and RPO)
– Define and test various recovery scopes
– Schedule tests to avoid business disruption
– Document all tests and results
Regularly (quarterly, biannually, annually)
Disaster Recovery TestsEnsures the disaster recovery plan is effective and feasible.– Perform disaster recovery scenarios
– Test failover and failback operations
– Coordinate with all relevant teams and stakeholders
Less frequent (once or twice a year)

By incorporating backup and recovery testing into the data lifecycle, organizations can have confidence in their ability to recover data with integrity, minimizing the risk of data loss or corruption and ensuring business continuity in the face of disasters or data loss incidents.

AspectBackup TestsRecovery Tests
ObjectiveVerify data integrity and backup processesEnsure data and systems can be successfully restored
FocusData backup and storageComprehensive recovery of data, applications, and infrastructure
ProcessesData copy verification, consistency checks, storage verificationFull system restore, spot-checking, disaster simulation
ScopeData-focusedBroader scope including systems and infrastructure
FrequencyRegular intervals (daily, weekly, monthly)Less frequent but more thorough
Testing AreasBackup scheduling, data transfer, storage capacityRecovery time objectives (RTO), recovery point objectives (RPO), failover/failback
ValidationBackup data is complete and accessibleRestored data and systems are fully functional