Our goal is to ensure that the data associated with drug manufacturing are complete, consistent, and accurate, and therefore reliable.
— Read on www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm628244.htm
And with those algorithms come a whole host of questions on how to validate and how to ensure they work properly over time. The FDA has indicated that ““we want to get an understanding of your general idea for model maintenance.” FDA also wants to know the “trigger” for updating the model, the criteria for recalibration, and the level of validation of the model.”
Kate Crawford at Microsoft speaks about “data fundamentalism” – the notion that massive datasets are repositories that yield reliable and objective truths, if only we can extract them using machine learning tools. It shouldn’t take much to realize the reasons why this trap can produce some very bad decision making. Our algorithm’s have biases, just as human beings have biases. They are dependent on the data models used to build and refine them.
Based on reported FDA thinking, and given where European regulators are in other areas, it is very clear we need to be able to explain and justify our algorithmic decisions. Machine learning in here now and will only grow more important.
Ask an Interesting Question
The first step is to be very clear on why there is a need for this system and what problem it is trying to solve. Having alignment across all the stakeholders is key to guarantee that the entire team is here with the same purpose. Here we start building a framework
Get the Data
The solution will only be as good as what it learns from. Following the common saying “garbage in, garbage out”, the problem is not with the machine learning tool itself, it lies with how it’s been trained and what data it is learning from.
Explore the Data
Look at the raw data. Look at data summary. Visualize the data. Do it all again a different way. Notice things. Do it again. Probably get more data. Design experiments with the data.
Model the Data
The only true way to validate a model is to observe, iterate and audit. If we take a traditional csv model to machine learning, we are in for a lot of hurt. We need to take the framework we built and validate to it. Ensure there are emchanisms to observe to this framework and audit to performance over time.
On 30-Nov-2018 PIC/S published the third draft of guidance PI 041-1 “Good Practices for Data Management and Data Integrity in regulated GMP/GDP Environments“. The first draft was published back in 2016, and the third draft is subject to a focused stakeholder consultation seeking substantive comments from trade and professional associations on specific questions relating to the proportionality, clarity and implementation of the guidance requirements. In parallel to this stakeholder consultation, the new draft is applied by PIC/S Participating Authorities on a trial basis for a new implementation trial period (3 months).
In short, you can expect inspectors to have reviewed and be reviewing against this. Do your gap analysis now and have plans in place to address the gaps. Yes, there will be a little while before this is finally published, but at this point this guidance neatly triangulates with other guidances on data integrity and we can expect most of this to be in the final version.
This document is a great place to start and can be used to develop whole sections of the quality management system. I find it very actionable. For example this table from 9.5 “Data capture/entry for computerised systems”:
A compilation of regulatory requirements in pharmaceuticals for data requirements.
|EMA||EMA Guideline on GCP compliance in relation to trial master file||A certified copy is a paper or electronic copy of the original record that has been verified (e.g. by a dated signature) or has been generated through a validated process to produce a copy having the exact content and meaning of the original.|
|EMA/CHMP/ICH Tripartite GCP||Guideline for good clinical practice E6(R2)||Source Data: All information in original records and certified copies of original records of clinical findings, observations, or other activities in a clinical trial necessary for the reconstruction and evaluation of the trial. Source data are contained in source documents (original records or certified copies).|
|EMA/CHMP/ICH Tripartite GCP||Guideline for good clinical practice E6(R2)||Source Documents: Original documents, data, and records (e.g. hospital records, clinical and office charts, laboratory notes, memoranda, subjects’ diaries or evaluation checklists, pharmacy dispensing records, recorded data from automated instruments, copies or transcriptions certified after verification as being accurate copies, microfiches, photographic negatives, microfilm or magnetic media, x-rays, subject files, and records kept at the pharmacy, at the laboratories and at medico-technical departments involved in the clinical trial).
Certified Copy: A copy (irrespective of the type of media used) of the original record that has been verified (i.e., by a dated signature or by generation through a validated process) to have the same information, including data that describe the context, content, and structure, as the original.
When a copy is used to replace an original document (e.g., source documents, CRF), the copy should fulfill the requirements for certified copies.
|EudraLex||Annex 11||Audit Trails
Consideration should be given, based on a risk assessment, to building into the system the creation of a record of all GMP-relevant changes and deletions (a system generated “audit trail”). For change or deletion of GMP-relevant data the reason should be documented. Audit trails need to be available and convertible to a generally intelligible form and regularly reviewed.
|EudraLex||Chapter 4||Generation and Control of Documentation
All types of document should be defined and adhered to. The requirements apply equally to all forms of document media types. Complex systems need to be understood, well documented, validated, and adequate controls should be in place. Many documents (instructions and/or records) may exist in hybrid forms, i.e. some elements as electronic and others as paper based.
|EudraLex||Chapter 4||Records: Provide evidence of various actions taken to demonstrate compliance with instructions, e.g. activities, events, investigations, and in the case of manufactured batches a history of each batch of product, including its distribution. Records include the raw data which is used to generate other records. For electronic records regulated users should define which data are to be used as raw data. At least, all data on which quality decisions are based should be defined as raw data|
|OECD||GLP No 1||Section 2.3 item 7. Raw data means all original test facility records and documentation, or verified copies thereof, which are the result of the original observations and activities in a study. Raw data also may include, for example, photographs, microfilm or microfiche copies, computer readable media, dictated observations, recorded data from automated instruments, or any other data storage medium that has been recognized as capable of providing secure storage of information for a time period as stated in section 10, below. (Section 10 not reproduced here.)|
|OECD||GLP No 17||Data (raw data): Data (raw data) may be defined as measurable or descriptive attribute of a physical entity, process or event. The GLP Principles define
raw data as all laboratory records and documentation, including data directly entered into a computer through an automatic instrument interface, which are the results of primary observations and activities in a study and which are necessary for the reconstruction and evaluation of the report of that study.
Data (derived data): Derived data depend on raw data and can be reconstructed from raw data (e.g., final concentrations as calculated by a spreadsheet relying on raw data, result tables as summarized by a LIMS, etc.).
|OECD||GLP No 17||3.4. Audit trails
An audit trail provides documentary evidence of activities that have affected the content or meaning of a record at a specific time point. Audit trails need to be available and convertible to a human readable form. Depending on the system, log files may be considered (or may be considered in addition, to an audit trailing system) to meet this requirement. Any change to electronic records must not obscure the original entry and be time and date stamped and traceable to the person who made the change.
Audit trail for a computerised system should be enabled, appropriately configured and reflect the roles and responsibilities of study personnel. The ability to make modifications to the audit trail settings should be restricted to authorised personnel. Any personnel involved in a study (e.g. study directors, heads of analytical departments, analysts, etc.) should not be authorised to change audit trail settings.
|PIC/S||PI 041-1 (Draft
|Complete: All information that would be critical to recreating an event is important when trying to understand the event. The level of detail required for an information set to be considered complete would depend on the criticality of the information…A complete record of data generated electronically includes relevant metadata.|
|PIC/S||PI 041-1 (Draft
|Many electronic records are important to retain in their dynamic (electronic) format, to enable interaction with the data. Data must be retained in a dynamic form where this is critical to its integrity or later verification.|
|PIC/S||PI 041-1 (Draft
|The original record can be described as the first-capture of information, whether recorded on paper (static) or electronically (usually dynamic, depending on the complexity of the system).
Information that is originally captured in a dynamic state should remain available in that state.
|UK MHRA||‘GXP’ Data Integrity Guidance and Definitions||6.2. Raw data (synonymous with “source data” which is defined in ICH GCP)
Raw data is defined as the original record (data) which can be described as the first-capture of information, whether recorded on paper or electronically. Information that is originally captured in a dynamic state should remain available in that state.
Raw data must permit full reconstruction of the activities. Where this has been captured in a dynamic state and generated electronically, paper copies cannot be considered as ‘raw data’…. In all definitions, the term ‘data’ includes raw data.
|UK MHRA||‘GXP’ Data Integrity Guidance and Definitions||A static record format, such as a paper or electronic record, is one that is fixed and allows little or no interaction between the user and the record content. For example, once printed or converted to static electronic format chromatography records lose the capability of being reprocessed or enabling more detailed viewing of baselines.
Records in dynamic format, such as electronic records, allow an interactive relationship between the user and the record content. For example, electronic records in database formats allow the user to track, trend and query data; chromatography records maintained as electronic records allow the user or reviewer (with appropriate access permissions) to reprocess the data and expand the baseline to view the integration more clearly.
Where it is not practical or feasibly possible to retain the original copy of source data, (e.g. MRI scans, where the source machine is not under the study sponsor’s control and the operator can only provide summary statistics) the risks and mitigation should be documented.
|UK MHRA||‘GXP’ Data Integrity Guidance and Definitions||6.11.1 Original record
The first or source capture of data or information e.g. original paper record of manual observation or electronic raw data file from a computerised system, and all subsequent data required to fully reconstruct the conduct of the GXP activity. Original records can be Static or Dynamic.
6.11.2 True copy
A copy (irrespective of the type of media used) of the original record that has been verified (i.e. by a dated signature or by generation through a validated process) to have the same information, including data that describe the context, content, and structure, as the original.
A true copy may be stored in a different electronic file format to the original record if required, but must retain the metadata and audit trail required to ensure that the full meaning of the data are kept and its history may be reconstructed.
Original records and true copies must preserve the integrity of the record. True copies of original records may be retained in place of the original record (e.g. scan of a paper record), if a documented system is in place to verify and record the integrity of the copy. Organisations should consider any risk associated with the destruction of original records.
It should be possible to create a true copy of electronic data, including relevant metadata, for the purposes of review, backup and archival. Accurate and complete copies for certification of the copy should include the meaning of the data (e.g. date formats, context, layout, electronic signatures and authorisations) and the full GXP audit trail. Consideration should be given to the dynamic functionality of a ‘true copy’ throughout the retention period (see ‘archive’).
Data must be retained in a dynamic form where this is critical to its integrity or later verification. If the computerized system cannot be maintained e.g., if it is no longer supported, then records should be archived according to a documented archiving strategy prior to decommissioning the computerized system. It is conceivable for some data generated by electronic means to be retained in an acceptable paper or electronic format, where it can be justified that a static record maintains the integrity of the original data. However, the data retention process must be shown to include verified copies of all raw data, metadata, relevant audit trail and result files, any variable software/system configuration settings specific to each record, and all data processing runs (including methods and audit trails) necessary for reconstruction of a given raw data set. It would also require a documented means to verify that the printed records were an accurate representation. To enable a GXP compliant record this approach is likely to be demanding in its administration.
|UK MHRA||‘GXP’ Data Integrity Guidance and Definitions||4.3 Hybrid
Where hybrid systems are used, it should be clearly documented what constitutes the whole data set and all records that are defined by the data set should be reviewed and retained. Hybrid systems should be designed to ensure they meet the desired objective.
|UK MHRA||‘GXP’ Data Integrity Guidance and Definitions||The audit trail is a form of metadata containing information associated with actions that relate to the creation, modification or deletion of GXP records. An audit trail provides for secure recording of life-cycle details such as creation, additions, deletions or alterations of information in a record, either paper or electronic, without obscuring or overwriting the original record. An audit trail facilitates the reconstruction of the history of such events relating to the record regardless of its medium, including the “who, what, when and why” of the action.|
|US FDA||21 CFR Part
|Laboratory records shall include complete data derived from all tests necessary to assure compliance with established specifications and standards, including examinations and assays|
|US FDA||21 CFR Part
|Batch production and control records shall be prepared for each batch of drug product produced and shall include complete information relating to the production and control of each batch.|
|US FDA||21 CFR Part
|Hard copy or alternative systems, such as duplicates, tapes, or microfilm, designed to assure that backup data are exact and complete and that it is secure from alteration, inadvertent erasures, or loss shall be maintained.|
|US FDA||21 CFR Part
|Raw data means any laboratory worksheets, records, memoranda, notes, or exact copies thereof, that are the result of original observations and activities of a nonclinical laboratory study and are necessary for the reconstruction and evaluation of the report of that study.|
|US FDA||21 CFR Part
|Raw data means all original nonclinical laboratory study records and documentation or exact copies that maintain the original intent and meaning and are made according to the person’s certified copy procedures.
Raw data includes any laboratory worksheets, correspondence, notes, and other documentation (regardless of capture medium) that are the result of original observations and activities of a nonclinical laboratory study and are necessary for the reconstruction and evaluation of the report of that study.
Raw data also includes the signed and dated pathology report.
|US FDA||21 CFR Part 211.180(d)||Records required under this part may be retained either as original records or as true copies such as photocopies, microfilm, microfiche, or other accurate reproductions of the original records. Where reduction techniques, such as microfilming, are used, suitable reader and photocopying equipment shall be readily available.|
|US FDA||Data Integrity and Compliance with CGMP Guidance for
|Electronic copies can be used as true copies of paper or electronic records, provided the copies preserve the content and meaning of the original data, which includes associated metadata and the static or dynamic nature of the original records.
True copies of dynamic electronic records may be made and maintained in the format of the original records or in a compatible format, provided that the content and meaning of the original records are preserved and that a suitable reader and copying equipment (for example, software and hardware, including media readers) are readily available.
|US FDA||Data Integrity and Compliance with CGMP Guidance for
|What is an “audit trail”?
For purposes of this guidance, audit trail means a secure, computer-generated, time-stamped electronic record that allows for reconstruction of the course of events relating to the creation, modification, or deletion of an electronic record. An audit trail is a chronology of the “who, what, when, and why” of a record.
For example, the audit trail for a high performance liquid chromatography (HPLC) run could include the user name, date/time of the run, the integration parameters used, and details of a reprocessing, if any, including change justification for the reprocessing.
Electronic audit trails include those that track creation, modification, or deletion of data (such as processing parameters and results) and those that track actions at the record or system level (such as attempts to access the system or rename or delete a file).
CGMP-compliant record-keeping practices prevent data from being lost or obscured (see §§ 211.160(a), 211.194, and 212.110(b)). Electronic record-keeping systems, which include audit trails, can fulfill these CGMP requirements.
|US FDA||Data Integrity and Compliance with CGMP Guidance for Industry||For the purposes of this guidance, static is used to indicate a fixed-data document such as a paper record or an electronic image, and dynamic means that the record format allows interaction between the user and the record content. For example, a dynamic chromatographic record may allow the user to change the baseline and reprocess chromatographic data so that the resulting peaks may appear smaller or larger. It also may allow the user to modify formulas or entries in a spreadsheet used to compute test results or other information such as calculated yield.|
|WHO||TRS No. 996
|Data means all original records and true copies of original records, including source data and metadata and all subsequent transformations and reports of these data, which are generated or recorded at the time of the GXP activity and allow full and complete reconstruction and evaluation of the GXP activity. Data should be accurately recorded by permanent means at the time of the activity. Data may be contained in paper records (such as worksheets and logbooks), electronic records and audit trails, photographs, microfilm or microfiche, audio- or video-files or any other media whereby information related to GXP activities is recorded.|
|WHO||TRS No. 996
|Dynamic record format.
Records in dynamic format, such as electronic records, that allow for an interactive relationship between the user and the record content. For example, electronic records in database formats allow the user to track, trend and query data; chromatography records maintained as electronic records allow the
user (with proper access permissions) to reprocess the data and expand the baseline to view the integration more clearly.
Static record format.
A static record format, such as a paper or pdf record, is one that is fixed and allows little or no interaction between the user and the record content. For example, once printed or converted to static pdfs, chromatography records lose the capability of being reprocessed or enabling more detailed viewing of baselines.
|WHO||TRS No. 996
|The use of hybrid systems is discouraged, but where legacy systems are awaiting replacement, mitigating controls should be in place…
A hybrid approach might exceptionally be used to sign electronic records when the system lacks features for electronic signatures, provided adequate security can be maintained…
Replacement of hybrid systems should be a priority.
A critical skill of a quality professional (of any professional), and a fundamental part of Quality 4.0, is managing data — knowing how to acquire good data, analyze it properly, follow the clues those analyses offer, explore the implications, and present results in a fair, compelling way.
As we build systems, validate computer systems, create processes we need to ensure the quality of data. Think about the data you generate, and continually work to make it better.
I am a big fan of tools like the Friday Afternoon Measurement to determine where data has problems.
Have the tools to decide what data stands out, use control charts and regression analysis. These tools will help you understand the data. “Looks Good To Me: Visualizations As Sanity Checks” by Michael Correll is a great overview of how data visualization can help us decide if the data we are gathering makes sense.
Then root cause analysis (another core capability) allows us to determine what is truly going wrong with our data.
Throughout all your engagements with data understand statistical significance, how to quantify whether a result is likely due to chance or from the factors you were measuring.
In the past it was enough to understand a pareto chart, and histogram, and maybe a basic control chart. Those days are long gone. What quality professionals need to bring to the table today is a deeper understanding of data and how to gather, analyze and determine relevance. Data integrity is a key concept, and to have integrity, you need to understand data.
As we all try to figure out just exactly what Industry 4.0 and Quality 4.0 mean it is not an exaggeration to say “Data is your most valuable asset. Yet we all struggle to actually get a benefit from this data and data integrity is an area of intense regulatory concern.
To truly have value our data needs to be properly defined, relevant to the tasks at hand, structured such that it is easy to find and understand, and of high-enough quality that it can be trusted. Without that we just have noise.
Understand why data matters, how to pick the right metrics, and how to ask the right questions from data. Understand correlation vs. causation to be able to make decisions about when to act on analysis and when not to is critical.
In the 2013 article Keep Up with Your Quants, Thomas Davenport lists six questions that should be asked to evaluate conclusions obtained from data:
1. What was the source of your data?
2. How well do the sample data represent the population?
3. Does your data distribution include outliers? How did they affect the results?
4. What assumptions are behind your analysis? Might certain conditions render your assumptions and your model invalid?
5. Why did you decide on that particular analytical approach? What alternatives did you consider?
6. How likely is it that the independent variables are actually causing the changes in the dependent variable? Might other analyses establish causality more clearly?
Framing data, being able to ask the right questions, is critical to being able to use that data and make decisions. In the past it was adequate enough for a quality professional to have a familiarity with a few basic tools. Today it is critical to understand basic statistics. As Nate Silver advises in an interview with HBR. “The best training is almost always going to be hands on training,” he says. “Getting your hands dirty with the data set is, I think, far and away better than spending too much time doing reading and so forth.”
Appropriate controls shall be exercised over computer or related systems to assure that changes in master production and control records or other records are instituted only by authorized personnel. Input to and output from the computer or related system of formulas or other records or data shall be checked for accuracy. The degree and frequency of input/output verification shall be based on the complexity and reliability of the computer or related system. A backup file of data entered into the computer or related system shall be maintained except where certain data, such as calculations performed in connection with laboratory analysis, are eliminated by computerization or other automated processes. In such instances a written record of the program shall be maintained along with appropriate validation data. Hard copy or alternative systems, such as duplicates, tapes, or microfilm, designed to assure that backup data are exact and complete and that it is secure from alteration, inadvertent erasures, or loss shall be maintained.
Kris Kelly over at Advantu got me thinking about GAMP5 today. As a result I went to the FDA’s Inspection Observations page and was quickly reminded me that in 2017 one of the top ten highest citations was against 211.68(b), with the largest frequency being “Appropriate controls are not exercised over computers or related systems to assure that changes in master production and control records or other records are instituted only by authorized personnel. ”
Similar requirements are found throughout the regulations of all major markets (for example EU 5.25) and data integrity is a big piece of this pie.
When building your change management system remember that your change is both a change to a validated change and a change to a process, and needs to go through the same appropriate rigor on both ends. Companies continue to get in a lot of trouble on this. Especially when you add in the impact of master data.
Make sure your IT organization is fully aligned. There’s a tendency at many companies (including mine) to build walls between an ITIL orientated change process and process changes. This needs to be driven by a risk based approach, and find the opportunities to tear down walls. I’m spending a lot of my time finding ways to do this, and to be honest, worry that there aren’t enough folks on the IT side of the fence willing to help tear down the fence.
So yes, GAMP5 is a great tool. Maybe one of the best frameworks we have available.