Clinical studies are usually very straightforward. We want to know if a new drug works and if it does, how well? We give this drug to a few thousand people and see how they react. We record their thoughts, check them for a dozen health indicators, analyze a ton of tables with recorded information - and we get our answer. All of these steps are heavily reliant on one important thing - data. And how well it’s recorded, collected, compiled, and analyzed.
For most of history, humanity did a pretty solid job at working with data. But the more we grow, the bigger our needs become. And two of today's core essentials in clinical research are data sharing and accessibility. Previously we had a look at main sources of drug data APIs. Now we examine CDISC standards and how they impact the scientific community.
For researchers - clear and simple data management at all stages of clinical research meaning that breakthroughs happen faster, and study results are more easily shared and understood by the community.
For pharma companies - simple and fast submission and review process as CDISC standards are required by FDA, PMDA, and EMA.
For technology vendors - the ability to create solutions that the research community needs and use standardized datasets for machine learning in pharma.
Overall, standards simplify the processing of tons of data, from questionnaires submitted by patients to information about laboratory samples.
There’s a whole family of CDISC standards to learn about, so let’s talk about them.
Foundational or data content standards (SEND, PRM, CDASH, SDTM, ADaM). These are the basis standards for a lot of other CDISC standards that support data collection, management, analysis, and reporting at all stages of the clinical study process.
Data exchange standards (SDM-XML, ODM-XML, Define-XML, CTR-XML). CDISC uses a typical XML standard for data exchange, supported by industry-related specifications.
Therapeutic areas. These standards specify foundational standards for different disease areas. For example, here’s a user guide explaining how to collect data and conduct trials relevant to COVID-19.
Controlled terminology and glossary. This is a dataset containing a list of terms that should be used with CDISC standards, ensuring that data is collected and recorded consistently.
We will overview all standards within foundational and data exchange groups.
PRM describes the most used items found in a protocol and applies a structure that makes them machine-readable. The model uses Unified Modeling Language (UML) to explain the hierarchy and relationships between its elements called classes. Classes like StudySite or StudySubject have attributes, for example “StudySubjectType: Rat.”
SDTMIG and SENDIG are SDTM/SEND Implementation Guides that provide standards within the standards that explain how datasets should be compiled according to the SDTM/SEND.
As all CDISC standards for data exchange, it has an XML specification, which enables its transformation into PDF, Word, HTML, and other formats very easily. This is the main standard for moving data and metadata between different stages of clinical data management. All CDISC data exchange standards are actually extensions of ODM.
Define-XML for sharing metadata. Usually, when sharing clinical data, companies provide an accompanying document with its metadata define.pdf or define.doc. So, Define-XML represents metadata more conveniently and uses links to other documents and information.
CTR-XML for submission of clinical trials to registries. Clinical Trial Registry is used to generate messages for registry submissions, for example to the WHO and EMA clinical trial registry. A single XML document holding all information can be used to create multiple submissions to all the different registries.
Goal: Start the adoption of CDASH and SDTM in your therapeutic area
Goal: Clinical data management practice
Goal: Full clinical data management
Goal: Enable semantic technology
If the FDA hadn’t announced its requirement of CDISC support in 2016, many organizations would still be using legacy systems and inconsistent data. You, reading this, might also not be interested in compliance with regulators in the first place. But adopting new standards means changing infrastructure, workflow, and governance of processes and that is rarely done by a company’s own initiative.
What can make this process easier is using tools that already follow CDISC principles. Or building a tool of your own. Either way, crossing the threshold of a few crucial standards will inevitably bring researchers closer to innovations and the world.
For most of history, humanity did a pretty solid job at working with data. But the more we grow, the bigger our needs become. And two of today's core essentials in clinical research are data sharing and accessibility. Previously we had a look at main sources of drug data APIs. Now we examine CDISC standards and how they impact the scientific community.
What are CDISC standards?
CDISC or Clinical Data Interchange Standards Consortium is a standards developing organization that supports data exchange in medical research. CDISC standards are free to use, international, and universal. Here’s how they benefit the medical community.For researchers - clear and simple data management at all stages of clinical research meaning that breakthroughs happen faster, and study results are more easily shared and understood by the community.
For pharma companies - simple and fast submission and review process as CDISC standards are required by FDA, PMDA, and EMA.
For technology vendors - the ability to create solutions that the research community needs and use standardized datasets for machine learning in pharma.
Overall, standards simplify the processing of tons of data, from questionnaires submitted by patients to information about laboratory samples.
There’s a whole family of CDISC standards to learn about, so let’s talk about them.
The map of CDISC standards
Basically, CDISC standards are divided into four groups:Foundational or data content standards (SEND, PRM, CDASH, SDTM, ADaM). These are the basis standards for a lot of other CDISC standards that support data collection, management, analysis, and reporting at all stages of the clinical study process.
Data exchange standards (SDM-XML, ODM-XML, Define-XML, CTR-XML). CDISC uses a typical XML standard for data exchange, supported by industry-related specifications.
Therapeutic areas. These standards specify foundational standards for different disease areas. For example, here’s a user guide explaining how to collect data and conduct trials relevant to COVID-19.
Controlled terminology and glossary. This is a dataset containing a list of terms that should be used with CDISC standards, ensuring that data is collected and recorded consistently.
We will overview all standards within foundational and data exchange groups.
Foundational standards
The foundational or data content standards help represent information collected in clinical studies in a universal way that both allow organizations to stay consistent and organized and reduce the submission period at FDA or PMDA. Still, labs and research organizations mostly choose to use one or two standards. Here we describe four foundational standards that follow the common data flow of many clinical studies: research planning, data collection, data tabulation, and analysis.PRM - capturing protocol content
Protocols are the basic documents in clinical studies that describe how and why the study is conducted. Protocol Representation Model or PRM captures information in a protocol as data instead of text. It also helps create case report forms (CRFs) and study outlines.PRM describes the most used items found in a protocol and applies a structure that makes them machine-readable. The model uses Unified Modeling Language (UML) to explain the hierarchy and relationships between its elements called classes. Classes like StudySite or StudySubject have attributes, for example “StudySubjectType: Rat.”
Example of a basic relationship between two classes from the PRM
PRM is one of the most recent standards. Because it’s not required by regulators, its adoption is very slow.CDASH - standardizing data collection
As we said, during clinical trials, information is recorded in case report forms (CRFs) that are usually designed specifically for each study. Clinical Data Acquisition Standards Harmonization (CDASH) Model provides recommendations for designing CRFs with some common fields used universally in these forms.CDASH-compliant CRF created in Oracle Health Sciences InForm Source: CDISC
For example, to describe the date of a patient’s visit, one study group may use Date of Visit, while another one - DT. CDASH describes the exact field names with recommendations on what they mean, how they should be filled, and additional information for sponsors when there are any. This includes identifiers (for subjects - SUBJID, sites - SITEID/SITENO, investigators - INVID, and sponsors - SPID), timing (date and time of visit - VISDAT and VISTIM), and adverse events (yes or no - AEYN). Each field is also characterized as highly recommended to use (date of visit), conditional (time of the start of the adverse event), or optional (time of visit).SDTM/SEND & SDTMIG/SENDIG - data compilation
After the clinical data is collected, it’s organized for analysis in tables. Study Data Tabulation Model (SDTM) defines how these tables should be created. Within an SDTM, data is structured in a universal table where each row is an individual piece of data and columns represent different variables. Similar to CDASH, SDTM describes variables, their names, formats, codes of each piece of data, and any information that can be obtained during a trial. SDTM is the most used CDISC standard for regulatory submissions and one of the first created.Example of variables described in SDTM Source: CDISC
SEND or Standard for Exchange of Nonclinical Data is the same standard but for nonclinical, animal studies.SDTMIG and SENDIG are SDTM/SEND Implementation Guides that provide standards within the standards that explain how datasets should be compiled according to the SDTM/SEND.
ADaM - defining data for analysis
When SDTM datasets are ready, they are used for statistical analysis. And the Analytical Data Model (ADaM) reconfigures them for each specific type of analysis, also making sure that this data is correctly transferred, replicated, and traced along the whole data journey. It does that by:- describing main principles for creating analysis datasets and metadata
- describing steps for dataset creation and results generation
- describing different types of metadata
- describing dataset content, naming, order of variables, etc.
Analysis results compiled using ADaM principles Source: CDISC
ADaM also uses two types of data structures that are used for dataset submissions and support all main analysis methods: the Subject-Level Analysis Dataset (ADSL) and the Basic Data Structure (BDS).Data exchange standards
The fundamental standard for data exchange in the clinical trial field is ODM or Operational Data Model. ODM framework uses XML to exchange data across the whole healthcare ecosystem: electronic data capture (EDC), electronic health records (EHR), data collection, tabulation, analysis, and archival. This data includes clinical report forms (CRFs) and datasets in a SDTM format.As all CDISC standards for data exchange, it has an XML specification, which enables its transformation into PDF, Word, HTML, and other formats very easily. This is the main standard for moving data and metadata between different stages of clinical data management. All CDISC data exchange standards are actually extensions of ODM.
Structure of CDISC ODM-XML Source: Design and generation of Linked Clinical Data Cubes, Laurent Lefort and Hugo Leroux
SDM-XML for sharing the design of a study. Study Design Model is a standard for sharing the design of your study, namely its structure, workflow, and timing, usually described in a protocol.Define-XML for sharing metadata. Usually, when sharing clinical data, companies provide an accompanying document with its metadata define.pdf or define.doc. So, Define-XML represents metadata more conveniently and uses links to other documents and information.
CTR-XML for submission of clinical trials to registries. Clinical Trial Registry is used to generate messages for registry submissions, for example to the WHO and EMA clinical trial registry. A single XML document holding all information can be used to create multiple submissions to all the different registries.
How to adopt CDISC standards
Below are five levels of maturity of data standards capability defined by Accenture. Each level has its own set of capabilities, from data management knowledge to available human and tech resources. Locate yourself in this model and see the steps that will take you to the next level.Level 1: Limited use of CDISC standards
Here, standards are only being explored and an organization is not ready for compliance. Only a handful of people are familiar with CDISC and data management processes. The company uses spreadsheets, decentralized data storages, and lack of task-specific tools.Goal: Start the adoption of CDASH and SDTM in your therapeutic area
- Conduct CDISC training
- Use CDISC PRM in designing CRFs
- Implement Controlled Terminology
- Start managing metadata in any format
- Perform data mapping according to SDTM
Level 2: CDISC standards for regulatory submissions
This is where most pharmaceutical organizations are. They implemented one or a few standards to comply with regulators, but don’t have any specialized data management staff, keep metadata recording manual, and remain siloed.Goal: Clinical data management practice
- Define and assign data management roles
- Invest in an electronic data capture (EDC) tool with CDISC support or implement a custom CDISC-compliant solution to your needs
- Design or build a relational database for eCRFs
- Adopt Define-XML
- Implement quality control
Level 3: Operational efficiency enhanced by centralized metadata repositories.
These organizations use CDISC standards for more than submissions, have a standards governance team, and manage metadata using standards.Goal: Full clinical data management
- Implement validation check into your EDC
- Invest in CDM training or invite data management experts
Level 4: Standards-based, Metadata-driven, End-to-End Automated Clinical Data Lifecycle
Here, standards are implemented along the whole clinical study process, metadata collection is automated, and the organization has a dedicated data management team with a workflow.Goal: Enable semantic technology
- Adopt metrics for commonly requested data
- Push for enterprise-wide data management
Level 5: Data-driven Business Process enabled by Semantic Technology
This organization uses standards to interact with other systems like EHRs and other standards. Although an ultimate goal, this level is rarely pursued with organizations deciding to stay on levels 2 and 3.Beyond standardization
...Which is understandable.If the FDA hadn’t announced its requirement of CDISC support in 2016, many organizations would still be using legacy systems and inconsistent data. You, reading this, might also not be interested in compliance with regulators in the first place. But adopting new standards means changing infrastructure, workflow, and governance of processes and that is rarely done by a company’s own initiative.
What can make this process easier is using tools that already follow CDISC principles. Or building a tool of your own. Either way, crossing the threshold of a few crucial standards will inevitably bring researchers closer to innovations and the world.