Clinical Data Management: Roles, Steps, and Software Tools

About 3.6 million data points are generated during Phase III clinical trials, which is three times more than 15 years ago. The Phase III stage has been conducted the same way for decades: It happens before approving the medication or therapy approach, involves up to 3,000 participants, and can last for several years. So, what has caused such a dramatic rise in data volume?

Some key contributors to this growth are extensive drug development projects targeting rare diseases, the use of biomarkers and genetic data, and the growing number of patient data sources, from web-based questionnaires to wearables. No matter the cause, clinical researchers need to harness this information and make the most of it. Clinical data management has appeared to address this challenge.

What is clinical data management

Clinical data management (CDM) is a set of practices that handle information generated during medical research. It aims to ensure data quality, integrity, and compliance with internal protocols and state regulations.

Also, the CDM process helps keep key clinical trial stakeholders on the same page :

1. Sponsors — pharmaceutical companies, institutions, and other organizations that initiate, monitor, and finance the trial.

2. CROs (control research organizations) — research entities hired by the sponsor to plan and run the investigation.

3. Sites — centers coordinating data collection from trial participants.

CDM plays a crucial role in evaluating the safety and effectiveness of drugs, diets, medical devices, digital therapeutics tools, and other types of treatment, diagnostics, or methods to prevent health problems. If properly handled, it significantly reduces the time required for a new medical product launch.

Сlinical data management process: roles and responsibilities of team members

CDM activities start early in the clinical trial process once the trial protocol, describing the study objectives and methodology, is designed. As a rule, data-related responsibilities are allocated across

a clinical data manager who supervises the entire CDM process;
a database programmer or designer;
data entry associates;
a medical coder who translates diagnosis, procedures, adverse events, and other health data into industry-specific codes; and
a quality control associate.

Now, let’s see how data management unfolds and who does what at each stage.

Stages and roles in clinical data management

Data management plan design

Experts in charge: data manager, database designer

A data management plan or DMP is a document detailing all procedures, tasks, milestones, and deliverables throughout the CDM lifecycle. It gives a roadmap on how to work with information and handle possible risks. Another important function is to clearly communicate what happens in the course of the trial to each stakeholder.

The DMP typically describes the following aspects:

data to be gathered from trial participants,
existing data that can be integrated,
data formats,
metadata and its standards,
storage and backup methods,
security measures to protect confidential information,
data quality procedures,
responsibility assignments across team members,
access and sharing mechanisms and limitations,
long-time archiving and preservation procedures,
the cost of data preparation and archiving, and
compliance with relevant regulations and requirements.

The DMP must be ready at the trial design stage before enrolling the first participant. This will ensure that data is collected in the correct format and properly organized. However, the plan is not something immutable: It has to be updated across the trial, capturing any changes that influence data management.

eCRF or electronic case report form design

Experts in charge: data manager, database designer

The case report form is a printed or electronic questionnaire for collecting data from study participants and reporting it to trial sponsors. The document is created specifically for each research project in accordance with

the trial protocol, and
recommendations of the Clinical Data Acquisitions Standards Harmonization (CDASH). They are developed by Clinical Data Interchange Standards Consortium to streamline industry-wide data exchange. Say, CDASH dictates dd/mm/yy format for collecting dates. (Read our article on CDISC standards to learn more.)

Starting from the early 1990s, electronic CRFs (eCRFs) have been gradually replacing paper-based analogs, which leads to faster data collection and better quality of information. As of 2020, 84 percent of sites, sponsors, and CROs either go paperless or plan to make this transition soon.

An example of an eCRF structure

Well-designed case report forms collect only data necessary for the particular study, avoiding any redundancy. The fields to be filled in may include

demographics (age, gender),
basic measurements (height, weight),
vital signs (blood pressure, temperature, etc.) captured at various time points,
lab exams,
medical history,
adverse events, and
more, based on the research requirements.

Data managers create data entry screens and eCRF layouts in collaboration with a database programmer. The design usually goes through several review cycles before finalization.

Clinical trial database design

Experts in charge: database designer, data manager

A clinical trial database is a set of data collected during the study and organized in rows and columns. It’s designed with the CRF structure in mind. In other words, the database incorporates a questionnaire schema of the case report forms.

Some questionnaire data can be coded into meaningful categories to preserve storage space. In this case, database specialists create detailed descriptions of decoding or how to map codes into elements of CRFs.

Combined coding approach. Source: Applied Clinical Trials

Before launching into the production (research) environment, the database is tested with dummy data in a secure, non-study site.

Electronic data capture in clinical trials

Experts in charge: clinicians, data manager, data entry associate, medical coder

As mentioned above, CRFs are the main instrument for data capture in clinical trials. The information for report forms is traditionally gathered by clinicians or data entry associates from participants when they visit medical facilities. Yet, in recent years, medical sites have stopped being the primary point of data capturing. Nowadays, details for trials are also extracted from

Electronic Health Records (EHRs),
medical devices (blood pressure monitors, ECG recording machines, and others),
a laboratory information management system (LIMS), and
patient-reported outcomes (PROs) or any descriptions of health conditions that come directly from patients without mediation and interpretation from medical experts.

With paper-based questionnaires, the data is manually entered into printed versions and then inserted into the database. In the case of eCRFS, information gets into the computer from the start, and some fields can be populated automatically, with data transmitted from medical devices or EHRs by robotic process automation (RPA) tools.

While transcribing data into electronic databases, medical terms from the CRF should be translated into standardized codes. Medical coding ensures data consistency and prevents duplication. For instance, investigators might describe the same health issue or medication using various terms, but medical coding consolidates them into a set of letters and numbers. Commonly, the Medical Dictionary for Regulatory Activities (MedDRA) is used for coding illnesses, and the World Health Organization–Drug Dictionary Enhanced (WHO-DDE) is used for coding medications.

All information that made it to the forms and then to the clinical database must go through the data validation process.

Data validation

Experts in charge: data manager, database designer, quality control associate

Clinical data validation is a series of quality tests to ensure the accuracy, consistency, legibility, and integrity of information. This includes the following steps.

Electronic edit checks. Edit checks are created by a database designer and embedded into eCRFs to automatically compare inputs against numerical and logical criteria. This prevents unlikely values from appearing in the document. Say, a check for the body temperature field may flag all entries lower than 95 and higher than 105 if the system expects measurements in Fahrenheit.

Discrepancy management. It involves reviewing and investigating data discrepancies, resolving them with documentary proof, or declaring them irresolvable. This process cleans the data and gathers evidence for deviations. Almost all clinical data management systems (CDMS) have a discrepancy database. Inconsistencies and errors are flagged to the investigator for clarification.

Part of discrepancy management is generating queries for data discrepancies and communicating these queries to site staff or data collectors. For instance, if a patient's recorded visit date is after their recorded discharge date, a query is generated to resolve this inconsistency. The site staff is supposed to correct the dates or provide an explanation. All checking activities must be properly logged.

Source data verification (SDV). SDV is a process of checking CRF entries against original medical records and other source files. The aim of this step is to confirm that an eCRF contains all relevant information and truly represents a participant’s profile.

Consistency checks. Ensuring data consistency across different data points and forms.

Database lock and data archiving

Experts in charge: data manager, database designer

After the study is completed, the database is locked so that no changes can be made to the information. After that, clean data is submitted to stakeholders for statistical analysis, reporting, and, finally, publication of the results. However, all these steps are beyond the clinical data management workflow.

All essential documents and trial supplies must be archived for at least three years. This enables post-trial monitoring and evaluation as well as trial reconstruction to facilitate further research.

Clinical data protection and anonymization

Clinical data contains highly sensitive information about the health of study participants and is therefore strictly regulated by several privacy regulations. In the US, these are primarily the Health Insurance Portability and Accountability Act (HIPAA) and the FDA 21 CFR Part 11.

The former is regulation protecting patient health information through privacy and security measures. The title 21 of the CFR (the Code of Federal Regulations) regulates food and drugs produced or consumed across the USA. Specifically, its 11th part sets rules for ensuring the security and confidentiality of electronic records used by companies subject to the FDA (the Food and Drug Administration).

Following these documents, all data collected for clinical trials involving patients must be anonymized. This means removing or changing all elements of protected health information (PHI) that can link the document to a particular person.

Anonymization may include the following steps.

Removing direct identifiers. This means deleting or masking names, social security numbers, and contact information. John Doe, born on 05/15/1980, living at 123 Main St., becomes "Patient A, born in May 1980, living in ZIP code 12345.”
De-identification. These methods include generalizing data elements, like replacing age with age range, and employing statistical techniques like suppressing outlier values or using grouped averages.
Randomization. This step involves altering data to maintain statistical validity while preventing traceability back to individuals. For instance, birth dates can be changed to random but plausible dates within the same month.

HIPAA permits different levels of anonymization based on the sensitivity of the clinical dataset. The more sensitive the data, the more restricted access is. Fully anonymized datasets can be made publicly available, while more sensitive data is accessible only to a limited number of researchers under a Data Use Agreement required by HIPAA.

Сlinical data management software and tools

Spreadsheets and common office programs are obviously not enough to address all the challenges of clinical data management. You need software capable of handling large amounts of documents and customized for medical studies — clinical data management system (CDMS), also called electronic data capture (EDC) systems.

CDMSs can be specifically tailored for psychiatry studies, medical device clinical trials, drug development, or other cases. However, they all have common features covering basic data management operations.

21 CFR part 11 compliance. Compliance with this document means that the technology has built-in mechanisms for data security and traceability — namely

access controls ensuring that only authorized users, under unique IDs (electronic signatures), can enter the system and work with data;
forced periodical password resets; and
audit trails, or keeping chronological records of all operations and changes to the database.

All these precautions are taken to guarantee that the data produced by the system can be trusted.

eCRF designer. This module provides a set of templates and drag-and-drop functionality to construct eCRF layouts and data entry fields in accordance with standards. Custom forms are then saved for reuse in future projects. The designer also allows for programming edit checks.

Query management. As mentioned above, a query is a request for clarification from trial sponsors to researchers for resolving errors and inconsistencies. The query management feature facilitates communication among data managers, sponsors, and other stakeholders. It helps resolve all questions faster.

Monitoring. Monitoring capabilities include but are not limited to scheduling, adverse event tracking, and automatic notifications to sponsors.

Medical coding support. The component automates code search, mapping across coding systems, and error checking.

Data import and export. Some systems are equipped with powerful data integration tools that facilitate multisite studies. The data export feature, in turn, automatically transforms information from the trial database into formats applicable for statistical analysis and required by trial sponsors and regulatory authorities.

Below, we’ll review the most widely used CDMSs accommodating all the above-mentioned features.

Clinical data management software suits, compared

Merative Zelta: a reliable platform for decentralized studies

Used by 4000+ studies

Pricing model: individual plan

Pros: reliability, medical coding with Merative AI (former IBM Watson)

Cons: archaic UI, high price

Zelta (former IBM Clinical Development) is an end-to-end cloud-based system that allows data to be captured from various sources and targets large-scale, decentralized studies. Its capabilities extend beyond the trial data management cycle and cover pre-built or custom analytical reports.

Zelta's strengths include reliability, a large library of pre-built forms, ease of use, and a flexible, modular structure. You can choose and pay for only those features that you really need. The medical coding part is supported by the AI power of Merative (former IBM Watson).

Keep in mind, though, that you will need a programmer to set up a trial. Among other cons admitted by users are archaic UI, slow customer support, and the high price of ownership, which makes the system unattainable for startup projects.

Oracle Clinical Research Suite is the best support for paper-based studies

Used by 2000+ studies

Pricing model: individual plan

Pros: stability, integrations with Oracle products

Cons: slow data entry, high price

One of the oldest CDMSs on the market, Oracle Clinical Research is a combination of three integrated subsystems.

Oracle Clinical automating trial protocol design, data validation, and reporting;
Oracle Remote Data Capture, an EDC solution with a layout editor to generate collection forms; and
Thesaurus Management System for standardizing clinical terminology.

Running on the top of the Oracle database, the suite demonstrates great stability and is recognized as one of the best systems for paper-based data collection. It seems to be a natural choice for companies already using other Oracle products.

At the same time, its data entry part involves repetitive and tedious manual operations. Also, Oracle isn’t meant for small companies with limited budgets.

Castor EDC: an affordable way to speed up trial builds

Used by 14000+ studies

Pricing model: based on study needs

Pros: ease of setup and use, quick support, affordability

Cons: limited functionality

Castor EDC became popular across 90+ countries due to its proven ability to significantly shorten the duration of a clinical study build. Intuitive UI simplifies creating eCRF forms, assigning study roles, and joining new users. The data can be smoothly integrated from various sources — EHR systems, medical devices, wearables, etc. Should any problems arise, you’ll get help from an expert in no more than 30 minutes.

The relatively low price makes the platform affordable for small companies. Quite expected, it lacks some features and export formats that larger and more expensive systems typically have.

TrialKit: an intuitive CRF designer for virtual studies

Used by 8000+ studies

Pricing model: based on the number of features

Pros: ease of use, smooth integrations with wearables

Cons: hard learning curve, limited functionality

Aimed at decentralized virtual studies, TrialKit comes in two versions—web-based and mobile. With a drag-and-drop CRF builder and a library of ready-to-use templates, companies can create eCRFs with no programming skills and launch trials in days instead of weeks. In addition to EHRs, the tool easily integrates data from wearables like Fitbit, Apple Watch, etc.

As for cons, learning how to work with the platform takes time. Another major complaint relates to functionality limitations.

How to choose and adopt a CDMS: best practices

Here are some additional tips on how to select and implement a CDMS so that it will contribute to trial success.

Check compatibility and integration options. Make sure that the intended CDMS smoothly links to other platforms you’ll use during the trial — namely, an EHR system, LIMS, and others. In case of compatibility issues, you‘ll need to find a tech partner with system integration expertise.

Learn more about the level of support offered by a vendor. This includes training programs, the quality of user documentation provided, and the speed of response in case of any issues. Also, check if the provider will help you with the system deployment, integrations, and development of new features once you need them.

Make sure that all parties are satisfied. Before finalizing your choice and implementing a new platform, try to get feedback from all intended users — CROs, sponsors, staff, and technical support at the research site. Take advantage of a free trial or demo version so that everybody can test the user interface and share their experience.

Start with the core functionality. Redundant features not only inflate your budget but create additional complexities for staff who will have to get familiar with a new system. Later, you can expand functionality as needed and as your project grows by adding new modules from the same provider, integrating third-party tools or using custom development.

Test and validate the system against the eCRF. Once electronic forms are designed, you need to run user acceptance testing (UAT). End users, such as data entry associates, clinicians, and researchers, should determine whether they feel comfortable with the eCRF structure and whether the form contains all the needed fields.

Take your time to set up a database. Quite naturally, trial sponsors want to get the system run in no time. However, it's important to balance speed and risk reduction. Take your time to thoroughly design and test a database before the study starts. Software changes during the trial might be costly and impact data validity.