Skip to main content

Data Assets

The Johns Hopkins Center for Drug Safety and Effectiveness provides a platform for investigators and trainees to access available data resources to conduct robust studies of drug utilization, safety and effectiveness.

The Center's core faculty members are national experts in pharmacoepidemiology with particular expertise in drug utilization, comparative effectiveness, advanced methods in observational studies, propensity scores and missing data. Trainees and young investigators are exposed to the rich and diverse group of experts in pharmacoepidemiology with hands on practical experience of participating in several ongoing studies of drug safety and effectiveness, and our faculty provides guidance to investigators to determine the appropriate data resource for their studies.

Center investigators use a variety of publicly available and proprietary assets in the conduct of their scholarly work. Many investigations leverage commercial insurance claims, while others utilize nationally representative surveys, longitudinal cohort studies, and national audits of ambulatory care, retail pharmacy sales, and pharmaceutical distribution or promotion. Some of these data are publicly available and can be readily used for any purpose, while others are proprietary and thus licensed under specific data use agreements. The bottom line is that the Center for Drug Safety and Effectiveness has numerous analytic resources that are available to support scholarly investigations examining drug safety, effectiveness or utilization - the question is just which is the right one for the scientific investigation at hand.

Data Assets

  • Merative Marketscan Commercial Claims and Encounters Database

    The MerativeTM MarketScan® Research Databases are one of the largest proprietary collections of de-identified US patient data available for healthcare research. They feature more than 264 million covered individuals, 37 billion service records, more than 160 contributing employers and 40 contributing health plans. MarketScan data may offer distinct advantages over other databases. Integrated at the patient-level data and reflecting the real-world continuum and cost of health care, our large sample size allows for research on unique patient populations. Complete episodes of care can support more inclusive cost and treatment studies. MarketScan data has been used in more than 2,400 studies published in peer-reviewed journal articles making it among the most published in the US. These data are managed by the staff of the Center for Drug Safety and Effectiveness and can be contacted at
  • Medicare Claims + Part D

    From the Center for Medicare and Medicaid Services (CMS), we have valuable administrative claims data that has diverse potential uses. With these data, investigators can address important questions about patterns of care, quality of care, and the comparative safety and effectiveness of interventions. At Hopkins presently, we have active data use agreements that maintain longitudinal files for more than a decade including 100% of Medicare inpatient claims and a 5% sample of Medicare outpatient claims. We also have claims for durable medical equipment, home health care, and skilled nursing facilities. Related files that are valuable for research are the SEER-Medicare data. These are data that link the SEER cancer registry, with detailed information about cancer diagnoses, pathology, and treatment, to the affected patients' Medicare claims. The SEER-Medicare data also include a matched set of individuals without cancer drawn from corresponding regions of the country. The SEER-Medicare data is purchased for research use by cancer diagnoses and we have access to many sets including breast cancer, lung cancer, and prostate cancer. If you are interested in using these data, please contact us
  • Medical Dictionary for Regulatory Activities 

    The Medical Dictionary for Regulatory Activities is a clinically validated international medical terminology dictionary to facilitate sharing of regulatory information internationally for medical products used by humans. MedDRA is designed for use in the registration, documentation and safety monitoring of medicinal products through all phases of the development cycle. MedDRA is a multilingual terminology that allows users to operate in their native languages including Japanese, Spanish, Portuguese, German, Chinese and others. Access MedDRA.
  • Minimum Data Set 3.0 Linked Medicare Claims

    The Minimum Data Set (MDS) is a federally mandated nursing home assessment tool that captures information on residents from Medicare and Medicaid certified nursing homes. MDS assessments are completed for all residents in certified nursing homes, regardless of source of payment for the individual resident. MDS assessments are required for residents on admission to the nursing facility, periodically, and on discharge. All assessments are completed within specific guidelines and time frames. These datasets are linked with Medicare Parts A, B and D claims data. Medicare claims data captures information on healthcare utilization and costs. We have access to this data through the Centers for Medicare & Medicaid Services Virtual Research Data Center (VRDC). If you are interested in using these data, please contact us.
  • SEER-Medicare Linked Database

    The SEER-Medicare data reflect the linkage of two large population-based sources of data that provide detailed information about Medicare beneficiaries with cancer. The data come from the Surveillance, Epidemiology and End Results (SEER) program of cancer registries that collect clinical, demographic and cause of death information for persons with cancer and the Medicare claims for covered health care services from the time of a person's Medicare eligibility until death. We currently have data for eleven cancer sites: breast, bladder, colorectal, esophagus, lung, ovary, kidney, pancreas, prostate, stomach and uterus. If you are interested in using these data, please contact us.



  • IQVIA National Prescription Audit

    The National Prescription Audit™ (NPA) is an industry standard source of national prescription activity for all pharmaceutical products. NPA measures demand for prescription drugs, including both what the provider prescribes in the retail setting and what is ultimately dispensed to consumers across four unique channels. From the selected pharmacies, IQVIA collects new and refilled prescription for every day of the month. Data are available on a monthly and weekly basis at varying levels of depth. For example, data can be analyzed and stratified by patient age, patient gender, co-payment, and four methods of payment. NPA is useful to address a variety of research topics examining pharmaceuticals, especially investigations that focus on prescription drug utilization, Rx size, average consumption, and more than 90 prescriber specialty groupings representing over 170 specialties. The NPA represents and captures over 70% of all prescription activity in the United States, including Alaska and Hawaii, and covers all products, classes, and manufacturers.
  • IQVIA National Disease and Therapeutic Index

    The National Disease and Therapeutic Index™ (NDTI) is a monthly audit of office-based physicians that provides information regarding patterns and treatment of disease in the continental United States. For each patient seen during a consecutive two-day period each calendar quarter, participating physicians complete an encounter form that includes information about diagnoses and drug therapies. Each record of a drug therapy within the NDTI is linked to a specific six-digit taxonomic code capturing diagnostic information similar to the International Classification of Diseases 9th Revision (ICD-9). In addition to detailed characteristics regarding therapies prescribed, NDTI also contains information about patients (e.g., demographics, location of visit, insurance type, basic health statistics), and physicians (e.g., specialty, age, region).
  • IQVIA National Sales Perspective

    The National Sales Perspectives™ (NSP) is considered the industry standard for measuring pharmaceutical spending. This is because NSP captures 100% of the total U.S. pharmaceutical market, measuring sales at actual transaction prices rather than using an average wholesale price. The NSP is used by a variety of healthcare policy setters and decision makers to monitor and assess national sales given its accuracy representing 100% of the U.S. pharmaceutical sales market.
  • IQVIA Integrated Promotional Services

    IQVIA Integrated Promotional Services™ (IPS) measures total promotional activity for pharmaceutical products from office-based and hospital-based physicians. IPS provides an understanding of what, how, when and how much promotional activity is occurring for pharmaceutical products (e.g., office detailing, free samples, medical journal advertising, direct-to-consumer advertising).


National Center For Health Statistics Datasets

  • National Ambulatory Medical Care Survey (NAMCS)

    The National Ambulatory Medical Care Survey (NAMCS) is a national survey designed to meet the need for objective, reliable information about the provision and use of ambulatory medical care services in the United States. Findings are based on a sample of visits to non-federal employed office-based physicians who are primarily engaged in direct patient care.
  • National Hospital Ambulatory Medical Care Survey (NHAMCS)

    The National Hospital Ambulatory Medical Care Survey (NHAMCS) is designed to collect data on the utilization and provision of ambulatory care services in hospital emergency and outpatient departments. Findings are based on a national sample of visits to the emergency departments and outpatient departments of noninstitutional general and short-stay hospitals.
  • National Health and Nutrition Examination Survey (NHANES)

    The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations.
  • Medical Expenditure Panel Survey (MEPS)

    The Medical Expenditure Panel Survey, which began in 1996, is a set of large-scale surveys of families and individuals, their medical providers (doctors, hospitals, pharmacies, etc.), and employers across the United States. MEPS collects data on the specific health services that Americans use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of health insurance held by and available to U.S. workers.


Johns Hopkins Clinical Databases

Center faculty also work with a variety of data that are derived from the clinical operations of Johns Hopkins Medicine. These data include detailed patient-level health care utilization and expenditure data, ranging from patient demographic information, to clinical health service utilization to detailed billing and insurance coverage information for hundreds of thousands of individuals.

Center for Clinical Data Analysis

The Center for Clinical Data Analysis (CCDA) is chartered to provide exploratory data access, data extraction, and development support to the clinical research community. The CCDA is intended to be an Honest Broker between JHM operational data and interested parties in the research community.

The CCDA recognizes the need for enhanced self-service data research capabilities in addition to traditional data extraction engagements.  I2b2 (Informatics for Integrating Biology and the Bedside) is an NIH-funded informatics framework enabling patient cohort discovery using clinical data. I2b2 is currently being deployed by ICTR and CCDA as a platform to empower authorized Johns Hopkins Medicine researchers to expedite patient cohort discovery prior to IRB approval of access to a fully identified data set.

The Patient Centered Outcomes Research Initiative (PCORI) draws on CCDA clinical domain, data extraction, systems architecture and open source systems and software expertise to research, in collaboration with several external institutions, specific patient cohorts. This virtual network will aggregate specific patient populations to provide a cohort against which institutional practice variations can be evaluated.

For more information, please visit the center's official page


Johns Hopkins Longitudinal Cohort Studies

  • The Multicenter AIDS Cohort Study (MACS)

    The Multicenter AIDS Cohort Study (MACS) is an ongoing prospective study of the natural and treated histories of HIV-1 infection in homosexual and bisexual men conducted by sites located in Baltimore, Chicago, Pittsburgh and Los Angeles. From April 1984 through March 1985, 4954 men were enrolled; an additional 668 men were enrolled from April 1987 through September 1991. A third enrollment of 1350 men took place between October 2001 and August 2003. This third cohort augments research efforts in the long term benefits and adverse effects of therapy.
  • AIDS Linked to Intravenous Experience (ALIVE) Study

    The AIDS Linked to Intravenous Experience (ALIVE) Study is a long-standing community based research effort that includes past and current injection drug users (IDUs). The primary objectives when the study started were to characterize the incidence and natural history of HIV among injection drug users, and has since evolved to include characterization of access to and impact of treatment for HIV, evaluation of non-AIDS outcomes among an aging cohort as well as ascertainment of incidence, natural history and treatment of co-infections such as hepatitis C virus.
  • Atherosclerosis Risk in Communities (ARIC) Study

    The Atherosclerosis Risk in Communities (ARIC) Study is a prospective epidemiologic study conducted in four US communities, including Washington County, Maryland. ARIC was originally designed to investigate the etiology and natural history of atherosclerosis, the etiology of clinical astherosclerosis diseases, and variation in cardiovascular risk factors, medical care and disease by race, gender, location and date. ARIC data have also become an important resource for the study of diabetes, kidney disease, and other chronic diseases.
  • North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD)

    The North American AIDS Cohort Collaboration on Research and Design (NA-ACCORD) began in 2006 as the North American regional representative of the International epidemiologic Databases to Evaluate AIDS (IeDEA). Comprised of 25 collaborating cohorts, NA-ACCORD is designed to be widely representative of HIV care in the United States and Canada. Over 200 sites contribute data on over 130,000 HIV-infected and 150, 000 HIV-uninfected participants.