Dataset Summary
Overview
CMS offers a wide variety of data products on Medicare enrollees. The focus of this brief review is Medicare claims data, representing claims for the various types of services that Medicare pays for, including inpatient and outpatient utilization, prescription drug purchases through the Part D program, home health services, and more. CMS collects several other types of data as well, for example, the Medicare Cost and Beneficiary Survey (MCBS, a national, longitudinal panel survey), CAHPS survey data, HEDIS data, and extended data on nursing home residents (the Minimum Data Set, or MDS). These types of data are not covered in this review.
Who is in the data, and how are they followed?
Medicare data contain a complete sample of Medicare beneficiaries, including the vast majority of adults age 65 and older in the United States, as well as selected younger populations covered under Medicare including people with end-stage renal disease and permanent disability. Medicare claims data are based on billing data, so people are followed to the extent to which they use health services that are paid for by the Medicare program. Because of the massive scope of the data, for research purposes, Medicare has created different types of research products that don’t require obtaining the full dataset, including 5% random samples of Medicare beneficiaries
What kind of information is in the data?
There are 4 main types of Medicare claims data: beneficiary enrollment information, Part A, Part B, and Part D.
- Beneficiary enrollment information includes information on dates of Medicare enrollment, services in which the patient is enrolled, and basic demographic information (e.g., age, sex, race, ZIP code). This file also includes a series of indicators of the presence of a variety of chronic conditions, each defined using criteria applied to other Medicare files, and indicators for a series of “other chronic or potentially disabling conditions” including disability-related conditions, mental health, and substance use disorders. Learn more about MBSF Files
- Part A includes information from inpatient utilization, including summary information from hospitalizations, detailed hospital claims, and claims information for Medicare-related expenses from skilled nursing facilities (for example, the post-hospitalization SNF benefit). Part A also covers the Medicare hospice benefit and claims information for hospice services are available.
- Part B includes information from ambulatory visits, including claims from community-based and hospital-based doctors’ offices, as well as claims from other ambulatory services such as billing data from clinical laboratories and suppliers of durable medical equipment. Outpatient IV medications that are administered in doctors’ offices or infusion centers are also typically covered under Medicare Part B. These can be evaluated from Part B data files using Healthcare Common Procedure Coding System (HCPCS) codes.
- Part D includes information from the Medicare prescription drug benefit, in particular claims for drug dispensing given in outpatient settings. Data include information such as the name of medication dispensed, dose, quantity supplied, days supplied, and cost/payment data. Most drug dispensing given in inpatient settings are covered under Medicare Part A bundled payment mechanisms and cannot be evaluated using Medicare data sources.
Much of Medicare data are available in more or less detailed versions, including Research Identifiable Files (RIF), Limited Data Set (LDS), and Public Use Files (PUF), in descending order of detail. The more detailed versions cost more and are subject to more stringent data security and regulatory requirements, but in most cases are the only way of obtaining patient-level data.
Detailed information about Medicare claims data can be found at the Research Data Assistance Center (ResDAC), which contracts with CMS to support research using CMS data. They have extensive online documentation and a very helpful help desk.
Practical issues with acquiring and using Medicare claims data:
Medicare data are a wonderful resource but are very complex. The files are very large, and the data fields were created for billing rather than research purposes, often requiring extensive work to understand and manipulate the available data to make it interpretable and clinically relevant. As a result, these data are best used as a long-term resource for an investigator or research group that can invest substantial start-up time and analyst effort and are less well-suited for a one-off project by someone who has little prior experience with the data. Data are typically available approximately 2 to 3 years after they are collected.
The process for obtaining Medicare data depends on the level of individual detail available in the files requested (see above). Most research questions will require patient-level Research Identifiable Files (RIF). Requests for these data require entering into a data use agreement and are reviewed by a CMS privacy board, a process that can take 6-12 months to complete. Costs per file vary widely depending on the file and number of subjects requested, and typically range from several hundred dollars to more than $10,000 per year of data. CMS can also create files tailored to the specifications of the investigator (e.g. including only patients who have a specific diagnosis); this costs anywhere from no additional fee to several thousand dollars depending on the extent of work that CMS has to do to prepare the files.
Areas of Particular Interest for Research on Function and Disability in Vulnerable Populations
For the most part, Medicare claims data lack direct measures of function and physical disability, since claims are mostly based on ICD-9 diagnosis codes and CPT procedural codes, which are not well-suited to measuring these phenomena. However, the data can be used in creative ways to study function, disability, and frailty. For example, investigators have developed highly accurate algorithms for identifying patients with frailty using Medicare claims data, using information such as diagnoses, purchases of frailty-associated durable medical equipment (e.g. walkers, home hospital beds), and so forth.1-3 The advent of ICD-10 is also providing new opportunities to capture function and disability in claims data better than was done in ICD-9, although it is too early to tell how accurate such claims will be. Medicare claims data are very well-suited to evaluating issues related to multimorbidity through the availability of multiple sources of diagnostic information, and for evaluating outcomes given essentially complete data on hospitalization, other health services use, and mortality in older adults in the United States. Another advantage of Medicare data is the ability to follow patients over time for longitudinal or exposure-outcome studies.
Finally, Medicare data can be linked with several national surveys such as the Health and Retirement Study (HRS) and National Health and Aging Trends Study (NHATS), providing a valuable source of information that supplements data available in those surveys. Access to Medicare files linked to those surveys is generally described on those study websites.
Measures from Medicare of particular interest include:
- Inpatient and outpatient diagnoses
- Use (purchase) of durable medical equipment
- Use of skilled nursing facility services
- Hospitalization and health services utilization
Additional Information - Medicare Data
Guide to getting started with CMS data
Medicare utilization claims files
Bibliography
- Faurot KR, Jonsson Funk M, Pate V, et al. Using claims data to predict dependency in activities of daily living as a proxy for frailty. Pharmacoepidemiol Drug Saf. 2015;24(1):59-66.
- Kim DH, Schneeweiss S. Measuring frailty using claims data for pharmacoepidemiologic studies of mortality in older adults: evidence and recommendations. Pharmacoepidemiol Drug Saf. 2014;23(9):891-901.
- Davidoff AJ, Zuckerman IH, Pandya N, et al. A novel approach to improve health status measurement in observational claims-based studies of cancer treatment and outcomes. J Geriatr Oncol. 2013;4(2):157-165.