Department of Veterans Affairs (VA) Data

Dataset Summary


The Department of Veterans Affairs (VA) has a wide variety of databases with information on veterans enrolled in the VA health care system. These data are generally only accessible to people with VA appointments, although this includes academic investigators who are fully or mostly paid by their university and have a non-paid guest appointment at their local VA medical center. Because VA is a single-payer health care system, it generates data from both claims-based functions (e.g. diagnoses associated with inpatient and outpatient visits, cost information) and from its clinical functions (e.g. results of laboratory tests, free text information from progress notes). It thus offers an incredibly rich laboratory for large-scale secondary data analysis. Note that there are other types of VA data not covered in this review, such as data obtained directly from local VA health systems.

Who is in the data, and how are they followed?

VA data include information on veterans who are enrolled in the VA health care system, and a small number of non-veterans who interact with the health care system (for example, VA employees seen in a VA emergency room due to a work-related accident). Many veterans with private health insurance never interact with the VA health care system, and in general the VA serves a population that is more socioeconomically disadvantaged than the US population or veteran population as a whole. Also, some veterans split their care in varying proportions between VA providers and private providers. Because VA data are generated in the course of routine clinical operations, patients are followed to the extent that they interact with the VA health care system.

What kind of information is in the data?

VA has a huge variety of data. Most information that appears in the VA electronic health record can be accessed, including demographic information, information on medication dispensing from VA pharmacies, laboratory test result information, free text from progress notes and radiology reports, vital status, and so forth. In addition, billing and claims-related data is also available, including diagnoses entered on after-visit encounter forms (where physicians record the complexity of the visit, the major diagnoses addressed during the visit, and so forth), cost information, and much more.

Some data are much easier to use for research than others. VA data stewards actively maintain a number of data files that are relatively clean and accessible for research, for example outpatient encounter and hospital discharge diagnoses, pharmacy dispensing data, and results of several dozen laboratory tests. In contrast, if an investigator wants to obtain data not included in VA-maintained research files, this may often be possible but challenging. Free text data from progress reports and radiology test results can be mined using natural language processing techniques, although this requires a solid amount of work with a VA data steward to make happen. Manual chart review on a regional or national level is also possible through a separate data request process (CAPRI and/or ViSTAWeb). Finally, Medicare data can be linked to VA data using a unique patient identifier, which is very useful since many patients receive care in both systems. Access to linked VA-Medicare data can be obtained through a data request process maintained by the VA Information Resource Center (VIReC).

VA data are maintained by several distinct data stewards, leading to a sometimes-confusing alphabet soup of data partners. The most commonly used data files are maintained in the VA Corporate Data Warehouse (CDW). The VA Informatics and Computing Infrastructure (VINCI) serves as an intermediary to regulate access and support investigators working with CDW data. Pharmacy data are available through CDW, but a more rich supply of pharmacy data can be obtained from the VA Pharmacy Benefits Management (PBM) program. Data from the nursing home Minimum Data Set, which is generated in all VA-operated nursing homes, can be obtained through the VA Office of Geriatrics and Extended Care, although this office is more used to providing these data for evaluating clinical operation than for research., The VA Information Resource Center (VIReC) provides access to VA-Medicare data, research user guides that provide valuable information about VA data files, and other services. Finally, there are a variety of other VA files that may be of interest to investigators with specific research interests, for example the VA Central Cancer Registry. Note that this is only a partial list of VA data sources.

For more information about VA data resources, the best first stops are the home pages the VIReC program ( and VINCI program ( Only limited information is provided on public-access versions of these sites. Much more detailed information is available on the VA intranet sites, which use the same URL (web address) but with “vaww” instead of “www” at the beginning of the URL (see links at top of page). For those with access to the VA intranet, the VA Data Portal ( is another useful resource.

Practical issues with acquiring and using VA data:

VA data can only be used by VA-affiliated investigators. Also, VA data need to be stored on VA servers. Depending on the situation, this can be either a centralized server maintained by the VINCI program or a secure server maintained with appropriate safeguards at one’s local medical center. Some datasets are updated on a daily basis, others much less often.

Otherwise, practical issues in acquiring and using VA are similar in many ways to Medicare data. VA data are a highly valuable resource with national scope, but obtaining access can be time-consuming and complex. Also, working with the many, many data files that often are needed for a project requires a substantial investment of time, energy, and hard-earned expertise. As a result, these data can be wonderful as a long-term investment, but are not amenable to a quick research study (unless the data files have already been obtained, assembled, and merged for a related project).

Each of VA’s data stewards has a separate process for requesting and accessing data. VA data are free but the process of obtaining permission can be complex. The process of obtaining data can take weeks to several months depending on the type of data requested and the level of permissions and involvement required from data stewards.

Areas of Particular Interest for Research on Function and Disability in Vulnerable Populations

VA data are very well-suited to studies of multimorbidity, given the richness of clinical data available. Since the records are generated from routine clinical care, there are few ready-to-use measures that directly assess physical function or disability. However, there are a number of ways of measuring these conditions indirectly. For example, investigators are developing algorithms for performing natural language processing-based searches of templated nursing notes to assess functional status in older inpatients. Similar work could be done searching templated sections of physical therapy notes. Other investigators are evaluating the accuracy of data on functional status that are routinely being collected by nurses at clinic triage in some VA medical centers. These data are recorded in a little-known database known as “health factors.” To evaluate cognitive impairment, investigators have access to dementia diagnoses and information on utilization of cognitively-related services (e.g. encounter at a dementia clinic). Investigators can also conduct free-text searches to locate results of mental status tests done in clinical settings (for example, by looking for a number following the term “MMSE”, an abbreviation of “mini mental status exam”).

Data from VA nursing homes, known as community living centers (CLCs), can also be highly valuable for studies of disability and function in vulnerable older adults. This includes information contained in Minimum Data Set (MDS) data collected by VA, and well as the many other VA data sets to which VA nursing home residents contribute data.

