Skip to Main Content

Discovering Government Data and Statistics

This guide provides supporting documentation for Jennifer Huck's presentation to the VA Summer GovDocs Meeting on July 19, 2024.

Discovering Government Data and Statistics

This guide provides supporting documentation for my presentation to the VA Summer GovDocs Meeting on July 19, 2024. I welcome any questions you have might have about government data. You can reach me through the "Email Me" button on this page.

In this presentation, we will cover:

  • Search strategies
  • Data.gov vs. other options
  • Useful sources for State, Local, and International data
  • Historical data
  • Format considerations

I will tend to use data and statistics interchangeably. Technically, data are the raw bits of information that are recorded and analyzed, while statistics are the results of the analysis (often presented as a table).

Try these search strategies

  • When you are searching for data, try to think of what organization would care enough to collect the data in the first place. It is expensive and time consuming to collect data – it doesn’t come out of thin air. Try to identify (or guess at) a department of bureau – who cares enough to collect data on this topic? Then check their website for data.
  • It's also okay if you do not know the source of the data or can’t guess at it! In that case, search the research literature, find relevant empirical articles, and jump to the methods section and look for the source of the data.
    • This strategy applies to any kind of data, not just government data. This is my favorite strategy for finding data!
  • It can also be helpful to search for [topic] data libguide. This often leads me to important sources.

Data.gov vs. Department websites and repositories

Data.gov is not really comprehensive. You might find what you are looking for here, but if you have any sense of which federal department/bureau is offering data, I recommend you look there first. I recently got a question about how to classify rural areas.

  • Example: a regular Google search for rural census brings up:
  • That resource is not in data.gov
  • There are so many results in data.gov – it helps if you know where the source might come from (and can then use the filters), but even then there can be thousands of results.

Here is another example: Rural-Urban Commuting Area Codes

  • We get better results, but clearly this was helped by knowing the exact name of the data/codes

Additionally, the non-federal organizations that are included are scattershot. I saw that the State of California has their data listed here, but the Commonwealth of Virginia does not. There are 3 VA localities that include their data in data.gov – Fairfax, Arlington, and Loudoun counties.   

I typically do not use data.gov. I tend to go straight to the Department or Bureau if I know of the source of the data in advance. (e.g., Google census rural).

Useful sources for State, Local, and International Data

There are also open data repositories for state and local governments:

The same concerns about not-everything showing up in the data portal seems to be as true for VA as for the US. Let's look at SOL test pass rates as an example:

vs.

It is more common for local governments to also have data portals. For example, Charlottesville's Open Data portal:

Additionally, there are data repositories for International Governmental Organizations (UN, World Bank). 

Historical data

Your ability to go back in time varies greatly depending on the organization and your access to proprietary databases. Most government organizations see their main mission as producing up-to-date information, and not necessarily archiving data.

Here are some specialty repositories:

I can be very helpful if your institution has access to databases that offer historical data and statistics like:

Statistical Abstracts are also extremely helpful for finding data on a particular topic, especially when you think there is likely a governmental source, but you are not sure which one might have collected data. You can often use these as sources to find the original, more detailed tables. The Census freely provides the years 1878-2012. ProQuest Statistical Abstracts covers 2013-present.

Some departments might have their own specialty historical databases, although this is not very common. For example, the Federal Reserve offers FRASER and ALFRED for historic economic data.

Sometimes you can find these historical statistics in print, occasionally in tables in annual reports if you really need to go back in time. You can rely on your usual gov docs skills to find these. HathiTrust can be helpful here as well - you can use it to find digitized versions. I am including an example from HathiTrust.

Format considerations

You may have to consider what format the data come in. This is especially true of historical data that you can only find in print. You will have to find a way to transform the table or data into machine (computer) readable files. Tabula can potentially help here when working with print - it converts PDF scans into computer readable tables.

On the other end of the spectrum are more modern approaches to data distribution, such as APIs. APIs allow the patron to skip the intermediate step of downloading data and importing it into statistical software - they can interact with the data directly in their software with this approach. Many governmental organizations provide APIs which can often be used to create packages for R or Python. So if you know that your patron is technically proficient, they may want to know that they can get programmatic access to some datasets. For example, there are several Census packages for R:

Summary

To sum this all up:

  • Think of which department might have collected data on this topic and start with their website. 
  • Look at research literature to see what data have been used before.
  • Take advantage of proprietary databases if you have them. 
  • Consider any format issues.