Skip to Main Content

Data and Statistics

Where to find numeric data and statistics

How to go about finding data - use the research literature

The top strategy for finding data that already exists is to look in the academic literature. What data sources are researchers already using? Here's how to do this:

  1. Visit Google Scholar or another academic literature database. Conduct a search using keywords that match your research question.
  2. Jump down to the Methods section of the article. Every empirical research research paper will have a Methods section. (Not that not all peer-reviewed literature is empirical. It is possible you will find articles that lack a methods section!) Every Methods section will describe the data in some way. Take note:
    1. What data do they use? Did they cite their source? (Finding the data citation can make it much easier to track down exactly where to find the data.)
    2. You can also track: Author/year of publication; Claim; Data; Dependent variable/estimate technique; Significant findings
  3. Iterate! Try this technique with different keywords in Google Scholar to find more articles.
     

Ask yourself what you really want

Ask yourself about your research question. What is your topic and what kind of claim do you want to make? Now consider, what kind of evidence would you need to support those claims? You can use that to describe your ideal dataset.

Ask yourself, in an ideal world, what do you want for your:

  • Unit of analysis - individuals, companies, counties, countries;
  • Geography - Virginia, the American South, USA, Europe, all countries;
  • Time period - 2000 to 2023, most recent year only; and
  • Frequency - quarterly, annually, every 10 years for 50 years?

Keep in mind that just because you can imagine it, does not mean it exists. This is especially something to consider when you are under a tight deadline to finish a research project. (This is especially true for DMP projects at UVA.) These are common limitations to finding your perfect dataset:

  • The data do not exist. No one has ever spent the time or money to collect the data you have in mind.
  • The data exist, but you can't access it. Sometimes datasets are restricted, and you can apply to access them. (This is not a good choice for a DMP project; you don't have enough time to apply for access.)
  • Sometimes data cost money to access. 
  • Intellectual property rights and Terms of Use can be a hindrance when trying to collect data yourself (i.e., it is not legal to web scrape every website you come across).
  • Data might exist but are not in a machine readable format. This might be true for historical forms of data. They might live on paper in the library and you might not have time to convert them to machine readable format. 

How would the data have been collected?

Other ways to brainstorm is to consider who might have collected the data? Data are expensive and time consuming to collect. They don't just appear out of thin air. Common sources of data are:

  • Researchers. 
  • Government agencies (e.g., Census, BLS, BEA).
  • NGOs and IGOs (e.g., UN, World Bank).
  • Think tanks, research organizations, private companies (e.g., Pew Research, Gallup, Bloomberg).

Use specialized search tools

You can also use specialized search tools. Take a look at the platforms listed on Find Data Archives and Find Data by Topic.

Evaluate potential data

Ask yourself questions about the data that you find.

1. Find overview information

Who created the data? Why? What is the scope? What is the geography and time period?

2. Find technical documentation

Look for and download or document technical documentation about the dataset, including information on how it was created (e.g., survey, administrative reporting, direct measure), variable definitions, indications of what was included or excluded. Survey instruments are also helpful. Hint: look for a codebook, user guide, or documentation section of the site.

3. Identify the Download Options and Access Restrictions

Who gets to use the data? Contact a librarian if you are unsure if you can access it. What formats of download are available - CSV, text, Excel? If it is not formatted for the statistical package you use, contact a librarian for assistance.

Acknowledgements

Much of this content is adapted from:

Carleton College Gould Library. "Data, Datasets, and Statistical Resources - When you're not sure where to start."