Skip to Main Content

STAT 3220: Finding and Evaluating Data

Data Literacy workshop for STAT 3220

Welcome Stat 3220!

This guide will help you discover and evaluate sources. 

The presentation slides are available.  The slides cover the general concepts about finding and evaluating data.  (Be sure to view the Speaker Notes.)  This research guide points to specific resources available through the library or publicly.

If you have any questions, feel free to reach out!

Citation Guide - How to Cite within a paper

Unlike many other disciplines, Math and Statistics don't have a single citation style in publications. However, we recommend you use APA citation style. We recommend using a citation generator to create accurate citations you can copy into your paper. You will have an in-text citation in your paper where you reference websites, journals, data sources, etc. At the end of your paper you will have a "References" section where you will have the full citation. 

In-text citation:

   "College Basketball Dataset" (Kaggle, 2023)
   "Vulnerable Catchments" (Ministry for the Environment, 2016)

Reference Entry: 

   College Basketball Dataset. (2023, September 12). Kaggle.
   Ministry for the Environment. (2016). Vulnerable catchments (Version 17) [Data set].

  • Citation Machine
    Citation Machine is a user friendly citation manager. Enter your information and Citation Machine spits out your citation. 
  • Purdue Online Writing Lab
    For many years, Purdue University's Online Writing Lab has been an authority on citation formats and styles. At the top of this page
  • APA Homepage
    If you want to read more documentation about APA style, you can find it on the American Psychological Association Homepage

Track down sources in academic literaure

This is our #1 strategy for how to start looking for data when you have no idea where to start looking!

Every empirical academic paper is going to have a methods section.  Look for articles similar to a topic you are interested in, and check the methods section.  What data are they using?

Here is a real-life example. A group of students asked Jenn about Major League Baseball injuries, and she found this: Injury Rates in Major League Baseball During the 2020 COVID-19 Season.  The very first line of the Methods section states, “Data from the 2018-2020 MLB transaction reports were extracted online at”

Find library licensed datasets

These are our "best bets" data sources available to you through UVA Library.  They cover a broad range of topics.

These are great places to look for a dataset that is clean, readily downloadable, and easy to import into your statistical software. If you are a little flexible about your topic, looking here first will save you some effort that you might spend looking for data, cleaning data, etc.

Look for government data

Use the US Census

For those of you using the US Census, you might want to check out these resources:

Find data from news sources

Also see News sources’ GitHub repos: NYTimes, Washington Post, BuzzFeed News, etc.

Remember to iterate

Start in one place and keep iterating.  Use the first place to see if sources are cited, and work from there.

Statista is a great place to try this strategy.  Statista only offers basic statistics (tables and charts, not full datasets).  However, they always cite their sources.  Keep digging at the source to see if you can find an original dataset.  You can use the same strategy with PolicyMap as well.