Skip to Main Content

STAT 4996: Expanding Your Data

Find library licensed datasets

These are data sources available to you through UVA Library.  They cover a broad range of topics.

Get background knowledge

Look for government data

Use the US Census

For those of you using the US Census, you might want to check out these resources:

Find data from news sources

Also see News sources’ GitHub repos: NYTimes, Washington Post, BuzzFeed News, etc.

Track down sources in academic literaure

Every empirical academic paper is going to have a methods section.  Look for articles similar to a topic you are interested in, and check the methods section.  What data are they using?

Here is a real-life example. A group of students asked Jenn about Major League Baseball injuries, and she found this: Injury Rates in Major League Baseball During the 2020 COVID-19 Season.  The very first line of the Methods section states, “Data from the 2018-2020 MLB transaction reports were extracted online at mlb.com/transactions.”

Remember to iterate

Start in one place and keep iterating.  Use the first place to see if sources are cited, and work from there.

Statista is a great place to try this strategy.  Statista only offers basic statistics (tables and charts, not full datasets).  However, they always cite their sources.  Keep digging at the source to see if you can find an original dataset.  You can use the same strategy with PolicyMap as well.