Skip to Main Content

IPUMS International (PPIRS Int'l Data Webinar)

Research Guide associated with PPIRS International Data Workshop Session 2: IPUMS International.

Getting Access

Users must apply for access to the data.  You must share with IPUMS a description of your proposed research, institutional affiliation, and other info.  IPUMS staff will review your application, and may as for additional information if necessary.  Applicants must agree to a number of terms in order to access the data.  The most important ones are that you do not redistribute the data and will not attempt to identify individuals.

Personally, I have noted that I am an academic librarian helping students and faculty to use IPUMS data and that I have no research project of my own, and that has been fine - I have never had a problem.

Getting Started with Data

Click on "Select Data" or "Browse and Select Data."  The initial screen includes harmonized variables.  Select "source variables" to browse variables specific to individual samples.  The page will display variables present in specific censuses.  An "x" indicates the availability of a variable for a particular sample.

Alternatively, you can start with "Select Samples" which allows you to choose the countries and years before choosing your variables.  That way, you only see the variables available to you for that country and year.

Let’s select samples: Mexico-2010, Vietnam-2009

You can search for variables.  Clicking on the variable brings up its documentation. You can see codes, description, comparability, universe, etc.  You can download the harmonization documents (e.g., harmonization table) for the specific variable here.

Look at variable: chborn (children ever born).

Let’s add to our extract cart: chborn, urban, electric, age, marst, edattain

There are several variables that are added to the cart by default (e.g., country, year, weights).

Data are delivered through data extraction system.  You will get an email when the extract is ready.  Note: this can take a little while, from a few minutes to an hour or more.  Some of the file sizes can get quite large and will take longer. The data are provided as an ASCII data file, plus a syntax file to import into SPSS, SAS, STATA, and R.  You also get a codebook to download.

Teaching with IPUMS

Instructors may wish to know that there are some good teaching resource available at IPUMS. You can get a classroom account, and use their teaching exercises, which are geared towards the online tabulator and exercises in R.

Using the online SDA Tabulator

IPUMS also offers an online SDA tabulator.  SDA tabulators are pretty common tools used for web-based data analysis.  This allows you to work with IPUMS data without having to use statistical software.  You will need to work with one sample (country-year combination) at a time. 

This quick video tutorial from IPUMS will get you started:

Here are some examples to work with:

Select sample: Mexico 2010

Age distribution of females in Mexico:
Row: age2
Column: sex
Optional Selection filters: sex(2) -- females are coded as 2

Distribution of electricity in rural and urban areas:

Row: electric
Column: urban
Optional Selection filters: electric(1-2) urban(1-2)

Number of children ever born, matches Vietnam's universe for children ever born:

Row: chborn (children ever born)
Optional Selection filters: age(15-100)

These are examples of relatively simple cross-tabs.  You can do much more complicated analysis, including regression using the online SDA tabulator.

Using a Statistical Package, examples using R

You can download command files that will help you import data into STATA, SPSS, SAS, or R.  You should download the command files when you download your data extract.

IPUMS offers an R package to help import data into R: ipumsr.  This is really helpful for getting started with IPUMS data in R.  The ipumsr package has a useful vignette to help you get started using the package.  The examples in the vignette show how to explore the data using dplyr, as well.

To use IPUMS data in R: run your extract, and download the dataset and DDI.  Download the R file, too - it is a helpful script to get you started.  It installs (if you don't already have the package installed) and loads the ipumsr package, and use ipumsr functions to import the DDI file and read in the dataset.