PPIRS International Data Workshop - Session 1

This is a research guide designed to complement the presentation given by Jennifer Huck to PPIRS on Oct. 22, 2020.

International Censuses: IPUMS International

“IPUMS-International is dedicated to collecting and distributing census microdata from around the world. The project goals are to collect and preserve data and documentation, harmonize data, and disseminate the harmonized data free of charge.” (Quoted from landing page.)

“IPUMS International is the world's largest collection of publicly available individual-level census data. The data are anonymized samples from population censuses taken from around the world since 1960. Most census data samples contain basic demographic information, along with additional information on religion, occupation, industry, income, work status, education, type of housing, disability status, and household characteristics. In addition, supplemental information is available for some countries that includes fertility data, migration data, mortality data, and GIS boundary files.” (Quoted from project description.)

Geographic coverage: 98 countries.

Time coverage: Varies by country

Access: Free and publically available, but you need to register your research project in order to get access.

Source: Source data for IPUMS International are provided by participating National Statistical Offices.  The IPUMS home is the University of Minnesota.

What is essential to know:

(1) PRESERVATION: IPUMS-International is an archive of publicly available census samples (microdata).  This database contains the information recorded by census enumerators – not the compiled statistical tables. 

(2) HARMONIZATION: the data are coded and documented consistently across countries and over time to facilitate comparative research.

They also are attempting to preserve whatever documentation that the country would have provided at the time of the census, such as enumerator instructions and a copy of the questionnaire. 

While they do not have Census microdata samples for all countries, they have pretty good inventory of the censuses conducted in other countries, and whether or not microdata survived.

What is harmonization? “Composite coding schemes offer a solution. The first one or two digits of the code provide information available across all samples. The next one or two digits provide additional information available in a broad subset of samples. Finally, trailing digits provide detail only rarely available. For example, in IPUMS-International, the first digit of the variable for marital status is comparable across all samples. The second digit delineates consensual unions from other forms of marriage (where appropriate) and distinguishes among the categories separated, divorced, and married with spouse absent. The final digit provides additional detail with the married and married-spouse-absent categories (such as polygamous marriages in Kenya). The basic goal of our harmonization efforts is to simplify use of the data while losing no meaningful information.”  This correspondence table might help you understand how they harmonize their variables and codes across censuses.

One last thing: You should also be aware that a portion of the IPUMS microdata samples are “weighted, with some records representing more cases than others. This means that persons and households with some characteristics are over-represented in the samples, while others are underrepresented.”  If I believe I am working with someone who is perhaps less savvy about analyzing surveys, I will simply point out that they need to pull the survey weights in order to analyze the dataset correctly.  (But I’m not a statistician, so I don’t help with the analysis!)

