IPUMS (Integrated Public Use Microdata Series)
http://usa.ipums.org/usa/ Maintained by the Minnesota Population Center, University of Minnesota.
Reviewed Jan. 1, 2003.
Although the decennial census began in 1790, detailed information was first listed for each individual in 1850. One of the great archival projects of the past two decades has been to make huge samples from each of the 1850–2000 censuses machine-readable-including at least one percent of the American population enumerated in each year. The product of this immense effort is available free of charge through the magnificent Web site, IPUMS (Integrated Public Use Microdata Series; in Google, seek IPUMS and choose the first entry). Across all years, a given census question (for example, place of birth) is in the same place in the dataset, and each possible response to the question (for example, Germany) has the same numerical code. Special variables have been created on the basis of explicit assumptions, to make the data easier to use. For 1850–1870, each household member’s relationship to the head was not ascertained by the census, but an IPUMS- constructed variable offers a reasonable guess based on age, sex, and order of enumeration.
So how do we get these magnificent datasets? The IPUMS Web site provides two ways. The experienced user of quantitative data will probably want to click his or her way to complete datasets (data/data options/ download entire raw datasets; the Web site discusses transmission time and space considerations). IPUMS also offers an option for selecting random subsamples: a mini version with 21,000 households, and a tiny version with 2,100 households.
The other way to access the datasets is via the Web site’s extract system (data/create a new extract). The user specifies variables and cases to include in the desired dataset (for example, choose New Englanders by race and age for 1850–1920). This strategy can save programming and downloading time, and space; it might be useful too for encouraging undergraduates to tool around in the system.
An impressive feature of the Web site is documentation. Census instructions to enumerators, as well as discussions of sampling, coding, and construction of variables, can all be read online or printed. The crucial documentation that will be used over and over - variable location and codes for each response - are found in hyperlinked form that is easy to use online and easy to download to one’s PC (personal computer) for use with a Web browser. Also, the extract option described earlier automatically provides programming statements to address the selected data files in SAS (System Application Software), SPSS (Statistical Package for the Social Sciences), or Stata, including all the value labels and other features that are time-consuming to enter oneself but nice to have. Select all variables in the extract option, and you get the fullest programming statements.
Once a dataset has been accessed, it must be analyzed. IPUMS does not provide its own shortcuts here. One option: a standard statistics package such as SAS. A second: IPUMS does provide links to two free software products for “easy” generation of crosstabs and descriptive statistics from IPUMS datasets. At least one of these (by Querylogic) can be downloaded on the spot to a PC for immediate use; still, it will require a little time before the inexperienced student can use even this package.
More? The “international” option at the home page reveals that IPUMS will soon provide access to machine-readable samples from the censuses of many other countries, as well as to some additional U.S. data.
Levy Economics Institute of Bard College
Annandale-on-Hudson, New York