Reclaiming Lost Data on American Racial Inequality, 1865-1940
Social scientists have begun to document the deep roots of American inequality by studying the long-run consequences of historical institutions like slavery, sharecropping, and Jim Crow. To provide this historical perspective, scholars have linked individual records across time, making considerable progress with data problems such as duplicate records, incomplete fields, inconsistent naming conventions, and selection into surviving records. Examples include linking administrative datasets to the complete-count of the 1880 and 1940 censuses. However, because of the smaller variation in surnames among African Americans, and difficulties posed by women’s name-changing conventions in marriage, previous studies mostly focus on white men.
Peter Bearman and colleagues will document the effects of historical events and institutions on racial inequality by incorporating historical and genealogical information with machine-learning approaches in order to link different records of marginalized groups that were excluded from or missed in official tabulations.