- Susan Athey, Stanford University
- Peter Bearman, Columbia University
- Karen Cook, Stanford University / RSF trustee
- Paul DiMaggio, New York University
- Liran Einav, Stanford University
- Bruce Kogut, Columbia University
- Bernice Pescosolido, Indiana University
- Matthew Salganik, Princeton University
- Duncan Watts, Microsoft Research
- Lynn Wu, University of Pennsylvania
In June 2015, the Foundation’s trustees approved the creation of a working group in Computational Social Science (CSS) to explore the ways in which RSF might stimulate new research in its core program areas that leverages the strengths of these new data and methods.
The working group, which was officially discontinued in 2018, supported innovative research that brings new forms of data and analysis, as well as new methods, to bear on questions of interest in its core programs in Social Inequality, Behavioral Economics, Future of Work, and Race, Ethnicity and Immigration.
Over the last five decades, survey research, including face-to-face interviews, telephone surveys and internet panels, has provided most of the data for knowledge production in the social sciences. But declining response rates and the increasing cost of surveys raise questions regarding their long-run sustainability. In addition, research on many topics in the social sciences is hampered by data of insufficient scale and quality.
The emergence of the digital age has rapidly increased access to large and comprehensive data sources such as public and private administrative databases, as well as novel new sources of information such as online transactions, social-media interactions, and internet searches. New computational tools allow for the extraction and coding of text from digital sources. Advances in analytical methods for exploiting and analyzing large amounts of data have also emerged. However, the availability of these new data raises complicated issues regarding access, confidentiality, and privacy.
The study of social mobility is one of many areas where new data have furthered our understanding of the social world. Although survey data have long been used to examine mobility, the data often have limitations including small samples, cross-sectional designs, an inability to link parents with children, and differential attrition. Recent studies have advanced our understanding of social and economic outcomes by analyzing administrative data from tax and social security records.
Chetty, et al. (2014a; 2014b), use tax records to examine intergenerational mobility for recent birth cohorts. Using rank-based measures of mobility, and combining their findings with prior results from other studies, they conclude that social mobility has been essentially stable for birth cohorts from the 1950s to the early 1990s. They also find substantial geographic variation in mobility with some metropolitan areas having much lower rates of mobility than others.
Mitnick and Grusky (2015) also rely on tax data to estimate intergenerational income elasticities (IGEs) for recent birth cohorts, finding that the U.S. is a much less mobile society than has been suggested by many previous studies. Carr and Wiemers (2016) link multiple panels from the Survey of Income and Program Participation with administrative earnings records and find that intragenerational mobility has declined for the most recent birth cohorts, with the greatest decline for those with a college degree.
These examples illustrate how “big data” can expand our understanding of social issues and demonstrate that new data and improved research methods can advance research in the social sciences. Other examples include:
- Advances in text analysis make it possible to analyze large volumes of written material. Enns and colleagues (2016) rely on automated content analysis and other qualitative analysis software to classify all speeches and content inserted into the Congressional Record from 1970 to 2011 to examine the association between political donations and issues raised by members of Congress.
- Price indexes are traditionally constructed using in-person store visits that occur on a regular basis to collect price data for goods and services. The Billion Prices Project (Cavallo and Rigobon, 2016) uses web-scraping technology to collect prices from hundreds of online retailers daily to construct new economic indicators that track the traditional indicators quite well.
- The literature showing an association between race and economic outcomes is extensive, but it is difficult to determine the extent to which these associations are due to racial discrimination or characteristics correlated with race. Doleac and Stein (2013) use online classified advertisements to examine the effect of race on market outcomes and find that black sellers receive fewer and lower offers than white sellers, and that buyer communication with black sellers indicates lower levels of trust.