Submission Deadlines: See upcoming deadlines
Social science research on many topics has often been hampered by the limitations of survey data. However, the digital age has rapidly increased access to large and comprehensive data sources such as public and private administrative databases, and unique new sources of information from online transactions, social-media interactions, and internet searches. New computational tools also allow for the extraction, coding, and analysis of large volumes of text. Advances in analytical methods for exploiting and analyzing data have accompanied the rise of these data. The emergence of these new data also raises questions about access, privacy and confidentiality.
The Russell Sage Foundation’s initiative on Computational Social Science (CSS) supports innovative social science research that brings new data and methods to bear on questions of interest in its core programs in Behavioral Economics, Future of Work, Race, Ethnicity and Immigration, and Social Inequality. Limited consideration will be given to questions that pertain to core methodologies, such as causal inference and innovations in data collection.
Some inquiries may also be considered by the Alfred P. Sloan Foundation for joint funding, especially projects with larger scale potential that relate to behavioral economics, the economics of science and technology, regulation and industrial organization, as well as privacy, empirical methodologies, economic measurement, or administrative data curation generally.
Examples of research (some recently funded by RSF) that are of interest include, but are not restricted to, the following:
Small Awards Competitions with Big Data
Many investigators have invested significant time and resources in assembling data from various sources to address important questions. This involves linking and harmonizing administrative data from multiple jurisdictions or agencies (federal, state, local), or linking administrative to survey data, or developing new forms of information from online or archival sources These data, once assembled, can have great value to other investigators beyond their original purpose.
RSF has funded small awards competitions where investigators have made their data available to the wider research community. Specifically, RSF issues a call for proposals developed by the investigators that offers small awards to graduate students and early career researchers to analyze these new data. The investigators lead the review panel that evaluates the proposals, and participate in a conference at RSF (about a year after awards a made) at which funded researchers present their results. Examples of prior small awards competitions include Raj Chetty and Nathan Hendren’s release of public use statistics derived from IRS administrative tax records (see Equality of Opportunity Project; RSF Request for Proposals) or Sean Reardon’s assembling of educational achievement data for roughly 40 million public school students (see the Stanford Education Data Archive; RSF Request for Proposals).
We welcome inquiries from investigators who have compiled such data and have an interest in making these data available for wider analyses through an RSF small awards competition. RSF is unable to provide support to assemble and prepare the data for release or the infrastructure support to house it once released.
Linked Administrative Data
Chetty, Friedman and Rockoff (2014a; 2014b) linked school district administrative records with federal income tax data to identify which teachers, in the short term, have the largest impact on student achievement, and in the longer-term, to show that students assigned to teachers with higher value-added scores have higher college attendance and higher salaries as adults.
Imberman, Lovenheim and Andrews link K-12 student-level administrative data from the Texas Education Agency (TEA), post-secondary administrative data from the Texas Higher Education Coordinating Board (THECB) and individual quarterly earnings data from the state’s Workforce Commission to assess the effectiveness of targeted scholarship programs on educational attainment and earnings in young adulthood.
Private Administrative Data
Normative decision theory implies that a dollar is a dollar no matter its source, but psychological research suggests that financial windfalls or additional expenses have different effects depending on which “mental accounts” they impact. Shapiro and Hastings analyze retail panel data (500,000 households, 6 billion transactions) to understand “mental accounting,” or how households think about and spend money from different sources.
Evidence from tax return data suggests no clear trend in intergenerational income mobility for recent cohorts of young adults (Chetty et al., 2014a; 2014b). In contrast, survey data suggest an increasing intergenerational persistence of occupational mobility. To date, no single “big data” source allows the analysis of income and occupational mobility simultaneously. Hout and Grusky are utilizing a machine-learning approach to code taxpayer occupation on Internal Revenue Service forms consistent with Current Population Survey records that already have respondent occupation reliably coded.
Online Surveys and Experiments
Survey response rates for in-person and telephone interviews have declined significantly and surveys are expensive to administer. Salganik and Levy (2015) highlight the advantage of Wiki surveys that have data collection instruments that can capture as much information as a respondent is willing to provide, collect information contributed by respondents that was unanticipated by the researcher, and modify the instrument as more information is obtained.
The literature showing an association between race and economic outcomes is extensive, but it is difficult to determine the extent to which these associations are due to racial discrimination or characteristics correlated with race. Doleac and Stein (2013) use online classified advertisements to examine the effect of race on market outcomes by featuring a photograph of the item for sale, and experimentally manipulating the color of the seller’s hand (dark or light-skinned). They find that black sellers receive fewer and lower offers than white sellers, and that buyer communication with black sellers indicates lower levels of trust.
Bail (2012) assessed competing predictions about how civil society organizations influence media portrayals of Muslims in the aftermath of 9/11. Using plagiarism detection software, he compared press releases about Muslims produced by civil society organizations to more than 50,000 newspaper articles and television transcripts produced between 2001 and 2008. He finds that anti-Muslim fringe organizations were overrepresented in media portrayals and exerted a powerful influence on media discourse, allowing these groups to evolve and become part of the “mainstream.”
Enns and colleagues hypothesize that levels of redistributive and egalitarian policy rhetoric will decline as the level of campaign contributions from wealthy donors and business interests increase. Using data on campaign contributions collected by the Federal Election Commission from the 1970s through the present, they incorporate automated content analysis and other qualitative analysis software to examine all speeches and content inserted into the Congressional Record by members of Congress during the same period.
Jelveh, Kogut, and Naidu (2015) used a combination of machine-learning and text tools to examine the extent to which social science empirical research has an ideological bias. Using a large corpus of economic articles, they created partisan scores for economists whose political contributions are recorded in Federal Election Commission data. Articles written by economists with known political ideology provided the text that was mined to predict the ideological scores of economists whose political preferences are unknown.
The large volume of data from social media sites and online interactions presents methodological challenges because the data are often highly unstructured and lack demographic information that is central to social science research. Bail (2015) describes the development and application of “social media survey apps” (SMSAs) using Facebook data to illustrate how such data can be mined to study organizational behavior. McCormick and colleagues (2015) developed and implemented a method for retrieving demographic information from non-text images using Twitter data. Barberá (2016) combines voting registration records and home valuations from Zillow with Twitter data to generate representative public opinion estimates. He uses machine learning methods to estimate key demographics (age, gender, race, income, party affiliation, propensity to vote) of any Twitter user in the U.S.
Applicants should specify how the proposed project informs and advances RSF’s computational social science research priorities in its core program areas: Behavioral Economics, Future of Work, Race, Ethnicity and Immigration, and Social Inequality. RSF values reproducibility and open science, and where applicable, investigators should explain their data release plan (data, code, codebooks) or any prohibitions on providing such materials.
Examples of the kinds of questions that are of interest can be found on the Foundation’s website at each of the links above, but examples include:
Program on Behavioral Economics
- What are the psychological consequences of income scarcity and how do they affect individual decision-making and judgment?
- What factors influence decision-making processes that involve tradeoffs between costs and benefits that occur at different points in time, or the tendency to over-value immediate rewards at the expense of longer-term benefits?
Program on the Future of Work
- To what extent have labor market changes affected family formation, transitions to adulthood, or social mobility?
- Job quality is related to many different factors including government policies (e.g., minimum-wage laws or parental and sick leave policies) and employer instituted policies (e.g., flex hours, retirement plans). What are the consequences of such policies for employers, workers and families?
Program on Race, Ethnicity and Immigration
- How do race-related beliefs evolve in the context of growing population diversity?
- What is the impact of immigration policies on the social and political development of immigrants? To what extent have these policies influenced public opinion, inter-group relations or civic participation?
Program on Social Inequality
- To what extent has increased economic inequality (income, wealth, consumption) affected equality of opportunity or social mobility?
- Are changes in the labor market and occupational structure related to changes in economic inequality?
- Upcoming deadlines
- Submit a letter of inquiry (LOI) or invited project proposal
- Detailed information about eligibility and application requirements
- Detailed information about budget requirements
- Frequently asked questions about applying for a grant
Examples of Recently Funded CSS Projects
- Jesse Shapiro and Justine Hastings (Brown University) - “Mental Accounting and Fungibility of Money: Evidence from a Retail Panel”
- Saurabh Bhargava (Carnegie Mellon University) – “The Behavioral Economics of Persistent Unemployment: New Evidence on Psychological Frictions in Job Search”
Future of Work:
- Kyle Handley (University of Michigan), and Nicholas Bloom (Stanford University) - “Offshoring and the Future of Work: New Evidence from Global Production and Employer-Employee Microdata”
- Daniel Shoag (Harvard University), and Alicia Sasser Modestino (Northeastern University) - “Upskilling During the Great Recession: Do Employers Demand Greater Skill When Workers Are Plentiful?”
- Katherine Abraham and John Haltiwanger (University of Maryland), and Lee Sandusky and James Spletzer (U.S. Census Bureau), Understanding the Growth and Nature of Non-Employee Work
Race, Ethnicity and Immigration:
- Ran Abramitzky (Stanford University), Leah Boustan (UC Los Angeles), and Katherine Eriksson (UC Davis) - “Cultural Assimilation During the Age of Mass Migration”
- Alexandra Filindra (University of Illinois, Chicago), and Shanna Pearson-Merkowitz (University of Rhode Island) - “State-Level Immigration-Related Bills, 1990-2015: Database Completion, Data Cleaning and Analyses”
- Nathaniel Hilger (Brown University) - “Estimating Intergenerational Mobility on Census Data”
- John Friedman (Brown University), Raj Chetty (Harvard University), and Emmanuel Saez and Danny Yagan (UC-Berkeley) - “College and Intergenerational Mobility: New Evidence from Administrative Data”
- John Haltiwanger (University of Maryland), Fredrik Anderson (US Treasury), Mark Kutzbach (Bureau of the Census), and Henry Pollakowski (Harvard University) - “Economic Mobility: The Impact of Individual, Parent and Spatial Factors Using National Survey and Administrative Data”
- Emmanuel Saez and Gabriel Zucman (University of California, Berkeley), “Distributional National Accounts”
- Christopher Wildeman and Maria Fitzpatrick (Cornell University), Linking New York City Administrative Data to Estimate the Effects of Paternal Incarceration
Bail, Christopher A. 2012. The fringe effect: civil society organizations and the evolution of media discourse about Islam since the September 11th attacks. American Sociological Review, 77(6): 855-879. DOI: 10.1177/0003122412465743.
Bail, Christopher A. 2015. Taming big data: using app technology to study organizational behavior on social media. Sociological Methods and Research, May: 1-29. DOI: 10.1177/0049124115587825.
Barberá, Pablo. (2016; working paper). Less is more? how demographic sample weights can improve public opinion estimates based on Twitter data. http://pablobarbera.com/static/less-is-more.pdf.
Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2014a. Measuring the impacts of teachers I: evaluating bias in teacher value-added estimates. American Economic Review, 104(9): 2593-2632.
Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2014b. Measuring the impacts of teachers II: teacher value-added and student outcomes in adulthood." American Economic Review, 104(9): 2633-79.
Doleac, J., and L. C. D. Stein. 2013. The visible hand: race and online market outcomes. The Economic Journal, 123 (572): F469–F492. DOI: 10.1111/ecoj.12082.
Enns, Peter, Nathan Kelly, Jana Morgan and Christopher Witko. Campaign funding, political rhetoric, and the public (non)response to rising inequality. http://www.russellsage.org/awarded-project/campaign-funding-political-rh.... Also see the WCEG working paper: http://equitablegrowth.org/working-papers/the-power-of-economic-interest....
Hout, Michael, and David Grusky. 2016. Recovering and coding occupational data in U.S. tax returns. http://www.russellsage.org/awarded-project/recovering-and-coding-occupat....
Imberman, Scott, Michael Lovenheim and Rodney Andrews. 2014. Does Attending an Elite University Help Low Income Students? Evidence from the Texas Longhorn and Texas A&M Century Scholars Program. http://www.russellsage.org/awarded-project/does-attending-elite-universi.... Also see the NBER working paper: http://www.nber.org/papers/w22260.
Jelveh, Zubin, Bruce Kogut, and Suresh Naidu. 2015. Political language in economics. Columbia Business School Research Paper No. 14-57. (October 22). Available at SSRN: http://ssrn.com/abstract=2535453 or http://dx.doi.org/10.2139/ssrn.2535453
McCormick, Tyler H., Hedwig Lee, Nina Cesare, Ali Shojaie, and Emma S. Spiro. 2015. Using Twitter for demographic and social science research: tools for data collection and processing. Sociological Methods and Research, October: 1-32. DOI: 10.1177/0049124115605339.
Salganik, Matthew J., and Karen C. Levy. 2015. Wiki surveys: open and quantifiable social data collection. PLoS ONE 10(5): e0123483. doi:10.1371/journal.pone.0123483.
Shapiro, Jesse, and Justine Hastings. 2015. Mental Accounting and Fungibility of Money: Evidence from a Retail Panel. http://www.russellsage.org/awarded-project/mental-accounting-and-fungibi....