Submission Deadlines: See upcoming deadlines
Social science research on many topics is often hampered by the limitations of survey data, including relatively small sample sizes, low response rates and high costs. However, the digital age has increased access to large, comprehensive data sources, such as public and private administrative databases, and new sources of information from online transactions, social-media interactions, and internet searches. New computational methods also allow for the extraction, coding, and analysis of large volumes of text. Advances in analytical methods for exploiting and analyzing data, including machine learning, have accompanied the rise of these data. The emergence of these new data and methods also raises questions about access, privacy and confidentiality.
The Russell Sage Foundation’s initiative on Computational Social Science (CSS) supports innovative social science research that brings new data and methods to bear on questions of interest in its core programs in Behavioral Economics, Future of Work, Race, Ethnicity and Immigration, and Social Inequality. Limited consideration will be given to research that focuses primarily on methodologies, such as causal inference and innovations in data collection.
Examples of research (some recently funded by RSF) that are of interest include, but are not restricted to, the following:
Small Grants Competitions with Big Data
Many investigators have invested significant time and resources in assembling big data sets by linking and harmonizing administrative data from multiple jurisdictions or agencies (federal, state, local), linking administrative to survey data, or deriving new information from online or archival sources These data, once assembled, can have great value to other investigators beyond their original purpose.
RSF has funded small grants competitions where investigators have made large data sets available to the wider research community. Specifically, RSF issues a call for proposals developed by the investigators that offers small grants to graduate students and early career researchers proposing new projects to use these new data. The investigators lead the review panel that evaluates the proposals and participate in a conference at RSF (about a year after grants a made) at which funded researchers present their results. Examples of small grants competitions include Raj Chetty and Nathan Hendren’s release of public use statistics derived from IRS administrative tax records (see Equality of Opportunity Project; RSF Request for Proposals) or Sean Reardon’s assembling of educational achievement data for roughly 40 million public school students (see the Stanford Education Data Archive; RSF Request for Proposals).
We welcome inquiries from investigators who have developed similar data sets and are willing to make those data available for wider analyses through an RSF small grants competition. RSF does not provide support to assemble and prepare the data for release or the infrastructure support to house it once released.
Linked Administrative Data
Linking public administrative records from different agencies or jurisdictions can help answer long-standing questions of interest. Chetty, Friedman and Rockoff (2014a; 2014b) linked school district administrative records with federal income tax data to identify which teachers, in the short term, have the largest impact on student achievement, and in the longer-term, to show the extent to which students assigned to teachers with higher value-added scores have higher college attendance and higher salaries as adults.
Imberman, Lovenheim and Andrews (2016) link K-12 student-level administrative data from the Texas Education Agency (TEA), post-secondary administrative data from the Texas Higher Education Coordinating Board (THECB) and individual quarterly earnings data from the state’s Workforce Commission to assess the effectiveness of targeted scholarship programs on educational attainment and earnings in young adulthood.
Algorithms and Automated Decision-Making
Human decision-making processes involve different biases. The use of algorithms, increasingly common in decision-making processes that affect people’s lives, including decisions regarding employment hiring and promotion, policing strategies, bail and sentencing, credit determinations, and the allocation of social services, raises many policy questions. Although algorithms are often perceived to be neutral and fair in their processes, some recent studies have found that algorithmic systems may contribute to outcomes that are biased and harmful, especially for disadvantaged populations (e.g., Sweeney, 2013). Other studies (e.g., Kleinberg, et al., 2018) however, suggest that algorithms can contribute to significantly improved decision-making outcomes, including reductions in racial disparities.
In what circumstances and under what conditions are algorithms fair, neutral and outperform human decision-making outcomes? Under what conditions do they incorporate existing social biases that lead them to disparately impact some populations more than others? If the latter occurs, how do such biases get embedded in the algorithms? How do governments, organizations and social scientists evaluate algorithms for such biases and establish accountability? Jens Ludwig and colleagues are investigating the tradeoffs between algorithmic fairness and efficiency and testing the extent to which four different proposed methods for promoting algorithmic fairness in machine learning actually work in practice.
Private Administrative Data
Proprietary data from sources such as credit reporting agencies, online real estate marketplaces, or retail firms are often extremely useful for addressing social science and policy questions. Normative decision theory implies that a dollar is a dollar no matter its source, but psychological research suggests that financial windfalls or additional expenses have different effects depending on which “mental accounts” they impact. Shapiro and Hastings (2017) analyze retail panel data (500,000 households, 6 billion transactions) to understand “mental accounting,” or how households think about and spend money from different sources.
Evidence from tax return data suggests no clear trend in intergenerational income mobility for recent cohorts of young adults (Chetty et al., 2014a; 2014b). In contrast, survey data suggest an increasing intergenerational persistence of occupational mobility. To date, no single “big data” source allows the analysis of income and occupational mobility simultaneously. Michael Hout and David Grusky are utilizing a machine-learning approach to code taxpayer occupation on Internal Revenue Service forms consistent with Current Population Survey records that already have respondent occupation reliably coded.
Atalay, Tannenbaum and Sotelo, using machine learning techniques, will extract job-related elements, including tasks, skills, and technology requirements from a dataset of job vacancies from published newspaper help wanted ads between 1940 and 2000 and online job vacancies posted between 2011-2017. They will study the extent to which the task content of occupations has changed over time, the impact of technology on tasks within occupations, and how these changes have affected earnings.
Online Surveys and Experiments
Survey response rates for in-person and telephone interviews have declined significantly and surveys are expensive to administer. Salganik and Levy (2015) highlight the advantage of Wiki surveys that have data collection instruments that can capture as much information as a respondent is willing to provide, collect information contributed by respondents that was unanticipated by the researcher, and modify the instrument as more information is obtained.
An extensive literature shows an association between race and economic outcomes, but it is difficult to determine the extent to which these associations are due to racial discrimination or characteristics correlated with race. Doleac and Stein (2013) use online classified advertisements to examine the effect of race on market outcomes by featuring a photograph of the item for sale, and experimentally manipulating the color of the seller’s hand (dark or light-skinned). They find that black sellers receive fewer and lower offers than white sellers, and that buyer communication with black sellers indicates lower levels of trust.
Bail (2012) assessed competing predictions about how civil society organizations influence media portrayals of Muslims in the aftermath of 9/11. Using plagiarism detection software, he compared press releases about Muslims produced by civil society organizations to more than 50,000 newspaper articles and television transcripts produced between 2001 and 2008. He finds that anti-Muslim fringe organizations were overrepresented in media portrayals and exerted a powerful influence on media discourse, allowing these groups to enter the “mainstream.”
Enns and colleagues hypothesize that levels of redistributive and egalitarian policy rhetoric in Congress will decline as campaign contributions from wealthy donors and business interests increase. Using data from the Federal Election Commission since the 1970s, they incorporate automated content analysis and other qualitative analysis software to examine all speeches and content inserted into the Congressional Record by members of Congress during the same period.
The large volume of data from social media sites and online interactions presents methodological challenges because the data are unstructured and lack demographic information that is central to social science research. Bail (2015) describes the development and application of “social media survey apps” (SMSAs) using Facebook data to illustrate how such data can be mined to study organizational behavior. McCormick and colleagues (2015) developed and implemented a method for retrieving demographic information from non-text images using Twitter data. Barberá (2016) combines voting registration records and home valuations from Zillow with Twitter data to generate representative public opinion estimates. He uses machine learning methods to estimate key demographics (age, gender, race, income, party affiliation, propensity to vote) of any Twitter user in the U.S.
Applicants should specify how the proposed project informs and advances RSF’s computational social science research priorities in one of its core program areas: Behavioral Economics, Future of Work, Race, Ethnicity and Immigration, and Social Inequality. RSF values reproducibility and open science, and where applicable, investigators should explain their data release plan (data, code, codebooks) or any prohibitions on providing such materials.
Examples of the kinds of questions that are of interest can be found on the Foundation’s website at each of the links above, but examples include:
Program on Behavioral Economics
- What are the psychological consequences of income scarcity and how do they affect individual decision-making and judgment?
- What factors influence decision-making processes that involve tradeoffs between costs and benefits that occur at different points in time, or the tendency to over-value immediate rewards at the expense of longer-term benefits?
Program on the Future of Work
- To what extent have labor market changes affected family formation, transitions to adulthood, or social mobility?
- Job quality is related to many different factors including government policies (e.g., minimum-wage laws or parental and sick leave policies) and employer instituted policies (e.g., flex hours, retirement plans). What are the consequences of such policies for employers, workers and families?
Program on Race, Ethnicity and Immigration
- How do race-related beliefs evolve in the context of growing population diversity?
- What is the impact of immigration policies on the social and political development of immigrants? To what extent have these policies influenced public opinion, inter-group relations or civic participation?
Program on Social Inequality
- To what extent has increased economic inequality (income, wealth, consumption) affected equality of opportunity or social mobility?
- Are changes in the labor market and occupational structure related to changes in economic inequality?
- Upcoming deadlines
- Submit a letter of inquiry (LOI) or invited project proposal
- Detailed information about eligibility and application requirements
- Detailed information about budget requirements
- Frequently asked questions about applying for a grant
Examples of CSS Projects Recently funded by RSF
- Jesse Shapiro and Justine Hastings (Brown University) - “Mental Accounting and Fungibility of Money: Evidence from a Retail Panel”
- Saurabh Bhargava (Carnegie Mellon University) - “The Behavioral Economics of Persistent Unemployment: New Evidence on Psychological Frictions in Job Search”
Future of Work:
- Kyle Handley (University of Michigan), and Nicholas Bloom (Stanford University) - “Offshoring and the Future of Work: New Evidence from Global Production and Employer-Employee Microdata”
- Daniel Shoag (Harvard University), and Alicia Sasser Modestino (Northeastern University) - “Upskilling During the Great Recession: Do Employers Demand Greater Skill When Workers Are Plentiful?”
- Katherine Abraham and John Haltiwanger (University of Maryland), and Lee Sandusky and James Spletzer (U.S. Census Bureau) - “Understanding the Growth and Nature of Non-Employee Work”
- Enghin Atalay (University of Wisconsin), Daniel Tannenbaum (University of Nebraska), Sebastian Sotelo (University of Michigan) - “Using Job Vacancy Ads to Study Long-Run Occupational Change”
Race, Ethnicity and Immigration:
- Ran Abramitzky (Stanford University), Leah Boustan (UC Los Angeles), and Katherine Eriksson (UC Davis) - “Cultural Assimilation During the Age of Mass Migration”
- Alexandra Filindra (University of Illinois, Chicago), and Shanna Pearson-Merkowitz (University of Rhode Island) - “State-Level Immigration-Related Bills, 1990-2015: Database Completion, Data Cleaning and Analyses”
- Nathaniel Hilger (Brown University) - “Estimating Intergenerational Mobility on Census Data”
- John Friedman (Brown University), Raj Chetty (Harvard University), and Emmanuel Saez and Danny Yagan (UC-Berkeley) - “College and Intergenerational Mobility: New Evidence from Administrative Data”
- John Haltiwanger (University of Maryland), Fredrik Anderson (US Treasury), Mark Kutzbach (Bureau of the Census), and Henry Pollakowski (Harvard University) - “Economic Mobility: The Impact of Individual, Parent and Spatial Factors Using National Survey and Administrative Data”
- Emmanuel Saez and Gabriel Zucman (University of California, Berkeley) - “Distributional National Accounts”
- Christopher Wildeman and Maria Fitzpatrick (Cornell University) - “Linking New York City Administrative Data to Estimate the Effects of Paternal Incarceration”
- Jens Ludwig (University of Chicago), Sendhil Mullainathan (Harvard University), Jon Kleinberg (Cornell University), and Benjamin Keys (University of Pennsylvania) - “Addressing Discrimination in Prediction Policy Problems”
- Peter Bearman and Suresh Naidu (Columbia University), Mara Loveman, Eric Schickler, and Christopher Muller (University of California, Berkeley), Marcella Alsan (Stanford University), James Feigenbaum (Boston University), Trevon Logan (Ohio State University) - “Reclaiming Lost Data on American Racial Inequality, 1865-1940”
- Bruce D. Meyer (University of Chicago), and James X. Sullivan (University of Notre Dame) - “Creating a Comprehensive Income Dataset”
- Justine Hastings (Brown University), and Eric Chyn (University of Virginia) - “The Impact of Paid Maternity Leave: Evidence from Temporary Disability Insurance in Rhode Island”
Bail, Christopher A. 2012. The fringe effect: civil society organizations and the evolution of media discourse about Islam since the September 11th attacks. American Sociological Review, 77(6): 855-879. DOI: 10.1177/0003122412465743.
Bail, Christopher A. 2015. Taming big data: using app technology to study organizational behavior on social media. Sociological Methods and Research, May: 1-29. DOI: 10.1177/0049124115587825.
Barberá, Pablo. (2016; working paper). Less is more? how demographic sample weights can improve public opinion estimates based on Twitter data.
Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2014a. Measuring the impacts of teachers I: evaluating bias in teacher value-added estimates. American Economic Review, 104(9): 2593-2632.
Chetty, Raj, John N. Friedman, and Jonah E. Rockoff. 2014b. Measuring the impacts of teachers II: teacher value-added and student outcomes in adulthood." American Economic Review, 104(9): 2633-79.
Doleac, J., and L. C. D. Stein. 2013. The visible hand: race and online market outcomes. The Economic Journal, 123 (572): F469–F492. DOI: 10.1111/ecoj.12082.
Enns, Peter, Nathan Kelly, Jana Morgan and Christopher Witko. Campaign funding, political rhetoric, and the public (non)response to rising inequality. Also see the WCEG working paper.
Hout, Michael, and David Grusky. 2016. Recovering and coding occupational data in U.S. tax returns.
Imberman, Scott, Michael Lovenheim and Rodney Andrews. 2014. Does Attending an Elite University Help Low Income Students? Evidence from the Texas Longhorn and Texas A&M Century Scholars Program. Also see the NBER working paper.
Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2018. Human Decisions and Machine Predictions. The Quarterly Journal of Economics, Volume 133, Issue 1 (February): 237–293.
McCormick, Tyler H., Hedwig Lee, Nina Cesare, Ali Shojaie, and Emma S. Spiro. 2015. Using Twitter for demographic and social science research: tools for data collection and processing. Sociological Methods and Research, October: 1-32. DOI: 10.1177/0049124115605339.
Salganik, Matthew J., and Karen C. Levy. 2015. Wiki surveys: open and quantifiable social data collection. PLoS ONE 10(5): e0123483. doi:10.1371/journal.pone.0123483.
Shapiro, Jesse, and Justine Hastings. 2015. Mental Accounting and Fungibility of Money: Evidence from a Retail Panel.
Sweeney, Latanya. 2013. Discrimination in Online Ad Delivery. Communications of the ACM, Vol. 56 No. 5, Pages 44-54.