Big Data, Prediction & Explanation
The contemporary data deluge offers rich opportunities for sociologists to deploy old and new tools to study patterns of social life. We focus in particular on whether the most recent developments in AI and Large Language Models, since the notable advent of ChatGPT, help improve prediction of life outcomes.
Background readings
Bail, 2023, arXiv, "Can Generative AI Improve Social Science"
McFarland & al., 2015, AmSoc, “Sociology in the era of big data: The ascent of forensic social science”
Molina & Garip, 2019, ARS, "Machine learning for sociology"
Salganik & al., 2019, Socius, "Introduction to the special collection on the Fragile Families Challenge"
Salganik & al., 2020, PNAS, "Measuring the predictability of life outcomes with a scientific mass collaboration"
Ziems & al., 2023, arXiv, "Can large language models transform computational social science"
Optional readings
Bail, 2014, TS, "The cultural environment: Measuring culture with big data"
Boelaert & Ollion, 2018, RFS, "The great regression. Machine learning, econometrics, and the future of quantitative social sciences"
Blei & al, 2003, JMLR, "Latent Dirichlet Allocation"
Colbaugh & al, 2012, arxiv, "Leveraging sociological models for predictive analysis"
Evans & Aceves, 2016, ASR, "Machine translation: Mining text for social theory"
Garip, 2020, PNAS, "What failure to predict life outcomes can teach us"
Grimmer & Stewart, 2013, PA, "Text as data: The promise and pitfalls of automatic content analysis methods for political texts"
Kitchin, 2014, BDS, "Big data, new epistemologies and paradigm shifts"
McFarland & al, 2013, Poet, "Differentiating language usage through topic models"
Mohr & Bogdanov, 2013, Poet, "Introduction - Topic models. What they are and why they matter"
Varian, 2014, JEP, "Big data: New tricks for econometrics"
Case-study for reading and commentary
Argyle & al., 2022, PA, "Out of One, Many. Using Language Models to Simulate Human Samples"
Case-studies for presentation
Filipova & al., 2019, Socius, "Humans in the Loop Incorporating Expert and Crowd-Sourced Knowledge for Predictions Using Survey Data"
Rigodon & al., 2019, Socius, "Winning Models for Grade Point Average, Grit, and Layoff in the Fragile Families Challenge"