Big Data, Prediction & Explanation
The contemporary data deluge offers rich opportunities for sociologists to deploy old and new tools to study patterns of social life. We focus in particular on whether the combination of big data and statistical learning techniques help improve prediction of life outcomes. (Listen to a podcast about Salganik's experiment we study in class.)
Background readings
McFarland & al., 2015, AmSoc, “Sociology in the era of big data: The ascent of forensic social science”
Molina & Garip, 2019, ARS, "Machine learning for sociology"
Salganik & al., 2019, Socius, "Introduction to the special collection on the Fragile Families Challenge"
Salganik & al., 2020, PNAS, "Measuring the predictability of life outcomes with a scientific mass collaboration"
Optional readings
Bail, 2014, TS, "The cultural environment: Measuring culture with big data"
Boelaert & Ollion, 2018, RFS, "The great regression. Machine learning, econometrics, and the future of quantitative social sciences"
Blei & al, 2003, JMLR, "Latent Dirichlet Allocation"
Colbaugh & al, 2012, arxiv, "Leveraging sociological models for predictive analysis"
Evans & Aceves, 2016, ASR, "Machine translation: Mining text for social theory"
Garip, 2020, PNAS, "What failure to predict life outcomes can teach us"
Grimmer & Stewart, 2013, PA, "Text as data: The promise and pitfalls of automatic content analysis methods for political texts"
Kitchin, 2014, BDS, "Big data, new epistemologies and paradigm shifts"
McFarland & al, 2013, Poet, "Differentiating language usage through topic models"
Mohr & Bogdanov, 2013, Poet, "Introduction - Topic models. What they are and why they matter"
Varian, 2014, JEP, "Big data: New tricks for econometrics"
Case-studies for reading, presentation and commentary
Note: All four papers are shorter in length and are based on the same data and participate in the same prediction challenge.
Filipova & al., 2019, Socius, "Humans in the Loop Incorporating Expert and Crowd-Sourced Knowledge for Predictions Using Survey Data"
Crompton, 2019, Socius, "A Data-Driven Approach to the Fragile Families Challenge Prediction through Principal-Components Analysis and Random"
Rigodon & al., 2019, Socius, "Winning Models for Grade Point Average, Grit, and Layoff in the Fragile Families Challenge"
Davidson, 2019, Socius, "Black-Box Models and Sociological Explanations. Predicting High School Grade Point Average Using Neural Networks"
Case-studies for written reviews
Hofstra & al., 2017, ASR, “Sources of segregation in social networks: A novel approach using Facebook”
Mazieres & al., 2021, Nature, "Computational appraisal of gender representativeness in popular movies"
Roth & al., 2020, PlosOne, "Tubes and bubbles topological confinement of YouTube recommendations"
Spiro & al., 2016, Socius, “The persistence of division: geography, institutions, and online friendship ties”