Big Data, Prediction & Explanation
The contemporary data deluge offers rich opportunities for sociologists to deploy old and new tools to study patterns of social life. We focus in particular on whether the most recent developments in AI and Large Language Models, since the notable advent of ChatGPT, help improve prediction of life outcomes.
Background readings
Bail, 2023, arXiv, "Can Generative AI Improve Social Science"
McFarland & al., 2015, AmSoc, “Sociology in the era of big data: The ascent of forensic social science”
Molina & Garip, 2019, ARS, "Machine learning for sociology"
Salganik & al., 2019, Socius, "Introduction to the special collection on the Fragile Families Challenge"
Salganik & al., 2020, PNAS, "Measuring the predictability of life outcomes with a scientific mass collaboration"
Ziems & al., 2023, arXiv, "Can large language models transform computational social science"
Optional readings
Bail, 2014, TS, "The cultural environment: Measuring culture with big data"
Boelaert & Ollion, 2018, RFS, "The great regression. Machine learning, econometrics, and the future of quantitative social sciences"
Blei & al, 2003, JMLR, "Latent Dirichlet Allocation"
Colbaugh & al, 2012, arxiv, "Leveraging sociological models for predictive analysis"
Evans & Aceves, 2016, ASR, "Machine translation: Mining text for social theory"
Garip, 2020, PNAS, "What failure to predict life outcomes can teach us"
Grimmer & Stewart, 2013, PA, "Text as data: The promise and pitfalls of automatic content analysis methods for political texts"
Kitchin, 2014, BDS, "Big data, new epistemologies and paradigm shifts"
McFarland & al, 2013, Poet, "Differentiating language usage through topic models"
Mohr & Bogdanov, 2013, Poet, "Introduction - Topic models. What they are and why they matter"
Varian, 2014, JEP, "Big data: New tricks for econometrics"
Case-studies for reading and commentary
Rigodon & al., 2019, Socius, "Winning Models for Grade Point Average, Grit, and Layoff in the Fragile Families Challenge"
Filipova & al., 2019, Socius, "Humans in the Loop Incorporating Expert and Crowd-Sourced Knowledge for Predictions Using Survey Data"
Case-studies for written reviews
Algee-Hewitt, Mark, Sarah Allison, Marissa Gemma, Ryan Heuser, Franco Moretti, and Hannah Walser. 2016. Canon/Archive: Large-Scale Dynamics in the Literary Field. Stanford, California: Literary Lab.
Anderson, Gordon, Maria Grazia Pittau, and Roberto Zelli. 2016. “Assessing the Convergence and Mobility of Nations without Artificially Specified Class Boundaries.” Journal of Economic Growth 21(3):283–304. doi: 10.1007/s10887-016-9128-5.
Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. “Out of One, Many: Using Language Models to Simulate Human Samples.” Political Analysis 31(3):337–51. doi: 10.1017/pan.2023.2.
Bail, Christopher A. 2012. “The Fringe Effect: Civil Society Organizations and the Evolution of Media Discourse about Islam since the September 11th Attacks.” American Sociological Review 77(6):855–79. doi: 10.1177/0003122412465743.
Bail, Christopher A., Brian Guay, Emily Maloney, Aidan Combs, D. Sunshine Hillygus, Friedolin Merhout, Deen Freelon, and Alexander Volfovsky. 2020. “Assessing the Russian Internet Research Agency’s Impact on the Political Attitudes and Behaviors of American Twitter Users in Late 2017.” Proceedings of the National Academy of Sciences 117(1):243–50. doi: 10.1073/pnas.1906420116.
Bolukbasi T., Saligrama V., Chang K.-W., Zou J., and Kalai A. 2016. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” in 30th Annual Conference on Neural Information Processing Systems.
Cardon, Dominique, Guilhem Fouetillou, and Camille Roth. 2021. “Two Paths of Glory — Structural Positions and Trajectories of Websites within Their Topical Territory.” Proceedings of the International AAAI Conference on Web and Social Media 5(1):58–65. doi: 10.1609/icwsm.v5i1.14101.
Centola, Damon. 2011. “An Experimental Study of Homophily in the Adoption of Health Behavior.” Science 334(6060):1269–72. doi: 10.1126/science.1207055.
Chu, Johan S. G., and James A. Evans. 2021. “Slowed Canonical Progress in Large Fields of Science.” Proceedings of the National Academy of Sciences 118(41):1–5.
DiMaggio, Paul, Manish Nag, and David Blei. 2013. “Exploiting Affinities between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of U.S. Government Arts Funding.” Poetics 41(6):570–606. doi: 10.1016/j.poetic.2013.08.004.
Evans, James A. 2010. “Industry Induces Academic Science to Know Less About.” American Journal of Sociology 116(2):389–452.
Gross, Neil, and Marcus Mann. 2017. “Is There a ‘Ferguson Effect?’ Google Searches, Concern about Police Violence, and Crime in U.S. Cities, 2014–2016.” Socius: Sociological Research for a Dynamic World 3:237802311770312. doi: 10.1177/2378023117703122.
Hofstra, Bas, Rense Corten, Frank Van Tubergen, and Nicole B. Ellison. 2017. “Sources of Segregation in Social Networks: A Novel Approach Using Facebook.” American Sociological Review 82(3):625–56. doi: 10.1177/0003122417705656.
Hunzaker, M. B. Fallin, and Lauren Valentino. 2019. “Mapping Cultural Schemas: From Theory to Method.” American Sociological Review 84(5):950–81. doi: 10.1177/0003122419875638.
Kozlowski, Austin C., Matt Taddy, and James A. Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 84(5):905–49. doi: 10.1177/0003122419877135.
Latour, Bruno, Pablo Jensen, Tommaso Venturini, Sébastian Grauwin, and Dominique Boullier. 2012. “‘The Whole Is Always Smaller than Its Parts’ – a Digital Test of G Abriel T Ardes’ Monads.” The British Journal of Sociology 63(4):590–615. doi: 10.1111/j.1468-4446.2012.01428.x.
Levy, Brian L., Nolan E. Phillips, and Robert J. Sampson. 2020. “Triple Disadvantage: Neighborhood Networks of Everyday Urban Mobility and Violence in U.S. Cities.” American Sociological Review 85(6):925–56. doi: 10.1177/0003122420972323.
Liu, David M., and Matthew J. Salganik. 2019. “Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge.” Socius: Sociological Research for a Dynamic World 5:1–21.
Mazieres, Antoine, Telmo Menezes, and Camille Roth. 2021. “Computational Appraisal of Gender Representativeness in Popular Movies.”
McKay, Stephen. 2019. “When 4 ≈ 10,000: The Power of Social Science Knowledge in Predictive Performance.” Socius: Sociological Research for a Dynamic World 5:237802311881177. doi: 10.1177/2378023118811774.
Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. 2011a. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science 331(6014):176–82. doi: 10.1126/science.1199644.
Miller, Ian Matthew. 2013. “Rebellion, Crime and Violence in Qing China, 1722–1911: A Topic Modeling Approach.” Poetics 41(6):626–49. doi: 10.1016/j.poetic.2013.06.005.
Nowak, Adam, and Patrick Smith. 2017. “Textual Analysis in Real Estate: TEXTUAL ANALYSIS.” Journal of Applied Econometrics 32(4):896–918. doi: 10.1002/jae.2550.
Roth, Camille, Antoine Mazières, and Telmo Menezes. 2020. “Tubes and Bubbles Topological Confinement of YouTube Recommendations” edited by T. P. Peixoto. PLOS ONE 15(4):e0231703. doi: 10.1371/journal.pone.0231703.
Salganik, Matthew J., Peter Sheridan Dodds, and Duncan J. Watts. 2006. “Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market.” Science 311(5762):854–56. doi: 10.1126/science.1121066.
Salganik, Matthew J., and Duncan J. Watts. 2009. “Web‐Based Experiments for the Study of Collective Social Dynamics in Cultural Markets.” Topics in Cognitive Science 1(3):439–68. doi: 10.1111/j.1756-8765.2009.01030.x.
Spiro, Emma S., Zack W. Almquist, and Carter T. Butts. 2016. “The Persistence of Division: Geography, Institutions, and Online Friendship Ties.” Socius: Sociological Research for a Dynamic World 2:237802311663434. doi: 10.1177/2378023116634340.
Stanescu, Diana, Erik Wang, and Soichiro Yamauchi. 2019. “Using LASSO to Assist Imputation and Predict Child Well-Being.” Socius: Sociological Research for a Dynamic World 5:237802311881462. doi: 10.1177/2378023118814623.