Emily M. Cantrell

PhD Candidate in Sociology & Social Policy, Princeton University

About


I research the capabilities and limitations of predictive machine learning models in social services, with a focus on child and family policy. My goal is to contribute to the safe regulation of these rapidly proliferating tools. I am also enthusiastic about data science, administrative data, and social safety net research more broadly!  
Many social services agencies have begun to use “predictive risk models:” machine learning models that estimate a person’s future risk of some negative experience or behavior, for use as a decision-making aid. For example, child protective services agencies use predictive risk models to estimate the risk that a child will be removed from the home in a future investigation to guide decisions about how to respond to reports of neglect, and legal courts use predictive risk models to estimate the likelihood of crime recidivism to guide decisions about bail. With the rise of predictive risk models comes a host of concerns about accuracy, fairness, and transparency. My dissertation examines the capabilities and limitations of machine learning models that make predictions about individual people’s futures using two datasets: (1) the Future of Families and Child Wellbeing Study, an in-depth longitudinal survey that contains data on many important aspects of participants’ lives starting from birth, and (2) Dutch administrative register data, which contains demographic, economic, employment, school, and social services data from the entire population of the Netherlands. 

My focus is on predictive models in child and family policy. I have been developing expertise in child and family policy since my undergraduate years, through a self-designed undergraduate major in Human Development and Social Policy, an internship in public benefits and community resource navigation, two years as a research assistant at the child and family policy research organization Child Trends, and a job as committee assistant for the Health and Human Services Committee of the New Mexico House of Representatives.

Updates

October 2024: Hanzhang Ren and I scored first place in Part 2 of the PreFer data challenge! This competition used the same outcome as Part 1 (births and adoptions), but this time, participants used Dutch administrative register data. We are now working on a paper to explore how sample size and feature sets affect predictive accuracy, as well as a paper on differences in predictability across demographic subgroups. Huge thanks to Lisa Sivak and Gert Stulp for organizing the challenge, to Malte Lüken and Flavio Hafner for assisting with supercomputer access, and to Juan Carlos Perdomo for his advice on using Catboost.

September 2024: Hanzhang Ren and I scored first place in Part 1 of the PreFer data challenge, in which participants competed to predict childbirths and adoptions using Dutch survey data! We are now working on a paper to explore which strategies contributed most to predictive accuracy.

May 2024: Check out our paper introducing the "REFORMS" checklist of recommendations for conducting and communicating high-quality machine-learning-based science! Huge thanks to Sayash Kapoor and Arvind Narayanan for leading this effort. Sayash, Arvind, and I were interviewed about the work here.


Tools
Translate to