Resources

Where to find me

If you have any questions about anything we talked about, feel free to reach out and we can set up a time to chat. I can be reached via email but also have a website with more ways to find me. :-)

Some Things To Read

I don’t pretend that this is any kind of systematic, comprehensive, or exhaustive bibliography; please consider it a short and opinionated list of academic articles and books that I have found informative, as well as a few non-book readings that may be of interest. Obviously there is so much more I could have included!

Books

Most of the below links go to my neighborhood bookstore, Annie Bloom’s Books; if you prefer, you can find your local indie store on Bookshop.org. They are in no particular order!

Artificial Intelligence

Emily Bender & Alex Hanna, “The AI Con: How To Fight Big Tech’s Hype and Create the Future We Want”. Harper. May 2025.
Arvind Narayanan & Sayash Kapoor, “AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference”. Princeton University Press. September 2024.
Melanie Mitchell, “Artificial Intelligence: A Guide for Thinking Humans”. Picador. November 2020.
Brian Christian, “The Alignment Problem: Machine Learning and Human Values”. Norton & Co. October 2021.
Dan McQuillan, Resisting AI: An Anti-fascist Approach to Artificial Intelligence. Bristol University Press, August 2022.

Machine Learning and Natural Language Processing

Moritz Hardt & Benjamin Recht, “Patterns, Predictions, and Actions: A story about machine learning”. Princeton University Press. October 2022.
- Note that the authors have made the full preprint available online.
Dan Jurafsky & James Martin, “Speech and Language Processing, 3rd ed.” (forthcoming, updated chapters posted regularly)

Broader Ethical, Legal, and Societal Impacts

Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, September 2017.
Virginia Eubanks, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. Picador, August 2019.
Ruha Benjamin, Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press, June 2019.
Safiya Umjoa Noble, Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press, February 2018.

Some Recent Non-Book Writing

Dan McQuillan, “The Role of the University is to Resist AI”. Posted June 2025.
Ivy Grey, “Why You’re Thinking About “Reasoning” All Wrong”. Posted May 2025.
Klaudia Jaźwińska and Aisvarya Chandrasekar, “AI Search Has A Citation Problem”. Posted March 2025.
Temese Szalai, “Can — and Should — We Trust Large Language Models for Health Literacy?”. Posted November 2023.
Leonardo Nicoletti and Dina Bass, “Humans Are Biased. Generative AI Is Even Worse”. Posted March 2024.
Leon Yin, Davey Alba, and Leonardo Nicoletti, “OpenAI’s GPT Is a Recruiter’s Dream Tool. Tests Show There’s Racial Bias”. Posted March 2024.

Academic Articles

Again, this is not meant to be some sort of exhaustive bibliography! This is just my current list that I like to give to students who I am just starting out working with, or who contact me asking me for things to read about LLMs. Topics included in this list include theoretical discussions of large language models, more general discussions of issues in applied machine learning, one or two useful papers about AI/ML in biomedicine, and at least one that is more about reproducibility in computational research.

Farrell H, Gopnik A, Shalizi C, Evans J. Large AI models are cultural and social technologies. Science. 2025 Mar 14;387(6739):1153–6.
Hicks MT, Humphries J, Slater J. ChatGPT is bullshit. Ethics Inf Technol. 2024 Jun;26(2):38.
Shanahan M. Talking about Large Language Models. Commun ACM. 2024 Feb;67(2):68–79.
Lyons PG, Hofford MR, Yu SC, Michelson AP, Payne PRO, Hough CL, et al. Factors Associated With Variability in the Performance of a Proprietary Sepsis Prediction Model Across 9 Networked Hospitals in the US. JAMA Intern Med. 2023 Jun 1;183(6):611–2.
Shah C, Bender EM. Situating Search. In: ACM SIGIR Conference on Human Information Interaction and Retrieval. Regensburg Germany: ACM; 2022. p. 221–32.
Bender EM, Gebru T, McMillan-Major A, Shmitchell S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Virtual Event Canada: ACM; 2021. p. 610–23.
Suresh H, Guttag J. A framework for understanding sources of harm throughout the machine learning life cycle. In: EAAMO equity and access in algorithms, mechanisms, and optimization. New York, NY, USA; 2021. p. 1–9.
Bender EM, Koller A. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics; 2020. p. 5185–98.
Passi S, Barocas S. Problem Formulation and Fairness. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. Atlanta GA USA: ACM; 2019. p. 39–48.
Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, et al. Hidden technical debt in machine learning systems. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, editors. Advances in neural information processing systems. Curran Associates, Inc.; 2015.
Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol. 2013 Oct 24;9(10):e1003285.
Gordon J, Van Durme B. Reporting bias and knowledge acquisition. In: Proceedings of the 2013 workshop on automated knowledge base construction. New York, NY, USA: ACM; 2013. p. 25–30.
Hand DJ. Classifier technology and the illusion of progress. Statistical Science. 2006;21(1):1–14.
Weizenbaum J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM. 1966 Jan;9(1):36–45.

Software, etc.

In this workshop, we used the ellmer R package.
Luis D. Verde Arregoitia has published a phenominal compendium of other LLM-related R tools.
If you find yourself working in Python, you may wish to check out Instructor, which (among a lot of other things) provides some of the same functionality as ellmer does, for LLM-driven structured data extraction.