Resources

Where to find me

If you have any questions about anything we talked about, feel free to reach out and we can set up a time to chat. I can be reached via email but also have a website with more ways to find me. :-)

Some Things To Read

I don’t pretend that this is any kind of systematic, comprehensive, or exhaustive bibliography; please consider it a short and opinionated list of academic articles and books that I have found informative, as well as a few non-book readings that may be of interest. Obviously there is so much more I could have included!

Books

Most of the below links go to my neighborhood bookstore, Annie Bloom’s Books; if you prefer, you can find your local indie store on Bookshop.org. They are in no particular order!

Artificial Intelligence

Machine Learning and Natural Language Processing

Some Recent Non-Book Writing

Academic Articles

Again, this is not meant to be some sort of exhaustive bibliography! This is just my current list that I like to give to students who I am just starting out working with, or who contact me asking me for things to read about LLMs. Topics included in this list include theoretical discussions of large language models, more general discussions of issues in applied machine learning, one or two useful papers about AI/ML in biomedicine, and at least one that is more about reproducibility in computational research.

  1. Farrell H, Gopnik A, Shalizi C, Evans J. Large AI models are cultural and social technologies. Science. 2025 Mar 14;387(6739):1153–6.
  2. Hicks MT, Humphries J, Slater J. ChatGPT is bullshit. Ethics Inf Technol. 2024 Jun;26(2):38.
  3. Shanahan M. Talking about Large Language Models. Commun ACM. 2024 Feb;67(2):68–79.
  4. Lyons PG, Hofford MR, Yu SC, Michelson AP, Payne PRO, Hough CL, et al. Factors Associated With Variability in the Performance of a Proprietary Sepsis Prediction Model Across 9 Networked Hospitals in the US. JAMA Intern Med. 2023 Jun 1;183(6):611–2.
  5. Shah C, Bender EM. Situating Search. In: ACM SIGIR Conference on Human Information Interaction and Retrieval. Regensburg Germany: ACM; 2022. p. 221–32.
  6. Bender EM, Gebru T, McMillan-Major A, Shmitchell S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Virtual Event Canada: ACM; 2021. p. 610–23.
  7. Suresh H, Guttag J. A framework for understanding sources of harm throughout the machine learning life cycle. In: EAAMO equity and access in algorithms, mechanisms, and optimization. New York, NY, USA; 2021. p. 1–9.
  8. Bender EM, Koller A. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics; 2020. p. 5185–98.
  9. Passi S, Barocas S. Problem Formulation and Fairness. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. Atlanta GA USA: ACM; 2019. p. 39–48.
  10. Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, et al. Hidden technical debt in machine learning systems. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, editors. Advances in neural information processing systems. Curran Associates, Inc.; 2015.
  11. Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol. 2013 Oct 24;9(10):e1003285.
  12. Gordon J, Van Durme B. Reporting bias and knowledge acquisition. In: Proceedings of the 2013 workshop on automated knowledge base construction. New York, NY, USA: ACM; 2013. p. 25–30.
  13. Hand DJ. Classifier technology and the illusion of progress. Statistical Science. 2006;21(1):1–14.
  14. Weizenbaum J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM. 1966 Jan;9(1):36–45.

Software, etc.

  • In this workshop, we used the ellmer R package.
  • Luis D. Verde Arregoitia has published a phenominal compendium of other LLM-related R tools.
  • If you find yourself working in Python, you may wish to check out Instructor, which (among a lot of other things) provides some of the same functionality as ellmer does, for LLM-driven structured data extraction.