CS 5/662, Winter 2021

**Week 2 /
Wed, Jan 13 &
**

We have numerous reading options today. You’ll pick from at least one of these three choices:

- Eisenstein Chapters 2 and 3
- Eisenstein’s treatment of this subject is well-written, clear, and concise, but assumes a lot of comfort with mathematical notation and doesn’t give as much background as one might like to see

- Goldberg Chapters 2 and 3
- Goldberg covers much of the same material, but from a slightly different perspective and with different notation.
- Why do I mention notation? Because different people prefer different styles, and Goldberg’s may be easier to follow along with for some folks.

- J&M Chapters 4 and 5
- These are longer and include more context and background, and are also much gentler in terms of how they approach the math.

For the homework, we will be using a similar notation to that used by Eisenstein.

My personal advice is to pick one of the above to read closely, and then skim the others (they are mostly short chapters). It is very educational to see how different authors approach the same subject!

- Goldberg, Chapters 6 and 7
- J&M Chapter 4 (if you didn’t read it for Wednesday)
- Eisenstein Chapter 4
- @tw, Evaluating language identification performance, 11/16/2015
- Dror, R., Baumer, G., Shlomov, S., & Reichart, R. (2018). The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing. ACL 2018
- Gorman & Bedrick, We Need to Talk About the Standard Splits. ACL 2019

- Resnik, P and Lin, J. “Evaluation of NLP Systems” in Clark et al., eds. “The Handbook of Computational Linguistics and Natural Language Processing”
- Ng & Jordan. On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes. in Proc. NeurIPS 2002
- Hand, D. J. (2006). Classifier Technology and the Illusion of Progress. Statistical Science, 21(1), 1–14.