CS/EE 5/606: Information Retrieval

Spring 2015, Tuesdays & Thursdays at 2:15


Synopsis

How many information retrieval systems have you used since waking up this morning? Probably more than you think. Information retrieval systems, including but not limited to web search engines, product recommender systems, library catalogues, and social media applications represent vital tools for navigating our modern information ecosystem. The underlying algorithms and technologies that power these systems come from every corner of computer and information science, and have a rich and fascinating history.

In this course, we will study the art and science of information retrieval. We will cover a wide range of technical topics and applications of IR. Furthermore, because information is generally produced and consumed by humans, we will have a particular focus on issues surrounding human users of IR systems.

The course will not involve a final exam; however, it will involve a final project, which will include an in-class presentation as well as a formal written paper. There will also be several homework assignments, including a pilot study for the final project. Furthermore, students will be expected to present at least one paper in class as well as participate in class discussions. There will also be a considerable amount of reading assigned, which students will be expected to actually do.


Fig. 1: Vannevar Bush's 'Memex, a hypothetical electro-mechanical hypertext system described in 1945 and arguably the blueprint for modern information systems.

Learning objectives

By the end of the course, students will:


Prerequisites

A working knowledge of programming is required for this course. While not strictly required, many parts of this course will be easier if you are familiar with standard mathematical concepts used in NLP: n-gram language models, Bayesian statistics, etc.


Instructor

Fig. 2: The instructor, next to a large sculpture of the majestic Crotaphytus bicintores.

CS/EE 5/655 is being taught by Steven Bedrick. He can usually be found in his natural habitat, Gaines Hall room 19. While he has no set office hours, GH is far enough off the beaten path that you should probably schedule something with him before making the schlep.

We strongly encourage you to consult the Student Health Center for guidance about any pre-travel immunizations that may be required before visiting Gaines Hall.


Textbook

In addition to a large number of articles and book chapters, we will be using the following texts extensively:


Schedule

Date Topic Reading HW Assigned
Mar 31 (T) Course Overview; Information Behavior Hearst Ch. 3, Case Ch. 3, Belkin (1980), and Patterson (2001). HW1
Apr 2 (Th) IR Basics Manning, et al. Ch. 1 and 2  
Apr 7 (T) IR Models: Boolean, Vector, Probabilistic Manning, et al. Ch. 6; Zhai 2007 (only sections 1 and 2); Sparck Jones 1972 and Robertson 2004. HW2
Apr 9 (Th) Index Construction/Optimization/Compression Manning, et al. Ch. 4 and 5  
Apr 14 (T) Experimental Evaluation Manning et al. Ch 8., Hearst Ch. 2, Cleverdon 1991, and Käki & Aula 2008.  
Apr 16 (Th) Web search, PageRank Manning et al. Ch19 and 20, Leskovec Ch 5, Kurland & Lee 2010, and Bing 2014.  
  Note: The Manning textbook also has a good chapter on link analysis; it overlaps enough with the Leskovec chapter that I'm not assigning it as reading, but you might find it useful and/or easier to follow than the Leskovec chapter.
Apr 21 (T) No Class — Steven in Bethesda    
Apr 23 (Th) Search UI/UX (Presenter: Joe Hamilton) Hearst in Baeza-Yates & Ribeiro-Neto, Ch. 2; Hearst Ch. 1, 4 and 5, Wu et al. 2012, Clarke et al. 2007, and Guan 2007.  
  There is some overlap in today's readings, but much less than it may initially appear.
Apr 28 (T) Learning From User Behavior Jiang 2013, Jones & Klinkner 2008, Agichtein et al. 2006, and Lagun et al. 2014.  
Apr 30 (Th) Relevance Feedback Manning et al. Ch 9, Lee & Croft 2013, Caballero & Akella 2012  
May 5 (T) Query suggestion/reformulation (Presenter: Joseph Hackman) Hearst Ch. 6, Huang & Efthimiadis 2009, Jain et al. 2011, Ozertem et al. 2012  
May 7 (Th) Machine Learning & Ranking (Presenter: Shiran) Manning et al. Ch 15, Liu 2009 sections 1–5, Zhu et al. 2014  
  In the Manning chapter, focus on the section on "Machine learning methods in ad hoc information retrieval" (15.4). Don't be put off by the length of the Liu paper- the page layout involves very large margins, and it's not as long as it looks!
May 12 (T) Multimedia Retrieval (Presenter: Krystal) Larson & Jones 2012 (Sections 1, 2, and 6), Mei et al. 2014, Kennedy & Naaman 2008, Apostolova 2013, Zhang et al. 2012 Note: The Larson and Mei articles are background; Krystal will be presenting the others
May 14 (Th) Document clustering (Presenter: Joel) Manning et al. Ch 17, Slaney 2008, Cohen 2010, Chappell 2013  
May 19 (T) Microblog search, Time & Space (Presenter: Meikun) Teevan 2011 and Woodward 2015 (as background), Bennett 2011, Cheng 2014, Mishra 2014 Pilot project presentations!
May 21 (Th) Cross-Language IR (Presenter: Allison) Zhou et al. 2012 (as background); Oard 2008; Steichen et al. 2015; Nikoulina et al. 2012  
May 26 (T) Guest Lecture: Bill Hersh, MD (OHSU) Hersh 2014, Lin 2008, Stanton 2014 NOTE: Class will start at 3:15 today
May 28 (Th) Guest Lecture: Stephen Wu, PhD (Mayo Clinic, TrapIt)    
June 2 (T) No class: NAACL    
June 4 (Th) No class: NAACL    
June 11 (Th) Project presentations!    
June 12 (F) Project presentations!    

Logistics

CS5/606 will be held Tuesdays and Thursdays, from 4:00 to 5:30 PM, in GH5.


Homework

We will have homework, (basically) all of which will involve programming. The point of the homework is to give you "hands on" experience with the algorithms and techniques we'll be covering, not to learn how to write production-ready code. For some of the assignments, I will provide "scaffolding" code that may save you significant time; mostly, this code will be written in either Python or Java. If you want to use something else to do the assignment, you are of course free to do so.

The homework assignments will all come with a "due date." Assignments will be due at 11:59 pm (Portland time) on their due date. If you think you will need additional time to complete an assignment, let me know as soon as possible. If something serious and unexpected comes up at the last minute (illness, family emergency, etc.), we'll work something out.

The deliverables for the final project are an in-class presentation and a short paper done in the style of a conference submission: a maximum of eight pages, not counting references. The writeup will be due on June 15. Unless otherwise agreed, this will be a hard deadline, as grades are due later that week.


Grading

Your grade will be based on three things: in-class participation (including paper presentations) (30%), homework (30%), and the final project (40%).

Resources

Useful books

Websites of note

We will be filling these in as we go along!

Articles [↑]


Student Access Statement

Our program is committed to all students achieving their potential. If you have a disability or think you may have a disability (physical, learning, hearing, vision, psychological) which may need a reasonable accommodation please contact Student Access at (503) 494-0082 or e-mail studentaccess@ohsu.edu to discuss your needs. You can also find more information at www.ohsu.edu/student-access. Because accommodations can take time to implement, it is important to have this discussion as soon as possible. All information regarding a student’s disability is kept in accordance with relevant state and federal laws.