In addition to homework assignments, there will be a final project on the subject of your choice.
In the past, students have reproduced a paper, developed a dataset, implemented an algorithm or approach, etc.
The only requirements vis-a-vis the topic of the project are:
That the project have a linguistic component to it,
That it involve the techniques and approaches we cover in this class,
That it involve an evaluation component, involving both quantitative and qualitative evaluations.
A couple of notes:
I encourage you to discuss possible ideas for your project with me early in the term- this is an excellent use of office hours!
Most often, the evaluation component of projects takes the form of an experimental evaluation, but depending on your project this may not be appropriate. If you are developing a dataset, for instance, your evaluation component will look rather different from how it would look if you were replicating a machine translation paper. Don’t be afraid to get creative with your evaluation, and please swing by office hours if you would like help brainstorming!
Please note that many current topics in NLP necessitate substantial computational resources, but many others do not- your choice of project should take into account what resources you have available, and should be feasible in that regard. I am happy to provide advice on this issue!
There exist many wonderful “out of the box” tools and models (e.g. large pre-trained language models, tools such as fairseq for machine translation, etc.); your project can certainly use such things, but must go beyond simply “downloading and running sample code”. If you are unsure about whether your idea might fall into this category, please ask!
The deliverables and grading breakdown for the project are as follows:
The grading rubric can be found at the end of this page.
For each of the written deliverables, you are required to use the ACL 2020 LaTeX style template (don’t forget to comment out the “final copy” command!), and to use BibTeX for your bibliography.
If you are not familiar with LaTeX, this is an excellent time to learn, but do make sure give yourself a little extra time when writing to allow for some trial-and-error.
The proposal must be turned in to me by the date shown on the course schedule, and must include the following components:
What task you propose to try and implement, paper you wish to replicate, etc.
What data set you will use, and where you will get it
What your experimental evaluation will consist of, including stated hypotheses
What possible obstacles you foresee, and what your “Plan B” will be
The written proposal must be a complete standalone prose document (not an outline, list of bullet points, collection of sentence fragments, etc.) covering the above points, with citations as appropriate. It need not be very long (2-3 pages), but it is important that it cover these elements in sufficient detail as to demonstrate that you have thought each part through. See the grading rubric for more details.
Along with the written proposal, you must also deliver a short (very strict time limit, TBD based on enrollment) in-class presentation, which should cover at a high level the points from your proposal. See the grading rubric below for an idea of what you should cover, and note that one slide per item is likely appropriate for the proposal presentation.
Check-in report
The purpose of the check-in report is to help ensure that things are going according to plan, and to keep you from encountering too many surprises at the end of the term.
The report should detail what you have finished thus far, and explain what is left to do. There is no minimum length, but it should contain (at a minimum):
A draft of the “Introduction” and “Background” sections of your final paper, including your literature review.
At least some preliminary result for your quantitative evaluation- e.g. the output of a very simple baseline model.
At least some preliminary result for your qualitative evaluation- e.g. a preliminary error analysis of some of your baseline model’s behavior.
If needed, a description of any departure from your original proposal that you may have taken.
Final written report
The final written report should follow the standard structure of a scientific article (background, methods, etc.), and should be in the ballpark of 8–10 pages, not counting references. You are expected to include at least some review of scientific literature relevant to your topic; the project proposal and check-in report will serve as a good start on this part of the paper.
All papers must include an “ethical considerations” section in which the author should discuss the broader ethical and social implications of their work (or, as appropriate, the implications of the subject area on which their project focuses). See the Ethics FAQ from the NAACL 2021 Call for Papers for a good description of what I’ve got in mind. If you are unsure about how to approach this section regarding your particular project, please ask.
End-of-term presentation
The in-class presentation will be delivered at the end of the term, and will show off your final result.
It should be about 10 minutes long, and should cover all parts of your project with an emphasis on what you did (i.e., the background portion should be relatively shorter).
The exact format of your presentation is up to you- you may wish to give a traditional research presentation, give a live demonstration of your project, etc.
Interpretive dance, theatrical renditions, and musical presentations are encouraged, as appropriate.
Grading Rubrics
In the interest of clarity and transparency, here is the grading rubric I will be using to evaluate your projects. Note that point totals will be weighted according to the breakdown given above.
Also, remember that these rubrics are not meant to serve as the outline for your paper; please do not use them as such. You will need to organize your paper according to a standard scientific paper in the field of NLP, as appropriate for your particular project.
Project Proposal:
Written portion:
Present (i.e., turned in): 1pt
Using appropriate ACL template: 1pt
Consists of actual prose (as opposed to bullet points) and is well-structured: 2pts
Topic is relevant and appropriate: 1pt
Includes detailed description of your proposed topic and task, including some discussion of relevant technical background: 5pts
Description of Dataset:
Includes detailed description of dataset to be used, including relevant citations, description of dataset size and characteristics, and justification for its use for this project: 3pts
Includes description of how student will access dataset (i.e., whether a data use agreement will be necessary, relevant LDC catalogue number, etc.): 3pt;
Description of evaluation plan:
Quantitative:
What will be measured (i.e., what aspect of system behavior)? 3pts
Why will this be measured? 3pts
How will this be measured (i.e., what metric(s) will you use to capture this?) 2pts
What are your hypotheses, and how will you test them? 3pts
Qualitative:
What will be examined (i.e., what aspect of system behavior)? 3pts
Why will this be examined? 3pts
How will this be examined? 2pts
Description of possible obstacles/barriers: 3pts
Description of alternatives: 2pts
Presentation:
Occurs: 2pts
Within time limit: 2pts
Describes task/paper/etc.: 1pt
Description includes appropriate citation: 1pt
Describes dataset: 1pt
Describes evaluation plan: 1pt
Describes barriers and alternatives: 1pt
Total: 49 pts.
Check-in Report
Present (i.e., turned in): 1pt
Using appropriate LaTeX template: 1pt
Draft of “Introduction”:
Clearly states and motivates the project’s aims: 3pt
Gives overview of structure of the paper, including high-level description of what experiments will be conducted: 3pt
Draft of “Background”:
Includes discussion of relevant related work: 5pt
Includes discussion of theoretical background for methods used in project (e.g. explanation of model architecture, etc.): 5pt
Preliminary results:
Quantitative:
Present: 3pt
Qualitative:
Present: 3pt
Discussion of plan for remainder of the project, including (if needed) description of variance from original proposal: 5pt
Total: 29
Final Project
Written portion:
Present: 1pt
Format:
Using appropriate template: 2pt
Using BibTeX for citations: 1pt
Follows appropriate structure for a scientific paper: 2pts
Content:
Introduction:
Clearly states and motivates the project’s aims: 3pt
Gives overview of structure of the paper, including high-level description of what experiments will be conducted: 3pt
Background:
Includes discussion of relevant related work: 5pt
Includes discussion of theoretical background for methods used in project (e.g. explanation of model architecture, etc.): 5pt
Methods:
Provides sufficient detail about datasets used (source, contents, languages, special considerations, etc.): 3pts
Provides clear and detailed description of methodologies used (training methods, hyperparameters chosen, libraries used, etc.): 2pts
Methodologies are appropriate for the task at hand: 3pts
Results:
Quantitative results reported clearly and properly: 5pts
Qualitative results reported clearly and properly: 5pts
Discussion & Conclusions: 10pts
Ethical Considerations: 3pts
Presentation:
Occurs: 2pts
Content:
Level of background detail appropriate: 3pts
Clear description of task/model/goal/etc. (can include demonstration, etc.): 5pts
Clear description of dataset and languages used: 2pts
Clear description of quantitative evaluation
method: 2pts
results: 2pts
Clear description of qualitative evaluation
method: 2pts
results: 2pts
Interpretation of results: 3pts
Discussion of ethical/social considerations and impact: 2pts
Organization & Delivery:
Fits within time limit: 3pts
Logical organization of talk (beginning, middle, end): 3pts