CS 5/662, Winter 2021

# Final Project

## Overview

In addition to homework assignments, there will be a final project on the subject of your choice. In the past, students have reproduced a paper, developed a dataset, implemented an algorithm or approach, etc. The only requirements vis-a-vis the topic of the project are:

• That the project have a linguistic component to it,
• That it involve the techniques and approaches we cover in this class,
• That it involve an evaluation component, involving both quantitative and qualitative evaluations.

A couple of notes:

• I encourage you to discuss possible ideas for your project with me early in the term- this is an excellent use of office hours!
• Most often, the evaluation component of projects takes the form of an experimental evaluation, but depending on your project this may not be appropriate. If you are developing a dataset, for instance, your evaluation component will look rather different from how it would look if you were replicating a machine translation paper. Don’t be afraid to get creative with your evaluation, and please swing by office hours if you would like help brainstorming!
• Please note that many current topics in NLP necessitate substantial computational resources, but many others do not- your choice of project should take into account what resources you have available, and should be feasible in that regard. I am happy to provide advice on this issue!
• There exist many wonderful “out of the box” tools and models (e.g. large pre-trained language models, tools such as fairseq for machine translation, etc.); your project can certainly use such things, but must go beyond simply “downloading and running sample code”. If you are unsure about whether your idea might fall into this category, please ask!

The deliverables and grading breakdown for the project are as follows:

For each of the written deliverables, you are required to use the ACL 2020 LaTeX style template (don’t forget to comment out the “final copy” command!), and to use BibTeX for your bibliography. If you are not familiar with LaTeX, this is an excellent time to learn, but do make sure give yourself a little extra time when writing to allow for some trial-and-error.

If you are an Overleaf user, the ACL 2020 style is available as a project template.

## Project proposal

The proposal must be turned in to me by the date shown on the course schedule, and must include the following components:

• What task you propose to try and implement, paper you wish to replicate, etc.
• What data set you will use, and where you will get it
• What your experimental evaluation will consist of, including stated hypotheses
• What possible obstacles you foresee, and what your “Plan B” will be

The written proposal must be a complete standalone prose document (not an outline, list of bullet points, collection of sentence fragments, etc.) covering the above points, with citations as appropriate. It need not be very long (2-3 pages), but it is important that it cover these elements in sufficient detail as to demonstrate that you have thought each part through. See the grading rubric for more details.

Along with the written proposal, you must also deliver a short (very strict time limit, TBD based on enrollment) in-class presentation, which should cover at a high level the points from your proposal. See the grading rubric below for an idea of what you should cover, and note that one slide per item is likely appropriate for the proposal presentation.

## Check-in report

The purpose of the check-in report is to help ensure that things are going according to plan, and to keep you from encountering too many surprises at the end of the term. The report should detail what you have finished thus far, and explain what is left to do. There is no minimum length, but it should contain (at a minimum):

• A draft of the “Introduction” and “Background” sections of your final paper, including your literature review.
• At least some preliminary result for your quantitative evaluation- e.g. the output of a very simple baseline model.
• At least some preliminary result for your qualitative evaluation- e.g. a preliminary error analysis of some of your baseline model’s behavior.
• If needed, a description of any departure from your original proposal that you may have taken.

## Final written report

The final written report should follow the standard structure of a scientific article (background, methods, etc.), and should be in the ballpark of 8–10 pages, not counting references. You are expected to include at least some review of scientific literature relevant to your topic; the project proposal and check-in report will serve as a good start on this part of the paper.

All papers must include an “ethical considerations” section in which the author should discuss the broader ethical and social implications of their work (or, as appropriate, the implications of the subject area on which their project focuses). See the Ethics FAQ from the NAACL 2021 Call for Papers for a good description of what I’ve got in mind. If you are unsure about how to approach this section regarding your particular project, please ask.

## End-of-term presentation

The in-class presentation will be delivered at the end of the term, and will show off your final result. It should be about 10 minutes long, and should cover all parts of your project with an emphasis on what you did (i.e., the background portion should be relatively shorter). The exact format of your presentation is up to you- you may wish to give a traditional research presentation, give a live demonstration of your project, etc. Interpretive dance, theatrical renditions, and musical presentations are encouraged, as appropriate.

In the interest of clarity and transparency, here is the grading rubric I will be using to evaluate your projects. Note that point totals will be weighted according to the breakdown given above.

Also, remember that these rubrics are not meant to serve as the outline for your paper; please do not use them as such. You will need to organize your paper according to a standard scientific paper in the field of NLP, as appropriate for your particular project.

### Project Proposal:

• Written portion:
• Present (i.e., turned in): 1pt
• Using appropriate ACL template: 1pt
• Consists of actual prose (as opposed to bullet points) and is well-structured: 2pts
• Topic is relevant and appropriate: 1pt
• Includes detailed description of your proposed topic and task, including some discussion of relevant technical background: 5pts
• Description of Dataset:
• Includes detailed description of dataset to be used, including relevant citations, description of dataset size and characteristics, and justification for its use for this project: 3pts
• Includes description of how student will access dataset (i.e., whether a data use agreement will be necessary, relevant LDC catalogue number, etc.): 3pt;
• Description of evaluation plan:
• Quantitative:
• What will be measured (i.e., what aspect of system behavior)? 3pts
• Why will this be measured? 3pts
• How will this be measured (i.e., what metric(s) will you use to capture this?) 2pts
• What are your hypotheses, and how will you test them? 3pts
• Qualitative:
• What will be examined (i.e., what aspect of system behavior)? 3pts
• Why will this be examined? 3pts
• How will this be examined? 2pts
• Description of possible obstacles/barriers: 3pts
• Description of alternatives: 2pts
• Presentation:
• Occurs: 2pts
• Within time limit: 2pts
• Description includes appropriate citation: 1pt
• Describes dataset: 1pt
• Describes evaluation plan: 1pt
• Describes barriers and alternatives: 1pt

Total: 49 pts.

### Check-in Report

• Present (i.e., turned in): 1pt
• Using appropriate LaTeX template: 1pt
• Draft of “Introduction”:
• Clearly states and motivates the project’s aims: 3pt
• Gives overview of structure of the paper, including high-level description of what experiments will be conducted: 3pt
• Draft of “Background”:
• Includes discussion of relevant related work: 5pt
• Includes discussion of theoretical background for methods used in project (e.g. explanation of model architecture, etc.): 5pt
• Preliminary results:
• Quantitative:
• Present: 3pt
• Qualitative:
• Present: 3pt
• Discussion of plan for remainder of the project, including (if needed) description of variance from original proposal: 5pt

Total: 29

### Final Project

• Written portion:
• Present: 1pt
• Format:
• Using appropriate template: 2pt
• Using BibTeX for citations: 1pt
• Follows appropriate structure for a scientific paper: 2pts
• Content:
• Introduction:
• Clearly states and motivates the project’s aims: 3pt
• Gives overview of structure of the paper, including high-level description of what experiments will be conducted: 3pt
• Background:
• Includes discussion of relevant related work: 5pt
• Includes discussion of theoretical background for methods used in project (e.g. explanation of model architecture, etc.): 5pt
• Methods:
• Provides sufficient detail about datasets used (source, contents, languages, special considerations, etc.): 3pts
• Provides clear and detailed description of methodologies used (training methods, hyperparameters chosen, libraries used, etc.): 2pts
• Methodologies are appropriate for the task at hand: 3pts
• Results:
• Quantitative results reported clearly and properly: 5pts
• Qualitative results reported clearly and properly: 5pts
• Discussion & Conclusions: 10pts
• Ethical Considerations: 3pts
• Presentation:
• Occurs: 2pts
• Content:
• Level of background detail appropriate: 3pts
• Clear description of task/model/goal/etc. (can include demonstration, etc.): 5pts
• Clear description of dataset and languages used: 2pts
• Clear description of quantitative evaluation
• method: 2pts
• results: 2pts
• Clear description of qualitative evaluation
• method: 2pts
• results: 2pts
• Interpretation of results: 3pts
• Discussion of ethical/social considerations and impact: 2pts
• Organization & Delivery:
• Fits within time limit: 3pts
• Logical organization of talk (beginning, middle, end): 3pts
• Transitions between major points are clear: 2pts