TREC Genomics Track - Roadmap

William Hersh, Track Chair
Oregon Health & Science University
hersh@ohsu.edu

This document provides a roadmap for the TREC Genomics Track from 2004-2008.  It was developed by the track steering committee in January-February, 2003. We recognize that new data resources, information tools, and even new understandings of the genome may lead us to change our specific path along the way, although we will not deviate from our goal of improving IR systems in the genomics domain.

General Context

Before describing the details of the roadmap, we will describe the experimental context.  We have developed a model of the different facets of experiments that includes four categories, as shown in Table 1.

Table 1 - Facets of experiments for this project.

Facet

Elements

Data

Citation databases (e.g., bibliographic databases)
Full-text literature (e.g., journal articles)
Summary resources (e.g., textbooks, review articles)
Nontextual data (e.g., sequence or structure data)
Genome databases (e.g., mouse, yeast)
Gene/protein function annotations (e.g. GeneOntology, LocusLink, and GeneRIF)

Tasks

Exhaustive retrieval
Question-answering
Finding summary information
Categorizing output (e.g., into subsets such as diagnosis, pharmacology, etc.)
Annotation/curation
Integration of information using all of these data sources and results

Users

Scientists
Clinicians
Non-scientists

Experiments

Batch
Interactive

Our general aim is to add a new facet to the track each year.  The current year (Year 0) is taking place in 2003.  The remainder of this section provides a road map for how the project will operate in years 1-5, as summarized in Table 2.  For each year, we present narrative description of the year’s track activities.   Also for each year is a table that provides specific details about that year’s experiments.

Table 2 - Five-year overview of project

Year

Track Goal

1

Expand data:  add new information resources, including full-text articles, summarizing textbooks, and other databases.

2

Expand tasks:  add more complex user tasks than just finding information on genes.

3

Expand experiments:    add real users who integrate various information needs.

4

Expand users:  address different types of users, including non-scientists.

5

Update and refine test collection.    Create resource that provides education on IT evaluation.

Year 1

The infrastructure task of the first year will be to expand beyond the MEDLINE data of Year 0.  The additional resources we could include are enormous, e.g., the 100+ databases in the catalog of Baxavenis (Nucleic Acids Research, 2003).  Based on our guiding principles noted above, we will aim to expand the data to include additional publicly available resources.

Of the new resources, probably the most challenging to obtain will be full-text journal articles.  As noted above, we will be assisted by Highwire Press.  We will also utilize journal articles freely available in public repositories from resources such as PubMed Central and Biomed Central.  Gaining access to other data will be easier due to its public availability.  MEDLINE is available publicly in an annual release ( http://www.nlm.nih.gov/databases/leased.html ).  NCBI resources such as OMIM, GeneBank, and LocusLink are available by FTP (ftp://ftp.ncbi.nih.gov/ ).  We will also use other books available from the “NCBI Bookshelf” ( http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books ).  Other resources we will incorporate into the test collection include some of the model organism databases, such as Mouse Genome Informatics System ( http://www.informatics.jax.org/ ) that provides linkage and annotation across an even wider spectrum of genes and their functions.

In Year 1, we will also go beyond purely textual resources to provide access to database and more structured materials.  We will make use of LocusLink, which provides a switchboard of information about genes.  Unlike Year 0, where LocusLink will be used to obtain proxies for relevance judgments, in subsequent years LocusLink will be usable as a resource to harvest knowledge (e.g., annotations, literature linkages, etc.) to improve searching.  We will also incorporate sequence databases, such as GenBank and SWISS-PROT.  We will not require systems to use sequence data proper, but will instead utilize these resources for our focus on improving text retrieval.  The two databases contain, for example, links to journal articles as well as descriptive data about their gene and protein sequences that some research groups may attempt to leverage to improve their systems.

Content

MEDLINE (larger subset over Year 0)
Full-text (via Highwire publishers)
Books - OMIM and NCBI Bookshelf
LocusLink
GenBank
SWISS-PROT
Mouse Genome Database

Task(s)

Find all documents about the function of a gene (50 queries)

Metric(s)

Mean average precision

Judgments

Binary relevance

Experiment

Batch

Statistics

Analysis of variance with posthoc Sheffe test to compare groups

Resources

Obtain content and standardize format
Relevance judgments

Extensions

Annotate function of genes
Perform interactive experiments

Year 2

In the second year, we will add more complex query tasks than just finding information about the function of genes.  The same content from Year 1 will be used.  The queries, however, will be different.  We will adopt question-answering, using an approach modeled on the TREC Question-Answering Track.  Similar to the approach used in the 2002 version of that track, we will include not only simple short-answer questions, but also instance questions (i.e., those which have multiple answers) as well as discourse questions (i.e., those in which there are a series of questions where the answer is dependent upon the context of a prior question).  For example, an instance question might be to find all K-channel genes which lead to neurologic abnormalities in different organisms.  Likewise, a discourse question might be to build on the answer of the question in the prior sentence and find all reported polymorphisms of these genes.  As the TREC Question-Answering Track has found that relevance assessments can be done more quickly for these types of questions, we will develop 100 or more questions for this task.

Systems will be required to not only identify the relevant document, but also extract the snippet of text which provides the answer to the question.  The metric for evaluating systems will be the same as used in the TREC Question-Answering Track, the mean reciprocal rank.

Content

Same as Year 1

Task(s)

Answer specific questions about a gene (100 queries)

Metric(s)

Mean reciprocal rank of document plus text containing answer

Judgments

Location in document of answer

Experiment

Batch

Statistics

Analysis of variance with posthoc Sheffe test to compare groups

Resources

Answer judgments

Extensions

Perform interactive experiments

Year 3

In the third year, we will add interactive experiments with real users who will be required to integrate various information needs.  We will allow participants to perform both quantitative and qualitative studies beyond a common baseline that each group will be required to carry out.  These experiments will be informed by the experience of the TREC Interactive Track.  As these experiments will be user-oriented, research groups with an interest in user interfaces and other aspects of human-computer interaction will find them particularly appealing.

The experimental approach will follow the model established by the TREC Interactive Track.  A common set of content, user tasks/problems, and baseline data about the experiments to be collected will be specified.  The content will consist of the same broad spectrum as used for the first two years.  The users will be presumed to be biologists or graduate students in biology seeking knowledge in a new area where they are informed but not experts.  The user tasks will focus on users solving particular problems, e.g., determining what a gene does or what abnormality a defect in it causes.  The relevance judges will determine the “correct” solutions to the tasks and “grade” them appropriately.   The minimum data about users to be collected will include demographics (e.g., age, gender), experience (both in searching and biology), and search logging (e.g., terms used, complexity of queries, time taken).

Employing real users of course limits the number of queries we can use.  Experience in the Interactive Track has shown that users take about 15 minutes to do tasks like these, and that they fatigue after a couple hours (i.e., six to eight tasks) of work.  Fortunately, a reasonable number of users (16 to 32) has usually provided enough statistical power to begin to detect statistically significant differences where they exist.

Groups will be encouraged to extend the experiments beyond the baseline through additional analysis pertinent to their systems.  They may carry out more extensive usability testing or think-aloud protocol analysis, which has been used by some participants in the TREC Interactive Track in the past.  We will also encourage participants to carry out cross-site experiments.  To do this, groups will need an adequate amount of reliable bandwidth, especially if applications have graphic-intensive user interfaces.  Fortunately, most participating groups, at least those at universities, have access to Internet2, so this should be feasible.

Content

Same as Year 1

Task(s)

Ask user to solve multi-faceted problem (8 problems)

Metric(s)

Correctness of answer

Judgments

Answers to problems

Experiment

Interactive

Resources

Problem answer judgments

Statistics

Generalized estimating equations

Extensions

More extensive usability testing, e.g., think-aloud protocol analysis, capture and analysis of user actions, etc.
Cross-site experiments

Year 4

In the fourth year, we will investigate different types of users, in particular non-scientists.  As there is a growing need for non-experts to understand the facts of genomics research better so they can be informed citizens, we will add additional resources to the collection that are oriented to non-scientists.  We will then develop tasks appropriate to these users and design both batch and interactive experiments to assess them.

The diversity of non-scientist users, of course, is very large, so we will need to limit our focus to a relatively homogeneous group of users in order for our experiments to have validity.  In other words, having too heterogeneous of a group will preclude being able to draw inferences, particularly statistical ones.  We will thus limit the focus to a group that is relatively homogeneous as well as convenient for most university-based TREC groups likely to participate in these experiments:  undergraduates whose majors are not in the life sciences.

Experiments with non-scientists will require us to modify the data sources.  We will remove the highly technical journal article and MEDLINE information from the collection but will keep the summarizing resources such as OMIM and NCBI Bookshelf.  We will add material that is more basic in nature, such as “Getting Started” materials from NCBI ( http://www.ncbi.nlm.nih.gov/About/outreach/gettingstarted/ ).

Content

NCBI “Getting Started”
Other non-scientist-oriented genomics materials
Books - OMIM and NCBI Bookshelf
Allowable use of Year 1 content behind the scenes

Task(s)

Answer questions about gene function

Metric(s)

Correctness of answer

Judgments

Answers to problems

Experiment

Interactive

Statistics

Generalized estimating equations

Resources

Problem answer judgments

Extensions

Extend beyond undergraduates to other populations of searchers

Year 5

In the final year, we will re-evaluate the collection to make sure the queries and data are appropriately up to date so that experimentation can continue after this funding ends.  In this year we will return to scientist users, resurrecting the experiments of years 2 and 3.

Content

Updated material from Year 1 or other appropriate sources

Task(s)

Answer specific questions about the function of a gene
Ask user to solve multi-faceted problem

Metric(s)

Identification of text containing answer
Correctness of answer

Judgments

Location in document of answer
Answers to problems

Experiment

Batch
Interactive

Statistics

Analysis of variance with posthoc Sheffe test to compare groups
Generalized estimating equations

Resources

Problem answer judgments

Extensions

Enhancing those from previous years

Concluding Remarks

As noted already, this roadmap is not a definitive plan, but a general overview of where the track might head.  We recognize that new technologies, priorities, etc. might change things.  Perhaps its most important value is to provide a reference for further discussion.

Last updated - November 12, 2003