TREC 2005 Genomics Track Data

This page lists the files that are in the distribution for the TREC Genomics Track.  More detail about these files can be found in the 2005 track protocol.  The data files themselves are located in the active user portion of the track Web site at http://ir.ohsu.edu/genomics/data/.  This area is password protected, with the password only available to those who have completed data usage agreements and/or are registered for TREC.  Please note that the data in the active user portion of the NIST TREC Web site are out of date and incomplete.

Ad Hoc Retrieval Task

Documents

Topics

Relevance judgments

Categorization Task

Documents

PMID-full text crosswalk

Sample, training, and test files

File contents
Sample output file
Training data file
Test data file
A (alelle) sample.Atrain.txt (18 KB)
Atrain.txt (4 KB)
Atest.txt (4 KB)
E (expression) sample.Etrain.txt (64 KB)
Etrain.txt (1 KB)
Etest.txt (2 KB)
G (GO annotation) sample.Gtrain.txt (28 KB)
Gtrain.txt (5 KB)
Gtest.txt (6 KB)
T (tumor) sample.Ttrain.txt (16 KB)
Ttrain.txt (1 KB)
Ttest.txt (1 KB)

Gene tagging of MEDLINE corpus

Evaluation program cat_eval

The required version of the program is 2.0, updated on Sept. 9, 2005.  Sample data to test the program are provided as described in the above table.  The program is provided in source code and as a Windows executable (see protocol page for documentation):
Last update - September 30, 2005