TREC 2004 Genomics Track Data

This page lists the files that are in the distribution for the TREC Genomics Track 2004.  More detail about these files can be found in the 2004 track protocol.  There is also a description of how the files were updated.  To use the data, read the instructions below.

Ad Hoc Retrieval Task



Relevance judgments

Categorization Task


PMID-full text crosswalk

Triage subtask

Categorization subtask

File contents
Training data file name
Test data file name
Documents - PMIDs
ptrain.txt (5 KB)
ptest.txt (4 KB)
Genes - Gene symbol, MGI identifier, and gene name for all used
gtrain.txt (68 KB)
gtest.txt (41 KB)
Document gene pairs - PMID-gene pairs pgtrain.txt (24 KB)
pgtest.txt (24 KB)
Positive examples - PMIDs
p+train.txt (2 KB)
p+test.txt (2 KB)
Positive examples - PMID-gene pairs
pg+train.txt (6 KB)
pg+test.txt (5 KB)
Positive examples - PMID-gene-domain tuples
pgd+train.txt (12 KB)
pgd+test.txt (10 KB)
Positive examples - PMID-gene-domain-evidence tuples
pgde+train.txt (15 KB)
pgde+test.txt (12 KB)
Positive examples - all PMID-gene-GO-evidence tuples
all+train.txt (92 KB)
all+test.txt (48 KB)
Negative examples - PMIDs
p-train.txt (4 KB)
p-test.txt (3 KB)
Negative examples - PMID-gene pairs
pg-train.txt (19 KB)
pg-test.txt (10 KB)

Also available are the original files that were used for the results reported in the official track runs.  The derivation of these files is described on the data update page.  The original files have been renamed and include:

Evaluation program cat_eval

Accessing the files

Excluding the MEDLINE XML files (which are identical to the ASCII MEDLINE files except for their format), there are 45 files totaling 2.9 GB in size.  The data files are available on this site.

The document files are:
The other archived/compressed files that do not require signing the data usage agreement include:
The rest of the files also do not require signing the data usage agreement.  They are archived/compressed into a single file,, which contains the following files:
Last update - April 6, 2015