Relevance Analysis for Test Data of Primary Task of TREC Genomics Track

Bill Hersh, Sarah Corley, Ravi Teja Bhupatiraju
Oregon Health & Science University


All GeneRIFs and up to the top 20 retrieved documents retrieved by the OHSU Boost run (test topics) were evaluated for relevance.  The evaluation was done by a physician who is also a biomedical informatics graduate student.

For the GeneRIFs and up to the top 20 retrieved documents, we evaluated each for:
Document relevant for query
Document not relevant for query
Document relevant for a different species
Unable to determine relevance due to insufficient text in MEDLINE record


Essentially all GeneRIFs were relevant.  The number of GeneRIFs per topic varied from 3 to 66, with average of 11.  Of the 566 GeneRIFs for the 50 topics, 551 (97.3%) were deemed relevant.  None were deemed not relevant, but 13 (2.3%) were deemed not relevant because they represented another species and 2 (0.4%) were deemed not relevant because there was insufficient text (lack of abstract and insufficient title) to judge.

There were very few nonrelevant documents, probably due to the broad nature of the query.  The table shows the four categories for documents broken down for their being GeneRIFs or not.  Only 9.2% of the documents were not relevant, with none of the nonrelevant documents being GeneRIFs.  Virtually all of the GeneRIFs were relevant, although 2 were relevant for other species.  About 42% of the documents were relevant but not GeneRIFs, while about 36% were not GeneRIFs and relevant for other species.  The latter two columns show the average and standard deviation for the number of each category.  Five topics were eliminated from this analysis for having 12 or fewer documents retrieved.  (The other 45 topics had >16 retrieved.)

Document rating Total for 50 topics % for 50 topics Average for a topic (>16 retrieved) Standard Deviation for a topic (>16 retrieved)
GRIF & Relevant 117 12.7% 2.53 1.85
GRIF & Not relevant 0 0.0% 0 -
GRIF & Relevant Other Species 2 0.2% 0.04 0.21
GRIF & No Abstract 0 0.0% 0 -
Not GRIF & Relevant 386 41.8% 8.24 3.90
Not GRIF & Not relevant 85 9.2% 1.84 4.01
Not GRIF & Relevant Other Species 333 36.1% 7.27 4.67
Not GRIF & No Abstract 0 0.0% 0 -
Total 923 100.0%

Last updated - November 12, 2003