OHSUMED Test Collection
This test collection was created to assist information retrieval
research. It is a clinically-oriented MEDLINE subset, consisting
of 348,566 references (out of a total of over 7 million), covering all
references from 270 medical journals over a five-year period
(1987-1991). The test database is about 400 megabytes in
size. A number of fields normally present in the MEDLINE record
but not pertinent to content-based information retrieval have been
deleted. The only fields present include the title, abstract,
MeSH indexing terms, author, source, and publication type. Since
this database is neither up-to-date nor complete, it is useless as a
tool for real searchers and only useful for research purposes.
The test collection was built as part of a study assessing the use of
MEDLINE by physicians in a clinical setting (1). Novice
physicians using MEDLINE generated 106 queries. Before they
searched, they were asked to provide a statement of information about
their patient as well as their informaiton need. Each query was
later replicated by four searchers, two physicians experienced in
searching and two medical librarians. The results were assessed
for relevance by a different group of physicians, using a three point
scale: definitely, possibly, or not relevant. There were
12,565 unique query-reference pairs. Over 10% of the
query-document pairs were judged in duplicate to assess interobserver
reliability.
The original test collection was subsequently used in experiments with
the SMART retrieval system (2). As would be expected, SMART
retrieved a number of references not retrieved by the original
searchers were retrieved by SMART. A second round of relevance
judgments was done after these experiments. There were 3,575 new
query-reference pairs judged, along with an overlap of over 10% to
again assess interobserver reliability.
Thus there are now a total of 16,140 query-document pairs that have
been judged for relevance. These are in a file (judged), along
with each of the relevance judgments done. There are also files
that list relevant query-document pairs (drel.i, drel.ui, pdrel.i, and
pdrel.ui). In these files, only the original relevance judgment
is used.
(Note: There are five queries for which there are no definitely
relevant documents, and you may wish to delete these from your
experiments. They are being left in the query file for this
distribution, because further analysis may uncover relevant documents
for them. Some systems, such as SMART, automatically drop queries
with no relevant documents from their analysis. The five queries
for which no definitely relevant documents exist are 8, 28, 49, 86, and
93.)
The National Library of Medicine has agreed to make the MEDLINE
references in the test database available for experimentation,
restricted to the following conditions:
1. The data will not be used in any non-experimental clinical,
library, or other setting.
2. Any human users of the data will explicitly be told that the
data is incomplete and out-of-date.
There are 13 files that make up the test collection, and each is
described below. (For those of you receiving compressed files,
you will obtain only seven files. Each of the files 1-5 below is
compressed by itself, and has the suffix .tar.Z. All of the files
6-12 are compressed into one file, which is called
ohsumed.rest.tar.Z. The final file is this file, readme, which is
not compressed.)
Here are the files, their uncompressed size, and a description of their
content:
1) ohsumed.87 (60,303,307) -- Contains the MEDLINE documents for
the year 1987. The format for each of the MEDLINE document files
follows the conventions of the SMART system, with each field defined as
below (NLM designator in parentheses):
.I sequential identifier
.U MEDLINE identifier (UI)
.M Human-assigned MeSH terms (MH)
.T Title (TI)
.P Publication type (PT)
.W Abstract (AB)
.A Author (AU)
.S Source (SO)
(Note: Some references have their abstracts truncated at 250
words, while some have no abstracts at all.)
2) ohsumed.88 (78,585,929) -- Contains the MEDLINE documents for
the year 1988, formatted as above.
3) ohsumed.89 (84,719,077) -- Contains the MEDLINE documents for
the year 1989, formatted as above.
4) ohsumed.90 (86,754,890) -- Contains the MEDLINE documents for
the year 1990, formatted as above.
5) ohsumed.91 (89,761,122) -- Contains the MEDLINE documents for
the year 1991, formatted as above.
6) queries (11,591) -- Contains the 106 queries in test set, with
patient and topic information, in the format:
.I Sequential identifier
.B Patient information
.W Information request
7) drel.ui (26,919) -- Contains the query-document pairs rated as
definitely relevant, with documents listed by MEDLINE UI, in the format:
<query><tab><document-ui>
8) drel.i (21,709) -- Contains the query-document pairs rated as
definitely relevant, with documents listed by sequential number (from
the .I field), in the format:
<query><tab><document-i>
9) pdrel.ui (57,831) -- Contains the query-doc pairs rated as
definitely or possibly relevant, with documents listed by MEDLINE
UI, in the format:
<query><tab><document-ui>
10) pdrel.i (46,664) -- Contains the query-doc pairs rated as
definitely or possibly relevant, with documents listed by sequential
number (from the .I field), in the format:
<query><tab><document-i>
11) judged (368,366) -- Contains a list of all retrieved
documents by any of the five original searchers or SMART, sorted first
by query number and then document number, along with their relevance
judgments. The relevance judgments are either d (definitely
relevant), p (possibly relevant), or n (not relevant). The
relevance1 judgment is the original relevance judgment done on the
documents retrieved by the original searchers. The relevance 2
judgment is the second relevance judgment done to assess interobserver
reliability of the relevance1 judgments. The relevance3 judgment
is the relevance judgment done on documents retrieved by SMART but not
the original searchers, or another relevance judgment on an originally
retrieved document to assess interobserver reliability.
<query><tab><document-ui><tab><document-i><tab>
<relevance1>[<tab><relevance2>][<tab><relevance3>]
12) ui (3,137,094) -- Contains the MEDLINE UI's for all 348,566
documents in test database, listed one per line.
13) readme -- This file.
We realize that due to the relative recall procedures used in building
this collection, as well as the subjective nature of relevance
judgments, that there may be disagreements about the relevance
judgments. I do want to be able to update the collection, but I
want to do it a systematic fashion, so that results among researchers
will be comparable. Therefore I am asking that results be
reported based on this collection unchanged. If you find new
documents that you feel are relevant, or if you find documents for
which you disagree with the relevance judgment, please notify me by
email or in writing. Periodically, we will update the relevance
judgments and release updated versions.
This work was made possible with support from NLM Grant LM05307.
All opinions expressed and relevance judgments made, however, are the
responsiblity of William Hersh.
For more details, contact:
William Hersh, M.D.
Assistant Professor of Medicine and Medical Informatics
Oregon Health Sciences University
BICC
3181 SW Sam Jackson Park Rd.
Portland, OR 97201
Voice: 503-494-4563
Fax: 503-494-4551
Email: hersh@ohsu.edu
Bibliography:
1. Hersh WR, Hickam DH, Use of a multi-application computer
workstation in a clinical setting, Bulletin of the Medical Library
Association, 1994, 82: 382-389.
2. Hersh WR, Buckley C, Leone TJ, Hickam DH, OHSUMED: An
interactive retrieval evaluation and new large test collection for
research, Proceedings of the 17th Annual ACM SIGIR Conference, 1994,
192-201.