Here is the data used in our research on automated classification of document citations for systematic review of drug classes. If you use any of this data in your published research, I would appreciate you crediting our work by referencing this article:
Cohen AM, Hersh WR, Peterson K, Yen PY. Reducing Workload in Systematic Review Preparation Using Automated Citation Classification. JAMIA 2006: (in press). [pre-print pdf]
Gold standard data file of drug review topics, EndNote ID, PubMed ID, Abstract Triage Status, and Article Triage Status [file].
This data file is in tab-separated value (.tsv) format.
Systematic review decisions for abstracts and articles are included for these fifteen drug review topics:
ACEInhibitors | CalciumChannelBlockers | ProtonPumpInhibitors |
ADHD | Estrogens | SkeletalMuscleRelaxants |
Antihistamines | NSAIDS | Statins |
AtypicalAntipsychotics | Opioids | Triptans |
BetaBlockers | OralHypoglycemics | UrinaryIncontinence |
Abstract or Article Code |
Meaning |
I |
Included at abstract or article level |
E |
Non-specifically excluded |
1 |
Excluded due to foreign language |
2 |
Excluded due to wrong outcome |
3 |
Excluded due to wrong drug |
4 |
Excluded due to wrong population |
5 |
Excluded due to wrong publication type |
6 |
Excluded due to wrong study design |
7 |
Excluded due to wrong study duration |
8 |
Excluded due to background article |
9 |
Excluded due to only abstract being available |
MEDLINE records corresponding to the PubMedID's in the gold standard data file.
As described in the paper, the MEDLINE records corresponding to the PubMedID's in the gold standard data file were taken from the TREC Genomics Track 10 year MEDLINE corpus.
The 10 year corpus is available from the NIST TREC web site on the Genomics Track data page. Click on 2004 Document Set link. You will have to fill out an Application for Use form, if you have not already done so.
The home page for the TREC Genomics Track may also be of interest.
The corresponding MEDLINE records are also available for download from PubMed using either the browser interface or the NLM E-Utilities, although it is possible that these records may contain differences from the ones we used in our research.
More details are available in the paper referenced above.