Information Retrieval: A Health and Biomedical Perspective, Third Edition

William Hersh, M.D.

Chapter 5 Update

This update contains all new references cited in the author's OHSU BMI 514/614 course for Chapter 5.

One constant about retrieval interfaces is their constantly changing look, feel, URLs, etc.. Most of the screens shown in the (2009) textbook look and behave differently as of this update. The focus in this chapter is on retrieval principles so that hopefully the reader will be able to figure out new interfaces as they emerge.
Although noted in the Errata, it is very important to note here as well that the NOT operator is diagrammed incorrect in Figure 5.1, which is correct on the Errata page.
A couple times since publication of the third edition of the book, NLM has completely revamped the user interface of PubMed ( Part of the motivation has been to make retrieval simpler, i.e., more Google-like. One major change is that there is no longer a separate screen for applying limits. Instead, these limits, which are actually now called Filters in PubMed parlance, are listed and actionable through links on the left-hand site of the search results screen. Some of them can be customized and additional filters are also available.

The advanced search capability previously available under the Preview/Index and History tabs is now available by clicking on Advanced in the default screen right under the text entry box. This functionality allows "power" searchers to build sets and combine them with Boolean operators.

Also available under the text box are links to create up a Rich Site Summary (RSS) feed as well as an Alert, the latter of which requires having an NCBI login.

It looks also like the behind-the-scenes mapping to MeSH has changed somewhat. The query colon cancer treatment with radiation shown on page 212 of the book now maps into the following search details:
("colonic neoplasms"[MeSH Terms] OR ("colonic"[All Fields] AND "neoplasms"[All Fields]) OR "colonic neoplasms"[All Fields] OR ("colon"[All Fields] AND "cancer"[All Fields]) OR "colon cancer"[All Fields]) AND ("therapy"[Subheading] OR "therapy"[All Fields] OR "treatment"[All Fields] OR "therapeutics"[MeSH Terms] OR "therapeutics"[All Fields]) AND ("radiation"[MeSH Terms] OR "radiation"[All Fields] OR "electromagnetic radiation"[MeSH Terms] OR ("electromagnetic"[All Fields] AND "radiation"[All Fields]) OR "electromagnetic radiation"[All Fields])

Another example of search mapping can be seen for the search, congestive heart failure ace inhibitors:
("heart failure"[MeSH Terms] OR ("heart"[All Fields] AND "failure"[All Fields]) OR "heart failure"[All Fields] OR ("congestive"[All Fields] AND "heart"[All Fields] AND "failure"[All Fields]) OR "congestive heart failure"[All Fields]) AND ("angiotensin-converting enzyme inhibitors"[Pharmacological Action] OR "angiotensin-converting enzyme inhibitors"[MeSH Terms] OR ("angiotensin-converting"[All Fields] AND "enzyme"[All Fields] AND "inhibitors"[All Fields]) OR "angiotensin-converting enzyme inhibitors"[All Fields] OR ("ace"[All Fields] AND "inhibitors"[All Fields]) OR "ace inhibitors"[All Fields])

Although PubMed retains predominantly an exact-match approach to searching, there have been a couple partial-match searching approaches have worked their way into it.

One of these stems from the ability to change the Sort Order of the search results. The historical default for MEDLINE searching has been by reverse chronological order of when the article was published, based on the premise that users most likely want to view the latest literature. But we may prefer a different sort order, such as one using a relevance ranking approach that might put the most relevant results at the top of the list. PubMed now allows a sorting of output by best match, based on an algorithm that uses both probabilistic (BM25) and machine learning approaches. The details of the algorithm are described in the PubMed Help documentation ( Other options for sorting include author name, journal name, and article title.

A second approach has been around for quite some time, has been to retrieve similar articles to a specific article, a form of relevance feedback. The algorithm used has been described by Lin (2007), although a recent analysis suggested some enhancements that improved results with data from TREC 2005 Genomics and TREC 2014 Clinical Decision Support Tracks (Wei, 2016).
Lin, J and Wilbur, WJ (2007). PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics. 8: 423.
Wei, W, Marmor, R, et al. (2016). Finding related publications: extending the set of terms used to assess article similarity. AMIA Summits on Translational Science Proceedings. 225–234.

Although we tend to think of PubMed as the one and perhaps only was to access biomedical literature, there are indeed many other tools, including those that search other sources of literature. Lu (2011) reviewed searching the biomedical literature "beyond" PubMed, describing tools rank search results, cluster them into topics, enhance them with semantics and visualization, and otherwise improve searching. She also explored tools that can be used for use cases beyond that of searching. (It is likely that some of these tools are no longer operational.)
Lu, Z (2011). PubMed and beyond: a survey of web tools for searching biomedical literature. Database. 2011

NLM offers a variety of mobile apps on both the iOS and Android platforms as well as on a mobile-friendly Web site. A listing of all such as apps is available at:

One of the main apps available is PubMed for Handhelds ( In addition to a mobile-friendly Web site, there are apps for iOS and Android. All of these feature the following search interfaces:
  • PICO Search - oriented to patient, intervention, comparison, and outcome clinical study searching
  • askMEDLINE - natural language searching interface
  • Journal Browser - searching by journal name
  • Consensus Abstracts - searching via askMEDLINE and Clinical Queries over a subset of important recent articles, with an opportunity to display the conclusion (called "the bottom line" or TBL) from the abstract (Fontelo, 2011)
  • Clinical Queries - applying the search approaches of clinical queries
  • Archive - ability to store and later view records retrieved from the above approaches
Both PICO and askMEDLINE have a feature that allows specification of MEDLINE Publication Type.
Fontelo, P (2011). Consensus abstracts for evidence-based medicine. Evidence-Based Medicine. 16(2): 36-38.

NLM has also made substantial revisions to E-utilities, which are for applications that want to access PubMed directly (and not via the PubMed interface). E-utilities use a fixed URL syntax for different functions. For example, if a program needs to "get the PMIDs for articles about breast cancer published in Science in 2008," the following URL will be passed to E-utilities, with appropriate XML returned:[journal]+AND+breast+cancer+AND+2008[pdat]

Another area of PubMed that has been expanded in recent years is what is now called, Special and Clinical Queries (also called Topic-Specific Queries). Special Queries are grouped into four categories:
  • Clinicians and Health Services Researchers Queries
    • e.g., Clinical Queries, electronic health records, comparative effectiveness research, health services research, Healthy People 2020
  • Subjects
    • e.g., AIDS, bioethics, cancer, veterinary science, etc.
  • Additional Search Queries / Interfaces
    • e.g., Complementary and Alternative Medicine (CAM) on PubMed
  • Journal Collections
    • e.g., core clinical journals
The main PubMed search page features links to a Clinical Queries interface and to the larger Topic-Specific Queries.

A continued area of retrieval-related research is the development of search filters, focused both on clinicians as well as systematic reviewers seeking high-quality research studies. A review in 2008 identified 38 studies of search filters, finding that in general, most aimed for recall over precision, i.e., they tended to retrieve most relevant articles at the expense of also retrieving many nonrelevant articles (McKibbon et al., 2008). A more recently developed filter was found to maintain high recall while improving precision, resulting in fewer bibliographic records needing to be assessed in the production of systematic reviews (Lee et al., 2012). A Web site devoted to evidence-based search filters is at:

Lokker et al. (2011) compared the PubMed Clinical Queries function with straight PubMed searching for retrieval of high-quality studies, finding that more were retrieved using the Clinical Queries interface, although more relevant articles overall were retrieved only for diagnosis and not treatment articles. Hoogendam et al. (2009) noted, however, that the PubMed Clinical Queries were not well-suited for clinical questions about therapy for which no randomized controlled trials existed. In that situation, focusing searching on the Abridged Index Medicus collection with search filters led to the best retrieval results. Focusing on the domain of nephrology, Shariff et al. (2012) found that PubMed search filters led to a higher ratio of relevant to nonrelevant articles (1.16 without the filters and 1.5 with them). Garg et al. (2009) had previously demonstrated that a variety of filters improved recall and precision of searching for nephrology articles in PubMed.
McKibbon, K., Wilczynski, N., et al. (2009). Retrieving randomized controlled trials from MEDLINE: a comparison of 38 published search filters. Health Information and Libraries Journal, 26: 187-202.
Lee, E., Dobbins, M., et al. (2012). An optimal search filter for retrieving systematic reviews and meta-analyses. BMC Medical Informatics & Decision Making, 12(1): 51.
Lokker, C., Haynes, R., et al. (2011). Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters. Journal of the American Medical Informatics Association, 18: 652-659.
Hoogendam, A., deVriesRobbé, P., et al. (2009). Evaluation of PubMed filters used for evidence-based searching: validation using relative recall. Journal of the Medical Library Association, 97: 186-193.
Shariff, S., Sontrop, J., et al. (2012). Impact of PubMed search filters on the retrieval of evidence by physicians. Canadian Medical Association Journal, 184: E184-E190.
Garg, A., Iansavichus, A., et al. (2009). Filtering Medline for a clinical discipline: diagnostic test assessment framework. British Medical Journal, 339: b3435.

Other enhancements have been made to other systems that search bibliographic databases. A description of the Google Scholar searching algorithm has been provided by Beel and Gipp (2009). (Although as with all Google search systems, there may be other functionality in Google Scholar that is not known to the larger world.) Beel, J and Gipp, B (2009). Google Scholar's ranking algorithm: an introductory overview. Proceedings of the 12th International Conference on Scientometrics and Informetrics. 230-241.
All of the major textbook searching products continue to change and enhance their search interfaces. In UpToDate, for example, the search interface starts to display a list of possible matches after characters have been entered. The interface also allows the user to designate "facets" for searching that include All topics, Adult, Pediatric, Patient, and Graphics. The system also recognizes common synonyms, abbreviations, and acronyms.
A number of new search systems for images have been developed. The Yale Image Finder (Xu, 2008) provides searching over tables and figures from biomedical literature displayed in PubMed records. The UpToDate medical reference has added an image searching feature to the search functionality for its medical reference resource. One interesting approach to image is retrievr, which allows users to sketch an image that is then applied in a similarity search against images from the Flickr photo-sharing site. An overview of efficient algorithms for similarity-searching was recently provided by Grauman (2010). NLM has also developed a database and search interface of open access biomedical images ( Xu, S., McCusker, J., et al. (2008). Yale Image Finder (YIF): a new search engine for retrieving biomedical images. Bioinformatics, 24: 1968-1970.
Grauman, K. (2010). Efficiently searching for similar images. Communications of the ACM, 53(6): 84-94.
Demner-Fushman, D, Antani, S, et al. (2012). Design and development of a multimodal biomedical information retrieval system. Journal of Computing Science and Engineering. 6: 168-177.
A number of systems allow citation searching in addition to Web of Science. The one with the most functionality for managing citation counts and searchers is Scopus, from Elsevier.

Genomic aggregations come and go. Two mentioned in the book are now defunct, SOURCE and GoGene. This is no doubt in part due to ever-growing databases of NCBI.

MedlinePlus has also substantially revamped its search interface. A search box finds words in text across the site. There are a number of features used automatically:
  • Refine by certain collection types
  • Spell-checking against medical dictionary
  • Boolean operators (default is AND)
  • Phrase search (with quotes)
  • Automated search expansion with synonyms

A series of retrieval systems have been developed to enable searchers to take advantage of the Gene Ontology (GO) for gene-related searching. GoGene associates genes from different model organisms to both GO concepts as well as Medical Subject Headings (MeSH) terms (Plake et al., 2009). GoPubMed acts as an intermediary for searching PubMed for GO concepts (Doms et al., 2005). GoWeb extends this functionality to World Wide Web searching (Dietze et al., 2009). Plake, C., Royer, L., et al. (2009). GoGene: gene annotation in the fast lane. Nucleic Acids Research, 37: W300-W304.
Doms, A. and Schroeder, M. (2005). GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Research, 33: W783-W786.
Dietze, H. and Schroeder, M. (2009). GoWeb: a semantic search engine for the life science web. BMC Bioinformatics, 10(Suppl 10): S7.

Last updated April 9, 2017