![]() |
Information Retrieval: A Health and Biomedical Perspective, Third EditionWilliam Hersh, M.D. |
Section |
Topic |
Reference |
|
---|---|---|---|
5 |
One constant about retrieval interfaces is their constantly changing look, feel, URLs, etc.. Most of the screens shown in the (2009) textbook look and behave differently as of this update. The focus in this chapter is on retrieval principles so that hopefully the reader will be able to figure out new interfaces as they emerge. | ||
5.2 |
Although noted in the Errata, it is very important to note here as well that the NOT operator is diagrammed incorrect in Figure 5.1, which is correct on the Errata page. | ||
5.3.1 |
A couple times since publication
of the third edition of the book, NLM has completely
revamped the user interface of PubMed (http://pubmed.gov). Part of the motivation has been
to make retrieval simpler, i.e., more Google-like. One
major change is that there is no longer a separate screen
for applying limits. Instead, these limits, which are
actually now called Filters in PubMed parlance, are listed
and actionable through links on the left-hand site of the
search results screen. Some of them can be customized and
additional filters are also available. The advanced search capability previously available under the Preview/Index and History tabs is now available by clicking on Advanced in the default screen right under the text entry box. This functionality allows "power" searchers to build sets and combine them with Boolean operators. Also available under the text box are links to create up a Rich Site Summary (RSS) feed as well as an Alert, the latter of which requires having an NCBI login. |
||
5.3.1 |
It looks also like the behind-the-scenes
mapping to MeSH has changed somewhat. The query colon
cancer treatment with radiation shown on page 212 of
the book now maps into the following search details: ("colonic neoplasms"[MeSH Terms] OR ("colonic"[All Fields] AND "neoplasms"[All Fields]) OR "colonic neoplasms"[All Fields] OR ("colon"[All Fields] AND "cancer"[All Fields]) OR "colon cancer"[All Fields]) AND ("therapy"[Subheading] OR "therapy"[All Fields] OR "treatment"[All Fields] OR "therapeutics"[MeSH Terms] OR "therapeutics"[All Fields]) AND ("radiation"[MeSH Terms] OR "radiation"[All Fields] OR "electromagnetic radiation"[MeSH Terms] OR ("electromagnetic"[All Fields] AND "radiation"[All Fields]) OR "electromagnetic radiation"[All Fields]) Another example of search mapping can be seen for the search, congestive heart failure ace inhibitors: ("heart failure"[MeSH Terms] OR ("heart"[All Fields] AND "failure"[All Fields]) OR "heart failure"[All Fields] OR ("congestive"[All Fields] AND "heart"[All Fields] AND "failure"[All Fields]) OR "congestive heart failure"[All Fields]) AND ("angiotensin-converting enzyme inhibitors"[Pharmacological Action] OR "angiotensin-converting enzyme inhibitors"[MeSH Terms] OR ("angiotensin-converting"[All Fields] AND "enzyme"[All Fields] AND "inhibitors"[All Fields]) OR "angiotensin-converting enzyme inhibitors"[All Fields] OR ("ace"[All Fields] AND "inhibitors"[All Fields]) OR "ace inhibitors"[All Fields]) |
||
5.3.1 |
Although PubMed retains predominantly an
exact-match approach to searching, there have been a couple
partial-match searching approaches have worked their way
into it. One of these stems from the ability to change the Sort Order of the search results. The historical default for MEDLINE searching has been by reverse chronological order of when the article was published, based on the premise that users most likely want to view the latest literature. But we may prefer a different sort order, such as one using a relevance ranking approach that might put the most relevant results at the top of the list. PubMed now allows a sorting of output by best match, based on an algorithm that uses both probabilistic (BM25) and machine learning approaches. The details of the algorithm are described in the PubMed Help documentation (https://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Algorithm_for_finding_best_ma). Other options for sorting include author name, journal name, and article title. A second approach has been around for quite some time, has been to retrieve similar articles to a specific article, a form of relevance feedback. The algorithm used has been described by Lin (2007), although a recent analysis suggested some enhancements that improved results with data from TREC 2005 Genomics and TREC 2014 Clinical Decision Support Tracks (Wei, 2016). |
Lin, J and Wilbur, WJ (2007). PubMed related
articles: a probabilistic topic-based model for content
similarity. BMC Bioinformatics. 8: 423. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-423. Wei, W, Marmor, R, et al. (2016). Finding related publications: extending the set of terms used to assess article similarity. AMIA Summits on Translational Science Proceedings. 225–234. |
|
5.3.1 |
Although we tend to think of PubMed as the
one and perhaps only was to access biomedical literature,
there are indeed many other tools, including those that
search other sources of literature. Lu (2011) reviewed
searching the biomedical literature "beyond" PubMed,
describing tools rank search results, cluster them into
topics, enhance them with semantics and visualization, and
otherwise improve searching. She also explored tools that
can be used for use cases beyond that of searching. (It is
likely that some of these tools are no longer operational.) |
Lu, Z (2011). PubMed and beyond: a survey of
web tools for searching biomedical literature. Database.
2011 http://database.oxfordjournals.org/content/2011/baq036.full. |
|
5.3.1 |
NLM offers a variety of mobile apps on both
the iOS and Android platforms as well as on a
mobile-friendly Web site. A listing of all such as apps is
available at: https://www.nlm.nih.gov/mobile/ One of the main apps available is PubMed for Handhelds (https://pubmedhh.nlm.nih.gov/). In addition to a mobile-friendly Web site, there are apps for iOS and Android. All of these feature the following search interfaces:
|
Fontelo, P (2011). Consensus abstracts for
evidence-based medicine. Evidence-Based Medicine.
16(2): 36-38. |
|
5.3.1 |
NLM has also made substantial revisions to E-utilities,
which are for applications that want to access PubMed
directly (and not via the PubMed interface). E-utilities use
a fixed URL syntax for different functions. For example, if
a program needs to "get the PMIDs for articles about breast
cancer published in Science in 2008," the following URL will
be passed to E-utilities, with appropriate XML returned: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat] |
||
5.3.1 |
Another area of PubMed that has been expanded
in recent years is what is now called, Special
and Clinical Queries (also called Topic-Specific
Queries). Special Queries are grouped into four categories:
|
||
5.3.1 |
A continued area of retrieval-related
research is the development of search filters, focused both
on clinicians as well as systematic reviewers seeking
high-quality research studies. A review in 2008 identified
38 studies of search filters, finding that in general, most
aimed for recall over precision, i.e., they tended to
retrieve most relevant articles at the expense of also
retrieving many nonrelevant articles (McKibbon et al.,
2008). A more recently developed filter was found to
maintain high recall while improving precision, resulting in
fewer bibliographic records needing to be assessed in the
production of systematic reviews (Lee et al., 2012). A Web
site devoted to evidence-based search filters is at: http://www.york.ac.uk/inst/crd/intertasc/. Lokker et al. (2011) compared the PubMed Clinical Queries function with straight PubMed searching for retrieval of high-quality studies, finding that more were retrieved using the Clinical Queries interface, although more relevant articles overall were retrieved only for diagnosis and not treatment articles. Hoogendam et al. (2009) noted, however, that the PubMed Clinical Queries were not well-suited for clinical questions about therapy for which no randomized controlled trials existed. In that situation, focusing searching on the Abridged Index Medicus collection with search filters led to the best retrieval results. Focusing on the domain of nephrology, Shariff et al. (2012) found that PubMed search filters led to a higher ratio of relevant to nonrelevant articles (1.16 without the filters and 1.5 with them). Garg et al. (2009) had previously demonstrated that a variety of filters improved recall and precision of searching for nephrology articles in PubMed. |
McKibbon, K., Wilczynski, N., et al. (2009).
Retrieving randomized controlled trials from MEDLINE: a
comparison of 38 published search filters. Health Information and
Libraries Journal, 26: 187-202. Lee, E., Dobbins, M., et al. (2012). An optimal search filter for retrieving systematic reviews and meta-analyses. BMC Medical Informatics & Decision Making, 12(1): 51. http://www.biomedcentral.com/1471-2288/12/51. Lokker, C., Haynes, R., et al. (2011). Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters. Journal of the American Medical Informatics Association, 18: 652-659. Hoogendam, A., deVriesRobbé, P., et al. (2009). Evaluation of PubMed filters used for evidence-based searching: validation using relative recall. Journal of the Medical Library Association, 97: 186-193. Shariff, S., Sontrop, J., et al. (2012). Impact of PubMed search filters on the retrieval of evidence by physicians. Canadian Medical Association Journal, 184: E184-E190. Garg, A., Iansavichus, A., et al. (2009). Filtering Medline for a clinical discipline: diagnostic test assessment framework. British Medical Journal, 339: b3435. http://www.bmj.com/content/339/bmj.b3435. |
|
Other enhancements have been made to other systems that search bibliographic databases. A description of the Google Scholar searching algorithm has been provided by Beel and Gipp (2009). (Although as with all Google search systems, there may be other functionality in Google Scholar that is not known to the larger world.) | Beel, J and Gipp, B (2009). Google Scholar's ranking algorithm: an introductory overview. Proceedings of the 12th International Conference on Scientometrics and Informetrics. 230-241. | ||
5.3.2 |
All of the major textbook searching products continue to change and enhance their search interfaces. In UpToDate, for example, the search interface starts to display a list of possible matches after characters have been entered. The interface also allows the user to designate "facets" for searching that include All topics, Adult, Pediatric, Patient, and Graphics. The system also recognizes common synonyms, abbreviations, and acronyms. | ||
5.3.3 |
A number of new search systems for images have been developed. The Yale Image Finder (Xu, 2008) provides searching over tables and figures from biomedical literature displayed in PubMed records. The UpToDate medical reference has added an image searching feature to the search functionality for its medical reference resource. One interesting approach to image is retrievr, which allows users to sketch an image that is then applied in a similarity search against images from the Flickr photo-sharing site. An overview of efficient algorithms for similarity-searching was recently provided by Grauman (2010). NLM has also developed a database and search interface of open access biomedical images (https://openi.nlm.nih.gov/index.php). | Xu, S., McCusker, J., et al. (2008). Yale
Image Finder (YIF): a new search engine for retrieving
biomedical images. Bioinformatics,
24: 1968-1970. Grauman, K. (2010). Efficiently searching for similar images. Communications of the ACM, 53(6): 84-94. Demner-Fushman, D, Antani, S, et al. (2012). Design and development of a multimodal biomedical information retrieval system. Journal of Computing Science and Engineering. 6: 168-177. |
|
5.3.3 |
A number of systems allow citation searching
in addition to Web of Science. The one with the most
functionality for managing citation counts and searchers is
Scopus, from Elsevier. |
||
5.3.4 |
Genomic aggregations come and go. Two
mentioned in the book are now defunct, SOURCE and GoGene.
This is no doubt in part due to ever-growing databases of
NCBI. |
||
5.3.4 |
MedlinePlus has also substantially revamped
its search interface. A search box finds words in text
across the site. There are a number of features used
automatically:
|
||
5.3.4 |
A series of retrieval systems have been developed to enable searchers to take advantage of the Gene Ontology (GO) for gene-related searching. GoGene associates genes from different model organisms to both GO concepts as well as Medical Subject Headings (MeSH) terms (Plake et al., 2009). GoPubMed acts as an intermediary for searching PubMed for GO concepts (Doms et al., 2005). GoWeb extends this functionality to World Wide Web searching (Dietze et al., 2009). | Plake, C., Royer, L., et al. (2009). GoGene:
gene annotation in the fast lane. Nucleic Acids Research, 37: W300-W304. Doms, A. and Schroeder, M. (2005). GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Research, 33: W783-W786. Dietze, H. and Schroeder, M. (2009). GoWeb: a semantic search engine for the life science web. BMC Bioinformatics, 10(Suppl 10): S7. http://www.biomedcentral.com/1471-2105/10/S10/S7. |