Information Retrieval: A Health and Biomedical Perspective, Third Edition

William Hersh, M.D.

Chapter 6 Update

This update contains all new references cited in the author's OHSU BMI 514/614 course for Chapter 6.

A variety of paper archives of historical medical information-related documents at the NLM are listed at The entire history of the AMIA Annual Symposium Proceedings are in PubMed Central (, although the symposium has gone by different names over the years, so a search over all of them must incorporate four different title names:
  • AMIA Annu Symp Proc: Vols. 2003 to 2016;  2003 to 2016
  • Proc AMIA Symp: 1998 to 2002
  • Proc AMIA Annu Fall Symp: 1996 to 1997
  • Proc Annu Symp Comput Appl Med Care: 1977 to 1995

Although there have been great strides made in DLs, Hull et al. (2008) argue we still need better tools to integrate and navigate across the resources. Monsastersky (2013) notes that libraries need a "reboot" as we move into the era of open data.

Fox et al. (2012) have put their 5S model into a book format. Witten et al. (2010) have updated their volume on building DLs.

Fox, EX, Goncalves, MA, et al. (2012). Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) Approach. San Rafael, CA, Morgan & Claypool.
Hull, D., Pettifer, S., et al. (2008). Defrosting the digital library: bibliographic tools for the next generation web. PLoS Computational Biology, 4(10): e1000204.
Monastersky, R (2013). Publishing frontiers: The library reboot. Nature. 495: 430-432.
Witten, IH, Bainbridge, D, et al. (2010). How to Build a Digital Library, Second Edition. San Francisco, Morgan Kaufmann.
The Digital Object Identifier (DOI, has emerged as the de facto standard for identifiers of digital objects on the Internet. The DOI Handbook ( provides an overview of the system, while fact sheets ( describe specific aspects. The DOI for the third edition of this book is:

MEDLINE allows designation of the DOI in its Article Identifier (AID) field, which is populated by the publisher. AID values may include the controlled publisher identifier (PII) or the digital object identifier (DOI). A PII, or Publisher Item Identifier, is any internal reference identifier used in the publishing process. A DOI is assigned in accordance with the publisher's use of the DOI system. The AID field may also contain Bookshelf accession numbers with the label: [bookaccession] for citations for books and book chapters from the NCBI Books Database.

Other standards for persistent digital identifiers in addition to DOI have emerged as well.

Cross-Ref ( maintains metadata records, including DOIs, for digital content.

Another identifier, already discussed in Chapter 2, is the ORCID ( The ORCID provides a unique identifier for scientific authors (mine is 0000-0002-4114-5148).

Increasingly, scientific papers publish their data, which authors or publishers might also want to have in citable form. DataCite ( provides a metadata record that includes a DOI to digital datasets so they are persistent and findable. FORCE11, a group devoted to research communications and e-scholarship, has developed a set of principles for data citation (Martone, 2014).

CrossRef, ORCID, and DOI have embarked upon a collaboration to establish an Organization Identifier (OI) that will tie publications, scientists, and data to institutions (Fenner, 2016).

Zenodo ( allows archiving of data and other research artifacts and can assign a DOI. It is also interoperable with GitHub and OAI-PMH.
Martone, M (2014). Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. San Diego, CA, FORCE11.
Fenner, M, Paglione, L, et al. (2016). Technical Considerations for an Organization Identifier Registry.
The Search/Retrieve for the Web (SRW) project described on page 242 of the book has undergone a name change. It is now called the Search/Retrieval via URL (SRU) project ( This project offers a standard XML-focused search protocol for Internet search queries using its Common Query Language (CQL, Web services for SRU are provided via ZeeRex (Z39.50 Explain, Explained and Re-Engineered in XML, An overview of some of the issues related to the project is at

Although not widely used, the Open Archives Initiative (OAI) project continues to evolve. Devarakonda et al. (2011) describe the implementation of a system called Mercury that makes use of the OAI Protocol for Metadata Harvesting (OAI-PMH). Also available is a primer on the Object Reuse and Exchange (OAI-ORE) project that focuses on description and exchange of Web resource aggregations (Lagoze and van de Sompel, 2012)
Devarakonda, R, Palanisamy, G, et al. (2011). Data sharing and retrieval using OAI-PMH. Earth Science Informatics. 4: 1-5.
Lagoze, C. and van de Sompel, H. (2008). ORE User Guide - Primer, Open Archives Initiative.
A newer Open Archives Initiative (OAI) project is ResourceSync (, which provides tools for maintaining synchronization of resources on servers.

Meanwhile, there is substantial ongoing work on "infobuttons." An archival Web site for the ongoing Infobutton Manager Project is at The infobutton process and manager has become an HL7 standard for EHRs called, Context-Aware Knowledge Retrieval (Infobutton) ( Although there is in general little connection of IR and search to the "meaningful use" rules that provide incentives for adoption of electronic health records (EHRs) in the United States, one of required standards for EHR certification in Stage 2 of meaningful use is the HL7 Context-Aware Knowledge Retrieval (Infobutton) standard (HHS, 2012). Infobuttons can be used to achieve two meaningful use criteria, namely the provision of online information resources for patients as well as the linkage to knowledge justifying clinical decision support rules.

Cimino  et al. (2012) described open-source tools for implementing infobuttons, particularly in the context of the meaningful use criteria. These included a Web-based version of a reference implementation, called OpenInfoButton, as well as the Librarian Infobutton Tailoring Environment (LITE, that helps librarians and others specify resources to be selected in a given infobutton content. Also of note is the National Library of Medicine's implementation of an infobutton capability with MedlinePLUS that allows connection to both EHRs and personal health records (

Infobuttons have been expanded to provide access to genomics resources in the context of the EHR (Heale, 2016).

Additional research related to infobuttons also continues. An evaluation study found that infobuttons reduced the time to seek information over conventional methods (35.5 seconds vs. 43 seconds, a 17.4% reduction) (Del Fiol, 2008). An analysis of IR activity logs from Columbia University assessed the use in 2008 of a fixed menu of health resources as well as infobutton capability for four groups of users: attending physicians, housestaff, nurse practitioners, and nurses. The logs showed more frequent use of the health resources than clicking on infobuttons, with statistically significant variation in the specific resources chosen by the different groups.

An analysis of 17 organizations implementing infobuttons found generally positive experiences, although concerns were raised about the learning curve of the standard and access to information about it (Del Fiol, 2012). Strasberg et al. (2013) have identified a number of terminology-related challenges for infobutton implementation. With use of ICD-9 codes by EHRs and information resources, there can be mismatch in granularity of terms as well as in coding in the EHR and/or information resrouce. For RxNorm and LOINC, there may be many codes for a specific drug or test respectively. For matching of other attributes passed to the Infobutton Manager - such as user type, e.g., physician, nurse, patient, etc. - there may be mismatches as well.

Cook et al. (2017) performed a systematic review of studies assessing infobuttons. A total of 17 studies were identified, three of which were randomized controlled trials. The analysis found that usage frequency ranged from 0.3-7.4 uses per month per user and was influenced by EHR task. Infobuttons were used about one-fifth to one-third as often as direct (non-context-sensitive) links. In three studies, users were found to answer their clinical question about 70% of the time that they used infobutton. No studies assessed the impact of infobuttons on patient outcomes.
Anonymous (2012). Health Information Technology: Revisions to the 2014 Edition Electronic Health Record Certification Criteria; and Medicare and Medicaid Programs; Revisions to the Electronic Health Record Incentive Program. Washington, DC, Department of Health and Human Services.
Cimino, JJ, Jing, X, et al. (2012). Meeting the electronic health record "meaningful use" criterion for the HL7 infobutton standard using OpenInfobutton and the Librarian Infobutton Tailoring Environment (LITE). AMIA Annual Symposium 2012 Proceedings, Chicago, IL. 112-120.
Heale, BS, Overby, CL, et al. (2016). Integrating genomic resources with electronic health records using the HL7 infobutton standard. Applied Clinical Informatics. 7: 817-831.
Del Fiol, G., Haug, P., et al. (2008). Effectiveness of topic-specific infobuttons: a randomized controlled trial. Journal of the American Medical Informatics Association, 15: 752-759.
Del Fiol, G., Huser, V., et al. (2012). Implementations of the HL7 Context-Aware Knowledge Retrieval ("Infobutton") standard: challenges, strengths, limitations, and uptake. Journal of Biomedical Informatics, 45: 726-735.
Hunt, S, Cimino, JJ, et al. (2013). A comparison of clinicians' access to online knowledge resources using two types of information retrieval applications in an academic hospital setting. Journal of the Medical Library Association. 101: 26-31.
Strasberg, HR, DelFiol, G, et al. (2013). Terminology challenges implementing the HL7 context-aware knowledge retrieval ('Infobutton') standard. Journal of the American Medical Informatics Association. 20: 218-223.
Cook, DA, Teixeira, MT, et al. (2017). Context-sensitive decision support (infobuttons) in electronic health records: a systematic review. Journal of the American Medical Informatics Association. 24: 460-468.
The Copyright Clearance Center (CCC) has developed a two-page PDF that describes what is and is not covered by copyright (CCC, 2017). Anonymous (2017). So, What Is (and Isn’t) Protected by Copyright? Where Copyright Protection Begins and Ends. Danvers, MA, Copyright Clearance Center.
There are two models of open-access (OA) publishing that are emerging (Laakso, 2011; Bjork, 2012; Frank, 2013):
  • Gold – "author pays" model, i.e., research funding cover costs
  • Green – author required to deposit manuscript in public repository, e.g., PubMed Central
The OA publishing movement has reached a sort of equilibrium, with some journals exclusively adopting the model, others rejecting it, and some adopting a hybrid. The major OA publishers in biomedicine remain Biomed Central (BMC) and Public Library of Science (PLoS). Some publishers, such as Springer and BMJ, give authors the option of paying (usually $2000-$3000) for open access rights, although Phelps et al. (2012) have pointed out that true full open access is not always provided. BMC was sold to Springer in 2008 (Gawrylewski, 2008). No major changes have been made in the operations of BMC by Springer so far, and the latter also gives authors an OA option.

The book lists a number of counter-arguments to OA publishing. Another is the growth of "predatory" new OA journals that offer inexpensive publishing and appointment to editorial boards, yet that exist mainly to make money (Haug, 2013). Some have spoofed predatory journals by publishing "fake papers" to demonstrate their fraudulence (McCool, 2017). There was a blog devoted to exposing predatory journals that was shut down in early 2017 (Straumsheim, 2017). There have been calls in major medical journals for exposing and stopping predatory publishing (Moher, 2016).

(I receive several emails per day inviting me to submit articles or join editorial boards. One tip-off of journals being predatory is lack of link to unsubscribe from their emails.)
Laakso, M, Welling, P, et al. (2011). The development of open access journal publishing from 1993 to 2009. PLoS ONE. 6(6): e20961.
Björk, BC and Paetau, P (2012). Open Access to the Scientific Journal Literature – Status and Challenges for the Information Systems Community. Bulletin of the American Society for Information Science and Technology, June/July, 2012.
Frank, M (2013). Open but not free--publishing in the 21st century. New England Journal of Medicine. 368: 787-789.
Phelps, L., Fox, B., et al. (2012). Supporting the advancement of science: open access publishing and the role of mandates. Journal of Translational Medicine, 10: 13.
Gawrylewski, A. (2008). BioMed Central sold to Springer. October 7, 2008.
Haug, C (2013). The downside of open-access publishing. New England Journal of Medicine. 368: 791-793.
McCool, JH (2017). Opinion: Why I Published in a Predatory Journal. The Scientist, April 6, 2017.
Straumsheim, C (2017). No More 'Beall's List'. Inside Higher Ed, January 18, 2017.
Moher, D and Moher, E (2016). Stop predatory publishers now: act collaboratively. Annals of Internal Medicine. 164: 616-617.
The use of OA in 2008 was found to vary by discipline (Björk, 2010). The highest use was seen in earth sciences, with use by biomedicine falling in the middle of various scientific fields (13.9% articles published under gold model and 7.8% published under green model). A growing number of authors post ( or "self-archive") their manuscripts, which have been found to have higher rates of citation that is attributed to their quality and not bias of being more available (Gargouri, 2010). Some publishers post guidelines for authors for sharing publications, such as Elsevier.

In terms of citations of articles, OA journals have been found to have the same citation rates as newer non-OA journals but not older non-OA journals (Björk, 2012). This analysis also found the that citation rates were higher in journals with author-processing charges than subscription-fee charges. More recently, Khabsa and Giles (2014) note that about one-quarter of scientific publications on the Web are freely available (whether by OA or in violation of copyright), with the proportion varying by discipline and medicine being in the middle at 24%. The highest rates are for computer science (50%) and "multidisciplinary" (43%).

More recently, a growing body of research around OA publishing has emerged (Miguel, 2016). Spedding (2-16) summarizes such research in the health sciences and advocates that it facilitates research into policy and practice. Butler (2016) notes that there is a strong push, especially from The Netherlands, to push journals to move to the OA model.
Björk, BC, Welling, P, et al. (2010). Open access to the scientific journal literature: situation 2009. PLoS ONE. 5(6): e11273.
Björk, BC and Solomon, D (2012). Open access versus subscription journals: a comparison of scientific impact. BMC Medicine. 10: 73.
Gargouri, Y, Hajjem, C, et al. (2010). Self-selected or mandated, open access increases citation impact for higher quality research. PLoS ONE. 5(10): e13636.
Khabsa, M and Giles, CL (2014). The number of scholarly documents on the public web. PLoS ONE. 9(5): e93949.
Miguel, S, Tannuride de Oliveira, EF, et al. (2016). Scientific production on open access: a worldwide bibliometric analysis in the academic and scientific context. Publications. 4(1): 1.
Spedding, S (2016). Open access publishing of health research: does open access publishing facilitate the translation of research into health policy and practice? Publications. 4(1): 2.
Butler, D (2016). Dutch lead European push to flip journals to open access. Nature. 529: 13.
OA can also be viewed as part of larger "open science." Most authors (Kraker, 2011; Pontika, 2015) include, in addition to OA
  • Open data – all data collected in research
  • Open source – all software code developed and used
  • Open methodology – clear and detailed description, all surveys and other tools
  • Open peer review – all comments of peer reviewers
An example of implementation of open science in the neuroscience community has been described by Wiener, (2016).
Kraker, P, Leony, D, et al. (2011). The case for an open science in technology enhanced learning. International Journal of Technology Enhanced Learning. 3: 643-654.
Pontika, N, Knoth, P, et al. (2015). Fostering Open Science to Research using a Taxonomy and an eLearning Portal. Milton Keynes, UK, Open University.
Wiener, M, Sommer, FT, et al. (2016). Enabling an open data ecosystem for the neurosciences. Neuron. 92: 617-621.
Another issue related to OA publishing is the usage of information tools developed for medical practice. Newman and Feldman (2011) have described the dilemma over the Mini-Mental Status Examination (MMSE), which is widespread use despite the publisher insisting on a royalty fee for each usage. This raises some thorny issues around tools that have been validated for clinical use and how they can actually be used. Newman, J. and Feldman, R. (2011). Copyright and open access at the bedside. New England Journal of Medicine, 365: 2447-2449.
The National Institutes of Health (NIH) Public Access Policy continues effect, despite legislation proposed to discontinue it (Carroll, 2011). The NIH Public Access Policy aims to ensure that the public has access to the published results of NIH-funded research. It requires scientists to submit their final peer-reviewed journal manuscripts that result from NIH funding to PubMed Central upon acceptance for publication no later than 12 months after publication. According the policy (, authors must comply via these four steps:
  1. Determine Applicability - Applies to any manuscript that
    1. Is peer-reviewed
    2. Is accepted for publication in a journal on or after April 7, 2008
    3. Arises from:
      • Any direct funding from an NIH grant or cooperative agreement active in Fiscal Year 2008 or beyond
      • Any direct funding from an NIH contract signed on or after April 7, 2008
      • Any direct funding from the NIH Intramural Program
      • An NIH employee
  2. Address Copyright - Ensure that the journal's publishing agreement allows the paper to be posted to PubMed Central in accordance with the the NIH Public Access Policy
  3. Submit Papers - Submit final submitted manuscript or published paper to PubMed Central
  4. Cite Papers - Include the PMCID at the end of the full citation of the article whenever it is cited
Carroll, M. (2011). Why full open access matters. PLoS Biology, 9(11): e1001210.
A paper about issues in preserving paper content has been published by Wouters (2008). A report commissioned by the National Science Foundation (2008) highlighted some of the issues and challenges for digital preservation. It concluded the following:
  • Inadequacy of funding models to address long-term access and preservation needs
  • Confusion and/or lack of alignment between stakeholders, roles, and responsibilities with respect to digital access and preservation
  • Inadequate institutional, enterprise, and/or community incentives to support the collaboration needed to reinforce sustainable economic models
  • Complacency that current practices are “good enough”
  • Fear that digital access and preservation is too big to take on
Nearly a decade later, digital preservation issues are still a concern, perhaps even more so as fewer and fewer paper copies of scholarly works are published (Kirchoff, 2015).
Wouters, J. (2008). Chemistry. Coming soon to a library near you? Science, 322: 1196-1198.
Anonymous (2008). Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation, Blue Ribbon Task Force on Sustainable Digital Preservation and Access.
Kirchhoff, A, Morrissey, S, et al. (2015). Networked Information's Risky Future: The Promises and Challenges of Digital Preservation. EDUCAUSE Review, March 2, 2015.
Two other preservation activities have emerged:
The importance of preservation of the Web is also recognized as increasingly important. In two articles, Niu describes an overview of approach to Web archiving (Niu, 2012a) as well as the functionality offered by various current archives (Niu, 2012b).

Current major initiatives of the National Digital Information Infrastructure Preservation Program (NDIIPP) include:
Niu, J. (2012a). An Overview of Web Archiving. D-Lib Magazine. March/April 2012.
Niu, J. (2012b). Functionalities of Web Archives. D-Lib Magazine. March/April 2012.
Rosenthal, DSH and Vargas, DL (2012). LOCKSS Boxes in the Cloud. Palo Alto, CA, LOCKSS Program, Stanford University Libraries.
A further evolution of the informationist concept has been an entire service devoted to not only answering the complex clinical questions of care teams but also helping to develop clinical order sets as well as a patient portal (Giuse, 2010). An evaluation study based on focus groups and interviews found clinical teams valued their expertise and increased use of information resources (Grefsheim, 2010). Additional evaluation studies show the enduring value of health science librarians as well (Marshall, 2013).

A systematic review of studies of librarian-provided services in healthcare setting (Perrier, 2014) found documented benefit for participants in training programs (e.g., students and residents) in improving skills in searching the literature to use research evidence for clinical decision-making. It also found studies showing that services provided to clinicians were effective in time-saving for healthcare professionals and providing relevant information for decision-making. A couple studies showed that such services led to decreased patient length of stay when literature searches related to a patient’s case were performed for clinicians. There were no studies that assessed the value of these services for researchers or patients.
Giuse, N., Williams, A., et al. (2010). Integrating best evidence into patient care: a process facilitated by a seamless integration with informatics tools. Journal of the Medical Library Association, 98: 220-222.
Grefsheim, SF, Whitmore, SC, et al. (2010). The informationist: building evidence for an emerging health profession. Journal of the Medical Library Association. 98: 147-156.
Marshall, JG, Sollenberger, J, et al. (2013). The value of library and information services in patient care: results of a multisite study. Journal of the Medical Library Association. 101: 38-46.
Perrier, L, Farrell, A, et al. (2014). Effects of librarian-provided services in healthcare settings: a systematic review. Journal of the American Medical Informatics Association. 21: 1118-1124.
There continue to be development and evaluation of online medical expert services. One study assessed an “ask the doctor” text-based consultation service in Sweden and reported 38,217 inquiries over 4 years (Umefjord, 2008). Another described a Facebook-based service aimed at parents of young children, with about 70 questions per month asked and the largest proportion concerning infection (Helve, 2014).
Umefjord, G., Sandström, H., et al. (2008). Medical text-based consultations on the Internet: a 4-year study. International Journal of Medical Informatics, 77: 114-121.
Helve, O (2014). A medical consultation service on Facebook: descriptive analysis of questions answered. Journal of Medical Internet Research. 16(9): e202.

Last updated April 16, 2017