Information Retrieval: A Health and Biomedical Perspective, Third Edition

William Hersh, M.D.

Chapter 2 Update

This update contains all new references cited in the author's OHSU BMI 514/614 course for Chapter 2.

One property that continues with scientific literature is its incessant growth, both in science generally as well as biomedicine and health more specifically. According to the de Solla Price doubling time in the book, the number of all scientific papers published by 2006 should have been about 50 million, which was verified by Jinha (2009) by other means. As of 2009, it was estimated that there were 25,400 journals in science, technology, and medicine, publishing 1.5 million articles annually and increasing at a rate of 3.5% (Ware, 2009).

There is also plenty of evidence for continued growth in biomedicine and health. By 2010, it was estimated that 75 clinical trials and 11 systematic reviews were being published per day (Bastian, 2010). Another measure of growth comes from the Key MEDLINE Indicators of records in the MEDLINE database. According to data from US Fiscal Year 2016, a total of 869,666 new records were added from 5,623 journals, bringing the total number of records to over 22 million. Fraser and Dunstan (2010), looking at the narrow field of echocardiography, noted the "impossibility of being expert." An accompanying editorial to their study noted that we need a "machine to help" manage this information (Smith, 2010).

Another aspect of growth is in the number of authors per paper. There continues to be an increase in the number of papers with more than 50, 100, and even 1000 authors (King, 2012). Clearly that many people cannot have written such papers, but they likely did participate in the research, so authorship is one method to acknowledge that.
Jinha, AE (2010). Article 50 million: an estimate of the number of scholarly articles in existence. Learned Publishing. 23: 258-263.
Ware, M and Mabe, M (2009). The STM report. An overview of scientific and scholarly journal publishing. Oxford, England, International Association of Scientific, Technical and Medical Publishers.
Bastian, H, Glasziou, P, et al. (2010). Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Medicine. 7(9): e1000326.
Fraser, AG and Dunstan, FD (2010). On the impossibility of being expert. British Medical Journal. 341: 1234-1235.
Smith, R (2010). Strategies for coping with information overload. British Medical Journal. 341: 1281-1282.
King, C (2012). Multiauthor Papers: Onward and Upward. Science Watch, July, 2012.
An additional property noted in the book is fragmentation, which researchers often trying to maximize their number of publications by breaking their research into "minimal publishable units." One study found that the more publications that result from a project, the more likely those publications will be cited in total, especially when the articles are long (Bornmann, 2007). Bornmann, L and Daniel, HD (2007). Multiple publication on a single research study: Does it pay? The influence of number of research articles on total citation counts in biomedicine. Journal of the American Society for Information Science & Technology. 58: 1100-1107.
Linkage of the scientific literature also continues to be an important - and debated - property. The study of linkage of scientific papers is bibliometrics, and a couple recent publications provide updated overviews of the field (Berger, 2014; Anonymous, 2014). The overviews note the continued primacy of Lotka's and Bradford's Laws. One recent analysis criticized the latter due to ambiguity of how the subjects covered in journals, leading to different results of its application (Nicolaisen, 2007).

The nature of linkage has been changing in the modern era of electronic publishing. Since 1990, the number of highly cited papers coming from highly cited journals has been diminishing (Lozano, 2011). Also in recent years, the number of highly cited papers in "non-elite" journals has increased, perhaps reflecting the democratizing effect of the ease of access of journals in electronic form (Acharya, 2014).

New research also provides more insights into citations. Letchford et al. have found that papers with shorter titles (2015) and with shorter abstracts containing more frequently used words (2016) are more likely to be cited. Greenberg (2010) notes, however, that while citations can indicate authority of papers, they can also lead to unfounded authority of claims. Trinquart et al. (2016) found in papers on one topic, the controversy of the role of reducing salt intake, that papers tended to be polarized in their citations, with more citations supporting the conclusions of a given paper.

Another challenge is that sometimes papers get cited that do not even exist, and then cited subsequently by other papers. This was shown in "the most influential paper Gerard Salton never wrote," a paper from the IR pioneer that was cited repeatedly (fortunately never by me!) (Dubin, 2004).
Berger, JM and Baker, CM (2014). Bibliometrics: an overview. RGUHS Journal of Pharmaceutical Sciences. 4(3): 81-92.
Anonymous (2014). Bibliometrics: an overview. Leeds, England, University of Leeds.
Nicolaisen, J and Hjørland, B (2007). Practical potentials of Bradford’s law: a critical examination of the received view. Journal of Documentation. 63: 359-377.
Lozano, GA, Larivière, V, et al. (2011). The weakening relationship between the impact factor and papers' citations in the digital age. Journal of the American Society for Information Science & Technology. 63: 2140-2145.
Acharya, A, Verstak, A, et al. (2014). Rise of the Rest: The Growing Impact of Non-Elite Journals. Mountain View, CA, Google.
Letchford, A, Moat, HS, et al. (2015). The advantage of short paper titles. Royal Society Open Science. 2(8): 150266.
Letchford, A, Preis, T, et al. (2016). The advantage of simple paper abstracts. Journal of Infometrics. 10: 1-8.
Greenberg, SA (2010). How citation distortions create unfounded authority: analysis of a citation network. British Medical Journal. 339: b2680.
Trinquart, L, Johns, DM, et al. (2016). Why do we think we know what we know? A metaknowledge analysis of the salt controversy. International Journal of Epidemiology. 45: 251-260.
Dubin, D (2004). The most influential paper Gerard Salton never wrote. Library Trends. 52: 748-764.
The debate over the impact factor (IF) and other citation-related metrics continues to go on. Some challenges are related to the measures themselves. Kulkarni et al. (2009), for example, found that three sources of citations - Web of Science, Scopus, and Google Scholar - varied "quantitatively and qualitatively" in their citation counts for articles published in three journals, JAMA, Lancet, and the New England Journal of Medicine. This is most likely due to the differing coverage of the sources. A related commentary defends the approach taken by Web of Science (which includes the Science Citation Index) (McVeigh and Mann, 2009). Althouse et al. (2009) noted that IF has in general been increasing over time, which their research attributes to the growing length of reference lists in journal articles. They also note that IF varies widely across fields, as continues to be shown comparing medical and information science journals.

The tables below shows 2016 (data from 2015) updates of IF from Table 2.1 in the book, which comes from Journal Citation Reports for general medical journals, medical informatics journals, and computer science/information systems journals.

Top General Medical Journals by Impact Factor for 2015:
New England Journal of Medicine - 59.6
Lancet - 44.0
JAMA-Journal of the American Medical Association - 37.7
BMJ-British Medical Journal - 19.7
Annals of Internal Medicine - 16.6
JAMA Internal Medicine - 14.0
PLoS Medicine - 13.6
BMC Medicine - 8.01
Journal of Cachexia Sarcopenia and Muscle - 7.88
Journal of Internal Medicine - 7.80
Canadian Medical Association Journal - 6.72
Cochrane Database of Systematic Reviews - 6.10
Mayo Clinic Proceedings - 5.92
American Journal of Medicine - 5.61
Annals of Family Medicine - 5.09

Top Medical Informatics Journals by Impact Factor for 2015:
Statistical Methods in Medical Research - 4.63
Journal of Medical Internet Research - 4.53
Journal of the American Medical Informatics Association - 3.43
Medical Decision Making - 2.91
Journal of Biomedical Informatics - 2.45
International Journal of Medical Informatics - 2.36
Artificial Intelligence in Medicine - 2.14
IEEE Journal of Biomedical and Health Informatics - 2.09
BMC Medical Informatics and Decision Making - 2.04
Computer Methods and Programs in Biomedicine - 1.86

Top 10 Computer Science, Information Systems Journals by Impact Factor for 2015:
IEEE Communications Surveys and Tutorials - 9.22
MIS Quarterly - 5.38
Journal of Information Technology - 4.78
IEEE Wireless Communications - 4.15
Journal of Cheminformatics - 3.95
Journal of Chemical Information and Modeling - 3.66
Journal o the American Medical Informatics Association - 3.43
Information Sciences - 3.36
Journal of Management Information Systems - 3.03
Internet Research - 3.02
Journal of the American Society for Information Science and Technology - 2.45
Information Processing & Management - 1.40
ACM Transactions on Information Systems - 0.977
Information Retrieval Journal - 0.896

There continue to be criticisms of IF, not so much of how it is calculated but rather how it is used. These criticisms tend to fall into two categories. The first category relates to misuse in the assessment of journals and what they do to manipulate the IF. In addition to what was noted in the book, another commentary by Smith (2006) reinforces this.

The second category of criticism of IF consists of what some consider its misapplication to measuring the value of the work of individual scientists. The criticisms in the second category have attracted the lion's share of recent publication and commentary. Many commentators have noted that IF is really a measure that should be applied to journals and not individual scientists (Browman and Stergiou, 2008; Simons, 2008; Campbell, 2009). In an analysis of literature of the role of beta-amyloid in Alzheimer's Disease, Greenberg (2009) found that articles discounting the connection were less likely to be cited, resulting in "unfounded authority" and a "citation bias" in this area of science. This actually harms science, giving researchers the incentive to publish sooner and more frequently than might be warranted by their work (Lawrence, 2008). Another concern is that papers being commented upon, often because other researchers criticize the methods and/or disagree with the conclusions, have higher rates of citations than non-commented papers, potentially giving higher citation to papers whose methods or conclusions may be suspect (Radicchi, 2012). As the latter study notes, similar to show business, any publicity is good publicity. Another analysis focusing on post-publication review finds that IF and number of citations are poor measures of scientific merit (Eyre-Walker and Stoletzki, 2013). Ioannidis et al. (2014) have surveyed highly cited scientists and found that their most cited work is not necessarily their best work. (This is true for myself, as my top-cited paper ever is one I published in 1994 describing my first test collection; see my Google Scholar profile below for details.) All of these concerns have led to a statement, the San Francisco Declaration on Research Assessment, originated by the American Society for Cell Biology, calling for journal-based metrics, most prominently IF, to not be used to evaluate the output of individual scientists (ASCB, 2013).

This has led to the search for other measures that might quantify productivity of scientists. One measure to gain prominence has been the h-index, which is a measure of the number of papers of a scientist that have >h citations (Hirsch, 2005). The h-index gives value to the most highly cited papers by a scientist. It has been shown to be correlated with the number of downloads of one major journal, the Proceedings of the National Academy of Science (Fersht, 2009). The h-index can be used to group researchers into discrete categories. El Emam et al. (2012) have found that the PR6 approach used the US National Science Foundation performs well at grouping medical informatics researchers by their academic rank (i.e., assistant, associate, and full professor), while grouping them into 10 categories (i.e., PR10) performs this grouping even better.

The h-index has been popularized by its prominent use by Google Scholar. Google Scholar provides h-index calculations both at the level of the scientific author as well as the journal. At the author level, this allows a measure of scientific output, which can be ranked across fields, as has been done for biomedical informatics and information retrieval (where my h-index of 63 places me 22nd in the former and 17th in the latter). Another benefit of Google Scholar for scientists is that it allows one to set up a profile page that provides a wealth of information, including a list of top-cited papers as well as h-index metrics. Here is a link to my Google Scholar page.

Another tool for calculating h-index is Publish or Perish, which uses Google Scholar to calculate a wide range of measures for a scientist, with tools to control for author names, in particular authors from different fields who have the same names (Harzing, 2009). Harzing and van der Wal (2009) were calculating the h-index from Google Scholar before Google developed the capability explicitly. The Publish or Perish screen for myself is shown below, with the top-ranking publications not authored by me deleted from the analysis (although the effect of this is minimal). My h-index in this system is similar to the one calculated by Google Scholar (64), while my most highly cited paper continues to be my 1994 paper describing the OHSUMED test collection (Hersh, 1994) (which I still do not believe is my best research work).

Google Scholar also calculates metrics for journals based on the h-index. The site also lists journals by subject areas, such as health and medical sciences broadly as well as medical informatics. One analysis has raised some concerns that the imperfect calculation of h-scores by Google Scholar makes these rankings "unreliable" (Delgado-Lopez-Cozar and Cabezas-Clavijo, 2012). It has also been shown that uploads of false papers that are detected by Google Scholar can manipulate its metrics (Delgado-López-Cózar et al., 2014).

Concerns about the correct identification of authors, which goes beyond calculation of citation metrics, has led to the development of the the ORCID. The ORCID provides a unique identifier for scientific authors that allows more accurate calculation of citation metrics (mine is 0000-0002-4114-5148).

Martinez et al. (2014) have created a new measure, the h-classic, which identifies citation classics in a field. Flor-Martinez et al. (2016) demonstrate its use in three dental-related fields. A variant of the h-index is the g-index, which measures the number of papers of a scientist that have >g citations, on average (Egghe, 2006). The g-index correlates with the total number of citations for an author, whereas the h-index is correlated with the most citations from the most highly cited papers. A number of other measures have been proposed, all of which mostly correlate but also differ in significant ways (Schreiber, 2008; Antonakis and Lalive, 2008; Bollen et al., 2009).

Another measure whose use has been advocated by the US National Institutes of Health is the relative citation ratio (RCR) (Hutchins, 2016). The RCR uses the co-citation network of the paper to field-normalize the number of citations it has received. In other words, it adjusts citations relative to a discipline and allows comparison of peers within it. The NIH has developed a Web site that calculates RCR. One analysis found that while amount and length of grant funding were associated with RCR, there was a diminishing of returns over time (Lauer, 2016)

Others have proposed metrics for scientist productivity that go beyond citations. Most prominent among these is the article-level metrics, or altmetrics (Chamberlain, 2013). Lin et al. (2013)  defined an ontology of article-level metrics that includes five categories and provides examples how each can be measured both for scholars and the public, as shown in the table below. This has been developed into an altmetrics score that gives relative weight to different types of citations, such as mentions in news stories (8), blogs (5), Wikipedia (3), tweets (1), and Facebook (0.25), among others. The altemtrics approach was most early adopted PLoS family of journals (Yan, 2011) but many others now provide it, such as JAMA and BMJ (Warren, 2017). Neylon and Wu (2009) have described some issues associated with article-level metrics. Haustein et al. (2014) have found that only 10% of the biomedical literature is tweeted, and that there is no correlation between tweets and citations.

Citations by editorials, Faculty of 1000
Press article
Citations, full-text mentions
Wikipedia mentions
CiteULike, Mendeley
Science blogs, journal comments
Blogs, Twitter, Facebook
PDF downloads
HTML downloads

Another attempt at multi-factorial measurement of impact is the ImpactStory Project, which attempts to measure impact beyond just papers, such as of research groups, funders, and electronic repositories and collections. The leader of the latter has noted that one of the major US government science research funders, the National Science Foundation, has changed from asking the principal investigator to list his or her publications to asking for research products, of which publications are only one category (Piwowar, 2013). The site can take an ORCID and provide an overview of a researcher's impact (such as mine, noting that I am in the 10% of scholars of research mentions, which is no doubt due in part to my own use of social media)

Another approach to measuring impact comes from Sarli et al. (2010), who have developed the Becker Library Model for Assessment of Research that assesses impact of research in four dimensions:
  • Research output, e.g., resulting pharmaceutical preparations, software, medical devices, licensing, etc.
  • Knowledge transfer, e.g., cited references, use in clinical guidelines, cited in meta-analysis, etc.
  • Clinical implementation, e.g., items from research output as well as citation in guidelines, coverage by insurance, use in quality measures, etc.
  • Community benefit, e.g., measured in healthcare outcomes, quality of life measurements, etc.

A final alternative approach comes from Ioannidis and Khoury (2014), who advocate the PQRST approach for appraising and rewarding research:

  • P - productivity
  • Q - quality of scientific work
  • R - reproducibility of scientific work
  • S - sharing of data and other resources
  • T - translational influence of research
Kulkarni, A., Aziz, B., et al. (2009). Comparisons of citations in Web of Science, Scopus, and Google Scholar for articles published in general medical journals. Journal of the American Medical Association, 302: 1092-1096.
McVeigh, M. and Mann, S. (2009). The journal impact factor denominator: defining citable (counted) items. Journal of the American Medical Association, 302: 1107-1109.
Althouse, BM, West, JD, et al. (2009). Differences in impact factor across fields and over time. Journal of the American Society for Information Science & Technology. 60: 27-34.
Smith, R (2006). Commentary: the power of the unrelenting impact factor--is it a force for good or harm? International Journal of Epidemiology. 35: 1129-1130.
Browman, H. and Stergiou, K. (2008). Factors and indices are one thing, deciding who is scholarly, why they are scholarly, and the relative value of their scholarship is something else entirely. Ethics in Science and Environmental Politics, 8: 1-3.
Simons, K. (2008). The misused impact factor. Science, 322: 165.
Campbell, P. (2008). Escape from the impact factor. Ethics in Science and Environmental Politics, 8: 5-7.
Greenberg, S. (2009). How citation distortions create unfounded authority: analysis of a citation network. British Medical Journal, 339: b2680.
Lawrence, P. (2008). Lost in publication: how measurement harms science. Ethics in Science and Environmental Politics, 8: 9-11.
Radicchi, F (2012). In science “there is no bad publicity”: Papers criticized in comments have high scientific impact. Scientific Reports. 2: 815.
Eyre-Walker, A and Stoletzki, N (2013). The assessment of science: the relative merits of post-publication review, the impact factor, and the number of citations. PLoS Biology. 11: e1001675.
Ioannidis, JP, Boyack, KW, et al. (2014). Bibliometrics: Is your most cited work your best? Nature. 514: 561-562.
Anonymous (2013). San Francisco Declaration on Research Assessment. Bethesda, MD, American Society for Cell Biology.
Hirsch, J. (2005). An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102: 16569-16572.
Fersht, A. (2009). The most influential journals: Impact Factor and Eigenfactor. Proceedings of the National Academy of Sciences, 106: 6883-6884.
El Emam, K, Arbuckle, L, et al. (2012). Two h-index benchmarks for evaluating the publication performance of medical informatics researchers. Journal of Medical Internet Research. 14(5): e144.
Harzing, A. and van der Wal, R. (2009). A Google Scholar h-index for journals: An alternative metric to measure journal impact in economics and business. Journal of the American Society for Information Science & Technology, 60: 41-46.
Harzing, A. (2009). The Publish or Perish Book: Your Guide to Effective and Responsible Citation Analysis. Melbourne, Australia. Tarma Software Research Pty Ltd.
Hersh, W., Buckley, C., et al. (1994). OHSUMED: an interactive retrieval evaluation and new large test collection for research. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland. Springer-Verlag. 192-201.
Delgado-López-Cózar, E and Cabezas-Clavijo, A (2012). Google Scholar Metrics: an unreliable tool for assessing scientific journals. El profesional de la información. 21: 419-427.
Delgado-López-Cózar, E, Robinson-García, N, et al. (2014). The Google Scholar experiment: how to index false papers and manipulate bibliometric indicators. Journal of the American Society for Information Science & Technology. 65: 446-454.
Martınez, MA, Herrera, M, et al. (2014). H-Classics: characterizing the concept of citation classics through H-index. Scientometrics. 98: 1971–1983.
Flor-Martínez, M, Galindo-Moreno, P, et al. (2014). H-classic: a new method to identify classic articles in implant dentistry, periodontics, and oral surgery. Clinical Oral Implants Research. 27: 1317-1330.
Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69: 131-152.
Schreiber, M. (2008). An empirical investigation of the g-index for 26 physicists in comparison with the h-index, the A-index, and the R-index. Journal of the American Society for Information Science & Technology, 59: 1513-1522.
Antonakis, J. and Lalive, R. (2008). Quantifying scholarly impact: IQp versus the Hirsch h. Journal of the American Society for Information Science & Technology, 59: 956-969.
Bollen, J., VandeSompel, H., et al. (2009). A principal component analysis of 39 scientific impact measures. PLoS ONE, 4(6): e6022.
Hutchins, BI, Yuan, X, et al. (2016). Relative citation ratio (RCR): a new metric that uses citation rates to measure influence at the article level. PLoS Biology. 14(9): e1002541.
Lauer, M (2016). Applying the Relative Citation Ratio as a Measure of Grant Productivity. Open Mike.
Chamberlain, S (2013). Consuming article-level metrics: observations and lessons. Information Standards Quarterly. 25(2): 4-13.
Lin, J and Fenner, M (2013). Altmetrics in evolution: defining & redefining the ontology of article-level metrics. Information Standards Quarterly. 25(2): 20-26.
Anonymous (2016). How is the Altmetric Attention Score calculated?
Yan, KK and Gerstein, M (2011). The spread of scientific information: insights from the web usage statistics in PLoS article-level metrics. PLoS ONE. 6(5): e19917.
Warren, HR, Raison, N, et al. (2017). The rise of altmetrics. Journal of the American Medical Association. 317: 131-132.
Neylon, C. and Wu, S. (2009). Article-level metrics and the evolution of scientific impact. PLoS Biology, 7(11): e1000242.
Haustein, S, Peters, I, et al. (2014). Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature. Journal of the American Society for Information Science & Technology. 65: 656-669.
Piwowar, H (2013). Altmetrics: Value all research products. Nature. 493(7431): 159.
Sarli, C., Dubinsky, E., et al. (2010). Beyond citation analysis: a model for assessment of research impact. Journal of the Medical Library Association, 98: 17-23.
Ioannidis, JP and Khoury, MJ (2014). Assessing value in biomedical research: the PQRST of appraisal and reward. Journal of the American Medical Association. 312: 483-484.

The Erdös Number Project has a new URL:
The Web Impact Factor has been evaluated and found to be difficult to compare across countries (Noruzi, 2006). Noruzi, A (2006). The Web Impact Factor: a critical review. The Electronic Library. 24: 490-500.
The Haynes 4S model was updated to the 5S (Haynes, 2006) and now the 6S (DiCenso et al., 2009) model. The 5S model adds summaries, i.e., Studies-Syntheses-Symopses-Summaries-Systems. The 6S model adds synopses of summaries, i.e., Studies-Syntheses-Symopses-Summaries-Synopses of Summaries-Systems. I actually prefer the 4S model for its simple elegance, in that syntheses are exhaustive systematic reviews whereas synopses are more digestible summaries of them. Haynes, R. (2006). Of studies, syntheses, synopses, summaries, and systems: the "5S" evolution of information services for evidence-based healthcare decisions. Evidence-Based Medicine, 11: 162-164.
DiCenso, A., Bayley, L., et al. (2009). ACP Journal Club. Editorial: Accessing preappraised evidence: fine-tuning the 5S model into a 6S model. Annals of Internal Medicine, 151(6): JC3-2, JC3-3.
A number of societal developments, not only in technology, continue to change the production of scientific information. Science has pretty much made the transition to fully electronic publishing, even if some people do still receive paper copies of journals in the mail. While there have some challenges, the life-cycle of scientific information is still mostly intact. Science still mostly operates from the cycle of research being done, with results being described and analyzed in manuscripts, which are then peer-reviewed and ultimately published. Perhaps the newest development is the growing advocacy for availability of underlying data, not only for transparency but also to allow re-analysis (Hudson, 2015). This ties into growing concerns about reproducibility of scientific results.

While some disciplines have required publishing of data for years (e.g., genomics, physics), there are a number of concerns with publication of clinical data, especially that from randomized clinical trials (RCTs) (Ross, 2013). One obvious concern is privacy, as there is always the potential to re-identify individuals who took part in trials. Another concern is whether the data will be used responsibly. Mello et al. (2013) have laid out principles for responsible re-use of RCT data. Many journals have developed policies around availability of data from the papers they publish, although adherence to such policies was found to be incomplete in a majority of papers from high IF journals (Alsheikh-Ali, 2011). This study also found that journals lacking data-availability policies had no publicly available data. Of course, there is value in making data available for re-analysis. Ebrahim et al. (2014) reviewed 37 re-analyses of RCT data, finding that 13 (35%) led to interpretation of results different from the original paper, with a majority identifying a larger group of patients who might benefit from treatment.

One of the most comprehensive data publishing policies comes from SpringerNature. They define four types of policies:
Type 1 - Data sharing and data citation is encouraged
Type 2 - Data sharing and evidence of data sharing encouraged  
Type 3 - Data sharing encouraged and statements of data availability required
Type 4 - Data sharing, evidence of data sharing and peer review of data required
An example of Type 4 comes from the journal Scientific Data.

The CrossRef standard now allows linking of clinical trials publications with their data (Shanahan, 2016).

Hudson, KL and Collins, FS (2015). Sharing and reporting the results of clinical trials. Journal of the American Medical Association. 313: 355-356.
Ross, JS, Mocanu, M, et al. (2013). Time to publication among completed clinical trials. JAMA Internal Medicine. 173: 825-828.
Mello, MM, Francer, JK, et al. (2013). Preparing for responsible sharing of clinical trial data. New England Journal of Medicine. 369: 1651-1658.
Alsheikh-Ali, AA, Qureshi, W, et al. (2011). Public availability of published research data in high-impact journals. PLoS ONE. 6(9): e24357.
Ebrahim, S, Sohani, ZN, et al. (2014). Reanalyses of randomized clinical trial data. Journal of the American Medical Association. 312: 1024-1032.
Shanahan, D (2016). Clinical trial data and articles linked for the first time. CrossRef Blog.

What is the right approach to sharing clinical research data?

While many people and organizations have long called for data from randomized clinical trials (RCTs) and other clinical research to be shared with other researchers for re-analysis and other re-use, the impetus for it accelerated about a year ago with two publications. One was a call by the International Committee of Medical Journal Editors (ICMJE) for de-identified data from RCTs to be shared as condition of publication (Taichman et al., 2016). The other was the publication of an editorial in the New England Journal of Medicine wondering whether those who do secondary analysis of such data were "research parasites" (Longo and Drazen, 2016). The latter set off a fury of debate across the spectrum, e.g. (Berger et al., 2016), from those who argued that primary researchers labored hard to devise experiments and collect their data, thus having claim to control over it, to those who argued that since most research is government-funded, the taxpayers deserve to have access to that data. (Some of those in the latter group proudly adopted the "research parasite" tag.)

Many groups and initiatives have advocated for the potential value of wider re-use of data from clinical research. The cancer genomics community has long seen the value of a data commons to facilitate sharing among researchers (Grossman et al., 2016). Recent US federal research initiatives, such as the Precision Medicine Initiative (Collins et al., 2015) and the 21st Century Cures program (Kesselheim and Avorn, 2017) envision an important role for large repositories of data to accompany patients in cutting-edge research. There are a number of large-scale efforts in clinical data collection that are beginning to accumulate substantial amounts of data, such as the National Patient-Centered Clinical Research Network (PCORNet) and the Observational Health Data Sciences and Informatics (OHDSI) initiative.

As with many contentious debates, there are valid points on both sides. The case for requiring publication of data is strong. As most research is taxpayer-funded, it only seems fair that those who paid are entitled to all the data for which they paid. Likewise, all of the subjects were real people who potentially took risks to participate in the research, and their data should be used for discovery of knowledge to the fullest extent possible. And finally, new discoveries may emerge from re-analysis of data. This was actually the case that prompted the Longo “research parasites” editorial, which was praising the “right way” to do secondary analysis, including working with the original researchers. The paper that the editorial described had discovered that the lack of expression of a gene (CDX2) was associated with benefit from adjuvant chemotherapy (Dalerba et al., 2016).

Some researchers, however, are pushing back. They argue that those who carry out the work of designing, implementing, and evaluating experiments certainly have some exclusive rights to the data generated by their work. Some also question whether the cost is a good expenditure of limited research dollars, especially since the demand for such data sets may be modest and the benefit is not clear. One group of 282 researchers in 33 countries, the International Consortium of Investigators for Fairness in Trial Data Sharing, notes that there are risks, such as misleading or inaccurate analyses as well as efforts aimed at discrediting or undermining the original research (Anonymous, 2016). They also express concern about the costs, given that there are over 27,000 RCTs performed each year. As such, this group calls for an embargo on reuse of data for two years plus another half-year for each year of the length of the RCT. Even those who support data sharing point out the requirement for proper curation, wide availability to all researchers, and appropriate credit to and involvement of those who originally obtained the data (Merson et al., 2016).

There are a number of challenges to more widespread dissemination of RCT data for re-use. A number of pharmaceutical companies have begun making such data available over the last few years. Their experience has shown that the costs are not insignificant (estimated to be about $30,000-$50,000 per RCT) and a scientific review process is essential (Rockhold et al., 2016). Another analysis found that the time to re-analyze data sets can be long, and so far the number of publications have been few (Strom et al., 2016). An additional study found that identifiable data sets were only explicitly visible from 12% of all clinical research funded by the National Institutes of Health in 2011 (Read et al., 2015). This means that from 2011 alone, there are possibly more than 200,000 data sets that could be made publicly available, indicating some type of prioritization might be required.

There are also a number of informatics-related issues to be addressed. These not only include adherence to standards and interoperability (Kush and Goldman, 2016), but also attention to workflows, integration with other data, such as that from electronic health records (EHRs), and consumer/patient engagement (Tenenbaum et al., 2016). Clearly the trialists who generate the data must be given incentives for their data to be re-used (Lo and DeMets, 2016).

There is definitely great potential for re-use of RCT and other clinical research data to advanced research and ultimately health and clinical care for the population. However, it must be done in ways that represent an appropriate use of resources and result in data that truly advances research, clinical care, and ultimately individual health.
Taichman, DB, Backus, J, et al. (2016). Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. New England Journal of Medicine. 374: 384-386.
Longo, DL and Drazen, JM (2016). Data sharing. New England Journal of Medicine. 374: 276-277.
Berger, B, Gaasterland, T, et al. (2016). ISCB’s initial reaction to The New England Journal of Medicine Editorial on data sharing. PLoS Computational Biology. 12(3): e1004816.
Grossman, RL, Heath, AP, et al. (2016). Toward a shared vision for cancer genomic data. New England Journal of Medicine. 379: 1109-1112.
Collins, FS and Varmus, H (2015). A new initiative on precision medicine. New England Journal of Medicine. 372: 793-795.
Kesselheim, AS and Avorn, J (2017). New "21st Century Cures" legislation: speed and ease vs science. Journal of the American Medical Association. Epub ahead of print.
Dalerba, P, Sahoo, D, et al. (2016). CDX2 as a prognostic biomarker in stage II and stage III colon cancer. New England Journal of Medicine. 374: 211-222.
Anonymous (2016). Toward fairness in data sharing. New England Journal of Medicine. 375: 405-407.
Merson, L, Gaye, O, et al. (2016). Avoiding data dumpsters — toward equitable and useful data sharing. New England Journal of Medicine. 374: 2414-2415.
Rockhold, F, Nisen, P, et al. (2016). Data sharing at a crossroads. New England Journal of Medicine. 375: 1115-1117.
Strom, BL, Buyse, ME, et al. (2016). Data sharing — is the juice worth the squeeze? New England Journal of Medicine. 375: 1608-1609.
Read, KB, Sheehan, JR, et al. (2015). Sizing the problem of improving discovery and access to NIH-funded data: a preliminary study. PLoS ONE. 10(7): e0132735.
Kush, R and Goldman, M (2016). Fostering responsible data sharing through standards. New England Journal of Medicine. 370: 2163-2165.
Tenenbaum, JD, Avillach, P, et al. (2016). An informatics research agenda to support precision medicine: seven key areas. Journal of the American Medical Informatics Association. 23: 791-795.
Lo, B and DeMets, DL (2016). Incentives for clinical trialists to share data. New England Journal of Medicine. 375: 1112-1115.
Another concern about science is the lack of reproducibility of research findings. This problem was brought to light by two pharmaceutical companies noting the declining success of Phase II clinical trials for cancer chemotherapy (Arrowsmith, 2011). Begley and Ellis (2012) found only 11-25% of published preclinical studies were reproducible, with similar results obtained by Prinz et al. (2011). A survey of 1500 scientists in 2015 found that over half believed there was a "crisis" in reproducibility (Baker, 2016). These results and concerns led to the Reproducibility Initiative, an effort to validate 50 landmark cancer biology studies (Errington, 2014). Early results have started to achieve publication, although not enough studies have been done to draw larger conclusions (Anonymous, 2017; Nosek and Errington, 2017).

The identification of this problems led to concerns in other fields, such as psychology (Yong, 2012). A subsequent attempt to replicate 100 studies from psychology by the Open Science Collaboration found that only 36% achieved statistical significance, with the mean effect about one-half of the original studies (Anonymous, 2015).

There are also informatics-related problems related to reproducbility of research. Many scientific researchers write code but are not always well-versed in best practices of testing and error detection (Merali, 2010). Scientists have a history of relying on incorrect data or models (Sainani, 2011). In addition, they may also not be good about selection of best software packages for their work (Joppa, 2013). One example comes from a widely used algorithm for interpreting functional magnetic resonance imaging (fMRI) question the validity of up to 40,000 studies that make use of fMRI (Eklund et al., 2016).

Informatics can also, however, play a role in addressing the reproducibility problem. For example, it may leads to approaches allowing the more precise identification of research resources (organisms, cell lines, genes, reagents, etc.) in the biomedical literature (Vasilevsky, 2013). In the case of "omics" research, release of computer code can allow inspection for validity (Baggerly, 2011). In addition, at least one study has shown that just reproducing the data analysis overturned the results, making the case for more widespread availability of primary data from studies (LeNoury, 2015).

NIH has launched a Reproducibility and Rigor initiative. New proposals for research must address four areas:
  • Scientific Premise of Proposed Research
  • Rigorous Experimental Design
  • Consideration of Sex and Other Relevant Biological Variables
  • Authentication of Key Biological and/or Chemical Resources
Freedman et al. (2015) provide a practical framework focused on economically efficient means to address the problem, which includes similar recommendations to those from NIH, including training, use of validated reagents in experiments, and standardized workflows and identification of research materials and processes. Ioannidis (2017) notes the problem is more serious for preclinical than clinical research.
Arrowsmith, J (2011). Trial watch: Phase II failures: 2008-2010. Nature Reviews Drug Discovery. 10: 328-329.
Begley, CG and Ellis, LM (2012). Raise standards for preclinical cancer research. Nature. 483: 531-533.
Prinz, F, Schlange, T, et al. (2011). Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery. 10: 712.
Baker, M (2016). Is there a reproducibility crisis? Nature. 533: 452-454.
Errington, TM, Iorns, E, et al. (2014). Science forum: An open investigation of the reproducibility of cancer biology research. eLife. 3: e04333.
Anonymous (2017). Reproducibility in cancer biology: the challenges of replication. eLife. 2017(6): e23693.
Nosek, BA and Errington, TM (2017). Making sense of replications. eLife. 2017(6): e23383.
Yong, E (2012). Replication studies: Bad copy. Nature. 485: 298-300.
Anonymous (2015). Estimating the reproducibility of psychological science. Science. 349: aac4716.
Merali, Z (2010). Computational science: ...Error. Nature. 467: 775-777.
Sainani, K (2011). Error! – What Biomedical Computing Can Learn From Its Mistakes. Biomedical Computation Review, September 1, 2011.
Joppa, LN, McInerny, G, et al. (2013). Troubling trends in scientific software use. Science. 340: 814-815.
Eklund, A, Nichols, TD, et al. (2016). Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences. 113: 7900–7905.
Vasilevsky, NA, Brush, MH, et al. (2013). On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ. 5(1): e148.
Baggerly, KA and Coombes, KR (2011). What information should be required to support clinical "omics" publications? Clinical Chemistry. 57: 688-690.
LeNoury, J, Nardo, JM, et al. (2015). Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. British Medical Journal. 351: h4320.
Freedman, LP, Cockburn, TM, et al. (2015). The economics of reproducibility in preclinical research. PLoS Biology. 13(6): e1002165.
Ioannidis, JPA (2017). Acknowledging and overcoming nonreproducibility in basic and preclinical research. Journal of the American Medical Association: Epub ahead of print.
There continue to be other concerns about the scientific process. Chalmers and others continue to criticize biomedical research generally, noting the "waste" of research questions not being relevant to patients or physicians, design and methods not appropriate, reports never published, and some that are published being biased or not usable (Chalmers, 2009; MacLeod, 2014). Another concern is the "hypercompetitive" grant funding environment that has emerged in the US. This results in younger scientists having difficulty establishing their careers, while older scientists may be less willing to pursue high-risk research that may be more difficult to get funded (Alberts, 2014). Both Rzhetsky et al. (2015) and Smaldino and McElreath (2016) have demonstrated that the current environment of science favors scientists being conservative and incremental in areas they choose to perform research. In the meantime, a growing amount of "stealth" research is bypassing the scientific literature (and its peer-review process) entirely (Ioannidis, 2015). Chalmers, I and Glasziou, P (2009). Avoidable waste in the production and reporting of research evidence. Lancet. 374: 86-89.
Macleod, MR, Michie, S, et al. (2014). Biomedical research: increasing value, reducing waste. Lancet. 383: 101-104.
Alberts, B, Kirschner, MW, et al. (2014). Rescuing US biomedical research from its systemic flaws. Proceedings of the National Academy of Sciences. 111: 5773-5777.
Rzhetsky, A, Foster, JG, et al. (2015). Choosing experiments to accelerate collective discovery. Proceedings of the National Academy of Sciences. 112: 14569-14574.
Smaldino, PE and McElreath, R (2016). The natural selection of bad science. Royal Society Open Science. 3(9): 160384.
Ioannidis, JP (2015). Stealth research: is biomedical innovation happening outside the peer-reviewed literature? Journal of the American Medical Association. 313: 663-664.
There continue to be criticisms of the peer-review process. The "winner's curse" pointed out by Neal et al. (2008) points to the distortion that may be happening with more positive results being published. Schooler (2011) attributes the "decline effect" of scientific results to the most dramatically positive findings being published first and foremost. A variety of solutions have been proposed for this problem. In the field of political science, Nyhan (2014, 2014) has advocated that "rejected research" meeting a basic standard of conduct should be published in some manner. Some have proposed that biomedicine take the approach of physics and mathematics in having preprint servers that allow scientists post publications that later undergo formal peer review (Berg, Bhalla, et al., 2016).

This is due in part to problems with the peer review process, which earlier research describe in the book notes its inconsistency and biases. More recently, Smith continues to write critically about peer review, calling it "an empty gun" (Smith, 2010). One recent study of peer review gave reviewers papers of identical methods but positive vs. neutral results and found a higher rate of recommending publication for positive (97.3%) over neutral (80.0%) papers (Emerson, 2010). Even studies with negative results do not always provide sufficient information to assess their validity (Herbert, 2002). Siler et al. (2015) assessed over 1000 articles submitted to three elite medical journals, of which about 800 articles were eventually published. While there was an association between peer review scores and subsequent citations, the top 14 cited articles had been rejected by these elite journals. Smith (2015) recently proposed a variety of steps for radically altering publishing of scientific results, essentially requiring publication of protocols and results with data, after which studies meeting a certain screen would be published with a unique identifier. Reviewers would then assess the studies and authors could use that feedback to improve their work, in an auditable way. This would make it easy to publish but also allow a great deal more transparency of the results and their meaning.

Young, NS, Ioannidis, JP, et al. (2008). Why current publication practices may distort science. PLoS Medicine. 5(10): e201.
Schooler, J (2011). Unpublished results hide the decline effect. Nature. 470: 437.
Nyhan, B (2014). Increasing the credibility of political science research: a proposal for journal reforms. Hanover, NH, Dartmouth College.
Nyhan, B (2014). To Get More Out of Science, Show the Rejected Research. New York Times. September 18, 2014.
Berg, JM, Bhalla, N, et al. (2016). Preprints for the life sciences. Science. 352: 899-901.
Smith, R (2010). Classical peer review: an empty gun. Breast Cancer Research. 12(Suppl 4): S13.
Emerson, GB, Warme, WJ, et al. (2010). Testing for the presence of positive-outcome bias in peer review: a randomized controlled trial. Archives of Internal Medicine. 170: 1934-1939.
Hebert, RS, Wright, SM, et al. (2002). Prominent medical journals often provide insufficient information to assess the validity of studies with negative results. Journal of Negative Results in Biomedicine. 30(1): 1.
Siler, K, Lee, K, et al. (2015). Measuring the effectiveness of scientific gatekeeping. Proceedings of the National Academy of Sciences. 112: 360-365.
Smith, R (2015). A better way to publish science. BMJ Opinions.
The book notes that much less research has been done concerning peer review of grant proposals. This is still the case, but there has been some recent research. Li and Agha (2015) evaluated more than 130,000 research project (R01) grants funded by the US NIH from1980 to 2008 and found that better peer-review scores were consistently associated with better research outcomes. A one–standard deviation worse peer-review score among awarded grants was associated with 15% fewer citations, 7% fewer publications, 19% fewer high-impact publications, and 14%fewer follow-on patents. Re-analysis of this data found, however, that for scientists with highly scoring proposals, i.e., those likely to receive funding, the correlation disappeared (Fang et al., 2016).
Li, D and Agha, L (2015). Big names or big ideas: Do peer-review panels select the best science proposals? Science. 348: 434-438.
Fang, FC, Bowen, A, et al. (2016). NIH peer review percentile scores are poorly predictive of grant productivity. eLife. 2016(5): e13323.

Another concern about peer review comes from the proliferation of open-access journals (see Chapter 6). These journals, which typically have an "author pays" model, have great incentive to accept articles. While some of these journals have achieved high prestige (e.g., Biomed Central and PLoS), many of them serve as money-making vehicles for those who publish them (Haug, 2013). Bohannon (2013) wrote a fabricated paper with obvious research flaws. Of the 304 journals open-access journals to which the paper was submitted, over half accepted the paper without any notice of the flaws. Rice (2013) criticized the study for assessing only open-access journals, and asserted that the real problem is the inadequate peer review process and not the open-access journals, which only take advantage of it. Some of these journals that exist solely for profit have become known as "predatory journals" (Moher and Moher, 2016).
Haug, C (2013). The downside of open-access publishing. New England Journal of Medicine. 368: 791-793.
Bohannon, J (2013). Who's afraid of peer review? Science. 342: 60-65.
Rice, C (2013). Open access publishing hoax: what Science magazine got wrong. The Guardian. October 4, 2013.
Moher, D and Moher, E (2016). Stop predatory publishers now: act collaboratively. Annals of Internal Medicine. 164: 616-617.
There is some new research concerning peer reviewers. Callaham and Tercier (2007) looked for characteristics associated with quality of peer reviews. They found that academic rank, status of principal investigator of a grant, and formal training in critical appraisal or statistics were not associated with the quailty of peer reviews, although working in a university-affiliated hospital and under ten years removed from training were. Patel (2014) reviewed the models of peer review and called for more specialization and training in the process.
Callaham, ML and Tercier, J (2007). The relationship of previous training and experience of journal peer reviewers to subsequent review quality. PLoS Medicine. 4(1): e40.
Patel, J (2014). Why training and specialization is needed for peer review: a case study of peer review for randomized controlled trials. BMC Medicine. 12: 128.
Where do scientists choose to publish? One recent study of submission patterns among different journals found a pattern of researchers aiming for high-impact general journals initially and, if articles were not accepted, aiming for more focused journals in their own fields (Calcagno, 2012). As one who academically straddles the computer science and biomedical communities, it is interesting to see diverging approaches to the publication of science. Biomedicine remains firmly wedded to the notion of journal articles being the primary vehicle for dissemination of science. In computer science, on the other hand, the critical metric of success is having a paper published in highly selective (i.e., those with low acceptance rates, on the order of 10-20%) conferences. This is clearly exemplified by data showing that papers from those conferences have higher rates of subsequent citation, comparable to major journals in the field (Chen and Konstan, 2010). Some have concern that the tight timeline of publishing of conference papers do not provide for the back and forth between authors and peer reviewers seen in journals. Terry (2014) has called for a policy of "publish now, judge later." He notes that the large numbers of papers that must be rejected (often 80%) means that peer reviewers will look for obvious flaws and not delve into more nuanced ones. Some of these flaws could potentially be corrected, but the additional short time cycle to publication also prevents papers from being improved along the process. Calcagno, V, Demoinet, E, et al. (2012). Flows of research manuscripts among scientific journals reveal hidden submission patterns. Science. 338: 1065-1069.
Chen, J. and Konstan, J. (2010). Conference paper selectivity and impact. Communications of the ACM, 53(6): 79-83.
Terry, D (2014). Publish now, judge later. Communications of the ACM. 57(1): 44-46.

Although the process of peer review and scientific publishing still seem to be unparalleled in disseminating the results of science, there continue to be problems in a number of areas, including reporting limitations, publication bias, conflict of interest, and fraud.

Reporting limitations mainly stem from incomplete publishing of data and results. Obtaining information from FDA reviews of drugs has improved, but publication of results from clinical trials in the reviews is still incomplete. Turner et al. (2012) extended a previous analysis of studies submitted to the US Food and Drug Administration (FDA) from anti-depressant drugs to antipsychotic drugs and found a similar (though lower in magnitude) bias in positive clinical trial results being more likely to be published in medical journals. In this same area, Roest et al. discovered reporting biases for trials of FDA-approved second-generation antidepressants for anxiety disorders. Although the biases did not significantly inflate estimates of drug efficacy, they led to significant increases in the number of positive studies in the literature. In addition, de Vries et al. (2016) found that journal publications of clinical trials were much less complete than in reporting adverse effects, especially serious ones, than FDA reviews of drugs.

Schwartz and Woloshin (2009) have noted that data from FDA reports does not always make it to appearing in drug label information for clinicians, such as information on harms, limited efficacy, and uncertainty. Spielmans and Parry (2010) reported on internal industry documents that disclose strategies for suppression of such information. A related problem is the reporting of statistically nonsignificant results of primary outcomes in RCTs, with one analysis of 72 such reports finding "spin" in the titles (18%), abstracts (58%), and main text (50%) (Boutron, 2010). Also, sometimes important data from study reports and registries do not make it into journal publication (Wiesler, 2010). Higginson and Munafò (2016) developed a model that demonstrates our scientific system provides incentive for novel results from under-powered studies. A consequence of these reporting problems is that "exposure" of clinicians to information provided directly by pharmaceutical companies sometimes leads to higher prescribing frequency, higher costs, and/or lower prescribing quality (Spurling, 2010).

A well-documented instance of these effects came from clinical trials of the drug oseltamivir (Tamiflu), which is used early in the course of influenza to reduce the severity and duration of symptoms. In performing a systematic review on the efficacy of this drug (Jefferson, 2009), authors found that 60% of data from clinical trials performed by the manufacturer (Roche) had never been released (Doshi, 2012). This led to calls by the BMJ to make all clinical trials data available (anonymized) in addition to publishing papers (Godlee, 2012), building on previous efforts calling for such data to be available to peer reviewers (Godlee, 2011). The BMJ also developed a Web site that ultimately led Roche to release all the data on oseltamivir. A subsequent meta-analysis found that oseltamivir did hasten time to clinical symptom alleviation; lowered risk of lower respiratory tract complications and admittance to hospital; but also led to increased nausea and vomiting (Abbasi, 2014). This led to the conclusion that the drug's harms do not outweigh its benefits and that probably a total of $20 billion was wasted by various countries in stockpiling the medication. This has did lead drug companies to release data on other drugs from other clinical trials (Strom, 2014).

Although scientific reporting of statistics has improved over the years, especially in top-tier journals, there is still a tendency to focus on p values and not confidence intervals, Bayes factors, or effect sizes (Chavalarias, 2016). Bayesian factors have been found to be important, as a study with a hypothesis that a piori is unlikely (or highly likely) will only have its probability of being correct increased a small (or large) amount when restuls from an experiment with a p value indicating low likelihood of chance is obtained (Nuzzo, 2014). This all has led the American Statistical Association to issue guidance on the optimal use of p values in scientific publications (Wasserstein and Lazar, 2016).
Turner, EH, Knoepflmacher, D, et al. (2012). Publication bias in antipsychotic trials: an analysis of efficacy comparing the published literature to the US Food and Drug Administration Database. PLoS Medicine. 9(3)
Roest, AM, deJonge, P, et al. (2014). Reporting bias in clinical trials investigating the efficacy of second-generation antidepressants in the treatment of anxiety disorders: a report of 2 meta-analyses. JAMA Psychiatry. 72: 500-510.
deVries, YA, Roest, AM, et al. (2016). Bias in the reporting of harms in clinical trials of second-generation antidepressants for depression and anxiety: A meta-analysis. European Neuropsychopharmacology. 26: 1752-1759.

Schwartz, L. and Woloshin, S. (2009). Lost in transmission--FDA drug information that never reaches clinicians. New England Journal of Medicine, 361: 1717-1720.
Spielmans, G. and Parry, P. (2010). From evidence-based medicine to marketing-based medicine: Evidence from internal industry documents. Bioethical Inquiry, 7(1).
Boutron, I, Dutton, S, et al. (2010). Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. Journal of the American Medical Association. 303: 2058-2064.
Wieseler, B., Kerekes, M., et al. (2012). Impact of document type on reporting quality of clinical drug trials: a comparison of registry reports, clinical study reports, and journal publications. British Medical Journal, 344: d8141.
Higginson, AD and Munafò, MR (2016). Current incentives for scientists lead to underpowered studies with erroneous conclusions. PLoS Biology. 14(11): e2000995.
Spurling, GK, Mansfield, PR, et al. (2010). Information from pharmaceutical companies and the quality, quantity, and cost of physicians' prescribing: a systematic review. PLoS Medicine. 7(10): e1000352.
Jefferson, T, Jones, M, et al. (2009). Neuraminidase inhibitors for preventing and treating influenza in healthy adults: systematic review and meta-analysis. British Medical Journal. 339: b5106.
Doshi, P, Jefferson, T, et al. (2012). The imperative to share clinical study reports: recommendations from the Tamiflu experience. PLoS Medicine. 9(4): e1001201.
Godlee, F (2012). Clinical trial data for all drugs in current use. British Medical Journal. 345: e7304.
Godlee, F (2011). Goodbye PubMed, hello raw data. British Medical Journal. 342: d212.
Abbasi, K (2014). The missing data that cost $20bn. British Medical Journal. 348: g2695.
Strom, BL, Buyse, M, et al. (2014). Data sharing, year 1--access to data from industry-sponsored clinical trials. New England Journal of Medicine. 371: 2052-2054.
Chavalarias, D, Wallach, JD, et al. (2016). Evolution of reporting p values in the biomedical literature, 1990-2015. Journal of the American Medical Association. 315: 1141-1148.
Wasserstein, RL and Lazar, NA (2016). The ASA's statement on p-values: context, process, and purpose. The American Statistician. 70: 129-133.
Nuzzo, R (2014). Scientific method: statistical errors. Nature. 506: 150-152.
Publication bias also continues to be a problem. An updated systematic review of publication bias found that positive study results continue to be more likely to be published (Dwan, 2013). Analyses in other fields beyond biomedicine have also found evidence of publication bias (Fanelli, 2012; Franco, 2014).

One method for overcoming publication bias has been requirements for registration and timely publication of RCTs. While most clinical trials appear to be adhering to the requirement for registration, some lament that the process has been falling short (Lehman, 2012; Dickersin, 2012). Another analysis by Califf et al. (2012) found that reporting was incomplete and not updated in a timely manner.

One requirement of, stipulated in the FDAAA of 2007, was that RCTs be published within 12 months of completion (Hudson, 2015). A variety of research shows researchers falling short of this, which impairs timely dissemination of results. Ross et al. (2012) found that for publicly funded research in the US, more than half of all trials were not published by 30 months after completion. A similar analysis of trials published in 2009 found an average of two years to publication (Ross et al., 2013). Prayle et al. (2012) also reported that only 22% of trials were published within 12 months. Another analysis by Jones et al. (2013) found that 29% of 171 RCTs involving nearly 300,000 patients that had been registered in by 2009 still had not had their results published by 2013. RCTs appearing in the "big five" journals do have a high rate of adherence to registration (Huser, 2013).

In recent years, the mission of has expanded to include RCT results, and may soon become a requirement (Zarin, 2015). Riveros et al. (2013) found that reporting was likely to be more complete in than subsequent peer-review publications that most clinicians are more likely to read. Hartung et al. (2014) similarly analyzed data in and subsequent peer-reviewed publications, with the latter likely to contain less information about secondary outcomes and lower rates of reported serious adverse events and deaths.

Multiple studies have found that despite the requirement for reporting clinical trials results in ClinicalTrials.Gov, the actual amount of reporting is still low (Anderson et al., 2015; Chen, 2016). Anderson et al. (2015) found that reporting results from highly likely clinically applicable trials was still low, although higher from industry-sponsored trials than NIH or other US government-funded trials.

The systematic review by Dwan et al. (2013) described above also found that in general, studies published tend to have a bias toward reporting positive outcomes.

These problems have led one major funder, Wellcome Trust in the United Kingdom, to plan to self-publish all of its funded research (Bohannon, 2016).
Dwan, K, Gamble, C, et al. (2013). Systematic review of the empirical evidence of study publication bias and outcome reporting bias - an updated review. PLoS ONE. 8(7): e66844.
Fanelli, D (2012). Negative results are disappearing from most disciplines and countries. Scientometrics. 90: 891-904.
Franco, A, Malhotra, N, et al. (2014). Publication bias in the social sciences: unlocking the file drawer. Science. 345: 1502-1505.
Lehman, R. and Loder, E. (2012). Missing clinical trial data. British Medical Journal, 344: d8158.
Dickersin, K and Rennie, D (2012). The evolution of trial registries and their use to assess the clinical trial enterprise. Journal of the American Medical Association. 307: 1861-1864.
Califf, RM, Zarin, DA, et al. (2012). Characteristics of clinical trials registered in, 2007-2010. Journal of the American Medical Association. 307: 1838-1847.
Hudson, KL and Collins, FS (2015). Sharing and reporting the results of clinical trials. Journal of the American Medical Association. 313: 355-356.
Ross, J., Tse, T., et al. (2012). Publication of NIH funded trials registered in cross sectional analysis. British Medical Journal, 344: d7292.
Prayle, A., Hurley, M., et al. (2012). Compliance with mandatory reporting of clinical trial results on cross sectional study. British Medical Journal, 344: d7373.
Jones, CW, Handler, L, et al. (2013). Non-publication of large randomized clinical trials: cross sectional analysis. British Medical Journal. 347: f6104.
Huser, V and Cimino, JJ (2013). Evaluating adherence to the International Committee of Medical Journal Editors' policy of mandatory, timely clinical trial registration. Journal of the American Medical Informatics Association. 20: e169-e174.
Zarin, DA, Tse, T, et al. (2015). The proposed rule for U.S. clinical trial registration and results submission. New England Journal of Medicine. 372: 174-180.
Riveros, C, Dechartres, A, et al. (2013). Timing and completeness of trial results posted at and published in journals. PLoS Medicine. 10(12): e1001566.
Hartung, DM, Zarin, DA, et al. (2014). Reporting discrepancies between the results database and peer-reviewed publications. Annals of Internal Medicine. 160: 477-483.
Anderson, ML, Chiswell, K, et al. (2015). Compliance with results reporting at New England Journal of Medicine. 372: 1031-1039.
Chen, R, Desai, NR, et al. (2016). Publication and reporting of clinical trial results: cross sectional analysis across academic medical centers. British Medical Journal. 352: i637.
Bohannon, J (2016). U.K. research charity will self-publish results from its grantees. Science Insider.
Another ongoing problem is reporting of conflict of interest. The International Committee of Medical Journal Editors (ICMJE) has developed a uniform format for disclosure of competing interests (Drazen, 2009). The rationale is based on research showing that industry-sponsored drug trials are more likely to have positive outcomes in the face of no differences in study methods, leading to concern about selective methods, analysis, or reporting (Lundh, 2012). This has led to views pro and con that medical journals stop publishing research funded by the pharmaceutical industry (Smith, 2014). One group of contrarians, however, notes that there is no evidence that conflict of interest rules have prevented biased research from being published, and that the effort around it is a waste of time and resources (Barton, 2014).

The International Society for Medical Publication Professionals has published a statement on Good Publication Practice for Communicating Company-Sponsored Medical Research (Battisti et al., 2015).

Tierney et al. (2016) note instances when interests of commercial sponsors of research and academic scientists could be aligned in areas such as medication adherence and postmarketing suveillance.
Drazen, J., VanDerWeyden, M., et al. (2009). Uniform format for disclosure of competing interests in ICMJE journals. New England Journal of Medicine, 361: 1896-1897.
Lundh, A, Sismondo, S, et al. (2012). Industry sponsorship and research outcome. Cochrane Database of Systematic Reviews. 12: MR000033.
Smith, R, Gøtzsche, PC, et al. (2014). Should journals stop publishing research funded by the drug industry? British Medical Journal. 348: g171.
Barton, D, Stossel, T, et al. (2014). After 20 years, industry critics bury skeptics, despite empirical vacuum. International Journal of Clinical Practice. 68: 666-673.
Battisti, WP, Wager, E, et al. (2015). Good publication practice for communicating company-sponsored medical research: GPP3. Annals of Internal Medicine. 163: 461-464.
Tierney, WM, Meslin, EM, et al. (2016). Industry support of medical research: important opportunity or treacherous pitfall? Journal of General Internal Medicine. 23: 544-552.
Fraud continues to be a problem as well (e.g., Harris 2009; Anesthesiology News, 2009), with a recent analysis finding an "epidemic," with a tenfold increase between 1975-2012 (Fang, 2012). This analysis also found that nearly 80% of all article retractions were due to fraud, with the remainder due to researcher error. Another analysis of retracted articles found that most did not contain flawed data and were not from authors accused of research misconduct (Grieneisen, 2012). Others have also noted that handling of research misconduct is still an ongoing challenge (Tavare, 2011).

There continue to be important reports of fraud in the scientific literature. One instance uncovered a massive "peer review and citation ring" led by a Taiwanese scientist in the Journal of Vibration and Control, leading to retraction of 60 articles (Anonymous, 2014). Another high-profile event leading to article retractions centered around a cancer researcher from Duke University, Anil Potti, whose work in using gene-expression arrays to predict drug response was found to be fraudulent (Reich, 2011). An additional concern is that research misconduct identified by the US FDA is reflected in the peer-reviewed literature, even when evidence of data fabrication or other forms of research misconduct is uncovered (Seife, 2015). A recent investigation by the Chinese government raised concerns that as many as 80% of all clinical drug trials performed in China have fraudulent results (Woodhead, 2016).

Fraud has also been found to occur as a result of manipulation of the peer review process, where paper authors would suggest peer reviewers who carried out bogus positive reviews (Haug, 2015). This led the publishers Springer and Biomedical Central to retract dozens of papers. Dainsinger (2017) reports the story of a paper of his that was republished (with title and author names changed but otherwise unchanged) by another scientist to whom the paper was sent for peer review. The co-authors of the published paper retracted it (Finelli, 2016).

The continued problem of fraud has led to development of a Web site and blog, Retraction Watch, which tracks retractions and other instances of fraud in the scientific literature. The site has a "Leaderboard" of scientists with the most retractions and a top-ten list of most highly cited retracted papers of all-time, one of which totals over 1000. There continue to be stories such as those of Scott Reuben, who continues to be cited after exposure of his discredited research, with not all citations mentioning the fraud of his work (Bornemann-Cimenti, 2015).

Another type of proliferation that occurs is that of clinical trials, sometimes called "seeding trials," where the goal of the trial itself is to increase the use of a drug or other treatment (Alexander, 2011). One well-documented example of this came with the drug neurontin, with analysis of documents from the manufacturer showing their having more concern for the prescribers of the drug than the patients taking it (Krumholz, 2011).

An additional problem is the proliferation of journals that are not motivated for the best reporting of science. It was revealed in 2009 that the publisher Elsevier had been commission by unnamed publishers to create journals that served as vehicles to reprint previously published, usually positive, papers under the imprimatur of an impartial journal (Grant, 2009). One of those companies turned out to be Merck (Grant, 2009). There has also been the growth of journals in the open-access era that appear to serve as vehicles for profit more than disseminating science (Haug, 2013).
Harris, G (2009). Drug Maker Told Studies Would Aid It, Papers Say. New York, NY. New York Times. March 20, 2009.
Fang, FC, Steen, RG, et al. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences. 109: 17028-17033.
Grieneisen, ML and Zhang, M (2012). A comprehensive survey of retracted articles from the scholarly literature. PLoS ONE. 7(10): e44118.
Tavare, A. (2011). Managing research misconduct: is anyone getting it right? British Medical Journal, 343: d8212.
Anonymous (2014). Retraction Notice. Journal of Vibration and Control. 20: 1601-1604.
Reich, ES (2011). Cancer trial errors revealed. Nature. 469: 139-140.
Seife, C (2015). Research misconduct identified by the US Food and Drug Administration: out of sight, out of mind, out of the peer-reviewed literature. JAMA Internal Medicine. 175: 567-577.
Woodhead, M (2016). 80% of China’s clinical trial data are fraudulent, investigation finds. British Medical Journal. 355: i5396.
Haug, CJ (2015). Peer-review fraud — hacking the scientific publication process. New England Journal of Medicine. 373: 2393-2395.
Dansinger, M (2017). Dear plagiarist: a letter to a peer reviewer who stole and published our manuscript as his own. Annals of Internal Medicine. 166: 143.
Finelli, C, Crispino, P, et al. (2016). Retraction: The improvement of large High-Density Lipoprotein (HDL) particle levels, and presumably HDL metabolism, depend on effects of low-carbohydrate diet and weight loss. EXCLI Journal. 15: 570.
Bornemann-Cimenti, H, Szilagyi, IS, et al. (2015). Perpetuation of retracted publications using the example of the Scott S. Reuben case: incidences, reasons and possible improvements. Science and Engineering Ethics. 22: 1063-1072.
Alexander, GC (2011). Seeding trials and the subordination of science. Archives of Internal Medicine. 171: 1107-1108.
Krumholz, SD, Egilman, DS, et al. (2011). Study of neurontin: titrate to effect, profile of safety (STEPS) trial: a narrative account of a gabapentin seeding trial. Archives of Internal Medicine. 171: 1100-1107.
Grant, B (2009). Merck published fake journal. The Scientist, April 30, 2009.
Grant, B (2009). Elsevier published 6 fake journals. The Scientist, May 7, 2009.
Haug, C (2013). The downside of open-access publishing. New England Journal of Medicine. 368: 791-793.
Another problem in the biomedical literature is gene names. Many of these problems emanate from use of the Microsoft Excel spreadsheet program, which may automatically convert gene names that appear to Excel to be dates (e.g., Apr-1), floating-point numbers (e.g., 2310009E13), and other types of data (Ziemann, 2016).
Ziemann, M, Eren, Y, et al. (2016). Gene name errors are widespread in the scientific literature. Genome Biology. 17: 177.
The problem of incorrect references in articles continues, with a newer twist on the problem being the decay of references to Web sites, which appears to be worse with smaller than better-known journals (Habibzadeh, 2013). Another analysis found "reference rot," where URLs become invalid and/or the content on the cited pages changes for about one-fifth of citations to URLs in scientific literature (Klein, 2014). This has led to the development of the site Hiberlink, which aims to preserve the content of Web pages cited in the scientific literature (Perkel, 2015).
Habibzadeh, P (2013). Decay of references to Web sites in articles published in general medical journals: mainstream vs small journals. Applied Clinical Informatics. 4: 455-464.
Klein, M, VandeSompel, H, et al. (2014). Scholarly context not found: one in five articles suffers from reference rot. PLoS ONE. 9(12): e115253.
In the area of systematic reviews, Murad and Montori (2013) call for the focus of evidence synthesis to shift from single studies to the body of evidence, with new studies always presented in the context of the existing body of evidence. While researchers have always cited past work in their papers, there has never been endorsement of any systematic approach for doing so. Murad, MH and Montori, VM (2013). Synthesizing evidence: shifting the focus from individual studies to the body of evidence. Journal of the American Medical Association. 309: 2217-2218.
Another problem with systematic reviews is their (probably excessive) proliferation. An analysis by Ioannidis (2016) found substantial numbers of systematic reviews and meta-analyses on similar topics that yielded disparate results. He also found that the growth of systematic reviews and meta-analyses published was growing much larger than other types of papers. His analysis conclided that most topics addressed by metaanalyses of randomized trials had overlapping, redundant meta-analyses, with same-topic meta-analyses sometimes exceeding 20. He also noted that some fields produce massive numbers of meta-analyses; fe.g., 185 meta-analyses of antidepressants f or depression were published between 2007 and 2014. These meta-analyses were often produced either by industry employees or by authors with industry ties, with results aligned with sponsor interests. In assessing Ioannidis' results, Page and Moher (2016) called for online systematic reviews, updated in real time.
Ioannidis, JPA (2016). The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Quarterly. 94: 485-514.
Page, MJ and Moher, D (2016). Mass production of systematic reviews and meta-analyses: an exercise in mega-silliness? Milbank Quarterly. 94: 515-519.
A continued problem with secondary publication sources is their ability to keep up with the advance of primary literature. Banzi et al. (2011) assessed various online point-of-care summary sources for their speed of updating their evidence, finding that some were better than others in the speed of incorporating new evidence. Ketchum et al. (2012) found similar variation in the amount and currency of evidence for four common clinical problems, with very little overlap in citations provided. Banzi, R, Cinquini, M, et al. (2011). Speed of updating online evidence based point of care summaries: prospective cohort analysis. British Medical Journal. 343: d5856.
Ketchum, AM, Saleh, AA, et al. (2011). Type of evidence behind point-of-care clinical information products: a bibliometric analysis. Journal of Medical Internet Research. 13(1): e21.
Another problem with secondary information sources is error. Two recent studies have found numerous errors in drug compendia, which physicians and other healthcare professionals use constantly for information on the drugs they prescribe and administer. One study found that across 270 drug summaries reviewed within five well-known drug compendia, the median of the total number of errors identified per source was 782, with the greatest number of errors occurring in the categories of Dosage and Administration, Patient Education, and Warnings and Precautions (Randhawa, 2016). The majority of errors were classified as incomplete, followed by inaccurate and omitted. Another study assessed one company's prescription opioid products across seven different compendia and discovered 859 errors, with the greatest percentage in Safety and Patient Education categories. The authors reported errors to publishers of the compendia, but the complete or partial resolution of errors was only 34%; i.e., leaving about two-thirds of the identified errors remaining (Talwar, 2016).
Randhawa, AS, Babalola, O, et al. (2016). A collaborative assessment among 11 pharmaceutical companies of misinformation in commonly used online drug information compendia. Annals of Pharmacotherapy. 50: 352-359.
Talwar, SR, Randhawa, AS, et al. (2016). Caveat emptor: Erroneous safety information about opioids in online drug-information compendia. Journal of Opioid Management. 12: 281-288.
Conflict of interest continues to be a problem, not only with primary literature but also secondary literature, especially clinical practice guidelines. Several new studies build upon the data reported in the book. Neuman et al. (2011) identified 14 guidelines on screening and/or treatment for hypertension and/or diabetes mellitus.  Of 288 panel members on these guidelines, 138 (48%) were found to have declared a conflict of interest, while 8 (11%) more declared no conflicts but actually had them. Likewise, Mendelson et al. (2011) looked at 17 guidelines on cardiovascular conditions and identified 279 of 498 (56%) panelists having a research grant, being on a speaker's bureau, and/or receiving honoraria, with a variation of 13-87% per guideline. In addition, Bennett et al. (2012) assessed 11 guidelines on oral medication treatment for Type 2 diabetes mellitus and found not only was there substantial variation in the consistency with known best evidence, but that about a third rated low in editorial independence, which consisted either of a lack of independence from funding bodies or the presence of a conflict of interest. A systematic analysis of guidelines in interventional medicine subspecialties also found that most failed to grade evidence, made use of lower-quality evidence when present, and failed to adequately disclose conflicts of interest (Feuerstein, 2014). These problems with guidelines led the Institute of Medicine (IOM) to publish a report calling for "trustworthy" clinical guidelines (Anonymous, 2011; Laine, 2011). The challenges with guidelines have also motivated the development of the Guidelines International Network (G-I-N,, which has created standards for many aspects of guidelines development, including management of conflicts of interest (Qaseem, 2012). Neuman, J, Korenstein, D, et al. (2011). Prevalence of financial conflicts of interest among panel members producing clinical practice guidelines in Canada and United States: cross sectional study. British Medical Journal. 343: d5621.
Mendelson, TB, Meltzer, M, et al. (2011). Conflicts of interest in cardiovascular clinical practice guidelines. Archives of Internal Medicine. 171: 577-584.
Bennett, WL, Odelola, OA, et al. (2012). Evaluation of guideline recommendations on oral medications for type 2 diabetes mellitus: a systematic review. Annals of Internal Medicine. 156: 27-36.
Feuerstein, JD, Akbari, M, et al. (2014). Systematic analysis underlying the quality of the scientific evidence and conflicts of interest in interventional medicine subspecialty guidelines. Mayo Clinic Proceedings. 89: 16-24.
Anonymous (2011). Clinical Practice Guidelines We Can Trust. Washington, DC, Institute of Medicine.
Laine, C, Taichman, DB, et al. (2011). Trustworthy clinical guidelines. Annals of Internal Medicine. 154: 774-775.
Qaseem, A, Forland, F, et al. (2012). Guidelines International Network: toward international standards for clinical practice guidelines. Annals of Internal Medicine. 156: 525-531.
Also a problem related to clinical practice guidelines is that while the evidence in the underlying randomized controlled trials may be valid, it may not be applicable to populations, interventions, or outcomes that are stated in the guideline recommendation (McAlister, 2007). One well-documented example is the persistent exclusion of elderly patients from RCTs of heart failure management (Cherubini, 2011). Cherubini, A, Oristrell, J, et al. (2011). The persistent exclusion of older patients from ongoing clinical trials regarding heart failure. Archives of Internal Medicine. 171: 550-556.
McAlister, FA, vanDiepen, S, et al. (2007). How evidence-based are the recommendations in evidence-based guidelines? PLoS Medicine. 4(8): e250.
Similar to other statements for standard and complete reporting, the Reporting Items for practice Guidelines in HealThcare (RIGHT) Statement has been developed for reporting on clinical practice guidelines (Chen, 2017).
Chen, Y, Yang, K, et al. (2017). A reporting tool for practice guidelines in health care: the RIGHT statement. Annals of Internal Medicine. 166: 128-132.
As one of the products of biomedical research continues to be data, especially from genomics-related research), journals increasingly require deposit of data in public repositories as a condition for publication of articles. One of the first publishers to require this was Nature Publishing, laying out a policy requiring deposition of data into the appropriate repositories (Nature Methods, 2008). Anonymous (2008). Thou shalt share your data. Nature Methods, 5: 209.
The issue of the quality of information on the Web remains an issue, especially in the "Web 2.0" era of social media, blogging, etc., where there is much more social interaction on the Web (Adams, 2010). Since social media sites are less regulated than formal Web sites of pharmaceutical manufacturers, some have expressed concern about how social media such as Facebook and Twitter may be exploited by marketers to circumvent regulations around promotion of products (Greene, 2010). The HONcode continues to be used in a voluntary fashion to indicate adherence to quality principles, but of course is limited to more traditional Web sites (Boyer, 2011). Adams, S. (2010). Revisiting the online health information reliability debate in the wake of "web 2.0": an inter-disciplinary literature and website review. International Journal of Medical Informatics, 79: 391-400.
Greene, JA and Kesselheim, AS (2010). Pharmaceutical marketing and the new social media. New England Journal of Medicine. 363: 2087-2089.
Boyer, C., Baujard, V., et al. (2011). Evolution of health web certification through the HONcode experience. Studies in Health Technology and Informatics, 169: 53-57.
A number of studies still continue to be undertaken to assess the quality of on-line information resources:
  • An assessment of Web sites for ten common surgical procedures found that "unsponsored" sites and those accessed with technical as opposed to lay terms tended to have higher-quality information (Yermilov et al., 2008).
  • An analysis of Web sites for ten common orthopedic sports medicine diagnoses found that sites displaying the HONcode were more likely to have higher-quality information (Starman, 2010)
  • A study assessing the ability of Google to answer five common pediatric questions found that when the answer was available, 78% of sites gave correct advice, with government sites giving correct information 100% of the time, news sites giving correct information 55% of the time, and sponsored sites never giving correct advice (Scullard, 2010)
  • Two studies of Web sites promoting use of robotic surgery found a tendency to overstate the evidence for the value of this type of surgery and downplay its risks (Jin, 2011; Mirkin, 2012)
  • Quality varies by topic but continues to be a problem for Web information (Kitchens, 2014)
  • US hospital Web sites that offer the procedure always tout the benefits of transcatheter aortic valve replacement but only about a quarter reported any of the known risks (Kinciad, 2015)
  • Most Web sites devoted to breast cancer are insufficient in helping women actively participate in decision-making for breast cancer surgery (Bruce, 2016)
Yermilov, I., Chow, W., et al. (2008). What is the quality of surgery-related information on the internet? Lessons learned from a standardized evaluation of 10 common operations. Journal of the American College of Surgeons, 207: 580-586.
Starman, JS, Gettys, FK, et al. (2010). Quality and content of Internet-based information for ten common orthopaedic sports medicine diagnoses. Journal of Bone and Joint Surgery. 92: 1612-1618.
Scullard, P, Peacock, C, et al. (2010). Googling children's health: reliability of medical advice on the internet. Archives of Disease in Childhood. 95: 580-582.
Jin, LX, Ibrahim, AM, et al. (2011). Robotic surgery claims on United States hospital websites. Journal for Healthcare Quality. 33: 48-52.
Mirkin, JN, Lowrance, WT, et al. (2012). Direct-to-consumer Internet promotion of robotic prostatectomy exhibits varying quality of information. Health Affairs. 31: 760-769.
Kitchens, B, Harle, CA, et al. (2014). Quality of health-related online search results. Decision Support Systems. 57: 454-462.
Kincaid, ML, Fleisher, LA, et al. (2015). Presentation on US hospital websites of risks and benefits of transcatheter aortic valve replacement procedures. JAMA Internal Medicine. 175: 440-441.
Bruce, JG, Tucholka, JL, et al. (2015). Quality of online information to support patient decision-making in breast cancer surgery. Journal of Surgical Oncology. 112: 575-580.
A number of studies have assessed the perenially popular Wikipedia. Clauson et al. (2008) assessed drug information, finding that a traditionally edited database (in this case, Medscape Drug Reference) was more complete, broader in scope, and had fewer errors of omission. Rajagopalan et al. (2011) found that the accuracy and depth of Wikipedia was comparable to information in the Physician Data Query (PDQ) of the National Cancer Institute but that Wikipedia was less readable. Hasty et al. (2014) found that Wikipedia articles covering the ten most costly medical conditions in the US contained many errors when assessed against standard peer-reviewed sources. Hwang et al. (2014) found that update of Wikipedia after new FDA drug warnings were issued occurred within two weeks for 41% of drugs (58% for those used in high-prevalence diseases) but the remainder remained un-updated for much longer. A number of recent efforts have been undertaken to improve the quality of information in Wikipedia (Heilman, 2014), including a collaborative effort with the Cochrane Collaboration (Mathew, 2013). Maskalyk (2014) describes the first formal peer review of a Wikipedia article (Heilman, 2014). Clauson, K., Polen, H., et al. (2008). Scope, completeness, and accuracy of drug information in Wikipedia. Annals of Pharmacotherapy, 42: 1814-1821.
Rajagopalan, M., Khanna, V., et al. (2011). Patient-oriented cancer information on the Internet: a comparison of wikipedia and a professionally maintained database. Journal of Oncology Practice, 7: 319-323.
Hasty, RT, Garbalosa, RC, et al. (2014). Wikipedia vs peer-reviewed medical literature for information about the 10 most costly medical conditions. Journal of the American Osteopathic Association. 114: 368-373.
Hwang, TJ, Bourgeois, FT, et al. (2014). Drug safety in the digital age. New England Journal of Medicine. 370: 2460-2462.
Heilman, J (2014). 5 ways Wikipedia’s health information is improving. London, England, Chartered Institute of Library and Information Professionals.
Mathew, M, Joseph, A, et al. (2013). Cochrane and Wikipedia: the collaborative potential for a quantum leap in the dissemination and uptake of trusted evidence. Cochrane Database of Systematic Reviews. 10: ED000069.
Maskalyk, J (2014). Modern medicine comes online: how putting Wikipedia articles through a medical journal’s traditional process can get free, reliable information into as many hands as possible. Open Medicine. 8(4): 116-119.
Heilman, JM, deWolff, JF, et al. (2014). Dengue fever: a Wikipedia clinical review. Open Medicine. 8(4): 105-115.
There are other challenges for consumer health information, probably the most significant of which is the complete picture of treatments and their risks not being provided by the popular media. One study analyzed 436 news reports of cancer from the US and found that aggressive treatment and survival were frequently discussed but that treatment failure, adverse events, and end-of-life care were rarely discussed (Fishman et al., 2010). Another study of consumers being showed video and audio messages about cancer found that they tended to overgeneralize, lose details, and misunderstand some concepts (such as "early stage" in terms of the cancer versus one's life) (Mazor, 2010). On the other hand, cancer prevention messages may be promulgated by reports of cancer in prominent public figures (Ayers, 2013). The decision of Angelina Jolie to have a prophylactic mastectomy led to an upsurge in queries on the subject for one week, which then returned to baseline. Another challenge is the widespread belief of conspiracy theories around health topics by many consumers (Oliver and Wood, 2014). Fishman, J, Have, TT, et al. (2010). Cancer and the media: how does the news report on treatment and outcomes? Archives of Internal Medicine. 170: 515-518.
Mazor, KM, Calvi, J, et al. (2010). Media messages about cancer: what do people understand? Journal of Health Communications. 15: 126-145.
Ayers, JW, Althouse, BM, et al. (2013). Do celebrity cancer diagnoses promote primary cancer prevention? Preventive Medicine. 58: 81-84.
Oliver, JE and Wood, T (2014). Medical conspiracy theories and health behaviors in the united states. JAMA Internal Medicine. 174: 817-818.
In other areas of medicine, analysis of news media stories has found a mixed picture of accuracy of reporting as well. Downing et al. (2014) found that half of media reports of the ACCORD-Lipid trial did not correctly report the lack of value for adding the drug fenofibrate to statin therapy in hyperlipidemia. Schwitzer (2014) analyzed 1889 news stories on various medical topics and found that drugs, medical devices, and other interventions were usually portrayed positively, with potential harms often minimized and costs ignored. Yavchitz et al. (2012) noted about half of press releases and news reports of randomized controlled trials contained "spin" (reporting strategies emphasizing the beneficial effects of experimental treatments). A Web site devoted to educating patients and others about health news reporting, with a focus on understanding evidence, is NLM consumer site MedlinePLUS has developed extensive guidelines for assessing quality of the content on its site. Downing, NS, Cheng, T, et al. (2014). Descriptions and interpretations of the ACCORD-Lipid trial in the news and biomedical literature: a cross-sectional analysis. JAMA Internal Medicine. 174: 1176-1182.
Schwitzer, G (2014). A guide to reading health care news stories. JAMA Internal Medicine. 174: 1183-1186.
Yavchitz, A, Boutron, I, et al. (2012). Misrepresentation of randomized controlled trials in press releases and news coverage: a cohort study. PLoS Medicine. 9(9): e1001308.
Beyond biomedical topics, another growing concern that emerged in 2016 was "fake news," which referred to Web sites reporting alleged news designed not only to deceive, but also manipulate search engine results (Mackey, 2016). Even before 2016, research had shown that the "search engine manipulation effect" (SEME) could be used to influence elections, as was demonstrated in India (Epstein, 2015). Wineburg and McGrew (2016) reviewed research in the ability of students from elementary school through college to judge the credibility of information on the Web and found several deficits. Noar et al. (2015) demonstrated how popular news effects searching, showing a substantial uptick in breast cancer-related searching after Angelina Jolie was diagnosed with the disease. Writing in Nature, Williamson (2016) called for the scientific community to be vigilant in taking the time and effort to correct misinformation.
Mackey, TP and Jacboson, T (2016). How can we learn to reject fake news in the digital world? The Conversation.
Epstein, R and Robertson, RE (2015). The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences. 112: E4512-4521.
Wineburg, S and McGrew, S (2016). Evaluating information: The cornerstone of civic online reasoning. Palo Alto, CA, Stanford History Education Group.
Noar, SM, Althouse, BM, et al. (2015). Cancer information seeking in the digital age: effects of Angelina Jolie's prophylactic mastectomy announcement. Medical Decision Making. 35: 16-21.
Williamson, P (2016). Take the time and effort to correct misinformation. Nature. 540: 171.

An additional challenge is numeracy, not only for consumers but even clinicians. A study of adults in the US and Germany found numeracy skills varied widely among adults, with a gap between those with high and low levels of education, especially in the US (Galesic, 2010). Another study of young and elderly adults in Berlin, Germany found that frequencies instead of single-event probabilities are much more consistently understood (Gigerenaer, 2012). Even physicians have problems with understanding a number of statistical concepts (Windish, 2007). More recently it was shown in a survey voluntarily administered to physicians that they tended to overstate benefits and understate harms of a number of common treatments (Krouss et al., 2016). Another recent analysis explored the reasons why physicians tend to be drawn to treatments not supported by evidence (Epstein, 2017).
Galesic, M and Garcia-Retamero, R (2010). Statistical numeracy for health: a cross-cultural comparison with probabilistic national samples. Archives of Internal Medicine. 170: 462-468.
Gigerenzer, G and Galesic, M (2010). Why do single event probabilities confuse patients? British Medical Journal. 344: e245.
Windish, DM, Huot, SJ, et al. (2007). Medicine residents' understanding of the biostatistics and results in the medical literature. Journal of the American Medical Association. 298: 1010-1022.
Krouss, M, Croft, L, et al. (2016). Physician understanding and ability to communicate harms and benefits of common medical treatments. JAMA Internal Medicine. 176: 1565-1567.
Epstein, D (2017). When Evidence Says No, But Doctors Say Yes. ProPublica, February 22, 2017.
One new and growing concern about searching for health information on the Web is protecting the privacy of those searching. A great deal can be gleaned about an individual from their public online behavior, including sexual orientation, ethnicity, and political preferences (Kosinksi, 2013). Many Web sites, including those of prominent health organizations and biomedical journals, allowing tracking of navigation for third-party advertisers (Huesch, 2013), with few laws or regulations governing the process (Libert, 2015). Kosinski, M, Stillwell, D, et al. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences. 110: 5802-5805.
Huesch, MD (2013). Privacy threats when seeking online health information. JAMA Internal Medicine. 173: 1838-1839.
Libert, T (2015). Privacy implications of health information seeking on the Web. Communications of the ACM. 58(3): 68-77.
The volume of research assessing information needs has declined in recent years, perhaps somewhat ironically since we have been moving from a state of information paucity to information overload. Therefore it is not known whether the results of research in this area from the 1990s and 2000s is truly applicable. We know, for example, that physicians use computers much more than the occasional amounts they did during that earlier period (Manhattan Research/Google, 2012)
Anonymous (2012). From Screen to Script: The Doctor's Digital Path to Treatment. New York, NY, Manhattan Research; Google.
We do continue to learn more about the needs of different groups who use information. Physician usage of electronic resources in the United States, United Kingdom, and Canada is now very high (Davies, 2010; Manhattan Research/Google, 2012). Physicians in these three countries tend to have a bias toward using literature produced in their countries. Turner et al. (2008) studied public health nurses in rural settings and found they do not always have access to the electronic resources, although it would be interesting to see how this has changed with the growth of the Internet a half-decade later. Hemminger et al. (2007) have studied academic scientists and find that make substantial use of electronic resources, from accessing journal articles on the Web to communicating mostly electronically and making few visits to the physical library. (I can vouch for this myself, as my office sits in the same building as the OHSU Library yet I rarely enter it any more, even as I make substantial use of OHSU's on-line collection of journals and other databases, including from all over the planet when I travel.) More recently, Roberts and Demner-Fushman (2015) analyzed large corpora of online questions from clinicians and consumers. They found that the form of consumer questions was highly dependent upon the individual online resource, especially in the amount of background information provided. In contrast, they found that clinicians provided very little background information and often asked much shorter questions. The content of consumer questions was also highly dependent upon the resource. They also noted that while clinician questions commonly were about treatments and tests, consumer questions focused more on symptoms and diseases. In addition, consumers placed far more emphasis on certain types of health problems, such as sexual health. A survey of 13-18 year-olds in the US found that 86% searched for health information online, with 25% searching for such information "a lot" (Wartella, 2015). However, more common sources of information were parents, school health classes, and health professionals. A survey of clinicians at Mayo Clinic found that 48% of respondents performed online searches for more than half of their patient interactions, with 91% occurring either before or within three hours of the patient interaction (Ellsworth, 2015). About 57% of respondents preferred synthesized information sources as compared to 13% who preferred primary literature. About 82% of knowledge searches took place on a workstation or office computer while just 10% were done from a mobile device or at home.
Davies, K. (2010). Physicians and their use of information: a survey comparison between the United States, Canada, and the United Kingdom. Journal of the Medical Library Association, 99: 88-91.
Anonymous (2012). From Screen to Script: The Doctor's Digital Path to Treatment. New York, NY, Manhattan Research; Google.
Turner, A., Stavri, Z., et al. (2008). From the ground up: information needs of nurses in a rural public health department in Oregon. Journal of the Medical Library Association, 96: 335-342.
Hemminger, B., Lu, D., et al. (2007). Information seeking behavior of academic scientists. Journal of the American Society for Information Science & Technology, 58: 2205-2225.
Roberts, K and Demner-Fushman, D (2015). Interactive use of online health resources: a comparison of consumer and professional questions. Journal of the American Medical Informatics Association. 23: 802-811.
Wartella, E, Rideout, V, et al. (2015). Teens, Health, and Technology - A National Survey. Evanston, IL, Center on Media and Human Development, Northwestern University.
Ellsworth, MA, Homan, JM, et al. (2015). Point-of-care knowledge-based resource needs of clinicians: a survey from a large academic medical center. Applied Clinical Informatics. 6: 305-317.
Del Fiol et al. (2014) recently performed a systematic review of three decades of work of clinician questions posed at the point of care. Reiterating individual study findings, they found that the mean frequency of questions raised was 0.57 per patient seen, with clinicians pursuing 51% of questions and findings answers to 78% of those pursued. The most common types of questions were of drug treatment (34%) and potential causes of a symptom, physical finding, or diagnostic test finding (24%). The main barriers to information seeking were lack of time and doubt that an answer existed. DelFiol, G, Workman, TE, et al. (2014). Clinical questions raised by clinicians at the point of care: a systematic review. JAMA Internal Medicine. 174: 710-718.

Last updated March 5, 2017