Information Retrieval: A Health and Biomedical Perspective, Third EditionWilliam Hersh, M.D. |
Section |
Topic |
Reference |
---|---|---|
9.1 |
One natural language processing
(NLP) that has been used both for patient-specific and
knowledge-based information is MetaMap, which was
developed at the National Library of Medicine. A recent
paper gives a historical update and describes recent
enhancements (Aronson, 2010). There is also a new book (Cohen and Demner-Fushman, 2014) as well as several book chapters (Denny, 2012; Cohen and Hunter, 2013; Chen and Sarkar, 2014; Doan et al., 2014). |
Aronson, A. and Lang, F. (2010).
An overview of MetaMap: historical perspective and recent
advances. Journal of the American Medical Informatics
Association, 17: 229-236. Cohen, KB and Demner-Fushman, D (2014). Biomedical Natural Language Processing. Amsterdam, Netherlands, John Benjamins Publishing. Denny, JC (2012). Mining Electronic Health Records in the Genomics Era. PLOS Computational Biology: Translational Bioinformatics. M. Kann and F. Lewitter. http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002823. Cohen, KB and Hunter, LE (2013). Text mining for translational bioinformatics. PLoS Computational Biology. 9(4): e1003044. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003044. Chen, ES and Sarkar, IN (2014). Mining the Electronic Health Record for Disease Knowledge. In Biomedical Literature Mining. V. Kumar and H. Tipney. New York, NY, Springer: 269-286. Doan, S, Conway, M, et al. (2014). Natural Language Processing in Biomedicine: A Unified System Architecture Overview. In Clinical Bioinformatics. R. Trent. New York, NY, Springer. 275-294. |
9.1.1 |
A systematic review was published in 2010 that reviewed all of the automated coding and classification studies in clinical natural language processing (NLP) (Stanfill et al., 2010 - a paper that started out as an OHSU BMI 510 term paper and culminated in a master's capstone!). The aggregation of studies showed a wide variety of clinical areas where NLP was used and an equally wide variety of results, which were usually measured in terms of recall and precision. One of the major unanswered questions, raised in a paper I wrote in 2005, is how good is "good enough" in clinical NLP? Chapman et al. (2011) also note a need for more and varied evaluation tasks and larger test collections. | Stanfill, MH, Williams, M, et al. (2010). A
systematic literature review of automated clinical coding
and classification systems. Journal of the American
Medical Informatics Association. 17: 646-651. Hersh, W (2005). Evaluation of biomedical text mining systems: lessons learned from information retrieval. Briefings in Bioinformatics. 6: 344-356. Chapman, W., Nadkarni, P., et al. (2011). Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association, 18: 540-543. |
9.1.1 |
The i2b2 challenge
evaluations have
continued and drawn a wide variety of research groups. The
tasks it has covered in its different years include:
|
Uzuner, O., Luo, Y., et al.
(2007). Evaluating the state-of-the-art in automatic
de-identification. Journal of the American
Medical Informatics Association, 14: 550-563. Uzuner, O., Goldstein, I., et al. (2008). Identifying patient smoking status from medical discharge records. Journal of the American Medical Informatics Association, 15: 14-24. Uzuner, O. (2009). Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association, 16: 561-570. Uzuner, O., Solti, I., et al. (2010). Extracting medication information from clinical text. Journal of the American Medical Informatics Association, 17: 514-518. Uzuner, Ö., South, B., et al. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18: 552-556. Uzuner, O., Bodnari, A., et al. (2012). Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association: Epub ahead of print. Sun, W, Rumshisky, A, et al. (2013). Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. Journal of the American Medical Informatics Association. 20: 806-813. Uzuner, O and Stubbs, A (2015). Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks. Journal of Biomedical Informatics. 58(Suppl): S1-S5. |
9.1.1 |
One task of the i2b2 challenge
has continued to be an important research area, not only
for its intrinsic value, but also its value in enabling
clinical NLP research. This is automated de-identification
of clinical narratives. A recent review of research on
this topic found a variety of methods used on an equal
varying number of document types, making comparison of
approaches and systems difficult (Meystre et al., 2010). Stubbs and Uzuner (2015) describe the corpus for the i2b2 de-identification task from 2015. |
Meystre, SM, Friedlin, FJ, et al. (2010).
Automatic de-identification of textual documents in the
electronic health record: a review of recent research. BMC
Medical Research Methodology. 10: 70. http://www.biomedcentral.com/1471-2288/10/70. Stubbs, A and Uzuner, O (2015). Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus. Journal of Biomedical Informatics. 58: S20-S29. |
9.1.1 |
Another development is the
emergence of large projects focused on use of clinical
NLP. Of the most productive of these is the Electronic
Medical Records and Genomics (eMERGE) Network, a large-scale consortium that
links the growing number of DNA biorepositories with
electronic health record (EHR) systems for "large-scale,
high-throughput genetic research" (McCarty et al., 2011;
Wilke et al., 2011). Some of the findings include use of
clinical NLP for:
|
McCarty, C., Chisholm, R., et
al. (2010). The eMERGE Network: a consortium of
biorepositories linked to electronic medical records
data for conducting genomic studies. BMC Genomics, 4(1): 13. http://www.biomedcentral.com/1755-8794/4/13. Wilke, R., Xu, H., et al. (2011). The emerging role of electronic medical records in pharmacogenomics. Clinical Pharmacology and Therapeutics, 89: 379-386. Denny, J., Miller, R., et al. (2009). Identifying QT prolongation from ECG impressions using a general-purpose natural language processor. International Journal of Medical Informatics, 78(Suppl 1): S34-42. Denny, J., Ritchie, M., et al. (2010). Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science. Circulation, 122: 2016-2021. Ritchie, M., Denny, J., et al. (2010). Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. American Journal of Human Genetics, 86: 560-572. Denny, J., Ritchie, M., et al. (2010). PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics, 26: 1205-1210. Kullo, LJ, Ding, K, et al. (2010). A genome-wide association study of red blood cell traits using the electronic medical record. PLoS ONE. 5(9): e13011. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0013011. Denny, JC, Crawford, DC, et al. (2011). Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. American Journal of Human Genetics. 89: 529-542. Crosslin, DR, McDavid, A, et al. (2012). Genetic variants associated with the white blood cell count in 13,923 subjects in the eMERGE Network. Human Genetics. 131: 639-652. Denny, J., Choma, N., et al. (2012). Natural language processing improves identification of colorectal cancer testing in the electronic medical record. Medical Decision Making, 32: 188-197. Denny, JC (2012). Mining Electronic Health Records in the Genomics Era. PLOS Computational Biology: Translational Bioinformatics. M. Kann and F. Lewitter. http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002823. Newton, KM, Peissig, PL, et al. (2013). Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. Journal of the American Medical Informatics Association. 20(e1): e147-154. |
9.1.1 |
An additional large-scale project is one of the four collaborative research centers developed as part of the Strategic Health IT Advanced Research Projects (SHARP) Program of the Office of the National Coordinator for Health IT ONC). The SharpN Project aims to "transform EHR data into standards-conforming, comparable information suitable for large-scale analyses, inferencing, and integration of disparate health data" (Chute et al., 2011; Rea et al., 2012). Projects include data normalization; clinical NLP consisting of extraction from clinical free text based on standards and interoperability as well as transformation of unstructured text into structured data; high-throughput phenotyping; and data quality assessment. The SharpN NLP builds on earlier work in NLP coming from the Mayo Clinic in the clinical Text Analysis and Knolwedge Extraction System (cTAKES) (Savova et al., 2010). | Chute, C., Pathak, J., et al.
(2011). The SHARPn project on secondary use of
Electronic Medical Record data: progress, plans, and
possibilities. AMIA Annual
Symposium Proceedings 2011, Washington, DC. 248-256. Rea, S., Pathak, J., et al. (2012). Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project. Journal of Biomedical Informatics, 45: 763-771. Savova, G., Masanz, J., et al. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17: 507-513. |
9.1.1 |
The chapter omitted some older
research in clinical NLP:
|
Chapman, W., Bridewell, W., et
al. (2001). A simple algorithm for identifying negated
findings and diseases in discharge summaries. Journal of Biomedical
Informatics, 34: 301-310. Chapman, W., Dowling, J., et al. (2005). Classification of emergency department chief complaints into 7 syndromes: a retrospective analysis of 527,228 patients. Annals of Emergency Medicine, 46: 445-455. Denny, JC, Spickard, A, et al. (2005). Identifying UMLS concepts from ECG Impressions using KnowledgeMap. AMIA Annual Symposium Proceedings, Washington, DC. 196-200. Brown, S., Speroff, T., et al. (2006). eQuality: electronic quality assessment from narrative clinical reports. Mayo Clinic Proceedings, 81: 1472-1481. Pakhomov, S., Weston, S., et al. (2007). Electronic medical records for clinical research: application to the identification of heart failure. American Journal of Managed Care, 13: 281-288. Pakhomov, S., Hanson, P., et al. (2008). Automatic classification of foot examination findings using statistical natural language processing and machine learning. Journal of the American Medical Informatics Association, 15: 198-202. |
9.1.1 |
There continues to be newer
clinical NLP research as well:
|
Coden, A., Savova, G., et al.
(2009). Automatically extracting cancer disease
characteristics from pathology reports into a Disease
Knowledge Representation Model. Journal of Biomedical
Informatics, 42: 937-949. Denny, J., Spickard, A., et al. (2009). Evaluation of a method to identify and categorize section headers in clinical documents. Journal of the American Medical Informatics Association, 16: 806-815. Pakhomov, S., Shah, N., et al. (2010). Automated processing of electronic medical records is a reliable method of determining aspirin use in populations at risk for cardiovascular events. Informatics in Primary Care, 18: 125-133. Hristidis, V., Varadarajan, R., et al. (2010). Information discovery on electronic health records using authority flow techniques. BMC Medical Informatics & Decision Making, 10: 64. http://www.biomedcentral.com/1472-6947/10/64. Gerbier, S., Yarovaya, O., et al. (2011). Evaluation of natural language processing from emergency department computerized medical records for intra-hospital syndromic surveillance. BMC Medical Informatics & Decision Making, 11:50. http://www.biomedcentral.com/1472-6947/11/50. Murff, H., FitzHenry, F., et al. (2011). Automated identification of postoperative complications within an electronic medical record using natural language processing. Journal of the American Medical Association, 306: 848-855. Elkin, P., Froehling, D., et al. (2012). Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Annals of Internal Medicine, 156: 11-18. Singh, B, Singh, A, et al. (2012). Derivation and validation of automated electronic search strategies to extract Charlson comorbidities from electronic medical records. Mayo Clinic Proceedings. 87: 817-824. FitzHenry, F, Murff, HJ, et al. (2013). Exploring the frontier of electronic health record surveillance: the case of postoperative complications. Medical Care. 51: 509-516. Tien, M, Kashyap, R, et al. (2015). Retrospective derivation and validation of an automated electronic search algorithm to identify post operative cardiovascular and thromboembolic complications. Applied Clinical Informatics. 6: 565-576. Epstein, RH, StJacques, P, et al. (2013). Automated identification of drug and food allergies entered using non-standard terminology. Journal of the American Medical Informatics Association. 20: 962-968. Ou, Y and Patrick, J (2014). Automatic structured reporting from narrative cancer pathology reports. e-Journal of Health Informatics. 8(2): e20. http://www.ejhi.net/ojs/index.php/ejhi/article/view/286. Hanauer, DA, Saeed, M, et al. (2014). Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis. Journal of the American Medical Informatics Association. 21: 925-937. Yu, S, Liao, KP, et al. (2015). Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. Journal of the American Medical Informatics Association. 22: 993-1000. Zheng, L, Wang, Y, et al. (2016). Web-based real-time case finding for the population health management of patients with diabetes mellitus: a prospective validation of the natural language processing–based algorithm with statewide electronic medical records. JMIR Medical Informatics. 4(4): e37. http://medinform.jmir.org/2016/4/e37/. Evans, RS, Benuzillo, J, et al. (2016). Automated identification and predictive tools to help identify high-risk heart failure patients: pilot evaluation. Journal of the American Medical Informatics Association. 23: 872-878. Karnes, JH, Bastarache, L, et al. (2017). Phenome-wide scanning identifies multiple diseases and disease severity phenotypes associated with HLA variants. Science Translational Medicine. 9: eaai8708. http://stm.sciencemag.org/content/9/389/eaai8708. Patel, TA, Puppala, M, et al. (2017). Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer. 123: 114-121. |
9.1.1 |
There have also been systematic reviews of
NLP in medical fields:
|
Spasić, I, Livsey, J, et al. (2014). Text
mining of cancer-related information: review of current
status and future directions. International Journal of
Medical Informatics. 83: 605-623. Yim, WW, Yetisgen, M, et al. (2016). Natural language processing in oncology: a review. JAMA Oncology. 2: 797-804. Burger, G, Abu-Hanna, A, et al. (2016). Natural language processing in pathology: a scoping review. Journal of Clinical Pathology. 69: 949-955. Pons, E, Braun, LMM, et al. (2016). Natural language processing in radiology: a systematic review. Radiology. 279: 329-343. |
9.1.1 |
One of the challenges for
clinical notes comes from the "tension" that clinicians
face between documentation systems that are structured,
which enable easier processing of the data and text,
versus allowing flexibility of what is entered (Rosenbloom
et al., 2011). Another challenge is the lack of
large-scale shared data (Chapman et al., 2011; Friedman,
2013), although larger annotated corpora are being
developed (Albright et al., 2013). |
Rosenbloom, S., Denny, J., et
al. (2011). Data from clinical notes: a perspective on the
tension between structure and flexible documentation. Journal of the American
Medical Informatics Association, 18: 181-186. Chapman, W., Nadkarni, P., et al. (2011). Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association, 18: 540-543. Albright, D, Lanfranchi, A, et al. (2013). Towards comprehensive syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association, 20: 922-930. Friedman, C, Rindflesch, TC, et al. (2013). Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. Journal of Biomedical Informatics. 46: 765-773. |
9.1.2 |
An overview of opinions on "the
way forward" was published in 2008 by a number of "leading
scientists," including this author (Altman et al., 2008).
Another overview of challenges was published by Dai et al.
(2010). Some newer overviews include a new book (Jurafsky
and Martin, 2008), a review focused on genomics and system
biology (Harmston et al., 2010), a review on biomedical
research and integrative biology (Rebholz-Schuchmann et
al., 2012), and another focused on neuroscience (Ambert
and Cohen, 2012). Neves (2014) described the various
corpora available for biomedical text mining research,
finding them more developed for genes, proteins, and
chemicals than diseases, genomic variations, and
mutations. Another overview summarizes the major challenge evaluations in biomedical text mining (Huang and Lu, 2016). One concern of many scientists is publishers' reluctance to provide access to their content for text-mining activities (Jha, 2012). One new resource that has been developed to aid biomedical text mining is the BioLexicon, a resource of terms with part-of-speech tagging, synonyms, and other information to help in biomedical text mining (Thompson et al., 2011). Another tool used MeSH indexing to build a resource aiding in word sense disambiguation (Jimeno-Yipes et al, 2011). Another new corpus, the Colorado Richly Annotated Full-Text (CRAFT) corpus provides great semantic diversity (Bada et al., 2012) and has uncovered many differences in commonly used NLP tools (Verspoor et al., 2012). A variety of other work has pushed the field forward. One recent study found that nouns are not the only parts of speech that vary in natural language (Cohen et al., 2008). Verbs do as well, as they are both nominalized as well as alternation (i.e., changes in surface form with the same underlying meaning). Noting a disconnect between the biomedical literature and the data published in gene sequence databases, Baran et al. (2011) developed a tool called pubmed2ensembl that integrated gene sequences in the Ensembl resource with literature describing those sequences in PubMed and PubMed Central. Haeussler et al. (2011) describe a similar system that identifies gene sequences within articles and maps them to records in GenBank. |
Altman, R., Bergman, C., et al.
(2008). Text mining for biology - the way forward:
opinions from leading scientists. Genome Biology, 9(Suppl 2): S7. http://genomebiology.com/2008/9/S2/S7. Dai, H., Chang, Y., et al. (2010). New challenges for biological text-mining in the next decade. Journal of Computer Science and Technology, 25: 169-179. http://jcst.ict.ac.cn:8080/jcst/EN/article/downloadArticleFile.do?attachType=PDF&id=9217. Jurafsky, D and Martin, JH (2008). Speech and Language Processing (2nd Edition). Upper Saddle River, NJ, Pearson Prentice Hall. Harmston, N, Filsell, W, et al. (2010). What the papers say: text mining for genomics and systems biology. Human Genomics. 5: 17-29. Rebholz-Schuhmann, D, Oellrich, A, et al. (2012). Text-mining solutions for biomedical research: enabling integrative biology. Nature Reviews Genetics. 13: 829-839. Ambert, KH and Cohen, AM (2012). Text-mining and neuroscience. International Review of Neurobiology. 103: 109-132. Neves, M (2014). An analysis on the entity annotations in biological corpora. F1000Research. 3: 96. https://f1000research.com/articles/3-96/v1. Huang, CC and Lu, Z (2016). Community challenges in biomedical text mining over 10 years: success, failure and the future. Briefings in Bioinformatics. 17: 132-144. Jha, A. (2012). Text mining: what do publishers have against this hi-tech research tool? Secondary Text mining: what do publishers have against this hi-tech research tool? The Guardian. http://www.guardian.co.uk/science/2012/may/23/text-mining-research-tool-forbidden. Thompson, P., McNaught, J., et al. (2011). The BioLexicon: a large-scale terminological resource for biomedical text mining. BMC Bioinformatics, 12: 397. http://www.biomedcentral.com/1471-2105/12/397. Bada, M, Eckert, M, et al. (2012). Concept annotation in the CRAFT corpus. BMC Bioinformatics. 13: 161. http://www.biomedcentral.com/1471-2105/13/161. Verspoor, K, Cohen, KB, et al. (2012). A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics. 13: 207. http://www.biomedcentral.com/1471-2105/13/207. Cohen, K., Palmer, M., et al. (2008). Nominalization and alternations in biomedical language. PLoS ONE, 3(9): e3. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0003158. Baran, J., Gerner, M., et al. (2011). pubmed2ensembl: a resource for mining the biological literature on genes. PLoS ONE, 6(9): e24716. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0024716. Haeussler, M., Gerner, M., et al. (2011). Annotating genes and genomes with DNA sequences extracted from biomedical articles. Bioinformatics, 27: 980-986. |
9.1.2 |
The Biocreative
initiative has continued. Biocreative focuses mostly on
developing tools for curation of literature, but some of
its tasks cover text mining. Some recent text-mining tasks
include: Biocreative IV (2013)
|
Leitner, F., Mardis, S., et al.
(2010). An overview of BioCreative II.5. IEEE Transactions on
Computational Biology and Bioinformatics, 7: 385-399. Arighi, C., Lu, Z., et al. (2011). Overview of the BioCreative III Workshop. BMC Bioinformatics, 12(Suppl 8): S1. http://www.biomedcentral.com/1471-2105/12/S8/S1. Mao, Y, VanAuken, K, et al. (2014). Overview of the gene ontology task at BioCreative IV. Database. 2014: bau086. https://academic.oup.com/database/article-lookup/doi/10.1093/database/bau086. Krallinger, M, Leitner, F, et al. (2015). CHEMDNER: The drugs and chemical names extraction challenge. Journal of Chemoinformatics. 19(7(Suppl 1 Text mining for chemistry and the CHEMDNER track)): S1. https://jcheminf.springeropen.com/articles/10.1186/1758-2946-7-S1-S1. Krallinger, M, Rabal, O, et al. (2015). The CHEMDNER corpus of chemicals and drugs and its annotation principles. Journal of Chemoinformatics. 7((Suppl 1 Text mining for chemistry and the CHEMDNER track)): S2. https://jcheminf.springeropen.com/articles/10.1186/1758-2946-7-S1-S2. Krallinger, M, Rabal, O, et al. (2017). Information retrieval and text mining technologies for chemistry. Chemical Reviews: Epub ahead of print. Wei, CH, Peng, Y, et al. (2016). Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database. 2016: baw032. https://academic.oup.com/database/article-lookup/doi/10.1093/database/baw032. |
9.1.2 |
Another challenge evaluation
in biological information extraction has been the BioNLP
Shared Task (BioNLP-ST). There have been four BioNLP-ST
events focused on event extraction from scientific
literature, mostly using the Genia corpus:
|
|
9.2 |
There continued to be many interesting applications of text categorization both in and out of health and biomedicine. Certainly a major biomedical application is selection of content categories. Ruau et al. (2011) showed that automated annotations of molecular datasets provided much more comprehensive (i.e., higher recall) assignments of MeSH terms than manual annotation. Automated annotation tools have also been shown to help humans doing manual annotation (Shatkay et al., 2008). | Ruau, D., Mbagwu, M., et al.
(2011). Comparison of automated and human assignment of
MeSH terms on publicly-available molecular datasets. Journal of Biomedical Informatics,
44(Suppl 1): S39-S43. Shatkay, H., Pan, F., et al. (2008). Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users. Bioinformatics, 24: 2086-2093. |
9.2 |
Other work shows applicability of text categorization beyond categorizing content. Categorization approaches have also been investigated as a means to identify "claims" in scientific papers (Blake, 2010). An important finding of this study was that most claims were made in the body of the paper and not the abstract, indicating that systems showing only the abstract (i.e., PubMed) do not retrieve all the claims made in papers. | Blake, C. (2010). Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles. Journal of Biomedical Informatics, 43: 173-189. |
9.2 |
Text categorization has also been used to signal articles likely to need inclusion in updates of systematic reviews. Since publication of the book, Cohen et al. (2009) have continued to refine work in this area and most recently have shown the ability to identify about 70% of all new publications warranting inclusion in a systematic drug category reviews while maintaining an overall low alert rate (Cohen et al., 2012). Likewise, Dalal et al. (2013) have shown similar success with two other drug review categories. | Cohen, AM, Ambert, K, et al.
(2009). Cross-topic learning for work prioritization in
systematic review creation and update. Journal of the
American Medical Informatics Association. 16:
690-704. Cohen, AM, Smalheiser, NR, et al. (2015). Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. Journal of the American Medical Informatics Association. 22: 707-717. Dalal, SR, Shekelle, PG, et al. (2013). A pilot study using machine learning and domain knowledge to facilitate comparative effectiveness review updating. Medical Decision Making. 33: 343-355. |
9.2 |
Other work focuses on queries to
search engines, with the major application being in
syndromic surveillance. A more recent development since
the applications described in the book include the
development of a system by Google called Flu
Trends (Carneiro and
Mylonakis, 2009). Early work found the system to perform
well retroactively in predicting H1N1 influenza in the US
(Cook et al., 2011) as well as rates of influenza and
patient utilization in emergency departments (Dugas et
al., 2012). However, the system performed less well in the US 2012-2013 flu season (Butler, 2013). This led Lazer et al. (2014) to issue a warning against "big data hubris." Subsequent research found, however, that other approaches to flu prediction can outperform Google and its approach based on search queries. Additional data used includes flu data itself (Martin et al., 2014), selection of specific queries (Santillana, 2014), and EHR data (Yang et al., 2017). |
Carneiro, HA and Mylonakis, E
(2009). Google Trends: a web-based tool for real-time
surveillance of disease outbreaks. Clinical Infectious Diseases. 49: 1557-1564. Cook, S, Conrad, C, et al. (2011). Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS ONE. 6(8): e23610. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0023610. Dugas, A., Hsieh, Y., et al. (2012). Google Flu Trends: correlation with emergency department influenza rates and crowding metrics. Clinical Infectious Diseases, 54: 463-469. Butler, D (2013). When Google got flu wrong. Nature. 494: 155-156. Lazer, D, Kennedy, R, et al. (2014). Big data. The parable of Google Flu: traps in big data analysis. Science. 343: 1203-1205. Martin, LJ, Xu, B, et al. (2014). Improving Google Flu Trends Estimates for the United States through Transformation. PLoS ONE. 10(4): e0122939. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0109209. Santillana, M, Zhang, DW, et al. (2014). What can digital disease detection learn from (an external revision to) Google Flu Trends? American Journal of Preventive Medicine. 47: 341-347. Yang, S, Santillana, M, et al. (2017). Using electronic health records and Internet search information for accurate influenza forecasting. BMC Infectious Disease. 17: 332. https://bmcinfectdis.biomedcentral.com/articles/10.1186/s12879-017-2424-7. |
9.2 |
Outside of medicine, a number of
other categorization techniques have been shown to be
beneficial. Going beyond detection of spam in email,
Cormack et al. (2011) have also been shown the ability to
identify "spam" in Web search engine output. Other
research has focused on social media. For example,
analysis of Twitter feeds has been found retrospectively
to predict stock market trends (Bollen et al., 2010).
Likewise, Facebook "Likes" have been shown to predict many
personal attributes from gender and geographic location to
sexual orientation and political views (Kosinski et al.,
2013). This analysis also found as association between
scores on intelligence tests and "liking" of the
television show, The Colbert Report (something I
Facebook-Liked before this study came out!). Another study of Facebook users who also sought medical care found that more prolific Facebook posters tended to post about health conditions more frequently (Smith et al., 2017). There was also an association in this sample with Facebook posting and diagnosis of depression. Attributes of Twitter users have also been found to correlate with health conditions. Eichstead et al. (2015) found that language patterns reflecting negative social relationships, disengagement, and negative emotions were risk factors for atherosclerotic heart disease while positive emotions and psychological engagement were found to be protective factors, even after controlling for income and education. Hawkins et al. (2015) found that many patients tweeted about their experiences in hospitals but few strong associations with process or outcomes measures was discovered. A more controversial application of text categorization is the automated grading of student essays on standardized tests. Shermis and Hammer (2012) claimed that automated methods were found to perform highly accurately on a gold standard of papers graded by humans. Perelman (2013), however, took strong exception to those claims, and it is likely that further research will need to sort out the specific role for automated high-stakes grading of writing. |
Cormack, G., Smucker, M., et al.
(2011). Efficient and effective spam filtering and
re-ranking for large web datasets. Information Retrieval, 14: 441-465. Bollen, J., Mao, H., et al. (2010). Twitter mood predicts the stock market. Journal of Computational Science, 2: 1-8. Kosinski, M, Stillwell, D, et al. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences. 110: 5802-5805. Smith, RJ, Crutchley, P, et al. (2017). Variations in Facebook posting patterns across validated patient health conditions: a prospective cohort study. Journal of Medical Internet Research. 19(1): e7. http://www.jmir.org/2017/1/e7/. Eichstaedt, JC, Schwartz, HA, et al. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science. 26: 159-169. Hawkins, JB, Brownstein, JS, et al. (2015). Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Quality & Safety. 25: 404-413. Shermis, MD and Hamner, B (2012). Contrasting State-of-the-Art Automated Scoring of Essays: Analysis. National Council on Measurement in Education, Vancouver, BC http://www.scoreright.org/NCME_2012_Paper3_29_12.pdf. Perelman, LC (2013). Critique (Ver. 3.4) of Mark D. Shermis & Ben Hammer, “Contrasting State-of-the-Art Automated Scoring of Essays: Analysis”. Cambridge, MA, Massachusetts Institute of Technology. http://graphics8.nytimes.com/packages/pdf/science/Critique_of_Shermis.pdf. |
9.3 |
Of all the topics in this
chapter, clearly question-answering has received the most
attention from the media, mostly revolving around the IBM
Watson system. Part of that attention stems from its
potential role in medicine. Watson was actually developed
out of IBM's participating in the TREC Question-Answering
Track (Voorhees, 2005). A technical overview of the system
has been described by Ferrucci et al. (2010). A great deal
of further detail has been provided in an entire issue of IBM Journal of Research and Development (Ferrucci, 2012). As most know,
Watson beat humans at the Jeopardy! television game show
(Markoff, 2011) and is now being applied to healthcare
(Lohr, 2012). Watson is built around a system called DeepQA, which uses massively parallel computing to acquire knowledge from resources of a given domain (Ferrucci et al., 2010). Its learning process builds around sample questions from the domain. One key step is to identify lexical answer types (LATs) in the domain. Among general questions, some common LATs include he, country, city, man, film, state, she, author, group, here, company, etc. From these LATs, natural language processing (NLP) is applied to text and knowledge representation and reasoning (KRR) to structured knowledge. Machine learning is then applied to questions and their answers. When questions are entered into Watson at run-time, the system searches performs question classification, aiming to detect LATs and focus the question. The process is aided by detection of relationships stated in the question as well as decomposition of questions into subparts. Watson then generates hypotheses for answers, performing a step called "soft filtering" to prune the possible list, and then sorting the remainder to rank and provide confidence in the answer. The parallel nature of its algorithms make it highly scalable. Since winning at Jeopardy!, Watson has "graduated medical school" (Cerrato, 2012). To apply Watson to any new domain, including medicine, three areas of adaptation are required (Ferrucci et al., 2012):
Watson was evaluated on an additional 188 unseen questions. The primary outcome measure was recall at 10 answers, and the results varied from 0.49 for the core system to 0.77 for the fully adapted and trained system (Ferrucci, 2012). It would have been interesting to see Watson compared against other systems, such as Google or Pubmed, as well as assessed using other measures, such as MRR. A future use case for Watson is to apply the system to data in EHR systems, ultimately aiming to serve as a clinical decision support system (Cerrato, 2012). Since the original published study, very little other peer-reviewed research has been published (Kim, 2015), although Watson is seen regularly on IBM television commercials and other marketing materials. A number of researchers, including a long-time artificial intelligence researcher, have been critical of claims for it (Schank, 2016). |
Voorhees, E. (2005). Question Answering in TREC,
233-257, in Voorhees, E. and Harman, D., eds. TREC - Experiment and Evaluation in Information
Retrieval. Cambridge, MA. MIT Press. Ferrucci, D., Brown, E., et al. (2010). Building Watson: an overview of the DeepQA Project. AI Magazine, 31(3): 59-79. Ferrucci, DA (2012). Introduction to "This is Watson". IBM Journal of Research and Development. 56(3/4): 1-15. Markoff, J. (2011). Computer Wins on ‘Jeopardy!’: Trivial, It’s Not. New York Times. February 16, 2011. http://www.nytimes.com/2011/02/17/science/17jeopardy-watson.html. Lohr, S. (2012). The Future of High-Tech Health Care - and the Challenge. New York Times. February 13, 2012. http://bits.blogs.nytimes.com/2012/02/13/the-future-of-high-tech-health-care-and-the-challenge/. Cerrato, P (2012). IBM Watson Finally Graduates Medical School. Information Week, October 23, 2012. http://www.informationweek.com/healthcare/clinical-systems/ibm-watson-finally-graduates-medical-sch/240009562. Ferrucci, D, Levas, A, et al. (2012). Watson: Beyond Jeopardy! Artificial Intelligence. 199-200: 93-105. Kim, C (2015). How much has IBM’s Watson improved? Abstracts at 2015 ASCO. Health + Digital. http://healthplusdigital.chiweon.com/?p=83. Schank, R (2016). The fraudulent claims made by IBM about Watson and AI. They are not doing "cognitive computing" no matter how many times they say they are. Roger Schank. http://www.rogerschank.com/fraudulent-claims-made-by-IBM-about-Watson-and-AI. |
9.3 |
Some other question-answering systems for
biomedicine have been developed, one that attempts to parse
and map questions into facts determined from journal
articles (Neves and Leser, 2015) and another that aims to
find sentences that likely have the answer (Hristovski et
al., 2015). Question-answering has been part of the BioASQ initiative (Tsatsaronis et al., 2015). |
Neves, M and Leser, U (2015). Question
answering for Biology. Methods. 74: 36-46. Hristovski, D, Dinevski, D, et al. (2015). Biomedical question answering using semantic relations. BMC Bioinformatics. 16: 6. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-014-0365-3. Tsatsaronis, G, Balikas, G, et al. (2015). An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics. 16: 138. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0564-6. |
9.4 |
There continues to be work in
text summarization in the biomedical domain, as
exemplified by recent systematic reviews focused on its
use with biomedical literature (Mishra et al., 2014) as
well as EHRs (Pivovarov and
Elhadad, 2015). In individual systems, Workman et al. (2010) used a semantic retrieval system to develop a summarization tool for a consumer genetics reference. Rebholz-Schuhmann and colleagues (2010) developed Papermaker, an application that provides a summarization to help authors write manuscripts by suggesting controlled vocabulary terms, more consistent language, and correct and appropriate references. Plaza (2014) has assessed the value of different terminology systems for assisting summarization, finding that individual systems, as opposed to the UMLS Metathesaurus, provide the most value. Work by Del Fiol et al. (2016) and Slager et al. (2017) have evaluated a system designed to summarize reports of evidence for physicians, finding summarized reports leveraging the PICO format have been found to be preferred over original formats of the literature. Text summarization work (along with question-answering and knowledge base population) has also continued in the Text Analysis Conference (TAC) sponsored by NIST. The 2017 cycle includes a task in information extraction of adverse drug reactions (ADRs) of drugs. |
Mishra, R, Bian, J, et al.
(2014). Text summarization in the biomedical domain:
a systematic review of recent research. Journal of
Biomedical Informatics. 52: 457-467. Pivovarov, R and Elhadad, N (2015). Automated methods for the summarization of electronic health records. Journal of the American Medical Informatics Association. 22: 938–947. Workman, T., Fiszman, M., et al. (2010). Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information. Journal of the Medical Library Association, 98: 273-281. Rebholz-Schuhmann, D., Kavaliauskas, S., et al. (2010). PaperMaker: validation of biomedical scientific publications. Bioinformatics, 26: 982-984. Plaza, L (2014). Comparing different knowledge sources for the automatic summarization of biomedical literature. Journal of Biomedical Informatics. 52: 319-328. DelFiol, Guilherme, Mostafa, J, et al. (2016). Formative evaluation of a patient-specific clinical knowledge summarization tool. International Journal of Medical Informatics. 86: 126-134. Slager, SL, Weir, CR, et al. (2017). Physicians' perception of alternative displays of clinical research evidence for clinical decision support - a study with case vignettes. Journal of Biomedical Informatics: Epub ahead of print. |