![]() |
Information Retrieval: A Health and Biomedical Perspective, Third EditionWilliam Hersh, M.D. |
Section |
Topic |
Reference |
---|---|---|
8 |
The nature of IR research continues to change
with the growing ubiquity of search tools such as PubMed and
Google. It is difficult if not impossible to stand up a
research system anything like them, and most newer research
builds on top of them or other large-scale systems. Another major change in IR research is the move beyond system-oriented evaluation of ad hoc retrieval systems, i.e., the traditional approach of seeing how well a batch approach to submitting queries against a test collection can perform. Some have argued that the ad hoc search problem is for the most part "solved," even though there are probably incremental improvements that can be made. A number of recent general publications are noteworthy. Sanderson and Croft (2012) recently published a broad history of IR research from literally the beginning, i.e., from use of mechanical devices to modern-day advances. More specific to biomedical and health informatics but not limited to IR is the Statement on the Reporting of Evaluation Studies in Health Informatics (STARE-HI), which provides recommendations on the report of evaluative research in informatics (Talmon et al., 2009). A more theoretical and mathematical framework on evaluative research in IR has also been published by Carterette (2011). Some new books summarize the state of the art. A book covering both text retrieval (search) and text mining (extraction) has been published by Zhai and Massung (2016). White (2016) has also published a book on the dynamic nature of Web search. Several dozen computer science-oriented IR researchers came together in 2012 to inventory challenges and opportunities for the field (Allan, 2012). Six cross-cutting themes emerged across the many topics deemed to be critical:
TREC recently celebrated
its 25th anniversary. The entire event was captured
on video,
including a talk
on the various Biomedical Tracks by this author
(starting at about 50 minutes into the Part 3 video).
Another way to see how IR evaluation has evolved is to
look at the tracks
from recent TREC conferences, which show
incorporation of interactive searching as well as new
tasks and content types. |
Allan, J., Croft, B., et al. (2012).
Frontiers, challenges and opportunities for information
retrieval: report from SWIRL 2012. SIGIR Forum, 46(1): 2-32. http://www.sigir.org/forum/2012J/2012j_sigirforum_A_allanSWIRL2012Report.pdf. Carterette, B. (2011). System effectiveness, user models, and user utility: a conceptual framework for investigation. The 34th Annual ACM SIGIR Conference, Beijing, China. ACM Press. 903-912. Sanderson, M. and Croft, W. (2012). The History of Information Retrieval Research. Proceedings of the IEEE, 100: 1444-1451. Talmon, J., Ammenwerth, E., et al. (2009). STARE-HI--Statement on reporting of evaluation studies in health informatics. International Journal of Medical Informatics, 78: 1-9. Zhai, C and Massung, S (2016). Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. New York, NY, Association for Computing Machinery. White, RW (2016). Interactions with Search Systems. Cambirdge, England, Cambridge University Press. |
8.1 |
With the emergence of large-scale commercial IR systems as well as new technologies, such as the Web and mobile devices, the nature of system-oriented IR research has changed from a focus on basic search systems and tasks (e.g., ad hoc retrieval) to systems for modern search tasks (e.g., Web retrieval, incremental search, contextual search, etc.). | |
8.1.1 |
A good deal of the research focus on pure
lexical-statistical systems in recent years has been on
statistical language models (Zhai, 2009) and machine
learning methods for optimal ranking of system output (Li,
2011). For the latter, Microsoft Research has described (Qin
et al., 2010) and made available for research a data
collection called LETOR (Learning to Rank; https://www.microsoft.com/en-us/research/project/letor-learning-rank-information-retrieval/). The language model approach is mathematically based on calculating the probability of a word being in a query given that it occurs in relevant document (Zhai and Massung, 2016). A "smoothing" function provides weighting for previously unseen words in queries. Learning to rank approaches apply various machine learning features to word-level and document-level (e.g., link-based) features (Zhai and Massung, 2016). Despite all the successes of "deep learning" approaches in areas outside IR, minimal headway has been seen in improving results of document retrieval (Cohen, 2016). Recently, however, Denghani et al. (2017) used the large AOL query set and generated ranking of output from BM25 that provided “weakly supervised” training data and was used with the Robust and ClueWeb09 test collections. The resulting system was shown to achieve better results than straight BM25. Can we still learn from failure analysis? A major analysis done in 2003 but not widely published until 2009 looked at a variety of high-performing systems from TREC to determine why they failed on various topics (Buckley, 2009). The results found that most systems failed for a given topic for mostly the same reasons. The most common reasons for failure to recognize or promote various aspects of a topic. Rarely was the relationship between terms important, but instead the aspects of topic and document terms were critical. A number of well-known older research systems are no longer maintained:
|
Zhai, CX (2009). Statistical Language
Models for Information Retrieval, Morgan &
Claypool Publishers. Li, H (2011). Learning to Rank for Information Retrieval and Natural Language Processing, Morgan & Claypool Publishers. Qin, T., Liu, T., et al. (2010). LETOR: a benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13: 346-374. Cohen, D, Ai, Q, et al. (2016). Adaptability of neural networks on varying granularity IR tasks. Neu-IR'16 SIGIR Workshop on Neural Information Retrieval, Pisa, Italy https://arxiv.org/abs/1606.07565. Dehghani, M, Zamani, H, et al. (2017). Neural ranking models with weak supervision. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017), Tokyo, Japan https://arxiv.org/abs/1704.08803. Buckley, C. (2009). Why current IR engines fail. Information Retrieval, 12: 652-665. Kuc, R (2013). Apache Solr 4 Cookbook. Birmingham, England, Packt Publishing. Gormley, C and Tong, Z (2015). Elasticsearch: The Definitive Guide. Sebastopol, CA, O'Reilly & Associates. |
8.1.2 |
The NLM's Lexical Systems Group maintains a
site that supports the SPECIALIST Lexicon and natural
language processing (NLP) tools for it (http://specialist.nlm.nih.gov).
A historical overview of the MetaMap system that maps text
to Unified Medical Language System (UMLS) Metathesaurus
controlled terms for a variety of IR (and other)
applications has been published (Aronson and Lang, 2010). The introductory text by Jackson and Moulinier (2007) has been updated. A more recent overview of text processing for Web applications has been published by Ingersoll et al. (2013). An overview of NLP for health and biomedical applications was published by Friedman and Elhadad (2014). The book referenced above by Zhai and Massung (2016) also provides an overview of NLP and its use in search. A sample parser is available from the Stanford Natural Language Processing Group. |
Aronson, A. and Lang, F. (2010). An overview
of MetaMap: historical perspective and recent advances. Journal of the American
Medical Informatics Association, 17: 229-236. Jackson, P and Moulinier, I (2007). Natural Language Processing for Online Applications: Text retrieval, Extraction and Categorization, Second Revised Edition. Amsterdam, Holland, John Benjamins Publishing Company. Ingersoll, GS, Morton, TS, et al. (2013). Taming Text - How to Find, Organize, and Manipulate It. Shelter Island, NY, Manning Publications. Friedman, C and Elhadad, N (2014). Natural Language Processing in Health Care and Biomedicine. Biomedical Informatics: Computer Applications in Health Care and Biomedicine (Fourth Edition). E. Shortliffe and J. Cimino. London, England, Springer: 255-284. |
8.1.3 |
New applications of IR attract research interest, especially with development of new technologies (e.g., mobile devices) and new types of information (e.g., social media). | |
8.1.3.1 |
The CLEF initiative has been renamed to the Conference and Labs of the Evaluation Forum from the Cross-Language Evaluation Forum, although it maintains its focus on multilingual IR. It has a new URL: http://www.clef-initiative.eu/. The ImageCLEF track also continues to thrive, including its medical image search task, which is described in Section 8.1.3.4. An update on the state of the art in cross-language IR was recently provided by Nie (2010). | Nie, JY (2010). Cross-Language Information Retrieval, Morgan & Claypool Publishers. |
8.1.3.2 |
The TREC Web Track and a number of derived
TREC tracks have focused on various aspects of Web
searching. These tracks have benefited from the
creation of new large test collections built by extensive
Web crawling. After a hiatus for a number of years, the TREC
Web Track reemerged in 2009 with the development of the
ClueWeb09 collection (http://lemurproject.org/clueweb09/).
This collection consists of 1.04 billion Web pages in ten
different languages. Its size is 5 TB compressed and 25 TB
uncompressed. The collection contains about 4.8 billion
unique URLs (some mapping to the same page) and 7.9 billion
total outlinks. The TREC 2011 (Clarke et al., 2011) and 2012 Web Tracks (Clarke et al., 2012) featured two tasks. One was a standard ad hoc retrieval task. Relevance judging was carried out on the following scale (Clarke et al., 2011):
The second task of the Web Track has been a diversity task, where the retrieval goal has been for the search output to contain complete coverage of the topic without excessive redundancy. Each topic is divided into subtopics, with each document judged for relevance to each subtopic. A variant of ERR, intent aware ERR (ERR-IA) is the primary performance measure, although NDCG is used as well. There have also been some derivations of the Web track using the same collection. TREC 2011 featured the Session Track, which focused on retrieval over multiple sessions. A set of sessions were provided that contained the current query along with the set of past queries; the ranked list of URLs for each past query; and the set of clicked URLs and snippets, along with time spent reading, for each clicked URL (Kanoulas et al., 2011). TREC 2012 featured the Contextual Suggestion Track, where the task was to retrieve documents relative to a user's context (location, date, time of year, etc.) and interests (Dean-Hall et al., 2012). |
Clarke, CLA, Craswell, N, et al. (2011).
Overview of the TREC 2011 Web Track. The Twentieth Text
REtrieval Conference (TREC 2011) Proceedings,
Gaithersburg, MD. National Institute of Standards and
Technology http://trec.nist.gov/pubs/trec20/papers/WEB.OVERVIEW.pdf. Clarke, CLA, Craswell, N, et al. (2012). Overview of the TREC 2012 Web Track. The Twenty-First Text REtrieval Conference (TREC 2012) Proceedings, Gaithersburg, MD. National Institute of Standards and Technology http://trec.nist.gov/pubs/trec21/papers/WEB12.overview.pdf. Chapelle, O, Metlzer, D, et al. (2009). Expected reciprocal rank for graded relevance. 18th ACM Conference on Information and Knowledge Management, Hong Kong, China. 621-630. Kanoulas, E, Hall, M, et al. (2011). Overview of the TREC 2011 Session Track. The Twentieth Text REtrieval Conference (TREC 2011) Proceedings, Gaithersburg, MD. National Insitute of Standards and Technology http://trec.nist.gov/pubs/trec20/papers/SESSION.OVERVIEW.2011.pdf. Dean-Hall, A, Clarke, CLA, et al. (2012). Overview of the TREC 2012 Contextual Suggestion Track. The Twenty-First Text REtrieval Conference (TREC 2012) Proceedings, Gaithersburg, MD. National Insitute of Standards and Technology http://trec.nist.gov/pubs/trec21/papers/CONTEXTUAL12.overview.pdf. |
8.1.3.2 |
A new version of the Web test collection was
created to supersede ClueWeb09. ClueWeb12 (http://www.lemurproject.org/clueweb12.php/)
contains about 870 million English-language Web pages. The
crawl captured all textual pages (including XML, Javascript,
and CSS pages) and images, but ignored multimedia (e.g.,
Flash) and compressed (e.g., Zip) files. It also included
all URLs in Twitter feeds. It has been used in a number of
TREC tracks. Another large new collection that has been developed is the Knowledge Base Acceleration (KBA) Stream Corpus (http://trec-kba.org/kba-stream-corpus-2014.shtml), which contains a variety of Web-based resources that are meant to be used in KBA tasks that aim to accelerate the discovery of knowledge. |
|
8.1.3.2 |
Some new TREC tracks in recent years
demonstrate applications of IR systems and include:
|
|
8.1.3.2 |
Another type of retrieval that has become
commercially important is recommender systems, which can be
viewed in some ways as an outgrowth of filtering. These are
important for many e-commerce Web sites, such as Amazon and
Netflix. There are two types of filtering:
|
|
8.1.3.3 |
Several new biomedical challenge evaluations
have emerged since publication of the third edition of the
book. One is the TREC Medical Records Track that was
introduced in Chapter 1 (Voorhees and Tong, 2011; Voorhees
and Hersh, 2012; Voorhees, 2013). The use case for the track
TREC Medical Records Track was identifying patients from a
collection of medical records who might be candidates for
clinical studies. This is a real-world task for which
automated retrieval systems could greatly aid in ability to
carry out clinical research, quality measurement and
improvement, or other "secondary uses" of clinical data
(Safran et al., 2007). The metric used to measure systems
employed was inferred normalized distributed cumulative gain
(infNDCG), which takes into account some other factors, such
as incomplete judgment of all documents retrieval by all
research groups. The data for the track was a corpus of de-identified medical records developed by the University of Pittsburgh Medical Center. Records containing data, text, and ICD-9 codes are grouped by "visits" or patient encounters with the health system. (Due to the de-identification process, it was impossible to know whether one or more visits might emanate from the same patient.) There were 93,551 documents mapped into 17,264 visits. In the 2012 track, the best manual results were reported by Dember-Fushman et al. (2012; infNDCG = 0.680) and Bedrick et al. (2012; infNDCG = 0.526), while the best automated results were reported by Zhu and Carterette (2012; infNDCG = 0.578) and Qi and Laquerre (2012; infNDCG = 0.547). A number of research groups used a variety of techniques, such as synonym and query expansion, machine learning algorithms, and matching against ICD-9 codes, but still had results that were not better than manually constructed queries employed by groups from NLM or OHSU (although the NLM system had a number of advanced features, such as document field searching). Although the performance of systems in the track was "good" from an IR standpoint, they also showed that identification of patient cohorts would be a challenging task even for automated systems. Some of the automated features that had variable success included document section focusing, and term expansion, term normalization (mapping into controlled terms). Follow-on studies reported benefit from various approaches to query expansion:
|
Voorhees, EM and Tong, RM (2011). Overview of
the TREC 2011 Medical Records Track. The Twentieth Text
REtrieval Conference Proceedings (TREC 2011),
Gaithersburg, MD. National Institute of Standards and
Technology. Voorhees, E and Hersh, W (2012). Overview of the TREC 2012 Medical Records Track. The Twenty-First Text REtrieval Conference Proceedings (TREC 2012), Gaithersburg, MD. National Institute of Standards and Technology http://trec.nist.gov/pubs/trec21/papers/MED12OVERVIEW.pdf. Voorhees, EM (2013). The TREC Medical Records Track. Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, Washington, DC. 239-246. Safran, C, Bloomrosen, M, et al. (2007). Toward a national framework for the secondary use of health data: an American Medical Informatics Association white paper. Journal of the American Medical Informatics Association. 14: 1-9. Demner-Fushman, D, Abhyankar, S, et al. (2012). NLM at TREC 2012 Medical Records Track. The Twenty-First Text REtrieval Conference Proceedings (TREC 2012), Gaithersburg, MD. National Institute for Standards and Technology http://trec.nist.gov/pubs/trec21/papers/NLM.medical.final.pdf. Bedrick, S, Edinger, T, et al. (2012). Identifying patients for clinical studies from electronic health records: TREC 2012 Medical Records Track at OHSU. The Twenty-First Text REtrieval Conference Proceedings (TREC 2012), Gaithersburg, MD. National Institute for Standards and Technology http://trec.nist.gov/pubs/trec21/papers/OHSU.medical.final.pdf. Zhu, D and Carterette, B (2012). Exploring evidence aggregation methods and external expansion sources for medical record search. The Twenty-First Text REtrieval Conference Proceedings (TREC 2012), Gaithersburg, MD. National Institute for Standards and Technology http://trec.nist.gov/pubs/trec21/papers/udel.medical.final.pdf. Qi, Y and Laquerre, PF (2012). Retrieving medical records with "sennamed": NEC Labs America at TREC 2012 Medical Record Track. The Twenty-First Text REtrieval Conference Proceedings (TREC 2012), Gaithersburg, MD. National Institute for Standard and Technology http://trec.nist.gov/pubs/trec21/papers/sennamed.medical.final.pdf. Amini, I, Martinez, D, et al. (2016). Improving patient record search: a meta-data based approach. Information Processing & Management. 52: 258-272. Zhu, D, Wu, ST, et al. (2014). Using large clinical corpora for query expansion in text-based cohort identification. Journal of Biomedical Informatics. 49: 275-281. Martinez, D, Otegi, A, et al. (2014). Improving search over electronic health records using UMLS-based query expansion through random walks. Journal of Biomedical Informatics. 51: 100-106. Edinger, T, Cohen, AM, et al. (2012). Barriers to retrieving patient information from electronic health record data: failure analysis from the TREC Medical Records Track. AMIA 2012 Annual Symposium, Chicago, IL. 180-188. |
8.1.3.3 |
There are some unique challenges for IR
research with medical records. One of these, not limited to
IR (e.g., also including NLP, machine learning, etc.) is the
privacy of the patients whose records are being searched
(Friedman, 2013). Given the growing concern over privacy and
confidentiality, how can informatics (including IR)
researchers carry out this work while assuring no
information about patients is revealed? It turns out that
this problem is not limited to patient records. There are
many private collections of information over which we might
like to search, such as email or corporate repositories. The
TREC Total Recall Track addressed this issue and developed
an architecture that involved sending systems to the data
(Roegeist, 2016). Hanbury et al. (2016) expanded on this
notion for medical IR, dubbing it Evaluation as a Service
(EaaS). One limitation to this approach is that researchers
run their systems on the data (securely somewhere else) so
do not see the data, and only get results of their runs. |
Friedman, C, Rindflesch, TC, et al. (2013).
Natural language processing: state of the art and prospects
for significant progress, a workshop sponsored by the
National Library of Medicine. Journal of Biomedical
Informatics. 46: 765-773. Roegiest, A and Cormack, GV (2016). An architecture for privacy-preserving and replicable high-recall retrieval experiments. Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy. 1085-1088. Hanbury, A, Müller, H, et al. (2015). Evaluation-as-a-Service: Overview and Outlook, arXiv. http://arxiv.org/pdf/1512.07454v1. |
8.1.3.3 |
The most recent biomedical track in TREC is
the Clinical Decision Support (CDS) Track (http://www.trec-cds.org/)
(Roberts, 2016; Roberts, 2016). In this track, the topic is
a clinical case and the task is to retrieve full-text
journal articles that provide relevant information on
diagnosis, tests, or treatments. The task is ad hoc
searching and the collection has been a snapshot of 733K
(2014-2015) and 1.25M (2016) full-text articles from PubMed
Central. This TREC track is morphing into the Precision
Medicine Track starting in 2017 that will use MEDLINE
abstracts and documents from ClinicalTrials.gov. |
Roberts, K, Simpson, M, et al. (2016).
State-of-the-art in biomedical literature retrieval for
clinical cases: a survey of the TREC 2014 CDS track. Information
Retrieval Journal. 19: 113-148. Roberts, K, Demner-Fushman, D, et al. (2016). Overview of the TREC 2016 Clinical Decision Support Track. The Twenty-Fifth Text REtrieval Conference (TREC 2016) Proceedings, Gaithersburg, MD http://trec.nist.gov/pubs/trec25/papers/Overview-CL.pdf. |
8.1.3.3 |
Another biomedical challenge evaluation
growing out of CLEF is the CLEF eHealth Evaluation Lab (https://sites.google.com/site/clefehealth/).
This challenge evaluation has focused on three tasks:
|
|
8.1.3.3 |
Research in biomedical text searching has not been limited to challenge evaluations. Sometimes users wish to find documents that report negative results, i.e., that use negation. Agrawal et al. (2010) have developed an algorithm, available on a Web site (http://bionot.askhermes.org/integrated/BioNot.uwm), that aims to single out negated sentences. | Agarwal, S., Yu, H., et al. (2010). BioNØT: a searchable database of biomedical negated sentences. BMC Bioinformatics, 12: 420. http://www.biomedcentral.com/1471-2105/12/420. |
8.1.3.3 |
Another area of focus has been on tools to
assist those performing systematic reviews that make the
retrieval and analysis of evidence (typically randomized
controlled trials, RCTs). A search engine focused on this
task has been developed by Smalheiser et al. (2014) and is
called Metta. Cohen et al. (2015) have developed machine
learning approaches that improve the tagging and retrieval
of RCTs. Others have also used machine learning to recognize
from the stream of new literature high-quality evidence in
clinical studies (Kilicoglu, 2009) as well as articles about
modalities of molecular medicine (Wehbe, 2009). Paynter et
al. (2016) recently looked at evaluation studies of text
mining approaches to assist systematic reviews. |
Smalheiser, NR, Lin, C, et al. (2014). Design
and implementation of Metta, a metasearch engine for
biomedical literature retrieval intended for systematic
reviewers. Health Information Science and Systems.
2014(2): 1. https://link.springer.com/article/10.1186/2047-2501-2-1. Cohen, AM, Smalheiser, NR, et al. (2015). Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. Journal of the American Medical Informatics Association. 22: 707-717. Kilicoglu, H., Demner-Fushman, D., et al. (2009). Towards automatic recognition of scientifically rigorous clinical research evidence. Journal of the American Medical Informatics Association, 16: 25-31. Wehbe, F., Brown, S., et al. (2009). A novel information retrieval model for high-throughput molecular medicine modalities. Cancer Informatics, 8: 1-17. Paynter, R, Bañez, LL, et al. (2016). EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews. Rockville, MD, Agency for Healthcare Research and Quality. https://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0092849/. |
8.1.3.3 |
Some research has focused on improving the
ranking of documents output by the IR system. Essie is a
concept-based system developed at the NLM whose main feature
is to expand user queries by mapping them into concepts in
the UMLS Metathesaurus (Ide et al., 2007). Evaluation with
the TREC 2006 Genomics Track test collection showed results
equal to the best-performing systems from the challenge
evaluation. Another NLM research group compared a variety of
document ranking strategies for TREC Genomics Track data,
finding that TF*IDF ranking outperformed sentence-level
co-occurrence of words and other approaches (Lu et al.,
2009). Additional work with the TREC Genomics Track data
found that language model approaches increased MAP by 174%
over standard TFIDF (MAP = 0.381) (Abdou and Savoy, 2008).
Including MeSH terms in the documents was found to increase
MAP by 8.4%, while query expansion also gave small
additional benefit. The TREC Genomics Track archive is now
at: https://dmice.ohsu.edu/hersh/genomics/. |
Ide, N., Loane, R., et al. (2007). Essie: a
concept-based search engine for structured biomedical text.
Journal of the American
Medical Informatics Association, 14: 253-263. Lu, Z., Kim, W., et al. (2009). Evaluating relevance ranking strategies for MEDLINE retrieval. Journal of the American Medical Informatics Association, 16: 32-36. Abdou, S and Savoy, J (2008). Searching in MEDLINE: query expansion and manual indexing evaluation. Information Processing & Management. 44: 781-799. |
8.1.3.3 |
Other recent research has focused on methods
for improving the assignment of Medical Subject Headings
(MeSH) terms by automated means. Trieschnigg et al. (2009)
introduced an approach called MeSH Up that uses a
machine-learning classification approach called k-nearest
neighbor (KNN). Using MEDLINE records from the TREC Genomics
Track, the authors show this technique performs better than
MetaMap, the NLM's Medical Text Indexer (MTI), and other
concept-oriented approaches. Other approaches to this task
have focused on identifying similar articles to leverage
their MeSH terms. Aljaber et al.(2011) devised and evaluated
an approach that gathers MeSH terms from articles that the
paper being indexed cites and ranks them based on if and
where they occur in the paper being indexed. Experiments
showed improved performance over MTI and MeSH Up. Huang et
al. (2011) used a "learning to rank" approach applied to
similar documents. Another approach demonstrated an
improvement with inserting a graph-based ranking approach of
MeSH terms into the MTI process (Herskovic, 2011). The European Commission and NLM have organized a challenge evaluation to assess semantic indexing of literature called BioASQ (http://www.bioasq.org/) (Tsatsaronis et al., 2015). An additional task in BioASQ focuses on question-answering and is covered in Chapter 9. One analysis from this effort from the NLM found that the ten-year-old Medical Text Indexer (MTI) of NLM was still useful and could perhaps be augmented with machine learning approaches (Mork et al., 2017). The study also found that assisted indexing using MTI tended to perform better with precision tasks than recall tasks. An updated overview of the MTI has been published by Mork et al. (2013). |
Trieschnigg, D., Pezik, P., et al. (2009).
MeSH Up: effective MeSH text classification for improved
document retrieval. Bioinformatics,
25: 1412-1418. Aljaber, B., Martinez, D., et al. (2011). Improving MeSH classification of biomedical articles using citation contexts. Journal of Biomedical Informatics, 44: 881-896. Huang, M., Neveol, A., et al. (2011). Recommending MeSH terms for annotating biomedical articles. Journal of the American Medical Informatics Association, 18: 660-667. Herskovic, J., Cohen, T., et al. (2011). MEDRank: using graph-based concept ranking to index biomedical texts. International Journal of Medical Informatics, 80: 431-441. Tsatsaronis, G, Balikas, G, et al. (2015). An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics. 16: 138. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0564-6. Mork, J, Aronson, A, et al. (2017). 12 years on - is the NLM medical text indexer still useful and relevant? Journal of Biomedical Semantics. 2017(8): 8. https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0113-5. Mork, J, JimenoYepes, A, et al. (2013). The NLM medical text indexer system for indexing biomedical literature. 2013. BioASQ Workshop, Valencia, Spain http://bioasq.org/sites/default/files/Mork.pdf. |
8.1.3.3 |
Another important area of biomedical IR that
has spawned a new challenge evaluation concerns data set
retrieval. This work is based on the biomedical and
healthCAre Data Discovery Index Ecosystem (bioCADDIE), a
database of metadata about data sets available online. The bioCADDIE 2016 Dataset Retrieval Challenge used a snapshot of bioCADDIE and 30 queries to form a challenge evaluation. The best results came from term-based query expansion, usually employing aspects of MeSH. A number of groups used machine learning approaches, although these may have been limited by the small amount of training data that had been made available. |
|
8.1.3.3 |
The notional of the TREC Contextual
Suggestion Track has been applied to medicine, with a mobile
app making suggestions in the context of a user's health,
such as healthy activities and eating (Wing and Yang, 2016). |
Wing, C and Yang, H (2014). FitYou:
integrating health profiles to real-time contextual
suggestion. Proceedings of the 37th Annual International
ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR 2014), Gold Coast,
Australia. 1263-1264. |
8.1.3.3 |
Another line of work has focused on analysis
of user search logs to understand aspects of health-related
searching. Early work focused on processing search logs,
mostly from Microsoft Bing, to understand users'
characteristics and intentions (Cartright et al., 2011;
White and Horvitz, 2012). This approach has uncovered
"cyberchondria," defined as unnecessary escalation of
health-related concern when searching (White and Hrovitz,
2009), and "Web-scale pharmacovigilance," the uncovering of
drug interactions from search logs (White et al., 2013;
Nguyen et al., 2016). This approach has also been used to
identify patients who have a higher likelihood to develop
pancreatic carcinoma (Paparrizos et al., 2016) and lung
carcinoma (White and Horvitz, 2017). Some limitations of
these approaches are the retrospective nature of the data
and the inferring of user actions and intent solely from the
search logs. |
Cartright, MA, White, RW, et al. (2011).
Intentions and attention in exploratory health search. Proceedings
of the 33rd Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR
2010), Beijing, China. 65-74. White, RW and Horvitz, E (2012). Studies of the onset and persistence of medical concerns in search logs. Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012), Portland, OR. 265-274. White, RW and Horvitz, E (2009). Cyberchondria: studies of the escalation of medical concerns in Web search. ACM Transactions on Information Systems. 4: 23-37. White, RW, Tatonetti, NP, et al. (2013). Web-scale pharmacovigilance: listening to signals from the crowd. Journal of the American Medical Informatics Association. 20: 404-408. Nguyen, T, Larsen, ME, et al. (2017). Estimation of the prevalence of adverse drug reactions from social media. Journal of Biomedical Informatics. 102: 130-137. Paparrizos, J, White, RW, et al. (2016). Screening for pancreatic adenocarcinoma using signals from web search logs: feasibility study and results. Journal of Oncology Practice. 12: 737-744. White, RW and Horvitz, E (2017). Evaluation of the feasibility of screening patients for early signs of lung carcinoma in Web search logs. JAMA Oncology. 3: 398-401. |
8.1.3.4 |
Work in medical image retrieval continues to
attract interest. An overview paper was published by Hersh
et al. (2009) describing the consolidation of the
ImageCLEFmed test collections from 2005-2007. A book
describing all of ImageCLEF, not just the medical task, has
also been published (Müller et al., 2010). More recently, a
ten-year overview of lessons learned from the ImageCLEFmed
tasks was published (Kalpathy-Cramer, 2015). These lessons
included:
Since 2009, ImageCLEFmed has added two additional tasks beyond the basic ad hoc retreival task. One of these is modality classification, where systems attempt to identify the image modality (e.g., radiologic image, computerized tomography, publication figures, photographs, etc.). In the early years of this task, a small number (6-8) of modality categories were used. In 2012, however, a larger classification was developed that included over 20 items (Müller, 2012). With the smaller, earlier classification set, mixed text and visual retrieval methods worked best, but with the more recent newer and larger classification, however, visual retrieval methods have worked best, with text-based approaches alone performing poorly (Müller et al., 2012). A second additional task has been case-based retrieval. In this task, given a case description with patient demographics, limited symptoms and test results including imaging studies (but not the final diagnosis), systems must retrieve cases that include images that best suit the provided case description (Kalpathy-Cramer et al., 2010). Similar to ad hoc retrieval, best results have come from textual and not visual queries (Kalpathy-Cramer et al., 2011; Müller et al., 2012). Other image retrieval work outside ImageCLEF has focused on retrieval of journal images based on captions (Yu et al., 2009; Kahn and Rubin, 2009) and on annotation with the goal of improving retrieval (Demner-Fushman et al., 2009). |
Hersh, W., Müller, H., et al. (2009). The
ImageCLEFmed medical image retrieval task test collection. Journal of Digital Imaging, 22:
648-655. Müller, H., Clough, P., et al., eds. (2010). ImageCLEF: Experimental Evaluation in Visual Information Retrieval. Heidelberg, Germany. Springer. Kalpathy-Cramer, J, Secode Herrera, AG, et al. (2015). Evaluating performance of biomedical image retrieval systems - an overview of the medical image retrieval task at ImageCLEF 2004–2013. Computerized Medical Imaging and Graphics. 39: 55-61. Müller, H, Seco De Herrera, AG, et al. (2012). Overview of the ImageCLEF 2012 medical image retrieval and classification tasks. CLEF 2012 Working Notes, Rome, Italy http://www.clef-initiative.eu/documents/71612/ec58b0bf-b68f-423c-abd9-ede306a69cc0. Müller, H, Kalpathy-Cramer, J, et al. (2012). Creating a classification of image types in the medical literature for visual categorization. Medical Imaging 2012: Advanced PACS-based Imaging Informatics and Therapeutic Applications, San Diego, CA. SPIE http://publications.hevs.ch/index.php/attachments/single/396. Kalpathy-Cramer, J., Bedrick, S., et al. (2010). Retrieving similar cases from the medical literature - the ImageCLEF experience. MEDINFO 2010, Cape Town, South Africa. 1189-1193. Kalpathy–Cramer, J, Müller, H, et al. (2011). Overview of the CLEF 2011 medical image classification and retrieval tasks. CLEF 2011 Labs and Workshops Notebook Papers, Amsterdam, Netherlands http://publications.hevs.ch/index.php/attachments/single/331. Yu, H., Agarwal, S., et al. (2009). Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration, 4: 1. http://www.j-biomed-discovery.com/content/4/1/1. Kahn, C. and Rubin, D. (2009). Automated semantic indexing of figure captions to improve radiology image retrieval. Journal of the American Medical Informatics Association, 16: 380-386. Demner-Fushman, D., Antani, S., et al. (2009). Annotation and retrieval of clinically relevant images. International Journal of Medical Informatics, 78: e59-e67. |
8.2 |
In 2007, veteran Microsoft IR researcher
Susan Dumais famously said, "If in 10 years we are still
using a rectangular box and a list of results, I should be
fired" (Markoff, 2007). She provided a vision for "thinking
outside the (search) box" in 2009. Another user-oriented
research offered up another vision of what "natural" search
interfaces might look like in the future, with the user
speaking rather than typing, viewing video rather than
reading text, and interacting socially rather than alone
(Hearst, 2011). Since the book was published, other books about search user interfaces state of the art and research have been published (Hearst, 2009; Wilson, 2012). In addition, the volume by Shneiderman and colleagues (2009) is now in its fifth edition. Neilsen's famous all-time list of problems has a new URL: http://www.nngroup.com/articles/top-10-mistakes-web-design/. The page with links to his prolific writings is now at http://www.nngroup.com/articles/. |
Markoff, J (2007). Searching for Michael
Jordan? Microsoft Wants a Better Way. New York, NY. New York
Times. March 7, 2007. http://www.nytimes.com/2007/03/07/business/07soft.html. Dumais, S. (2009). Thinking outside the (search) box. User Modeling, Adaptation, and Personalization, 17th International Conference, UMAP 2009 Proceedings, Trento, Italy. 2. Hearst, M. (2009). Search User Interfaces. Cambridge, England, Cambridge University Press. Hearst, M. (2011). 'Natural' search user interfaces. Communications of the ACM, 54(11): 60-67. Shneiderman, B, Plaisant, C, et al. (2009). Designing the User Interface: Strategies for Effective Human-Computer Interaction, 5th Edition. Reading, MA, Addison-Wesley. Wilson, ML (2012). Search User Interface Design, Morgan & Claypool Publishers. |
8.2 |
As noted above, NLM has enhanced the Medical
Text Indexer (MTI) (Mork et al., 2013). More recent
additions include a machine learning module for selection of
check tags and a "first-line" status for journals where the
automated process is the only process used. NLM recently
reassessed the system after 12 years of use, finding
that it still provided value as a tool to aid MeSH term
selection by indexers (Mork et al., 2017). The inter-indexer
consistency of MTI is comparable to humans. |
Mork, J, JimenoYepes, A, et al. (2013). The
NLM medical text indexer system for indexing biomedical
literature. BioASQ Workshop, Valencia, Spain http://bioasq.org/sites/default/files/Mork.pdf. Mork, J, Aronson, A, et al. (2017). 12 years on - is the NLM medical text indexer still useful and relevant? Journal of Biomedical Semantics. 2017(8): 8. https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0113-5. |
8.3 |
A number of systems over the years have
attempted to cluster results for users, few of which have
been evaluated with real users. Mu et al. (2011) went
further and evaluated such a system with users, finding
fewer clicks needed to navigate to relevant information with
the clustering system. User satisfaction was also rated
higher for the clustering system. |
Mu, X., Ryu, H., et al. (2011). Supporting effective health and biomedical information retrieval and navigation: a novel facet view interface evaluation. Journal of Biomedical Informatics, 44: 576-586. |
8.3 |
Both CLEF and TREC have incorporated
interactive retrieval evaluation among their recent tracks.
CLEF began with a Living
Labs Track that provided commonly asked queries to
users in real-time, with different systems and their
features substituted as part of the study protocol. The CLEF
track focused on product search and Web search, while the TREC Open Search Track
has focused on academic search. |
Schuth, A, Balog, K, et al. (2015). Overview
of the Living Labs for Information Retrieval Evaluation
(LL4IR) CLEF Lab 2015. CLEF Proceedings 2015 http://vps46235.public.cloudvps.com/living-labs/wp-content/uploads/sites/10/2015/06/clef2015-ll4ir.pdf. |