Homepage of Professor Aaron Michael Cohen, M.D. M.S.


Current contact information:

                      head shot
Recent-ish Photo

Aaron M. Cohen, Professor
School of Medicine
Department of Medical Informatics and Clinical Epidemiology (DMICE) 
Oregon Health & Science University
3181 S.W. Sam Jackson Park Road
Mail Code: BICC
Portland, Oregon, USA 97239-3098
5th grade head shot
Future Informatics Professor

Professional Information


My research interests center around the development, application, and evaluation of text mining, machine learning, and classification techniques and tools for biomedical researchers. Here are some significant papers I have published in this area:
  1. Cohen AM, Hersh W. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics 2005;6(1):57-71. [pdf]
  2.  Cohen AM, Hersh WR, Bhupatiraju RT. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In: Proceedings of the Text Retrieval Conference (TREC) 2004; Gaithersburg, MD. [pdf]
  3. Cohen AM. Unsupervised gene/protein entity normalization using automatically extracted dictionaries. In: Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Proceedings of the BioLINK2005 Workshop; Detroit, MI: Association for Computational Linguistics; 2005. p. 17-24. [pdf]
  4. Cohen AM, Yang J, Hersh WR. A Comparison of Techniques for Classification and Ad Hoc Retrieval of Biomedical Documents. In: Proceedings of the Fourteenth Annual Text REtrieval Conference, TREC 2005, Gaithersburg, MD. [pdf]
  5. Cohen AM, Hersh WR, Peterson K, Yen PY. Reducing Workload in Systematic Review Preparation Using Automated Citation Classification. JAMIA 2006;13(2):206-219. [pdf]
  6. Cohen AM. An Effective General Purpose Approach for Automated Biomedical Document Classification. In: Proceedings of the American Medical Informatics Association (AMIA) 2006 Annual Symposium; 2006. [pdf]
  7. Cohen AM.Five-way Smoking Status Classification using Text Hot-spot Identification and Error-Correcting Output Codes. J Am Med Inform Assoc, 2007. [pdf] [pdf of full paper and data supplement]
  8. Cohen AM. Optimizing feature representation for automated systematic review work prioritization. AMIA Annu Symp Proc 2008:121-5. [pdf]
  9. Cohen AM, Ambert K, McDonagh MS. Cross-topic Learning for Work Prioritization in Systematic Review Creation and Update. J Am Med Inform Assoc 2009. [pdf]
  10. Ambert KH, Cohen AM. k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents. IEEE IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011. [pdf]
  11. Cohen AM, Ambert K, McDonagh M. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Medical Informatics and Decision Making. 2012 Apr 19;12(1):33. [pdf]
  12.  Edinger T, Cohen AM. A Large-Scale Analysis of the Reasons Given for Excluding Articles that are Retrieved by Literature Search During Systematic Review. AMIA Annu Symp Proc. 2013. [pdf]
  13. Jiang Y, Lin C, Meng W, Yu C, Cohen AM, Smalheiser NR. Rule-based deduplication of article records from bibliographic databases. Database: the journal of biological databases and curation. 2014;2014. [pdf]
  14.  Cohen AM, Smalheiser NR, McDonagh MS, Yu C, Adams CE, Davis JM, et al. Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. J Am Med Inform Assoc. 2015 Feb 5. [pdf]
  15. Lugli G, Cohen AM, Bennett DA, Shah RC, Fields CJ, Hernandez AG, et al. Plasma Exosomal miRNAs in Persons with and without Alzheimer Disease: Altered Expression and Prospects for Biomarkers. PLoS ONE. 2015 Oct 1;10(10):e0139233. [pdf]
  16. Wallace BC, Noel-Storr A, Marshall IJ, Cohen AM, Smalheiser NR, Thomas J. Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach. J Am Med Inform Assoc. 2017 May 25.

Curriculum Vitae:

Data Sets:




Personal Information

I like (in no particular order) programming the iPhone and the Arduino, experimenting with programming languages, building and playing ukuleles and guitars, exploring ham radio (as AF7OI), folding origami, listening to podcasts, reading (to myself and my kids), playing board games, taking photographs, and watching Doctor Who (when I have the time).



Web Resources

Here are a few of my favorite programming and biomedical text mining resources on the Web:

  • The ORBIT Project. An Online Registery of Biomedical Informatics Tools.
  • Alex Morgan's page on BioNLP resources, although it has been a while since it has been updated.
  • BLIMP (Biomedical Literature and Text Mining Publications), a forum for collection, compilation, and exchange of publications on biomedical text mining. Sadly, this is no longer kept up and has gone out of date.
  • TREC Genomics Track home page.
  • TREC Medical Records Track home page.
  • BioPython project home page. Not geared specifically to text mining BioPython contains lots of tools for working with resources useful in biomedical text mining.
  • Python.org, the home of the Python language. You do know about Python, right? After spending years getting paid to program in Assembly, C, C++, and Java, I now do almost all of my text mining research using Python. It's my favorite language. But that GIL is going prevent Python being a great multicore language.
  • Clojure.org, the home of the Clojure programming language. LISP on the JVM, really? But it's got immutable data and software transaction memory. It may be the language best-prepared to take advantage of 21st century CPUs (read: multicore).

Last updated 9/12/2017