Homepage of Professor Aaron Michael Cohen, M.D. M.S.

Current contact information:

Recent-ish Photo

Aaron M. Cohen, Professor
School of Medicine
Department of Medical Informatics and Clinical Epidemiology (DMICE)

Oregon Health & Science University

3181 S.W. Sam Jackson Park Road

Mail Code: BICC

Portland, Oregon, USA 97239-3098

Future Informatics Professor
Circa 1970

Professional Information

Publications:

My research interests center around the development, application, and evaluation of text mining, machine learning, and classification techniques and tools for biomedical researchers. Here are some significant papers I have published in these areas:

Cohen AM, Hersh W. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics 2005;6(1):57-71. [pdf]
Cohen AM, Hersh WR, Bhupatiraju RT. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In: Proceedings of the Text Retrieval Conference (TREC) 2004; Gaithersburg, MD. [pdf]
Cohen AM. Unsupervised gene/protein entity normalization using automatically extracted dictionaries. In: Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Proceedings of the BioLINK2005 Workshop; Detroit, MI: Association for Computational Linguistics; 2005. p. 17-24. [pdf]
Cohen AM, Yang J, Hersh WR. A Comparison of Techniques for Classification and Ad Hoc Retrieval of Biomedical Documents. In: Proceedings of the Fourteenth Annual Text REtrieval Conference, TREC 2005, Gaithersburg, MD. [pdf]
Cohen AM, Hersh WR, Peterson K, Yen PY. Reducing Workload in Systematic Review Preparation Using Automated Citation Classification. JAMIA 2006;13(2):206-219. [pdf]
Cohen AM. An Effective General Purpose Approach for Automated Biomedical Document Classification. In: Proceedings of the American Medical Informatics Association (AMIA) 2006 Annual Symposium; 2006. [pdf]
Cohen AM.Five-way Smoking Status Classification using Text Hot-spot Identification and Error-Correcting Output Codes. J Am Med Inform Assoc, 2007. [pdf] [pdf of full paper and data supplement]
Cohen AM. Optimizing feature representation for automated systematic review work prioritization. AMIA Annu Symp Proc 2008:121-5. [pdf]
Cohen AM, Ambert K, McDonagh MS. Cross-topic Learning for Work Prioritization in Systematic Review Creation and Update. J Am Med Inform Assoc 2009. [pdf]
Ambert KH, Cohen AM. k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents. IEEE IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011. [pdf]
Cohen AM, Ambert K, McDonagh M. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Medical Informatics and Decision Making. 2012 Apr 19;12(1):33. [pdf]
Edinger T, Cohen AM. A Large-Scale Analysis of the Reasons Given for Excluding Articles that are Retrieved by Literature Search During Systematic Review. AMIA Annu Symp Proc. 2013. [pdf]
Jiang Y, Lin C, Meng W, Yu C, Cohen AM, Smalheiser NR. Rule-based deduplication of article records from bibliographic databases. Database: the journal of biological databases and curation. 2014;2014. [pdf]
Cohen AM, Smalheiser NR, McDonagh MS, Yu C, Adams CE, Davis JM, et al. Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. J Am Med Inform Assoc. 2015 Feb 5. [pdf]
Lugli G, Cohen AM, Bennett DA, Shah RC, Fields CJ, Hernandez AG, et al. Plasma Exosomal miRNAs in Persons with and without Alzheimer Disease: Altered Expression and Prospects for Biomarkers. PLoS ONE. 2015 Oct 1;10(10):e0139233. [pdf]
Wallace BC, Noel-Storr A, Marshall IJ, Cohen AM, Smalheiser NR, Thomas J. Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach. J Am Med Inform Assoc. 2017 May 25.
Edinger T, Demner-Fushman D, Cohen AM, Bedrick S, Hersh W. Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval. AMIA Annu Symp Proc. 2017;2017:660-9.
Smalheiser NR, Cohen AM. Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database. Data and Information Management. 2018;2(1):27-36.
Cohen AM, Dunivin ZO, Smalheiser NR. A probabilistic automated tagger to identify human-related publications. Database (Oxford). 2018 01;2018:1-8.
Smalheiser NR, Cohen AM, Bonifield G. Unsupervised Low-Dimensional Vector Representations for Words, Phrases and Text that are Transparent, Scalable, and produce Similarity Metrics that are not Redundant with Neural Embeddings. J Biomed Inform. 2019 Jan 14;103096.
Weiskopf NG, Cohen AM, Hamman J, Jarmon T, Dorr D. Towards augmenting structured EHR data: a comparison of manual chart review and patient self-report. AMIA Annu Symp Proc; 2019.
Leung ET, Raboin MJ, McKelvey J, Graham A, Lewis A, Prongay K, Cohen AM, Vinson A. Modelling disease risk for amyloid A (AA) amyloidosis in non-human primates using machine learning. Amyloid. 2019 Jun 18;0(0):1-9.
Chamberlin SR, Bedrick SD, Cohen AM, Wang Y, Wen A, Liu S, Liu H, Hersh WR, A query taxonomy describes performance of patient-level retrieval from electronic health record data, Health Search and Data Mining (HSDM) Workshop, Web Search and Data Mining (WSDM) Conference, medRxiv 19012294, 2020.
Chamberlin SR, Bedrick SD, Cohen AM, Wang Y, Wen A, Liu S, et al. Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task. Jamia Open [Internet]. Available from: https://academic.oup.com/jamiaopen/article/doi/10.1093/jamiaopen/ooaa026/5876567
Cohen AM, Chamberlin S, Deloughery T, Nguyen M, Bedrick S, Meninger S, et al. Detecting rare diseases in electronic health records using machine learning and knowledge engineering: Case study of acute hepatic porphyria. PLOS ONE. 2020 Jul 2;15(7):e0235574.
Schneider J, Hoang L, Kansara Y, Cohen AM, Smalheiser NR. Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews. JAMIA Open. 2021 Mar 12 (in press).

Curriculum Vitae:

View my full Curriculum Vitae here.

Data Sets:

Gold standard files for work described in my masters thesis and in the paper Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.
Systematic drug class review data extracted from Endnote files created by the OHSU Evidence-based Practice Center.

Software:

RMEQ: A tool for Computing Equivalence Groups in Repeated Measures Studies

Personal Information

I like (in no particular order) programming the iPhone and the Arduino, experimenting with programming languages, building and playing ukuleles and guitars, exploring ham radio (as AF7OI), folding origami, listening to podcasts, reading (both fiction and science), playing board games, taking photographs, and launching mid-power rockets.

My Covid-19 one-a-day photo project is at: https://www.instagram.com/aaron.michael.cohen/

Web Resources

Here are a few of my favorite programming and biomedical text mining resources on the Web:

The ORBIT Project. An Online Registery of Biomedical Informatics Tools.
Alex Morgan's page on BioNLP resources, although it has been quite a while since it has been updated.
TREC Genomics Track home page.
TREC Medical Records Track home page.
BioPython project home page. Not geared specifically to text mining BioPython contains lots of tools for working with resources useful in biomedical text mining.
Python.org, the home of the Python language. You do know about Python, right? After spending years getting paid to program in Assembly, C, C++, and Java, I now do almost all of my text mining research using Python. It's my favorite language. But that GIL is going prevent Python being a great multicore language.
Anaconda.com, the multi-platform basically everything you need way to setup and maintain your Python installation. It's not perfect, but it is better, more complete, and more consistent than anything that came before it (I'm looking at you PyPI.org). Jupyter notebooks and matplotlib graphing are included right out of the box as well.

Last updated 3/18/2021