help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published June 28, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2435
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M2435v1
14/5/564    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wellner, B.
Right arrow Articles by Hirschman, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wellner, B.
Right arrow Articles by Hirschman, L.
J Am Med Inform Assoc. 2007;14:564-573. DOI 10.1197/jamia.M2435.
© 2007 American Medical Informatics Association


Research Paper

Rapidly Retargetable Approaches to De-identification in Medical Records

Ben Wellnera,c, Matt Huyckb, Scott Mardisa, John Aberdeena,*, Alex Morgana,d, Leonid Peshkinb, Alex Yeha, Janet Hitzemana and Lynette Hirschmana

a The MITRE Corporation, Bedford, MA
b Center for Biomedical Informatics, Harvard Medical School, Boston, MA
c Department of Computer Science, Brandeis University, Waltham, MA
d Stanford Biomedical Informatics, Palo Alto, CA.

* Correspondence and reprints: John Aberdeen, 202 Burlington Road, Bedford, MA 01730 (Email: aberdeen{at}mitre.org).

Received for publication: 03/13/07; accepted for publication: 06/11/07.

Objective: This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation.

Method: Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe.

Results: The "out of the box" Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736.

Conclusions: We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
M. Bloomrosen and D. Detmer
Advancing the Framework: Use of Health Data--A Report of a Working Conference of the American Medical Informatics Association
J. Am. Med. Inform. Assoc., November 1, 2008; 15(6): 715 - 722.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
F. J. Friedlin and C. J. McDonald
A Software Tool for Removing Patient Identifying Information from Clinical Documents
J. Am. Med. Inform. Assoc., September 1, 2008; 15(5): 601 - 610.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
O. Uzuner, Y. Luo, and P. Szolovits
Evaluating the State-of-the-Art in Automatic De-identification
J. Am. Med. Inform. Assoc., September 1, 2007; 14(5): 550 - 563.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by the American Medical Informatics Association.