help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published June 28, 2007 as JAMIA PrePrint; doi:10.1197/jamia.M2441
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M2441v1
14/5/574    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Szarvas, G.
Right arrow Articles by Busa-Fekete, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Szarvas, G.
Right arrow Articles by Busa-Fekete, R.
J Am Med Inform Assoc. 2007;14:574-580. DOI 10.1197/jamia.M2441.
© 2007 American Medical Informatics Association


Research Paper

State-of-the-art Anonymization of Medical Records Using an Iterative Machine Learning Framework

György Szarvasa,*, Richárd Farkasb and Róbert Busa-Feketeb

a Department of Informatics, University of Szeged, Szeged, Hungary
b Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary.

* Correspondence and reprints: György Szarvas, University of Szeged, Department of Informatics, 6720, Szeged, Árpád tér 2., Hungary (Email: szarvas{at}inf.u-szeged.hu).

Received for publication: 03/16/07; accepted for publication: 06/11/07.

Objective: The anonymization of medical records is of great importance in the human life sciences because a de-identified text can be made publicly available for non-hospital researchers as well, to facilitate research on human diseases. Here the authors have developed a de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act.

Design: We introduce here a novel, machine learning-based iterative Named Entity Recognition approach intended for use on semi-structured documents like discharge records. Our method identifies PHI in several steps. First, it labels all entities whose tags can be inferred from the structure of the text and it then utilizes this information to find further PHI phrases in the flow text parts of the document.

Measurements: Following the standard evaluation method of the first Workshop on Challenges in Natural Language Processing for Clinical Data, we used token-level Precision, Recall and Fβ=1 measure metrics for evaluation.

Results: Our system achieved outstanding accuracy on the standard evaluation dataset of the de-identification challenge, with an F measure of 99.7534% for the best submitted model.

Conclusion: We can say that our system is competitive with the current state-of-the-art solutions, while we describe here several techniques that can be beneficial in other tasks that need to handle structured documents such as clinical records.




This article has been cited by other articles:


Home page
J. Am. Med. Inform. Assoc.Home page
F. P. Morrison, L. Li, A. M. Lai, and G. Hripcsak
Repurposing the Clinical Record: Can an Existing Natural Language Processing System De-identify Clinical Notes?
J. Am. Med. Inform. Assoc., January 1, 2009; 16(1): 37 - 39.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
M. Bloomrosen and D. Detmer
Advancing the Framework: Use of Health Data--A Report of a Working Conference of the American Medical Informatics Association
J. Am. Med. Inform. Assoc., November 1, 2008; 15(6): 715 - 722.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by the American Medical Informatics Association.