help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published June 25, 2008 as JAMIA PrePrint; doi:10.1197/jamia.M2702
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M2702v1
15/5/601    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Google Scholar
Right arrow Articles by Friedlin, F. J.
Right arrow Articles by McDonald, C. J.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Friedlin, F. J.
Right arrow Articles by McDonald, C. J.
J Am Med Inform Assoc. 2008;15:601-610. DOI 10.1197/jamia.M2702.
© 2008 American Medical Informatics Association


Application of Information Technology

A Software Tool for Removing Patient Identifying Information from Clinical Documents

F. Jeff Friedlin, DO* and Clement J. McDonald, MD1

Regenstrief Institute, Indianapolis, IN

* Correspondence: Jeff Friedlin, DO, Regenstrief Institute, Inc., Medical Informatics, Health Information and Translational Sciences (HITS) Building, 410 West 10th Street, Suite 2000, Indianapolis, IN 46202 (Email: jfriedlin{at}regenstrief.org).

Received for publication: 12/20/07; accepted for publication: 05/30/08.

We created a software tool that accurately removes all patient identifying information from various kinds of clinical data documents, including laboratory and narrative reports. We created the Medical De-identification System (MeDS), a software tool that de-identifies clinical documents, and performed 2 evaluations. Our first evaluation used 2,400 Health Level Seven (HL7) messages from 10 different HL7 message producers. After modifying the software based on the results of this first evaluation, we performed a second evaluation using 7,190 pathology report HL7 messages. We compared the results of MeDS de-identification process to a gold standard of human review to find identifying strings. For both evaluations, we calculated the number of successful scrubs, missed identifiers, and over-scrubs committed by MeDS and evaluated the readability and interpretability of the scrubbed messages. We categorized all missed identifiers into 3 groups: (1) complete HIPAA-specified identifiers, (2) HIPAA-specified identifier fragments, (3) non-HIPAA–specified identifiers (such as provider names and addresses). In the results of the first-pass evaluation, MeDS scrubbed 11,273 (99.06%) of the 11,380 HIPAA-specified identifiers and 38,095 (98.26%) of the 38,768 non-HIPAA–specified identifiers. In our second evaluation (status postmodification to the software), MeDS scrubbed 79,993 (99.47%) of the 80,418 HIPAA-specified identifiers and 12,689 (96.93%) of the 13,091 non-HIPAA–specified identifiers. Approximately 95% of scrubbed messages were both readable and interpretable. We conclude that MeDS successfully de-identified a wide range of medical documents from numerous sources and creates scrubbed reports that retain their interpretability, thereby maintaining their usefulness for research.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by the American Medical Informatics Association.