help button home button JAMIA Bigger figures
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

First published May 19, 2005 as JAMIA PrePrint; doi:10.1197/jamia.M1757
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M1757v1
12/5/576    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Ao, H.
Right arrow Articles by Takagi, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ao, H.
Right arrow Articles by Takagi, T.
J Am Med Inform Assoc. 2005;12:576-586. DOI 10.1197/jamia.M1757.
© 2005 American Medical Informatics Association


Research Paper

ALICE: An Algorithm to Extract Abbreviations from MEDLINE

Hiroko Ao, MSc and Toshihisa Takagi, PhD

Affiliations of the authors: Department of Computational Biology, University of Tokyo, Chiba, Japan (HA, TT); Basic Research Laboratory, Kanebo Cosmetics, Inc., Kanagawa, Japan (HA).

Correspondence and reprints: Hiroko Ao, MSc, Department of Computational Biology, University of Tokyo CB01, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, 277-8561, Japan; e-mail: <aohiroko{at}hgc.jp>.

Received for publication: 12/02/04; accepted for publication: 04/23/05.

Objective: To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly.

Methods: ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules.

Results: It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database.

Conclusion: ALICE extracted abbreviations and their expansions from the literature efficiently. The subtly compiled heuristics enabled it to extract abbreviations with high recall without significantly reducing precision. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate. This system is freely available at http://uvdb3.hgc.jp/ALICE/ALICE_index.html.




This article has been cited by other articles:


Home page
BioinformaticsHome page
N. Okazaki and S. Ananiadou
Building an abbreviation dictionary using a term recognition approach
Bioinformatics, December 15, 2006; 22(24): 3089 - 3095.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Zhou, V. I. Torvik, and N. R. Smalheiser
ADAM: another database of abbreviations in MEDLINE
Bioinformatics, November 15, 2006; 22(22): 2813 - 2818.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2005 by the American Medical Informatics Association.