help button home button JAMIA Hate scrolling?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH

First published June 25, 2008 as JAMIA PrePrint; doi:10.1197/jamia.M2265
Journal of the American Medical Informatics Association 2008;15(5):654-660
© 2008 American Medical Informatics Association


A more recent version of this article appeared on September 1, 2008
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M2265v1
15/5/654    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Tromp, M.
Right arrow Articles by Bonsel, G. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tromp, M.
Right arrow Articles by Bonsel, G. J.

Submitted on September 5, 2006
Accepted on April 25, 2008

Ignoring dependency between linking variables and its impact on the outcome of probabilistic record linkage studies

Miranda Tromp MSc1*, Nora Méray PhD1, Anita C. J. Ravelli PhD1, Johannes B. Reitsma PhD2, and Gouke J. Bonsel MD, PhD3

Affiliation of the authors: 1 Academic Medical Center, Department of Medical Informatics, University of Amsterdam, Amsterdam, The Netherlands ; 2 Academic Medical Center, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, The Netherlands ; 3 Academic Medical Center, Department of Public Health Methods, University of Amsterdam, Amsterdam, The Netherlands

* To whom correspondence should be addressed.

Objective To examine the differences between ignoring (naïve) and incorporating dependency (non-naïve) among linkage variables on the outcome of a probabilistic record linkage study.

Methods We used the outcomes of a previously developed probabilistic linkage procedure for different registries in perinatal care assuming independence among linkage variables. We estimated the impact of ignoring dependency by re-estimating the linkage weights after constructing a variable which combines the outcomes of the comparison of two correlated linking variables. The results of the original naïve and the new non-naïve strategy were systematically compared for three scenarios: the empirical dataset using 9 variables, the empirical dataset using 5 variables and a simulated dataset using 5 variables.

Results The linking weight for agreement on two correlated variables among non-matches was estimated considerably higher in the naïve strategy than in the non-naïve strategy (16.87 vs. 13.55). Therefore, ignoring dependency overestimates the amount of identifying information if both correlated variables agree. The impact on the number of pairs that was classified differently with both approaches was modest in the situation where there were many different linking variables but grew substantially with fewer variables. The simulation study confirmed the results of the empirical study and suggests that the number of misclassifications can rise substantially by ignoring dependency under less favorable linking conditions.

Conclusions Dependency often exists between linking variables and has the potential to bias the outcome of a linkage study. The non-naïve approach is a straightforward method for creating linking weights that accommodate dependency. The impact on the number of misclassifications depends on the quality and number of linking variables relative to the number of correlated linking variables.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 1994 by the American Medical Informatics Association.