| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Submitted on September 5, 2006
Accepted on April 25, 2008
Affiliation of the authors: 1 Academic Medical Center, Department of Medical Informatics, University of Amsterdam, Amsterdam, The Netherlands ; 2 Academic Medical Center, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, The Netherlands ; 3 Academic Medical Center, Department of Public Health Methods, University of Amsterdam, Amsterdam, The Netherlands
* To whom correspondence should be addressed.
Objective To examine the differences between ignoring (naïve) and incorporating dependency (non-naïve) among linkage variables on the outcome of a probabilistic record linkage study.
Methods We used the outcomes of a previously developed probabilistic linkage procedure for different registries in perinatal care assuming independence among linkage variables. We estimated the impact of ignoring dependency by re-estimating the linkage weights after constructing a variable which combines the outcomes of the comparison of two correlated linking variables. The results of the original naïve and the new non-naïve strategy were systematically compared for three scenarios: the empirical dataset using 9 variables, the empirical dataset using 5 variables and a simulated dataset using 5 variables.
Results The linking weight for agreement on two correlated variables among non-matches was estimated considerably higher in the naïve strategy than in the non-naïve strategy (16.87 vs. 13.55). Therefore, ignoring dependency overestimates the amount of identifying information if both correlated variables agree. The impact on the number of pairs that was classified differently with both approaches was modest in the situation where there were many different linking variables but grew substantially with fewer variables. The simulation study confirmed the results of the empirical study and suggests that the number of misclassifications can rise substantially by ignoring dependency under less favorable linking conditions.
Conclusions Dependency often exists between linking variables and has the potential to bias the outcome of a linkage study. The non-naïve approach is a straightforward method for creating linking weights that accommodate dependency. The impact on the number of misclassifications depends on the quality and number of linking variables relative to the number of correlated linking variables.
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH |