1 Department of Public Health, The University of Western Australia, 35 Stirling Highway, Crawley, Western Australia, 6009.
2 Current affiliation: TVW Telethon Institute for Child Health Research, Western Australia.
Correspondence: Kate Brameld, Centre for Health Services Research, Department of Public Health, The University of Western Australia, 35 Stirling Highway, Crawley, Western Australia, 6009. E-mail: kate{at}dph.uwa.edu.au
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Method A retrograde survival model was implemented to calculate the level of over-ascertainment of incidence according to the number of years of linked data on which the estimates were based and corresponding correction factors were calculated. The method is illustrated using the example of linked hospital morbidity data on diabetes mellitus and then acute myocardial infarction, which was validated against the Perth MONICA database for cardiovascular disease.
Results Corrected estimates of the incidence of diabetes and acute myocardial infarction were produced. The incidence of diabetes was shown to be lower than in North America in accordance with prevalence estimates, whereas the incidence of acute myocardial infarction was overestimated by approximately 10%.
Conclusion A new method is presented for estimating incidence trends in disease from linked hospital morbidity data. The advantages of this method are its ease of use with routinely collected data and the relatively low cost of applying it in comparison with community surveys or maintaining formal disease registers. The method has other applications using linked data, such as the study of trends in first-time health care procedures and pharmaceutical prescriptions.
Accepted 4 April 2003
Examples of medical record linkage studies have appeared with increasing frequency in the literature since the 1960s, partly due to the inception of the Oxford Record Linkage Study at that time.1 Record linkage involves bringing together records derived from different sources, but relating to the same individual.2 The three basic steps are blocking of records that have a potential relationship; matching to determine if records within a block are likely to be related; and linking matched records so they can be analysed as composite or longitudinal information for the one individual.3 The process was relatively slow and cumbersome at first, but more recently the availability of affordable computing technology and the ability to process large numbers of records in a short space of time have meant that medical record linkage is no longer limited by processing power. There are now six comprehensive population-based medical record linkage systems around the world that routinely link administrative health data.1,48 In addition, there are numerous examples of ad hoc record linkage studies.9,1012 The current capacity for data linkage means that linked data sets can now potentially be used to answer a diverse range of public health surveillance and health service research questions. At the same time, the use of record linkage in these studies requires corresponding developments in methods of analysis to take full advantage of the research potential of linked administrative data.
Linked hospital morbidity data can be used to estimate the incidence of serious chronic disease provided that patients with the condition are admitted to hospital at least once. In the case of incidence rates estimated from first-time hospital admissions, unless the hospital morbidity data are used in conjunction with another data source, it is common to employ a clearance period to overcome the problem of overestimation of incident cases.1319 The problem results from the erroneous inclusion of prevalent cases that have had previous hospital admissions prior to the study observation periodthe prevalent pool effect. However, enforcing a clearance period is not an ideal method. There is no guarantee it will remove all prevalent cases and it results in loss of data from the early years of observation. To address this problem, we have developed the backcasting method. It implements a retrograde survival model to calculate the levels of over-ascertainment, and corresponding correction factors, according to the number of years of linked data on which the estimates of incidence rate are based.
In this paper we explain the backcasting method for estimating incidence trends from linked hospital morbidity data. We use the example of linked hospital morbidity data on diabetes mellitus and myocardial infarction to illustrate and validate the method.
![]() |
Backcasting methods for estimation of incidence rates |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Retrograde hazard and survival curves are produced from the data using the Kaplan-Meier estimator.20 The underlying hazard rate of a previous hospital admission at time point t, going backwards in calendar time is given by:
| Equation (1) |
| Equation (2) |
For a chronic disease for which the retrograde hazard is monotonic and decreasing with reverse-time and where the survival function is decreasing with reverse-time, it is assumed that a point exists where (t) = 0 i.e. the duration of retrograde follow-up where all previous admissions have been accounted for and there is no further hazard. We denote this point in reverse-time where
= 0, and the patient is risk-free, as tf.
To estimate tf, we fit a fractional polynomial regression curve to the estimates of (t) at t = 0,1,2,...,k 1, and solve
(t) = 0 for t or the earliest reasonable approximation to zero if the fractional polynomial is asymptotic in form.21 For practical purposes, tf was taken to be the first time point when
(t)
0.005. Retrograde survival at the point when the risk of admission is 0,
(tf), is evaluated from the retrograde survival curve at the equivalent point in time.
A correction factor for over-ascertainment of incident cases due to the prevalent pool effect is now given by the conditional probability of an admission being the true first admission, conditional on no previous admission;
| Equation (3a) |
![]() | Equation (3b) |
The corrected number of first admissions is then obtained by multiplying the observed number of first admissions, Ioj, by the proportion of patients estimated to be having their true first admission, Cj, given a specified number of years data. Thus, if Ioj is the observed number of first-time hospital admissions in period j, then a corrected estimate of the number of incident cases in period j is:
| Equation (4) |
Correction weights for individual patients
A further refinement of the backcasting method may be used to derive correction weights for individual patients with first-time hospital admissions in a file of linked hospital morbidity data. The correction weight, Cx, is the probability that the individual case x is truly incident, as distinct from a member of the prevalent pool of previously admitted patients.
In this analysis the structure of annual files of linked hospital morbidity data is ignored. All hospital admissions are included in the model rather than just the first admission per year as with the annual method. For each admission in the study, let a denote the date of admission, b the date of the most recent previous hospital separation, and c the first date of the study observation period. The retrograde follow-up time is for cases with previous events or
for censored cases. Whether or not a, b or c occur within the same annual period within the study makes no difference.
Retrograde hazard and survival curves are generated and tf and (tf) are estimated using fractional polynomial regression as above. In this case, tf was taken to be the first time point when
(t)
0.00001, given the different scaling in comparison to the method for annual files.
| Equation (5a) |
| Equation (5b) |
Corrected estimates of incident cases in the study, or in different annual periods within the study, are then obtained from:
| Equation (6) |
![]() |
Variance estimation |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Variance of the correction factor, Cj and the corrected incidence Îcj for the annual method
| Equation (7) |
| Equation (8) |
Variance of the correction factor, Cx and the corrected incidence Îc for the individual method
| Equation (9) |
| Equation (10) |
![]() |
Example of linked hospital morbidity data for diabetes mellitus |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The incidence of hospitalized diabetes was estimated using the linked hospital morbidity data and then corrected using the backcasting method, first the method for annual records and then that for individual records. Incident cases of hospitalized diabetes were defined as the first-ever admission of a patient that mentioned a diagnosis of diabetes in any of 19 diagnosis fields. The backcasting method was applied as described above, by age group and for all ages combined. The annual method requires a hospital morbidity file containing the first diabetic admission per person per year while the individual method requires a file containing all diabetic admissions.
The retrograde hazard and survival curves used in the backcasting method are shown in Figures 1 and 2. The powers and coefficients for the fractional polynomial fitted to the hazard curves are given in Appendix 2. At time zero it appears that all admissions are first admissions as there is no data available from the following year (1996, 1 years in reverse-time) that would indicate if somebody had been admitted previously, that is in reverse-time. The Figures show that the prevalent pool persisted for 13 years according to both the method for annual files and for individual patients. The hazard and survival curves for both methods have the same shape but are on a different scale. This difference is mainly due to the large number of past events in the first year of retrograde follow-up, all of which are counted in the individual method, whereas the annual method only counts zero or one event per person. The correction factors are given in Table 1
separately for both methods and were calculated using Equations 3a and 3b
for the annual method and Equations 5a and 5b
for the individual method. They are similar and thus result in similar incidence estimates. An example of calculation of the correction factor using the annual method is given in Appendix 3.
|
|
|
|
![]() |
Validation of the backcasting method using the Perth MONICA data for myocardial infarction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Linked data containing all records of hospital admissions for myocardial infarction (ICD-9 and ICD-9-CM codes 410.x)22 during 19801996, for residents of the Perth Statistical Division in Western Australia who were aged 3564 years, were selected from the WA Linked Database. After admissions involving transfers had been concatenated into single admissions, any non-fatal admission lasting less than 3 days was excluded. Cases with a length of stay of less than 3 days are generally cases being readmitted within 8 weeks for follow-up investigations and revascularization procedures and are unlikely to be genuine cases of myocardial infarction. The backcasting method was then applied.
The age-standardized incidence rate of myocardial infarction per year from 1984 to 1993, as determined using the backcasting method, was compared against the age-standardized rate from the MONICA register.
The retrograde hazard and survival curves used in the backcasting method for myocardial infarction show that the prevalent pool persisted for 13 years according to the method for individual patients. The powers and coefficients for the fractional polynomial fitted to the hazard curve are given in Appendix 2. The correction factors are given in Table 2. The trend over time in the observed and corrected incidence rates of new cases of hospitalized myocardial infarction in Perth, Western Australia, compared with the incidence rate as determined from the MONICA database are shown in Figure 4
.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The problems with the first two methods are illustrated in Figure 3. The observed line represents the results of the first-time admission method. The method using a clearance period of 5 years would be illustrated if the results prior to 1985 were ignored. The corrected lines show the results obtained from the backcasting method. Comparison of the observed and corrected lines shows that the first-time admission method substantially overestimates the incidence rate of new hospital admissions if data covering a sufficient duration of time are unavailable. The clearance period method is reasonably accurate, but results in the loss of information from the early years of observation and is dependent on an accurate decision as to the best duration of the clearance period.
The advantage of the backcasting method is that it can be used on hospital discharge data alone, it provides more accurate estimates of incidence rates than first-time admission or the use of a clearance period, and does not require any data to be discarded, which is particularly important when measuring trends over time. The backcasting method also avoids the delays and costs associated with searching through medical records and dependence on alternative data sets. The backcasting method using annual files and individual data produced very similar results and thus either could be used, although the method for annual files is the simplest.
Data on the incidence rate of diabetes in Australia and around the world are scarce. The crude community-based incidence rate in the US in 1994 was reported to be 3.7 per 1000 person-years (py), compared with 5.6/1000py in the Canadian province of Manitoba in 1991 and 2.8/1000py in 19911995 in Skaraborg, Sweden.2931 Our crude estimate of the hospital-based incidence rate rose from 1.3/1000py in 1991 to 2.5/1000py in 1995. (The prevalence of diabetes in Australia is below the average for developed countries.32) Inevitably the Western Australian figures obtained from linked hospital morbidity data have not included patients with mild diabetes mellitus, who were never admitted to hospital for their condition. Nevertheless, the results suggest that either the incidence rate of moderate to severe diabetes has increased in the population or the risk of admission to hospital or the propensity for the diagnosis of diabetes to be recorded on hospital discharge abstracts has increased.
The validity of the backcasting method for removing the prevalent pool effect is demonstrated by the myocardial infarction data in Figure 4. The hospital morbidity data tend to overestimate the incidence of myocardial infarction. This is due to the more conservative definition of myocardial infarction using MONICA criteria compared with clinical practice. The percentage of cases coded by MONICA as not acute myocardial infarction is less than 10% and remains fairly constant over time. Nevertheless, the prevalent pool effect is clearly visible in the earlier years. Application of the backcasting method removed the prevalent pool effect such that the percentage overestimation of the incidence of myocardial infarction when using the hospital morbidity system became approximately constant at around 15% until 19911993 when this dropped to 10%. In monitoring trends, the validity of relative changes in rate are also an important measure.
The accuracy of the incidence estimates presented will be effected by migration from and to WA. This averaged 3.5% from and 2.6% into WA during 19901998 and was most common in the younger age groups (2540 years).33 In the future, linkage of the State Electoral Roll to the WA Linked Database should enable more accurate estimates of the effects of migration into and out of Western Australia. The accuracy of incidence measures from hospital morbidity data are also affected by trends in hospitalization patterns of patients and the propensity to record accurate information on discharge abstracts. It is important that the results of this study are interpreted in the light of changing patterns of hospital care and changing quality and completeness of morbidity data. Nevertheless, provided patterns of hospital care and the performance of information systems can be assumed to have remained fairly steady, incidence rates based on linked administrative data are reasonably construed as a reflection of underlying disease occurrence in the population.
We have quantified the effect of the introduction of case-mix funding on diabetic admissions by applying a Cox regression model to look at the effect of calendar time on time to readmission while controlling for age and sex. This showed that the risk of readmission in the 1990s was up to 15% higher than in the 1980s, suggesting that some of the increase in the incidence in hospitalization was due to the increasing proportion of patients admitted or identified in comparison to earlier years. It follows that the backcasting method is likely to overestimate the size of the prevalent pool effect and underestimate incidence in the earlier years. The size of this effect can be estimated by running the backcasting model on data for 19801990 only, and this suggests that the level of underestimation is less than 3%.
The advantages of this method are its ease of use with routinely collected data and the relatively low cost of applying it in comparison to community surveys or maintaining formal disease registers. The availability of primary medical care data or data on pharmaceutical prescriptions would make a further substantial contribution to improving our estimates of the numbers of incident cases. In this paper we have illustrated the method using the example of diabetes mellitus, but it can be generalized to other chronic conditions that require hospitalization, for example, severe mental illness such as schizophrenia, cerebrovascular disease, peripheral vascular disease, chronic obstructive pulmonary disease, Parkinsons disease, multiple sclerosis, and rheumatoid arthritis; and to other applications of linked data in the field of health services research such as the study of trends in first-time health care procedures and pharmaceutical prescriptions.
KEY MESSAGES
|
![]() |
Appendix 1 |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
1 S(t) = probability(admission between 0 and t)
Probability(admission between 0 & tj) = 1 S(tj) where tj is start of the study observation period
Probability(admission between 0 & tf) =*1 S(tf)
Probability(admission between tj & tf = [1 S(tf)] [1 S(tj)] =*S(tj) S(tf)
Probability(admission between tj & tf but not 0 & tj) = S(tj) S(tf)
j = Probability(first admission) = 1 (the probability of admission between tj and tf but not 0 and tj)
= 1 [Probability(admission between tj and tf but not 0 and tj)/Probability(no admission between 0 and tj)]
= 1 [Probability(admission between tj and tf but not 0 and tj)/Probability(no admission between 0 and tj)
= 1 {[S(tj) S(tf)]/[1 (1 S(tj))
= 1 {[S(tj) + S(tf)]/S(tj)}
j = S(tf)/S(tj) for tj < tfEquation (3a
)
![]() |
Appendix 2 |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
![]() |
Appendix 3 |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
(tf) at 13 years = 0.403
(j) at 5 years = 0.461
Cj = (tf)/
(j)
Cj = 0.403/0.461 = 0.874
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Hobbs M, McCall M. Health statistics and record linkage in Australia. J Chronic Dis 1970;23:37581.[CrossRef][ISI][Medline]
3 Gill L, Goldacre M, Simmons H, Bettley G, Griffith M. Computerised linking of medical records: methodological guidelines. J Epidemiol Community Health 1993;47:31619.[Abstract]
4 Melton LJ III. History of the Rochester Epidemiology Project. Mayo Clin Proc 1996;71:26674.[ISI][Medline]
5 Kendrick S, Clarke J. The Scottish Record Linkage System. Health Bull 1993;51:7279.
6 Roos NP, Black CD, Frohlich N et al. A population-based health information system. Medical Care 1995;33(12 Suppl.):DS1320.[ISI][Medline]
7 Chamberlayne R, Green B, Barer ML, Hertzman C, Lawrence WJ, Sheps SB. Creating a population-based linked health database: a new resource for health services research. Can J Public Health. Revue Canadienne de Sante Publique 1998;89:27073.[ISI]
8 Holman CDJ, Bass AJ, Rouse IL, Hobbs MS. Population-based linkage of health records in Western Australia: development of a health services research linked database. Aust NZ J Public Health 1999;23:45359.[ISI][Medline]
9 Smith M, Newcombe H. Automated follow-up facilities in Canada for monitoring delayed health effects. Am J Public Health 1980;70:126168.[Abstract]
10 Brown MH, Weinberg M, Chong N, Levine R, Holowaty E. A cohort study of breast cancer risk in breast reduction patients. Plastic & Reconstructive Surgery 1999;103:167481.[ISI][Medline]
11 Morgan CL, Ahmed Z, Kerr MP. Health care provision for people with a learning disabilityRecord-linkage study of epidemiology and factors contributing to hospital care uptake. Br J Psychiatry 2000; 176:3741.
12 Sorensen HT, Thulstrup AM, Blomqvist P, Norgaard B, Fonager K, Ekbom A. Risk of primary biliary liver cirrhosis in patients with coeliac disease: Danish and Swedish cohort data. Gut 1999;44:73638.
13 Brameld KJ, Holman CDJ, Thomas MAB, Bass AJ. Use of a state data bank to measure incidence and prevalence of a chronic disease: end-stage renal failure. Am J Kidney Dis 1999;34:103339.[ISI][Medline]
14 Goldacre M, Shiwach R, Yeates D. Estimating incidence and prevalence of treated psychiatric disorders from routine statistics: the example of schizophrenia in Oxfordshire. J Epidemiol Community Health 1994;48:31822.[Abstract]
15 The Nova ScotiaSaskatchewan Cardiovascular Disease Epidemiology Group. Trends in incidence and mortality from acute myocardial infarction in Nova Scotia and Saskatchewan 1974 to 1985. Can J Cardiology 1992;8:25358.[ISI]
16 McLean M, Duclos P, Jacob P, Humphreys P. Incidence of Guillain-Barre Syndrome in Ontario and Quebec, 19831989, using hospital service databases. Epidemiology 1994;5:44348.[ISI][Medline]
17 Huff L, Bogdan G, Burke K et al. Using hospital discharge data for disease surveillance. Public Health Rep 1996;111:7881.[ISI][Medline]
18 Primatesta P, Goldacre MJ. Crohns disease and ulcerative colitis in England and the Oxford record linkage study area: a profile of hospitalized morbidity. Int J Epidemiol 1995;24:92228.[Abstract]
19 Toniolo P, Pisani P, Vigano C, Gatta G, Repetto F. Estimating incidence of cancer from a hospital discharge reporting system. Rev Epidemiol Sante Publique 1986;34:2330.[ISI][Medline]
20 Kaplan EL, Meier P. Non-parametric estimation from incomplete observations. J Am Stat Assoc 1958;53:45781.[ISI]
21 Royston P, Altman D. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with discussion). Appl Stat 1994;43:42967.[ISI]
22 National Coding Centre. Australian Version of the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). Tabular List of Diseases. First edn. Sydney, Australia: National Coding Centre, 1995.
23 Chancellor AM, Swingler RJ, Fraser H, Clarke JA, Warlow CP. Utility of Scottish morbidity and mortality data for epidemiological studies of motor neuron disease. J Epidemiol Community Health 1993;47:11620.[Abstract]
24 Koobatian TJ, Birkhead GS, Schramm MM, Vogt RL. The use of hospital discharge data for public health surveillance of Guillain-Barre syndrome. Ann Neurol 1991;30:61821.[ISI][Medline]
25 Tyndall RM, Clarke JA, Shimmins J. An automated procedure for determining patient numbers from episode of care records. Med Inform 1987;12:13746.[ISI]
26 Glatthaar C, Whittall DE, Welborn TA et al. Diabetes in Western Australian children: descriptive epidemiology. Med J Aust 1988; 148:11723.[ISI][Medline]
27 Heliovaara M, Reunanen A, Aromaa A, Knekt P, Aho K, Suhonen O. Validity of hospital discharge data in a prospective epidemiological study on stroke and myocardial infarction. Acta Med Scand 1984; 216:30915.[ISI][Medline]
28 Kokmen E, Beard CM, OBrien PC, Kurland LT. Epidemiology of dementia in Rochester, Minnesota. Mayo Clin Proc 1996;71:27582.[ISI][Medline]
29 Blanchard J, Ludwig S, Wajda A et al. Incidence and Prevalence of Diabetes in Manitoba, 19861991. Diabetes Care 1996;19:80711.[Abstract]
30 Berger B, Stenstrom G, Sundkvist G. Incidence, prevalence and mortality of diabetes in a large population. A report from the Skaraborg Diabetes Registry. Diabetes Care 1999;22:77378.[Abstract]
31 Centers for Disease Control and Prevention. Trends in the prevalence and incidence of self-reported diabetes mellitusUnited States, 19801994. MMWR 1997;46:101418.[Medline]
32 King H, Aubert RE, Herman WH. Global Burden of Diabetes, 19952025. Diabetes Care 1998;21:141431.[Abstract]
33 Australian Bureau of Statistics. Migration. Canberra: Australian Bureau of Statistics, 2000.