Estimation of the incidence of stroke using a capture-recapture model including covariates

Kate Tillinga, Jonathan AC Sterneb and Charles DA Wolfea

a Department of Public Health Sciences, King's College London, Capital House, 42 Weston Street, London SE1 3QD, UK.
b Department of Social Medicine, University of Bristol, Canynge Hall, Whiteladies Road, Bristol BS8 2PR, UK.

Dr Kate Tilling, Department of Public Health Sciences, 5th Floor, Capital House, 42 Weston Street, London SE1 3QD, UK. E-mail: kate.tilling{at}kcl.ac.uk

Abstract

Background Capture-recapture is often used to assess completeness of a register. However, the usual two-source model relies on assumptions of independence of sources and equality of capture probability which are rarely satisfied in epidemiology. An alternative is to include covariates in capture-recapture models.

Methods We use capture-recapture models including covariates to estimate incidence of stroke in South London. We estimate ascertainment-adjusted age-standardized incidence rates, and calculate confidence intervals for incidence which allow for the uncertainty in estimation of the total number of cases.

Results The crude capture-recapture model (including no covariates) underestimated the number of non-fatal strokes. Demographic and stroke severity variables were associated with the probability of capture. Including covariates led to more plausible results for fatal and non-fatal strokes, and suggested that the stroke register was 88% complete. Adjusting for under-ascertainment increased the estimated incidence from 1.31 (95% CI : 1.21–1.42) to 1.49 (95% CI : 0.38–2.60) per 1000 people.

Conclusions Incidence and age-standardized incidence can be calculated using data from an incomplete register. However, sparse strata can lead to wide confidence intervals for adjusted rates. Cost-effectiveness of routine registers might be increased by using the combination of sources and covariates which most accurately estimates the total number of cases, rather than by aiming for 100% completeness.

Keywords Stroke, incidence, registries, epidemiological methods

Accepted 11 April 2001

Registers have been used to study the incidence of diseases, particularly cancer, for which national registers provide routine data. They may be used to inform health policy and purchasing, to predict the public health impact of a disease, and to identify factors associated with incidence. Comparisons between different areas may also be used to identify ecological differences that might be associated with incidence.

These uses require that a register accurately estimates the number of individuals with the disease (cases). It should either be 100% complete, or provide an estimate of the number and type of cases missed. One method for estimating the number of cases missed is capture-recapture,1 a technique originally developed to estimate the size of animal populations. At one time as many animals as possible in a given area are caught, tagged and released (‘capture’), then at a later time this is repeated (‘recapture’). The number of animals in each sample, and the number common to both are used to estimate the population size. This simple technique was first used to estimate the size of a human population in the 1940s,2 and became more widespread following the work of Wittes in the 1970s.3–5 In epidemiology the two sources are often lists, for instance hospital records and death certificates. The methodology has been extended to include more than two sources.1

The two-source capture-recapture estimate is appealing because of its simplicity and ease of calculation (most often using log-linear models). However, two important and related assumptions; independence of sources and equal catchability, are made in estimating the total number of cases. These assumptions are unlikely to be satisfied in epidemiological applications. For example, consider a cancer registry with death certificates and hospital records as the two sources of notification. Severe cases are more likely to have cancer recorded on both death certificates and hospital records, so there will be positive dependence between these sources. Also, severe cases are more likely to be captured than less severe cases.

Solutions include stratifying by the variable(s) which may cause dependence between sources (e.g. case severity) and performing a separate capture-recapture analysis for each stratum,2 or including the covariate in a log-linear model.6,7 If there is one categorical covariate then these two approaches are identical. Stratification by many variables decreases sample size and thus increases variation of the estimate for each stratum, especially if strata are sparse. A recently developed approach is to use a multinomial logit model to relate patient characteristics to probability of capture.8 In simulations, inclusion of capture-related covariates improved the accuracy of the estimate of the total number of cases.9

In this paper we use likelihood ratio tests to select the covariates to be included in a multinomial logit capture-recapture model. We use the parameter estimates from the model to estimate the sizes of population subgroups and thus calculate age-standardized incidence rates. We also derive confidence intervals for incidence which allow for the uncertainty in estimating the total number of cases.

Methods

The South London Stroke Register
The South London Stroke Register (SLSR) has recorded first strokes in a geographically defined area of South London since January 1995. The methods used to register cases have been described elsewhere,10 and include the use of 14 different sources of notification of cases.

The research team made an active search (‘hot’ pursuit) for cases by: twice weekly telephone contact with all wards (e.g. acute medical, elderly care) in St Thomas', Guy's, King's and St George's Hospitals likely to admit stroke patients (hereafter referred to as notification source ‘Research Team’); 2-monthly examination of death certificates and coroners' reports (‘Death Certificate’); searching records from the brain imaging departments and contacting the Bereavement Officer. Notifications to the research team were also encouraged. These included notification by those carrying out ward rounds in any of the study hospitals (‘Hospital Staff’) and by general practitioners (‘GP’).

The database was searched as each new stroke was registered, to ensure that there were no duplicate registrations. No difficulties in matching between sources were encountered. Initial assessments of each patient and confirmation of an incident stroke were performed by one of the study doctors.

The data analysed here consist of all cases from the first 2 years of the register (1 January 1995 to 31 December 1996). Ethnicity was defined as ‘White’, ‘Black’ (including African-Caribbean, black African) and ‘Other’ (including Asian, Pakistani, Indian, Bangladeshi, Chinese, and other). Only ‘White’ and ‘Black’ subgroups were used to examine incidence rate ratios by ethnicity, due to small numbers of ‘Other’ cases. The population denominators used to calculate incidence of stroke were the 1996 adjusted estimates of the 1991 census data from the Office for National Statistics. Incidence rates were adjusted for age using the standard World population.

Capture-recapture methods
Analyses used the multinomial logit model. Where there are only categorical covariates, this is equivalent to the usual log-linear model including appropriate interaction terms. However, continuous covariates can only be included in the logit model.9

Three-source analyses assumed dependence between all pairs of sources, as this has been shown in both simulation11 and empirical12 studies to result in confidence intervals with the best coverage.

Covariate selection
The selection of covariates to be included in a capture-recapture model has received little attention. One strategy is to include all possible covariates—however, this is impractical unless there are many cases. We used a backwards stepwise procedure (using likelihood ratio tests, with a P-value of >0.2 as the criterion for removing variables from the model13) to eliminate covariates, starting with a full model including all potential covariates.

Table 1Go shows the demographic and stroke severity data considered for inclusion in the capture-recapture analysis. For continuous covariates, we compared estimates of the total number of cases using the linear effect with those derived from the model using quantiles of the covariate and used the quantile with fewest categories consistent with our criterion for change in the log-likelihood. An exception was made for ‘time to notification’ (56% of cases notified within a week, maximum length of time 713 days) for which the categorization chosen a priori was: <2 weeks; 2 weeks–3 months; >3 months.


View this table:
[in this window]
[in a new window]
 
Table 1 Variables considered for inclusion in the capture-recapture models for non-fatal and fatal cases
 
Estimating the size of population subgroups
As described elsewhere,8 the final model was used to estimate probability of capture for each registered case. The total number of cases was estimated by summing the inverse of the probability of capture across all registered cases. Similar logic was used to estimate the sizes of population subgroups. For example, to estimate the total number of men with an incident stroke, the inverse of the probability of capture was summed across all registered men. The estimated sizes of age- and sex-defined subgroups of the population were used to calculate age- and sex-standardized ascertainment-adjusted incidence rates.

Examining the plausibility of the model
Usual estimates of model goodness-of-fit are of little use in capture-recapture because they assess the fit of the model to the observed data rather than to the whole population.

Although several sources of notification are used in the SLSR, the capture-recapture analyses used only those sources notifying substantial numbers of cases. In this case, where the total number of cases registered is greater than the number used to develop the capture-recapture model, model plausibility can be examined by comparing the capture-recapture estimate to the total number of cases registered. If the capture-recapture estimate is less than the total number of cases registered, this indicates that the estimate is implausibly small. Knowledge about the likely range for the total number of cases (e.g. from other studies of incidence, fieldwork information about the likely completeness of the register, etc) was used to assess whether the model estimate of total number of cases was implausibly large. The population was also divided into subgroups (by both demographic and case-severity variables, including variables which defined subgroups of interest but were not included in the model) and the same checks on model plausibility repeated.

Confidence intervals for incidence rates
The usual confidence intervals for incidence rates assume that the number of cases is known. Where the total number of cases is estimated using a capture-recapture model, confidence intervals for incidence rates should include the uncertainty in this estimate. We used imputation methods,14 developed for the analysis of datasets with missing data, to derive more appropriate confidence intervals as follows:

  1. use bootstrap methods to make a new dataset by sampling with replacement
  2. fit the final capture-recapture model to this new dataset
  3. estimate the total number of cases using the capture-recapture model. Use this to estimate the incidence and its standard error, {sigma}i
  4. repeat the above bootstrap procedure M times, giving M estimates of the incidence. Let the variance of these M estimates of the incidence be VB
  5. the total variance is given by:


We used 1000 bootstrap samples to derive confidence intervals.

Results

Of the 616 stroke patients registered during 1995 and 1996, 416 survived to 3 months post-stroke. There were 949 separate notifications of the 616 strokes (Table 2Go), with the research team notifying the largest proportion of patients. The sources of notification differ for non-fatal and fatal strokes (Table 2Go), so fatal and non-fatal cases were analysed separately.


View this table:
[in this window]
[in a new window]
 
Table 2 Number of cases identified by each source of notification
 
Non-fatal strokes
Of the 416 non-fatal strokes, 173 (42%) were identified by two, 8 (2%) by three and 1 (0.5%) by four distinct sources. Figure 1Go shows a Venn diagram of capture by the three most common sources of notification: GP, research team (R) and hospital staff (H), including the eight non-fatal cases registered by other sources. There is very little overlap between notification by the GP and the other two main sources of notification. Those notified by the GP tended to be older and not admitted to hospital.



View larger version (12K):
[in this window]
[in a new window]
 
Figure 1 Capture of 416 non-fatal cases by GP, research team (R) and hospital staff (H)

 
Three-source analysis
A capture-recapture analysis of the 408 cases notified by either the GP, the research team or hospital staff was carried out, using a model including all two-way interactions. This model estimated the total number of non-fatal strokes as 495 (95% CI : 422–825). This estimate is plausible, as it is greater than the known minimum number of non-fatal strokes (416). Because of the lack of overlap between notification by GP and the other two sources, inclusion of covariates in the model led to very small numbers of cases in some population subgroups, and thus unstable estimates of the total number of cases.

Two-source analysis
Only 11 stroke patients were notified by both the GP and the hospital staff, and only 15 by both the GP and the research team. The two-source analysis therefore concentrated on the 350 cases captured by either the hospital staff or the research team. Of these, 30 (9%) patients were notified by hospital staff alone, 181 (52%) by the research team alone, and 139 (40%) by both hospital staff and research team.

The usual capture-recapture method (including no covariates) with these two sources estimates the number of non-fatal strokes as 389 (bootstrap 95% CI : 375–411); a clear underestimate of the true number since 416 non-fatal strokes are known to have occurred. This implies positive dependence between the two sources (i.e. patients notified by the hospital staff are likely to be notified also by the research team).

The relationship between all covariates considered for inclusion in the model and the combination of notification sources is shown in Table 3Go. There was no evidence of a univariate association between any of the demographic variables and the sources of notification. Those notified by the hospital staff only were more likely to be: fully conscious; notified by the GP; independent in ADL (activities of daily living) at 7 days; not in hospital; or have a longer gap from stroke to notification than those notified by the research team.


View this table:
[in this window]
[in a new window]
 
Table 3 Relationships between covariates and sources of notification for 350 non-fatal cases captured by hospital staff or research team (numbers are per cent unless stated otherwise)
 
Using the capture-recapture model including all covariates, the estimated total number of non-fatal strokes was 432 (bootstrap 95% CI : 370–708). The parameters for the final model (after removal of covariates as described in Methods) are shown in Table 4Go. Independence in ADL pre-stroke, no admission to hospital, notification by a GP, short hospital stay and long delay before notification to the register were all associated with decreased likelihood of being notified by the research team. The estimated number of non-fatal strokes from this final model was 477 (bootstrap 95% CI : 379–1144). This estimate is greater than the known number of non-fatal strokes (416), so appears more plausible than the crude estimate of the total number of cases.


View this table:
[in this window]
[in a new window]
 
Table 4 Final model for capture of non-fatal cases (figures shown are the ratios of probability of capture by different sources of notification compared to capture by hospital staff only)
 
To examine model plausibility, we estimated the number of cases in each of the subgroups formed by the demographic and stroke-severity measures considered for inclusion in the model (variables listed in Table 1Go). For all subgroups except patients who were not fully conscious, the estimated number of cases is greater than or equal to the known minimum. For this subgroup, the estimated number of cases (116) is smaller than the known minimum (119).

Fatal strokes
Of the 200 fatal strokes (patients not surviving to 3 months after the stroke), 91 (46%) were identified by one, 78 (39%) by two, 30 (15%) by three and 1 (0.5%) by four distinct sources of notification. Death certificates (DC), research team (R) and hospital staff (H) were the most common sources of notification (Table 2Go). Figure 2Go shows a Venn diagram of capture by these three sources, including 18 cases notified only by sources other than these.



View larger version (12K):
[in this window]
[in a new window]
 
Figure 2 Capture of 200 fatal cases by death certificates (DC), research team (R) and hospital staff (H)

 
Three-source analysis
A capture-recapture analysis of the 182 cases notified by either death certificates, the research team or hospital staff was carried out, using the model including all two-way interactions. This model estimated the total number of non-fatal strokes as 283 (95% CI : 186–985). This is a plausible estimate, as it is greater than the known minimum number of fatal strokes (200). However, this estimate would imply that registration of fatal strokes was only 71% complete, which is low given the extensive efforts at case-finding made by this register.

Two-source analysis
Because there was little overlap between any three sources of notification, a two-source capture-recapture analysis was performed using the two most common sources of notification: death certificates and the research team.

Of the 177 cases identified by either of these two sources, 54 (31%) were identified by the research team alone, 53 (30%) by death certificates alone and 70 (40%) by both sources. Using simple capture-recapture methods with these two sources estimates the number of fatal strokes as 218 (bootstrap 95% CI : 203–238). This is larger than the known minimum number of fatal strokes (200), so is a plausible estimate.

The relationship between each of the variables considered for inclusion in the model and the combination of notification sources is shown in Table 5Go. None of the demographic variables was associated with source of notification. All stroke severity variables (except level of consciousness) were associated with source of notification. The research team was more likely to capture patients surviving longer; without stroke as the first cause of death on the death certificate; who were not first notified at death; who were admitted to hospital; or who had a short delay before notification.


View this table:
[in this window]
[in a new window]
 
Table 5 Relationships between covariates and sources of notification for 177 fatal cases notified by research team or death certificate
 
Using the capture-recapture model including all covariates, the estimated number of fatal strokes was 198 (bootstrap 95% CI : 69–255). The parameters for the final model are shown in Table 6Go. Those with a long survival time were more likely to be captured by the research team, while those with stroke as the first cause of death or a long delay to notification were more likely to be notified by death certificate only. The estimated number of fatal strokes from this final model was 222 (bootstrap 95% CI : 165–331).


View this table:
[in this window]
[in a new window]
 
Table 6 Final model for capture of fatal cases (figures shown are the ratios of probability of capture by different sources of notification compared to capture by death certificate only)
 
For all subgroups defined by the variables considered for inclusion in the model (Table 1Go), except non-hospitalized cases, the estimated number of cases was greater than or equal to the known minimum. The estimated number of non-hospitalized cases (10) is smaller than the true number (18). These cases seem to have a different relationship with probability of capture than hospitalized cases (Table 5Go). In particular, 71% of non-hospitalized cases were captured by death certificate only, compared to 28% of hospitalized cases.

Two-source versus three-source models
For both fatal and non-fatal strokes, use of any three sources of notification led to small numbers of cases being captured by some combinations of sources. The estimate of the total number of non-fatal strokes from the three-source model (495) was similar to that from the two-source model including covariates (477). For fatal strokes, the estimate from the three-source model (283) was implausibly high.

Models with no covariates assume that all cases have equal probability of being captured. Thus, even if models with and without covariates estimate similar total numbers of cases, the estimated numbers in specific subgroups of the population will differ. Prior knowledge about the type of patients likely to be missed by the register suggests that all patients do not have an equal probability of capture. We therefore use the two-source model, including covariates, to estimate the incidence of stroke, as this relies on accurately estimating the numbers in age- and sex-defined subgroups of the population.

Incidence of stroke
The incidence rate in the SLSR population was calculated using the 616 cases registered over the 2-year period. This is shown in Table 7Go, together with the age-standardized rate, and crude and age-standardized incidence rates for men and women, and white and black people. Table 7Go also shows corresponding estimates adjusted for under-ascertainment using the capture-recapture model. Adjusting for under-ascertainment led to a negligible decrease in the incidence rate ratio for men compared to women from the uncorrected estimate of 1.24 (95% CI : 1.1–1.5) to the ascertainment-corrected estimate of 1.23 (95% CI : 0.69–2.19). However, the incidence rate ratio for black compared to white people decreased from the crude estimate of 2.18 (95% CI : 1.8–2.7) to the ascertainment-corrected estimate of 1.93 (95% CI : 1.05–3.54).


View this table:
[in this window]
[in a new window]
 
Table 7 Number of strokes and annual incidence of stroke (per 1000 population) in different subgroups
 
Discussion

Our results imply that the SLSR is approximately 88% complete, and that both demographic and stroke severity characteristics are associated with the probability of notification to the register. For non-fatal stroke, hospitalized cases are more likely to be notified than non-hospitalized cases. This agrees with the a priori view that some of the most difficult cases to identify are ‘non-fatal cases in the community’.10 Increased length of stay in hospital (non-fatal cases) and longer survival time (fatal cases) are associated with increased chance of notification—probably because of the increased chance of the patient being identified by the research team telephone ward enquiry.

Accounting for the under-ascertainment by the register increased the age-standardized incidence rate from 0.83 to 0.96 per 1000 people. However, these methods do not merely estimate a higher number of cases, but cases with a different distribution of characteristics. For example, more white people than black people are estimated as being missed by the register, leading to a slight decrease in the incidence rate ratio for black compared to white people. These methods also provide a sensitivity analysis for the elevated risk in black people, which still remains after adjustment for under-ascertainment.

Model parameters can be used to highlight deficiencies in the register by identifying particular subgroups with low registration rates. For example, the likelihood of notification to the SLSR is lower for those fatal cases where stroke is not the primary cause of death. Registration could perhaps be improved by examining the common causes of death for a sample of such individuals, and extending the search of death certificates to include these additional primary causes of death. Notification of cases by GPs could be increased—in contrast with other incidence studies,15 the percentage of cases notified by GP was low (14%), with only 55 (30%) of 182 practices notifying any patients to the register over the 2-year period. This could be due to the large number of practices in this inner-city area and the difficulty of keeping all practices informed about the register. A regular newsletter is now sent out to all practices, and ‘reminder’ stickers for computer screens are being supplied.

Previous epidemiological applications of capture-recapture methods have mostly been simple two- or three-source examples (using log-linear models), with stratification used to overcome dependence between sources. Models including the effects of covariates on the probability of capture have been used only rarely.16,17 Other epidemiological applications including the effects of covariates have used linear models fitted to the logarithm of the number of cases in each combination of sources7 and log-linear, linear product models.6

Here we examined the effect of inclusion of covariates on the estimate of the number of strokes in a population, using two-source models. For both fatal and non-fatal cases there was evidence of dependence between the two sources. In each case, inclusion of covariates appeared to reduce the bias and resulted in plausible estimates of the total number of cases. Availability of covariate information also allowed us to check that the model produced reasonable estimates of the size of different subgroups.

We selected the final model using backwards stepwise regression based on the likelihood ratio. Standard theory justifying this would be applicable if we had the data on all cases, including the uncaptured cases. Use of this procedure on the incomplete data may give different results because either of loss of power or different characteristics of those not captured. An alternative (motivated by a recommendation of Sekar and Dening2) would be to base model selection on changes to the estimated total number of cases made by covariate exclusion. Such a strategy performed best in selecting covariates for inclusion in a Poisson model when the cut-off for the estimate change was low.13 We have used simulated data to compare the two approaches in the capture-recapture context—results suggested that the likelihood ratio method was a little less biased, and gave rise to confidence intervals with higher coverage, than the method based on changes in the estimated number of cases (details available from the authors).

Because several sources registered very few cases, the capture-recapture models were developed using fewer cases than the actual number registered. This allowed us to compare model estimates and actual numbers in the whole population and in different subgroups of the population to check model plausibility. This procedure might also identify subgroups of the population for whom the model does not fit well, possibly leading to the inclusion of further covariates or interactions in the model.

The confidence intervals surrounding the estimates of the total number of cases from models including covariate effects were generally wide. Large sample sizes may be needed to estimate covariate effects and therefore number of cases with precision. Conventional models will lead to narrower confidence intervals, because they are based on the (often unrealistic) assumption that probability of capture is homogeneous across the entire population. There may be unobserved covariates with stronger relationships to probability of capture, inclusion of which could have increased the precision of the estimates.

Epidemiologists have mostly used capture-recapture only to estimate completeness of a register: ascertainment-adjusted estimates of incidence have been calculated only occasionally. We have shown that by estimating the size of population subgroups, ascertainment-corrected standardized estimates of incidence can be calculated. This is particularly relevant where probability of capture depends on patient characteristics (such as severity of disease), or where the age distribution in the captured cases may be very different to that in uncaptured cases.

We have shown that age-standardized incidence rates and incidence rate ratios can be calculated using an incomplete register. Losses in precision from using an incomplete register could be balanced against the decrease in resource use. It may thus be possible for routine monitoring to take place using a register which does not aim to be exhaustive, but which uses carefully planned sources and covariates to estimate the incidence rates of interest.


KEY MESSAGES

  • Simple capture-recapture methods, often used to estimate completeness of a register, make two unrealistic assumptions.
  • Inclusion of covariates can overcome these assumptions and identify under-represented subgroups of the population.
  • Using these methods, the South London Stroke Register was estimated to be 88% complete, with under-registration of non-hospitalized cases.
  • These methods allow age- and sex-standardized incidence rates to be calculated, even with a register that is not 100% complete.

 

Acknowledgments

Funding was from the Northern and Yorkshire Region Research and Development Programme and a Special Training Fellowship from the Medical Research Council. We thank the South London Stroke Register team for their important contributions.

References

1 Hook EB, Regal RR. Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995;17:243–64.[ISI][Medline]

2 Sekar CC, Deming WE. On a method of estimating birth and death rates and the extent of registration. J Am Statist Assoc 1949;44:101–15.[ISI]

3 Wittes JT, Colton T, Sidel VW. Capture-recapture methods for assessing the completeness of case ascertainment when using multiple information sources. J Chron Dis 1974;27:25–36.[ISI][Medline]

4 Wittes JT. Applications of a multinomial capture-recapture model to epidemiological data. J Am Statist Assoc 1974;69:93–97.[ISI]

5 Wittes J. A Generalization of the simple capture-recapture model with applications to epidemiological research. J Chron Dis 1968;21: 287–301.[ISI][Medline]

6 Hilsenbeck SG, Kurucz C, Duncan RC. Estimation of completeness and adjustment of age-specific and age-standardized incidence rates. Biometrics 1992;48:1249–62.[ISI][Medline]

7 Robles SC, Marrett LD, Clarke EA, Risch HA. An application of capture-recapture methods to the estimation of completeness of cancer registration. J Clin Epidemiol 1988;41:495–501.[ISI][Medline]

8 Alho JM. Logistic regression in capture-recapture models. Biometrics 1990;46:623–35.[ISI][Medline]

9 Tilling K, Sterne JAC. Capture-recapture models including covariate effects. Am J Epidemiol 1999;149:392–400.[Abstract]

10 Stewart JA, Dundas R, Howard RS, Rudd AG, Wolfe CD. Ethnic differences in incidence of stroke: prospective study with stroke register. Br Med J 1999;318:967–71.[Abstract/Free Full Text]

11 Regal RR, Hook EB. The effects of model selection on confidence intervals for the size of a closed population. Stat Med 1991;10:717–21.[ISI][Medline]

12 Domingo-Salvany A, Hartnoll RL, Maguire A et al. Analytical considerations in the use of capture-recapture to estimate prevalence: case studies of the estimation of opiate use in the metropolitan area of Barcelona, Spain. Am J Epidemiol 1998;148:732–40.[Abstract]

13 Maldonado G, Greenland S. Simulation study of confounder-selection strategies. Am J Epidemiol 1993;138:923–36.[Abstract]

14 Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: John Wiley, 1987.

15 Bamford J, Sandercock P, Dennis M et al. A prospective study of acute cerebrovascular disease in the community: the Oxfordshire Community Stroke Project 1981–86. 1. Methodology, demography and incident cases of first-ever stroke. J Neurol Neurosurg Psychiatr 1988; 51:1373–80.[Abstract]

16 Alho JM, Mulry MH, Wurdeman K, Kim J. Estimating heterogeneity in the probabilities of enumeration for dual-system estimation. J Am Statist Assoc 1993;88:1130–36.[ISI]

17 van Charante AW, Mulder PG. Reporting of industrial accidents in The Netherlands. Am J Epidemiol 1998;148:182–90.[Abstract]