Efficiency of two-phase designs for prevalence estimation

R McNamee

Biostatistics Group, School of Epidemiology and Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK. E-mail: rmcnamee{at}man.ac.uk


    Abstract
 Top
 Abstract
 The most efficient two-phase...
 Efficiency compared with one...
 Discussion
 Appendix
 References
 
Background Unbiased estimation of the prevalence of diseases and other conditions is important but can be expensive, especially for conditions which do not necessarily lead to contact with health services. A two-phase population survey may seem an attractive option when there is a relatively cheap, although fallible, test for disease status available: the test is used in the first phase of the survey but in the second, only a subsample are classified by the relatively expensive, gold standard. Previously the cost efficiency of such studies compared with simple, one-phase random sample designs was investigated empirically and some questions remain unclear.

Methods A simple formula for the maximum reduction in cost or standard error that can be achieved by two-phase sampling compared with simple random sampling is derived mathematically. A formula for the minimum reduction is also given and the influence of prevalence on efficiency explained.

Results The main result shows that the sensitivity and specificity of the first stage test set an absolute limit on the efficiency of two-phase designs; in particular, two-phase sampling can never be justified on efficiency grounds alone if the test is not accurate enough.


Keywords Two-phase designs, double sampling, two-stage designs, prevalence estimation, survey design, epidemiology, cost efficiency, statistical efficiency

Accepted 7 May 2003

Where occurrence of a disease or condition does not necessarily lead to clinical diagnosis, a population survey may be considered for prevalence estimation; however, this can be expensive if diagnosis requires clinical assessment. A two-phase survey may seem an economic alternative; first, a random sample of the population is classified by a relatively cheap, but fallible indicator of disease status; then, in the second phase, (stratified) random samples of first phase subjects undergo the ‘gold standard’, diagnostic procedure. Two-phase studies for prevalence estimation are popular in psychiatric research1,2 but have also been employed in studies of asthma,3 disablement,4 and Parkinson’s disease.5

A two-phase study might be considered for reasons besides cost. For example, in the second phase it is possible to target people who are more likely to have the disease; hence such studies may seem to make better use of clinical time1 and/or to be more ethical.6 Also a two-phase study can arise by capitalizing on a previously free-standing survey of symptoms (say) by adding a second phase based on gold standard diagnosis. This paper is concerned only with the efficiency of two-phase designs where the study budget encompasses both phases; it investigates the advantage, in terms of money saved or in terms of increased precision of estimation, of a two-phase design compared with a simple, ‘one-phase’, random sample design.

For a given study budget, there are many two-phase options depending on the balance between first and second phase total sample sizes and the sampling fractions for phase two. The most statistically efficient scheme, for a given cost, will be the one that results in the smallest standard error (SE) for the prevalence estimate. Alternatively, one could fix the SE required and then seek the most cost efficient, i.e. cheapest, two-phase design to achieve this. Formulae specifying the solutions to these problems, which are closely related, have been derived.7 From these, it has long been recognized that even the best (whether in statistical or cost terms) two-phase design can be less efficient than a one-phase design.1,2,8,9 This tends to occur when the cost per subject for the gold standard investigation (c2 monetary units) is not much higher than that for the first phase (c1 monetary units); as a rough rule, when c2/c1 < 5, the design will tend to be inefficient.

In many epidemiological applications, c2/c1 will be larger than this—for example the cost of a clinical interview might be 20 times or more the cost of administering a postal questionnaire—and so it is important to clarify the gains to be made in such cases. However, on this point the literature is not very clear; for example, according to Pickles and Dunn,2 an accurate test is needed (although they do not say how accurate) while others suggest that, if c2/c1 is large enough, the two-phase design ‘can be highly efficient’, even when the first phase indiator is ‘not especially accurate’.8 Deming9 specified a critical value for the negative predictive value of the first phase indicator, below which the two-phase design was not worthwhile, but no proof was given. Most authors conclude that the method is most efficient when prevalence is ‘low’. A problem with all the recommendations is that they appear to have been based on a limited number of examples that may have limited the generalizability of their advice.

This paper aims to provide easily understood, general insights into the efficiency gains from two-phase designs, compared with simple one-phase designs of the same cost. The main result shows that the maximum gain in efficiency is a simple function of the sensitivity and specificity of the first phase indicator; it emerges that, no matter how high the value of c2/c1, a two-phase design can never be very efficient if sensitivity and specificity are not high.


    The most efficient two-phase design
 Top
 Abstract
 The most efficient two-phase...
 Efficiency compared with one...
 Discussion
 Appendix
 References
 
Consider the following two-phase design. In phase one, n subjects are chosen at random from the target population and classified into J (>=2) ordered categories using the first phase indicator. The second phase indicator (gold standard) is assumed to be completely accurate; it classifies as either disease absent or present. The total number of subjects in phase two is m < n; these are chosen by stratified random sampling of first phase subjects, with the sampling fraction for the jth category being vj, j = 1,...J. The costs per subject of first and second phase classifications are c1 and c2 monetary units respectively with c2 > c1; the total study cost is C = nc1 + mc2.

Let P = true prevalence of the disease in the target population,

Wj
probability that a subject is classified to jth category in first phase, and

Pj
probability that disease is present, given the first phase category is j.

Then P = {sum}WjPj. An estimate of P, say, can be found by estimating {Wj} and {Pj} from the first and second phase data sets respectively.2,7,10

Now consider the problem where there is a fixed study budget available, CF, and we want to find the design which gives the smallest SE for within that budget; Cochran7 has shown that this is achieved by setting:


(1)

and

The standard error of is then7:


(2)

An alternative but related problem is to fix the size of the SE required, SEF say, in advance and then look for the design which achieves this at minimum cost. The solution to this problem has the same sampling fractions (1); the minimum study cost is


and

Frequently the first phase indicator will have only two categories, ‘test negative’ (j = 1) and ‘test positive’ (j = 2). For simplicity, this is assumed in the remainder of the paper. Then P2 is the probability of disease being present given a positive test result, i.e. the positive predictive value of the test, while 1 – P1 is the negative predictive value. Both these parameters depend on prevalence and the specificity, S1, and sensitivity, S2, of the test, where:

S1
specificity = probability that the test is negative, given that disease is absent, and

S2
sensitivity = probability that the test is positive, given that disease is present.

In what follows, use of S1 and S2, instead of P1 and P2, will tend to lead to simpler results.

There are circumstances where the ‘optimal’ design described above cannot be implemented because (1) would suggest that either, or both, sampling fractions be greater than 1. To complete the specification of optimal two-phase design, additional rules are needed for what to do if this occurs. An extension of the rules for optimal design, covering this situation, is given in the Appendix. These additional rules will lead to either one, or both, fractions being fixed at unity, and to new formulae for min SE2-phase. To simplify the presentation here, the results of the next section are initially derived assuming that (1) always leads to 0 < v1 < 1 and 0 < v2 < 1. A discussion of how the results can be generalized beyond this restriction then follows, but the detailed justification is confined to the Appendix. It is also assumed hereafter that P <= 0.5 and S1 + S2 > 1. This last condition ensures that there is a positive correlation, {rho}, between test and gold standard classifications. Tests for which this is true are said to be ‘legitimate’;11 an ‘illegitimate’ test is unlikely to be of interest.


    Efficiency compared with one-phase design
 Top
 Abstract
 The most efficient two-phase...
 Efficiency compared with one...
 Discussion
 Appendix
 References
 
Suppose that, instead of a two-phase study, the budget, CF, is devoted to a ‘one-phase’, simple random sample design based on the gold standard classification only. This budget would allow a sample size of CF/c2 and prevalence would be estimated as a simple proportion whose SE would be {surd}[CF-1c2 P(1 - P)]. The advantage of an optimally designed two-phase study with 0 < v1,v2 < 1 over a one-phase design, is found by comparing (2) with this formula. The ratio of the smallest two-phase SE, to the one-phase SE based on the same budget, is:


(3)

Alternatively, one could compare the minimum cost of a two-phase design chosen to achieve a given degree of precision, SEF, with the cost of a one-phase design giving the same SE. It can be shown that this ratio, minC2-phase/C1-phase, is equal to the square of (3).

Equation (3)Go is much simpler if the following substitutions are made: W1P1 = P(1 - S2), W1(1 - P1) = (1 - P)S1, W2P2 = PS2, W2(1 - P2) = (1 - P)(1 - S1). As shown in the Appendix, this gives


(4)

where {rho} = Pearson correlation between the true classification and the test classification. This formula could be used directly to calculate the benefits from two-phase design for a given combination of prevalence, c2/c1 and sensitivity and specificity. To facilitate this, a formula for {rho} in terms of P, S1, and S2 is given in the Appendix. However, a general understanding of the benefits of two-phase design can be gained by examining, first an upper bound, and then a lower bound for (4), as is now shown.

The ratio (4) decreases as c2/c1 increases, provided P, S1, and S2 stay fixed; this confirms the well-known observation that the advantage of two-phase over one-phase designs with the same budget increases with the relative cost of the gold standard. Furthermore, the last term in (4) will be zero when c2/c1 is equal to infinity; therefore the SE ratio cannot be less than


This result can be expressed as an upper bound on the SE reduction:


(5)

In the Appendix it is proven that this result also holds true for optimal designs where either v1 = 1 or v2 = 1. (The extreme case where v1 = 1 and v2 = 1 is discussed separately later). The expression on the right of (5) is the maximum reduction in SE that can be achieved by two-phase design; no two-phase study can ever improve on this. It is shown in Table 1Go for various values of sensitivity and specificity. The reduction is slight in many cases; for example when S1 = S2 = 0.7, it is only 8%. In these cases, inspection of this Table may be all that is needed to convince that there is little to be gained from two-phase sampling.


View this table:
[in this window]
[in a new window]
 
Table 1 Maximum reductiona in standard error of prevalence using an optimally designed two-phase study compared with a one-phase design of the same cost
 
Where this initial calculation is encouraging, more specific information about the expected reduction, taking account of the cost ratio c2/c1, will be useful. The reduction in SE will also depend on the unknown prevalence, P, but further insights can be gained without having to estimate P. This is done by calculating an upper bound for (4) which does not depend on P. Consider the correlation {rho} in the second term of (4); for fixed S1 and S2, this varies with P but its theoretical maximum, {rho}max across the range 0 < P < 1 can be deduced mathematically. For example, when sensitivity and specificity are equal, {rho}max = S1 + S2 – 1. (The more general formula for when S1 != S2 is given by (A3) in the Appendix.) Hence the maximum of (4) for a given test and costs, but regardless of P, is


This upper bound is based on the assumption that v1 < 1 and v2 < 1. Where this assumption is relaxed to include situations where optimal design leads to either v1 = 1 or v2 = 1, the result is not valid in all cases. However, empirical work (discussed later) suggests that it is valid in all cases where it predicts a reduction in SE from two-phase sampling. Therefore we suggest the following rule for predicting the ‘worst case’ reduction from two-phase sampling:


If this is less than one, then it is the maximum value of (4). This result can also be expressed as a minimum reduction in SE from two-phase sampling for a given test and costs:


(6)

If it is greater than one, two-phase design may increase the SE compared with a one-phase study of the same cost; however the formula is not accurate for predicting the maximum increase.

To illustrate the validity and use of the lower and upper bounds, SE reductions were found from (4)—or its equivalents in the Appendix for cases where v1 = 1 or v2 = 1—for three hypothetical tests and a range of prevalences and costs (Table 2Go). The most favourable result for each test was calculated from (5). The worst case reduction for each test and a given cost ratio c2/c1, was found from (6); this is shown on the last line of each sub-Table. If the worst case is an increase in SE from two-phase design, only the fact of an increase is stated.


View this table:
[in this window]
[in a new window]
 
Table 2 Percentage reduction in standard error of prevalence from a optimally designed two-phase study compared to a one-phase design; also maximum and minimum reductions
 
In Table 2aGo, where S1 = S2 = 0.95, the maximum possible reduction in SE is an encouraging 56%; further calculation, taking into account costs, is warranted. If the cost ratio for this test was 20 then, from (6), a minimum reduction of 36% is expected regardless of prevalence. One could go on to calculate the expected reduction for a specific estimate of P—as has been done for the Table—but these bounds may be informative enough. In Table 2bGo, the test is less sensitive (0.7) and the maximum reduction is now 28%. However, equation (6)Go suggests that with a cost ratio as low as five for example, two-phase sampling could lead to higher SE than one-phase sampling with the same budget. This is borne out by the exact calculations that show increases at higher prevalences. The final case is of a test where both sensitivity and specificity are low (0.7). No further calculation beyond this ‘best case’ figure of 8% may be warranted here, but if (6) were used nevertheless, one would find, for example, that even when c2/c1 = 20, two-phase sampling can be less efficient than a single phase design.

Equation (6)Go and related guidance have not been proven mathematically for the case where v1 = 1 or v2 = 1; however, they are supported by 2970 calculations of (4) and (6) in which c2/c1 ranged from 2 to 500, P from 0.001 to 0.6 and S1 and S2 from 0.1 to 0.95, with S1 + S2 > 1. The guidance was correct in all cases. Its validity is discussed further in the Appendix.

Inspection of the columns in Table 2Go gives some insight into how two-phase SE reductions vary with prevalence, but does not tell the whole story. In general, the SE reduction diminishes as P increases up to a certain point, but then increases as P increases beyond this point (Appendix). When S1 and S2 are approximately equal (Tables 2aGo and 2cGo), the point of change is around P = 0.5. However, if specificity (S1) is greater than sensitivity (S2), but both are in the range 0.5 to 1, the change point will be at a lower value of P; e.g. it is at P = 0.32 for the data in Table 2bGo. The most extreme case is when S1 is close to one and S2 is close to 0.5; e.g. if S1 = 0.99, S2 = 0.5, the change point is around P = 0.17. The SE reduction will increase with P beyond this point. Formulae (5) and (6) are valid regardless of these subtleties.

Two extreme scenarios complete this overview. The first is the important case where ‘optimal’ two-phase design leads to v1 = v2 = 1, implying that every subject is assessed by both the gold standard and the test. This outcome can arise from a combination of a poor test, P close to 0.5 and low c2/c1; one example occurs in Table 2cGo. Of course a two-phase prevalence study in which everyone is measured by test and gold standard is wasteful; the proper action would be to revert to a one-phase design based on the gold standard only. If the two-phase design were followed nevertheless, it can be shown (Appendix) that this would result in an SE which is {surd}(1 + c1/c2) times that from a one phase study with the same budget.

The second extreme is the improbable case of a perfect test with S1 = S2 = 1, and hence {rho} = 1. Application of formula (1) would lead to the correct conclusion that no second phase data is needed, i.e. v1 = v2 = 0: in effect, the ‘two-phase’ study becomes a one-phase study using only the perfect test. Equation (4)Go leads to the correct conclusion that the ratio of SEs based on the ‘test’ alone versus the ‘gold standard’ alone, given the same budget, is {surd}(c1/c2).


    Discussion
 Top
 Abstract
 The most efficient two-phase...
 Efficiency compared with one...
 Discussion
 Appendix
 References
 
Formulae have been derived, based on a simple model of study costs, whereby researchers can rapidly and easily derive the maximum and minimum benefits of two-phase designs for efficient prevalence estimation. More sophisticated two-phase models, allowing for different refusal rates and incorporating other study costs might have been used but it seems unlikely that this would change the main message: a two-phase design can never be very efficient if sensitivity and specific are low. For example Deming,9 commenting on the administrative burdens of two-phase and one-phase surveys, noted that the former might be more difficult to carry out and have greater refusal rates, since some subjects have to be approached twice. Furthermore, if the disease is serious, non-response in the second phase might be due to illness or death of cases, so that response bias is also more likely. Such factors would tend to undermine any purely economic benefit.

Since implementation of an optimal scheme requires estimates of prevalence, sensitivity, and specificity, inaccurate estimates may lead to a less than optimal scheme being implemented.9 A small theoretical reduction in cost or SE might therefore evaporate in practice. To offset this problem, one might therefore insist on a large theoretical benefit before considering a two-phase design as likely to being efficient in practice. For example, if we insist on a theoretical reduction in SE of at least 30% then, roughly speaking, only tests where the sum of sensitivity and specificity exceeds 1.7 qualify (Table 1Go); an exception is when one index is very high, say 0.99 or more, when a sum of 1.6 is adequate. The General Health Questionnaire (GHQ) has sensitivity and specificity in the ranges 0.55–0.92 and 0.80–0.99 respectively depending on the health outcome;12 from these figures alone one would expect the efficiency of a two-phase study using the GHQ to vary widely from one outcome to another.

Deming’s view, that the critical factor for two-phase efficiency is the negative predictive value,9 can be partly explained by the fact that he worked with predictive values rather than specificity and sensitivity. Furthermore, in all his examples specificity, although not calculated, was always in the range 0.97–0.99, while prevalence was either 0.1 or 0.2. If specificity and prevalence have fixed values, then high negative predictive value is synonymous with high sensitivity. Also, it is easy to show that Deming’s rule of thumb—that for efficiency, negative predictive value must be greater than 1 – P/4—is equivalent to insisting on sensitivity > 0.75. Deming’s rule, deduced from his limited range of data, could therefore be construed as insistence on sensitivity + specificity being at least 1.7. This is compatible with the present work which, however, has more general validity.

Pickles and Dunn2 have noted that the appeal of two-phase studies may lie not in efficiency, but in the ability to sample all those ‘test positive’, but a small fraction of ‘test negative’ subjects. Another advantage of a two-phase design is that it can simultaneously measure prevalence and the performance (i.e. sensitivity and specificity) of a new test for disease status.10 This was done in a study of childhood asthma3 where a symptom questionnaire of unknown sensitivity and specificity for asthma was used in the first phase of a two-phase study which went on to measure prevalence and sensitivity and specificity.

If efficient estimation of prevalence is the main objective in design, then other options may be considered. Firstly, if there is a cheap test of disease status available with known S1 and S2, one could use this test alone to estimate prevalence, since


where W2 is the proportion who are ‘test positive’. To employ this approach, one would have to be sure that estimates of S1 and S2 made in one context could be safely carried over to another; also allowance would have to be made for any sampling error in the estimates of S1 and S2. Thus this approach may not be as efficient as it seems, and may have very limited applicability. Secondly, if several (preferably three or more) population-wide but imperfect systems for recording cases exist already, it may be possible to use capture-recapture methods13 to allow for under-ascertainment and thus to estimate prevalence.


KEY MESSAGES

  • Regardless of costs, two-phase designs are not justifiable on the grounds of economy or cost efficiency alone, unless the sensitivity and specificity of the first phase classification are sufficiently high.
  • In practice, a simple random sample design will usually be superior when the sum of sensitivity and specificity is less than 1.6.

 


    Appendix
 Top
 Abstract
 The most efficient two-phase...
 Efficiency compared with one...
 Discussion
 Appendix
 References
 
Proof of (4) and evaluation of {rho}max
The first part of (4), not involving costs, is easily shown by making the substitutions given in the text. For the second, consider a random sample of subjects classified by the test (X) and by the gold standard (Y) into categories 0 or 1. The Pearson correlation, {rho}, between X and Y is


(A1)

In equation (3)Go the second term contains


hence the whole of the second term in (3) is


The following formula for {rho}, in terms of P, S1 and S2 only, may be more convenient than (A1):


(A2)

where


For fixed S1 and S2, {rho} varies with P; to find its maximum value, {rho}max, for fixed S1 and S2, standard calculus methods are applied to (A2). In the case where S1 = S2, it turns out that {rho}max = {alpha} which occurs when P = 0.5. If S1 != S2, it can be shown that


which occurs when


For fixed S1 and S2, {rho} increases with P up to {rho}max, then decreases. Unless S1(1-S1) and S2(1-S2) are very different, the value of P at the maximum will be in the range 0.3–0.7.

(5) and (6) when either v1 = 1 or v2 = 1 (not both)
Cochran7 suggested that where (1) leads to vj > 1 for some j, those vj be set equal to 1; new formulae for the optimal values of the remaining fractions, subject to this restriction, then have to be derived. If J = 2, and we set v2 = 1, it can be shown6 that the optimum value of v1 is


(A4)

and the corresponding minimum SE, given a fixed budget CF, is:


(A5)

If instead, v1 is fixed at 1, the optimum value of v2 and the corresponding minimum SE can be found from (A4) and (A5) respectively, but with W1 and W2 interchanged, and P1 and P2.

If (1) leads initially to both v1 > 1 and v2 > 1, the following modification to Cochran’s suggestion is recommended: set the larger v equal to one, and use formula (A4)Go or its equivalent, to find a new value for the other fraction. If this second step also leads to a fraction of one or more, then both fractions are set equal to one. In empirical work by the author, this modification always led to smaller two-phase SE compared to if it was not used. The two-phase optimality process tends to result in v1 < 1, v2 = 1 when P is very low, or c2/c1 is low, or the test accuracy is low. This pattern of occurrence can be seen in Table 2Go and can be deduced by rewriting (1), for j = 2, as


In such cases, the ratio of the two-phase SE to that for a one-phase design is, from (A5):


(A6)

The latter inequality follows from application of the Cauchy-Schwarz inequality. If instead, v1 = 1, v2 < 1, the relevant formula is (A6'), which is (A6)Go with S1 and S2 interchanged and W1 and W2 interchanged. In practice this outcome is very unlikely when P < 0.5.

Thus a full description of the ratio of SE from optimal two-phase and one phase designs—excluding the case where v1 = 1, v2 = 1 which is treated separately later—is as follows:


(A7)

Since both (A6) and (A6') are greater than (4), the SE ratio can

never be smaller than the lower bound,


derived for (4) alone. Hence equation (5)Go, which describes the maximum reduction in SE using two-phase studies, is valid for the whole range in (A7).

A simple algebraic expression for the maximum of (A7) with respect to P, for a given test and c2/c1, has not been found. It was initially conjectured that the maximum of (4) might also be the maximum of (A7). Consider (A6) and (4): while it is true that (A6) >= (4) for all P, (A6) tends to be applicable only for small values of P, and both functions are smallest when P is small. Thus it was conceivable that the highest applicable value of (A6) might be less than the highest value of (4). Empirical work (see main text) supported the conjecture in all cases where the true maximum is less than one, but not always otherwise—hence the proviso in (6) for these cases. In examples where v1 = 1, v2 < 1, (A6') > 1 always, but so was the maximum of (4) for test/cost combinations where this arose; hence the proviso in (6) also allowed for this.

Ratio of minimum two-phase and one-phase SE when v1 = 1 and v2 = 1
Application of the rules for optimal two-phase design may give v1 = 1 and v2 = 1, implying that the first and second phases have the same size (m = n). This corresponds to a study with a cost per subject of c1 + c2; the corresponding SE, given a fixed budget CF, would be:


Comparing this with a one-phase design, we have:



    References
 Top
 Abstract
 The most efficient two-phase...
 Efficiency compared with one...
 Discussion
 Appendix
 References
 
1 Dunn G, Pickles A, Tansella M, Vazquez-Barquero J. Two-phase epidemiological surveys in psychiatric research. Br J Psychiatry 1999; 174:95–100.[ISI][Medline]

2 Pickles A, Dunn G. Prevalence of Disease, Estimation from Screening Data. Encyclopaedia of Biostatistics. Chichester: Wiley, 1998.

3 Frank TL, Frank PI, McNamee R et al. Assessment of a simple scoring system applied to a screening questionnaire for asthma in children aged 5–15 years. Eur Respir J 1999;14:1190–97.[Abstract/Free Full Text]

4 Tennant A, Badley EM. Investigating non-response bias in a survey of disablement in the community—implications for survey methodology. J Epidemiol Community Health 1991;45:247–50.[Abstract]

5 Bermejo F, Gabriel R, Vega S, Morales JM, Rocca WA, Anderson DW. Problems and issues with door-to-door, two-phase surveys: an illustration from Spain. Neuroepidemiology 2001;20:225–31.[CrossRef][ISI][Medline]

6 Shrout P, Newman SC. Design of Two-Phase Prevalence Surveys of Rare Disorders. Biometrics 1989;45:549–55.[ISI][Medline]

7 Cochran WG. Sampling Techniques. New York: Wiley, 1977.

8 Newman S, Shrout, P, Bland R. The efficiency of two-phase designs in prevalence surveys of mental disorders. Psychol Med 1990;20:183–93.[ISI][Medline]

9 Deming WE. An essay on screening or on two-phase sampling, applied to surveys of a community. Int Statist Rev 1977;45:29–37.[ISI]

10 McNamee R. Optimal designs of two-stage studies for estimation of sensitivity, specificity and positive predictive value. Stat Med 2002; 21:3609–25.[CrossRef][ISI][Medline]

11 Kraemer HC. Evaluating Medical Tests. Objective and Quantitative Guidelines. California: Sage Publications, Inc, 1992.

12 Bowling A. Measuring Health. Buckingham: Open University Press, 1997.

13 McCarty DJ, Tull ES, Moy CS, Kent Kwoh C, Laporte RE. Ascertainment corrected rates: application of capture-recapture methods. Int J Epidemiol 1993;22:559–65.[Abstract]