Biostatistics Group, School of Epidemiology and Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK. E-mail: rmcnamee{at}man.ac.uk
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods A simple formula for the maximum reduction in cost or standard error that can be achieved by two-phase sampling compared with simple random sampling is derived mathematically. A formula for the minimum reduction is also given and the influence of prevalence on efficiency explained.
Results The main result shows that the sensitivity and specificity of the first stage test set an absolute limit on the efficiency of two-phase designs; in particular, two-phase sampling can never be justified on efficiency grounds alone if the test is not accurate enough.
Accepted 7 May 2003
Where occurrence of a disease or condition does not necessarily lead to clinical diagnosis, a population survey may be considered for prevalence estimation; however, this can be expensive if diagnosis requires clinical assessment. A two-phase survey may seem an economic alternative; first, a random sample of the population is classified by a relatively cheap, but fallible indicator of disease status; then, in the second phase, (stratified) random samples of first phase subjects undergo the gold standard, diagnostic procedure. Two-phase studies for prevalence estimation are popular in psychiatric research1,2 but have also been employed in studies of asthma,3 disablement,4 and Parkinsons disease.5
A two-phase study might be considered for reasons besides cost. For example, in the second phase it is possible to target people who are more likely to have the disease; hence such studies may seem to make better use of clinical time1 and/or to be more ethical.6 Also a two-phase study can arise by capitalizing on a previously free-standing survey of symptoms (say) by adding a second phase based on gold standard diagnosis. This paper is concerned only with the efficiency of two-phase designs where the study budget encompasses both phases; it investigates the advantage, in terms of money saved or in terms of increased precision of estimation, of a two-phase design compared with a simple, one-phase, random sample design.
For a given study budget, there are many two-phase options depending on the balance between first and second phase total sample sizes and the sampling fractions for phase two. The most statistically efficient scheme, for a given cost, will be the one that results in the smallest standard error (SE) for the prevalence estimate. Alternatively, one could fix the SE required and then seek the most cost efficient, i.e. cheapest, two-phase design to achieve this. Formulae specifying the solutions to these problems, which are closely related, have been derived.7 From these, it has long been recognized that even the best (whether in statistical or cost terms) two-phase design can be less efficient than a one-phase design.1,2,8,9 This tends to occur when the cost per subject for the gold standard investigation (c2 monetary units) is not much higher than that for the first phase (c1 monetary units); as a rough rule, when c2/c1 < 5, the design will tend to be inefficient.
In many epidemiological applications, c2/c1 will be larger than thisfor example the cost of a clinical interview might be 20 times or more the cost of administering a postal questionnaireand so it is important to clarify the gains to be made in such cases. However, on this point the literature is not very clear; for example, according to Pickles and Dunn,2 an accurate test is needed (although they do not say how accurate) while others suggest that, if c2/c1 is large enough, the two-phase design can be highly efficient, even when the first phase indiator is not especially accurate.8 Deming9 specified a critical value for the negative predictive value of the first phase indicator, below which the two-phase design was not worthwhile, but no proof was given. Most authors conclude that the method is most efficient when prevalence is low. A problem with all the recommendations is that they appear to have been based on a limited number of examples that may have limited the generalizability of their advice.
This paper aims to provide easily understood, general insights into the efficiency gains from two-phase designs, compared with simple one-phase designs of the same cost. The main result shows that the maximum gain in efficiency is a simple function of the sensitivity and specificity of the first phase indicator; it emerges that, no matter how high the value of c2/c1, a two-phase design can never be very efficient if sensitivity and specificity are not high.
![]() |
The most efficient two-phase design |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Let P = true prevalence of the disease in the target population,
Then P = WjPj. An estimate of P,
say, can be found by estimating {Wj} and {Pj} from the first and second phase data sets respectively.2,7,10
Now consider the problem where there is a fixed study budget available, CF, and we want to find the design which gives the smallest SE for within that budget; Cochran7 has shown that this is achieved by setting:
![]() | (1) |
and
![]() |
The standard error of is then7:
![]() | (2) |
An alternative but related problem is to fix the size of the SE required, SEF say, in advance and then look for the design which achieves this at minimum cost. The solution to this problem has the same sampling fractions (1); the minimum study cost is
![]() |
and
![]() |
Frequently the first phase indicator will have only two categories, test negative (j = 1) and test positive (j = 2). For simplicity, this is assumed in the remainder of the paper. Then P2 is the probability of disease being present given a positive test result, i.e. the positive predictive value of the test, while 1 P1 is the negative predictive value. Both these parameters depend on prevalence and the specificity, S1, and sensitivity, S2, of the test, where:
In what follows, use of S1 and S2, instead of P1 and P2, will tend to lead to simpler results.
There are circumstances where the optimal design described above cannot be implemented because (1) would suggest that either, or both, sampling fractions be greater than 1. To complete the specification of optimal two-phase design, additional rules are needed for what to do if this occurs. An extension of the rules for optimal design, covering this situation, is given in the Appendix. These additional rules will lead to either one, or both, fractions being fixed at unity, and to new formulae for min SE2-phase. To simplify the presentation here, the results of the next section are initially derived assuming that (1) always leads to 0 < v1 < 1 and 0 < v2 < 1. A discussion of how the results can be generalized beyond this restriction then follows, but the detailed justification is confined to the Appendix. It is also assumed hereafter that P 0.5 and S1 + S2 > 1. This last condition ensures that there is a positive correlation,
, between test and gold standard classifications. Tests for which this is true are said to be legitimate;11 an illegitimate test is unlikely to be of interest.
![]() |
Efficiency compared with one-phase design |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | (3) |
Alternatively, one could compare the minimum cost of a two-phase design chosen to achieve a given degree of precision, SEF, with the cost of a one-phase design giving the same SE. It can be shown that this ratio, minC2-phase/C1-phase, is equal to the square of (3).
Equation (3) is much simpler if the following substitutions are made: W1P1 = P(1 - S2), W1(1 - P1) = (1 - P)S1, W2P2 = PS2, W2(1 - P2) = (1 - P)(1 - S1). As shown in the Appendix, this gives
![]() | (4) |
where = Pearson correlation between the true classification and the test classification. This formula could be used directly to calculate the benefits from two-phase design for a given combination of prevalence, c2/c1 and sensitivity and specificity. To facilitate this, a formula for
in terms of P, S1, and S2 is given in the Appendix. However, a general understanding of the benefits of two-phase design can be gained by examining, first an upper bound, and then a lower bound for (4), as is now shown.
The ratio (4) decreases as c2/c1 increases, provided P, S1, and S2 stay fixed; this confirms the well-known observation that the advantage of two-phase over one-phase designs with the same budget increases with the relative cost of the gold standard. Furthermore, the last term in (4) will be zero when c2/c1 is equal to infinity; therefore the SE ratio cannot be less than
![]() |
This result can be expressed as an upper bound on the SE reduction:
![]() | (5) |
In the Appendix it is proven that this result also holds true for optimal designs where either v1 = 1 or v2 = 1. (The extreme case where v1 = 1 and v2 = 1 is discussed separately later). The expression on the right of (5) is the maximum reduction in SE that can be achieved by two-phase design; no two-phase study can ever improve on this. It is shown in Table 1 for various values of sensitivity and specificity. The reduction is slight in many cases; for example when S1 = S2 = 0.7, it is only 8%. In these cases, inspection of this Table may be all that is needed to convince that there is little to be gained from two-phase sampling.
|
![]() |
This upper bound is based on the assumption that v1 < 1 and v2 < 1. Where this assumption is relaxed to include situations where optimal design leads to either v1 = 1 or v2 = 1, the result is not valid in all cases. However, empirical work (discussed later) suggests that it is valid in all cases where it predicts a reduction in SE from two-phase sampling. Therefore we suggest the following rule for predicting the worst case reduction from two-phase sampling:
![]() |
If this is less than one, then it is the maximum value of (4). This result can also be expressed as a minimum reduction in SE from two-phase sampling for a given test and costs:
![]() | (6) |
If it is greater than one, two-phase design may increase the SE compared with a one-phase study of the same cost; however the formula is not accurate for predicting the maximum increase.
To illustrate the validity and use of the lower and upper bounds, SE reductions were found from (4)or its equivalents in the Appendix for cases where v1 = 1 or v2 = 1for three hypothetical tests and a range of prevalences and costs (Table 2). The most favourable result for each test was calculated from (5). The worst case reduction for each test and a given cost ratio c2/c1, was found from (6); this is shown on the last line of each sub-Table. If the worst case is an increase in SE from two-phase design, only the fact of an increase is stated.
|
Equation (6) and related guidance have not been proven mathematically for the case where v1 = 1 or v2 = 1; however, they are supported by 2970 calculations of (4) and (6) in which c2/c1 ranged from 2 to 500, P from 0.001 to 0.6 and S1 and S2 from 0.1 to 0.95, with S1 + S2 > 1. The guidance was correct in all cases. Its validity is discussed further in the Appendix.
Inspection of the columns in Table 2 gives some insight into how two-phase SE reductions vary with prevalence, but does not tell the whole story. In general, the SE reduction diminishes as P increases up to a certain point, but then increases as P increases beyond this point (Appendix). When S1 and S2 are approximately equal (Tables 2a
and 2c
), the point of change is around P = 0.5. However, if specificity (S1) is greater than sensitivity (S2), but both are in the range 0.5 to 1, the change point will be at a lower value of P; e.g. it is at P = 0.32 for the data in Table 2b
. The most extreme case is when S1 is close to one and S2 is close to 0.5; e.g. if S1 = 0.99, S2 = 0.5, the change point is around P = 0.17. The SE reduction will increase with P beyond this point. Formulae (5) and (6) are valid regardless of these subtleties.
Two extreme scenarios complete this overview. The first is the important case where optimal two-phase design leads to v1 = v2 = 1, implying that every subject is assessed by both the gold standard and the test. This outcome can arise from a combination of a poor test, P close to 0.5 and low c2/c1; one example occurs in Table 2c. Of course a two-phase prevalence study in which everyone is measured by test and gold standard is wasteful; the proper action would be to revert to a one-phase design based on the gold standard only. If the two-phase design were followed nevertheless, it can be shown (Appendix) that this would result in an SE which is
(1 + c1/c2) times that from a one phase study with the same budget.
The second extreme is the improbable case of a perfect test with S1 = S2 = 1, and hence = 1. Application of formula (1) would lead to the correct conclusion that no second phase data is needed, i.e. v1 = v2 = 0: in effect, the two-phase study becomes a one-phase study using only the perfect test. Equation (4)
leads to the correct conclusion that the ratio of SEs based on the test alone versus the gold standard alone, given the same budget, is
(c1/c2).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Since implementation of an optimal scheme requires estimates of prevalence, sensitivity, and specificity, inaccurate estimates may lead to a less than optimal scheme being implemented.9 A small theoretical reduction in cost or SE might therefore evaporate in practice. To offset this problem, one might therefore insist on a large theoretical benefit before considering a two-phase design as likely to being efficient in practice. For example, if we insist on a theoretical reduction in SE of at least 30% then, roughly speaking, only tests where the sum of sensitivity and specificity exceeds 1.7 qualify (Table 1); an exception is when one index is very high, say 0.99 or more, when a sum of 1.6 is adequate. The General Health Questionnaire (GHQ) has sensitivity and specificity in the ranges 0.550.92 and 0.800.99 respectively depending on the health outcome;12 from these figures alone one would expect the efficiency of a two-phase study using the GHQ to vary widely from one outcome to another.
Demings view, that the critical factor for two-phase efficiency is the negative predictive value,9 can be partly explained by the fact that he worked with predictive values rather than specificity and sensitivity. Furthermore, in all his examples specificity, although not calculated, was always in the range 0.970.99, while prevalence was either 0.1 or 0.2. If specificity and prevalence have fixed values, then high negative predictive value is synonymous with high sensitivity. Also, it is easy to show that Demings rule of thumbthat for efficiency, negative predictive value must be greater than 1 P/4is equivalent to insisting on sensitivity > 0.75. Demings rule, deduced from his limited range of data, could therefore be construed as insistence on sensitivity + specificity being at least 1.7. This is compatible with the present work which, however, has more general validity.
Pickles and Dunn2 have noted that the appeal of two-phase studies may lie not in efficiency, but in the ability to sample all those test positive, but a small fraction of test negative subjects. Another advantage of a two-phase design is that it can simultaneously measure prevalence and the performance (i.e. sensitivity and specificity) of a new test for disease status.10 This was done in a study of childhood asthma3 where a symptom questionnaire of unknown sensitivity and specificity for asthma was used in the first phase of a two-phase study which went on to measure prevalence and sensitivity and specificity.
If efficient estimation of prevalence is the main objective in design, then other options may be considered. Firstly, if there is a cheap test of disease status available with known S1 and S2, one could use this test alone to estimate prevalence, since
![]() |
where W2 is the proportion who are test positive. To employ this approach, one would have to be sure that estimates of S1 and S2 made in one context could be safely carried over to another; also allowance would have to be made for any sampling error in the estimates of S1 and S2. Thus this approach may not be as efficient as it seems, and may have very limited applicability. Secondly, if several (preferably three or more) population-wide but imperfect systems for recording cases exist already, it may be possible to use capture-recapture methods13 to allow for under-ascertainment and thus to estimate prevalence.
KEY MESSAGES
|
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | (A1) |
In equation (3) the second term contains
![]() |
hence the whole of the second term in (3) is
![]() |
The following formula for , in terms of P, S1 and S2 only, may be more convenient than (A1):
![]() | (A2) |
where
![]() |
For fixed S1 and S2, varies with P; to find its maximum value,
max, for fixed S1 and S2, standard calculus methods are applied to (A2). In the case where S1 = S2, it turns out that
max =
which occurs when P = 0.5. If S1
S2, it can be shown that
![]() |
which occurs when
![]() |
(5) and (6) when either v1 = 1 or v2 = 1 (not both)
Cochran7 suggested that where (1) leads to vj > 1 for some j, those vj be set equal to 1; new formulae for the optimal values of the remaining fractions, subject to this restriction, then have to be derived. If J = 2, and we set v2 = 1, it can be shown6 that the optimum value of v1 is
![]() | (A4) |
and the corresponding minimum SE, given a fixed budget CF, is:
![]() | (A5) |
If instead, v1 is fixed at 1, the optimum value of v2 and the corresponding minimum SE can be found from (A4) and (A5) respectively, but with W1 and W2 interchanged, and P1 and P2.
If (1) leads initially to both v1 > 1 and v2 > 1, the following modification to Cochrans suggestion is recommended: set the larger v equal to one, and use formula (A4) or its equivalent, to find a new value for the other fraction. If this second step also leads to a fraction of one or more, then both fractions are set equal to one. In empirical work by the author, this modification always led to smaller two-phase SE compared to if it was not used. The two-phase optimality process tends to result in v1 < 1, v2 = 1 when P is very low, or c2/c1 is low, or the test accuracy is low. This pattern of occurrence can be seen in Table 2
and can be deduced by rewriting (1), for j = 2, as
![]() |
In such cases, the ratio of the two-phase SE to that for a one-phase design is, from (A5):
![]() | (A6) |
The latter inequality follows from application of the Cauchy-Schwarz inequality. If instead, v1 = 1, v2 < 1, the relevant formula is (A6'), which is (A6) with S1 and S2 interchanged and W1 and W2 interchanged. In practice this outcome is very unlikely when P < 0.5.
Thus a full description of the ratio of SE from optimal two-phase and one phase designsexcluding the case where v1 = 1, v2 = 1 which is treated separately lateris as follows:
![]() | (A7) |
Since both (A6) and (A6') are greater than (4), the SE ratio can
never be smaller than the lower bound,
![]() |
derived for (4) alone. Hence equation (5), which describes the maximum reduction in SE using two-phase studies, is valid for the whole range in (A7).
A simple algebraic expression for the maximum of (A7) with respect to P, for a given test and c2/c1, has not been found. It was initially conjectured that the maximum of (4) might also be the maximum of (A7). Consider (A6) and (4): while it is true that (A6) (4) for all P, (A6) tends to be applicable only for small values of P, and both functions are smallest when P is small. Thus it was conceivable that the highest applicable value of (A6) might be less than the highest value of (4). Empirical work (see main text) supported the conjecture in all cases where the true maximum is less than one, but not always otherwisehence the proviso in (6) for these cases. In examples where v1 = 1, v2 < 1, (A6') > 1 always, but so was the maximum of (4) for test/cost combinations where this arose; hence the proviso in (6) also allowed for this.
Ratio of minimum two-phase and one-phase SE when v1 = 1 and v2 = 1
Application of the rules for optimal two-phase design may give v1 = 1 and v2 = 1, implying that the first and second phases have the same size (m = n). This corresponds to a study with a cost per subject of c1 + c2; the corresponding SE, given a fixed budget CF, would be:
![]() |
Comparing this with a one-phase design, we have:
![]() |
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Pickles A, Dunn G. Prevalence of Disease, Estimation from Screening Data. Encyclopaedia of Biostatistics. Chichester: Wiley, 1998.
3 Frank TL, Frank PI, McNamee R et al. Assessment of a simple scoring system applied to a screening questionnaire for asthma in children aged 515 years. Eur Respir J 1999;14:119097.
4 Tennant A, Badley EM. Investigating non-response bias in a survey of disablement in the communityimplications for survey methodology. J Epidemiol Community Health 1991;45:24750.[Abstract]
5 Bermejo F, Gabriel R, Vega S, Morales JM, Rocca WA, Anderson DW. Problems and issues with door-to-door, two-phase surveys: an illustration from Spain. Neuroepidemiology 2001;20:22531.[CrossRef][ISI][Medline]
6 Shrout P, Newman SC. Design of Two-Phase Prevalence Surveys of Rare Disorders. Biometrics 1989;45:54955.[ISI][Medline]
7 Cochran WG. Sampling Techniques. New York: Wiley, 1977.
8 Newman S, Shrout, P, Bland R. The efficiency of two-phase designs in prevalence surveys of mental disorders. Psychol Med 1990;20:18393.[ISI][Medline]
9 Deming WE. An essay on screening or on two-phase sampling, applied to surveys of a community. Int Statist Rev 1977;45:2937.[ISI]
10 McNamee R. Optimal designs of two-stage studies for estimation of sensitivity, specificity and positive predictive value. Stat Med 2002; 21:360925.[CrossRef][ISI][Medline]
11 Kraemer HC. Evaluating Medical Tests. Objective and Quantitative Guidelines. California: Sage Publications, Inc, 1992.
12 Bowling A. Measuring Health. Buckingham: Open University Press, 1997.
13 McCarty DJ, Tull ES, Moy CS, Kent Kwoh C, Laporte RE. Ascertainment corrected rates: application of capture-recapture methods. Int J Epidemiol 1993;22:55965.[Abstract]