Institute of Psychiatry, De Crespigny Park, London SE5 8AF, UK. E-mail: m.prince{at}iop.kcl.ac.uk
Roseanne McNamee1 has demonstrated elegantly that the efficiency of a two-phase design is critically dependent upon the sensitivity and specificity of the phase one screening assessment. More importantly, regardless of the ratio of costs of phase one and phase two assessments, two-phase designs will never be justified on the grounds of efficiency, unless the screening assessment is unusually accurate. She provides an evidence-based rule of thumb; sensitivity and specificity summed together need to exceed 1.6, or, allowing room for doubt as to whether cited validity coefficients will generalize to other settings, 1.7. Thus, McNamee has performed a service for psychiatric and neuroepidemiology, deftly inserting a stiletto between the ribs of a design that has proved enduringly popular but that is all too often incorrectly applied, analysed, and inferenced.2,3
At first sight, McNamees rule of thumb would still rule into contention many commonly used screening assessments. For dementia, the Mini-Mental State Examination (MMSE) in a community setting delivered a sensitivity of 0.86 with specificity of 0.92, summing to 1.78.4 For schizophrenia and other functional psychosis, the psychosis screening questionnaire was originally validated in a clinical setting with sensitivity of 0.97 and specificity 0.93, summing to 1.90.5 Only the 12-item General Health Questionnaire would seem to be struggling; when tested in the Australian National Mental Health Survey against the criterion of any Composite International Diagnostic Interview (CIDI)-defined neurotic disorder, it achieved a sensitivity of 0.83 with specificity of 0.69, summing to just 1.52.6 However, our experience from the UK National Psychiatric Morbidity Survey (NPMS) should provide a warning to the wise. NPMS is for the most part a one-phase survey; most diagnoses are generated from the fully structured, lay interviewer Clinical Interview ScheduleRevised,7 administered to all participants; but psychosis and personality disorder are screened for in the first phase with definitive second-phase diagnostic interviews for selected participants. This is because of the low prevalence of these disorders and the perceived difficulty in arriving at valid diagnoses using non-clinical interviewers. All respondents screening positive for psychosis (using the psychosis screening questionnaire), half of those screening positive for borderline or antisocial personality disorders, and one in 14 of all others were invited for second-phase interview using the Schedules for Clinical Assessment in Neuropsychiatry (SCAN) (for psychosis) and the Structured Clinical Interview for DSM-IV (for personality disorder). Of 375 people screening positive for psychosis, 203 (54%) completed second-phase interviews.8 In this group the prevalence of psychosis according to SCAN was 13.3%. Of 732 people screening negative for psychosis and selected for second-phase interview, 420 (57%) completed interviews, and the prevalence of psychosis was 0.63%. As expected, the positive predictive value (PPV) was much lower (0.13 versus 0.91) than that originally estimated in a high-prevalence clinical population,5 while the negative predictive value (NPV) was a little higher (0.99 versus 0.98). My rough calculation from data provided in the NPMS report (with no allowance made for sampling weights) suggests a sensitivity of 0.49 and specificity of 0.96, summing to only 1.55. The decrement in sensitivity from that originally reported (0.92)5 is the more surprising given that in NPMS additional screening in criteria (e.g. admission to a mental hospital, or taking anti-psychotic medication) were used in an attempt to boost sensitivity after the experience of the first NPMS survey. The lesson here is that validity coefficients derived (as is usually the case) in clinical populations by the research group that developed the screening assessment may not generalize. In population-based research the rigour with which they are administered is arguably less while the degree of masking to caseness is inevitably greater.
In the first NPMS, psychosis screen negatives were not assessed in the second phase, and were assumed to be free of psychosis.9 The tacit assumption was that the screen was perfectly sensitive. In the second NPMS, this was acknowledged to be an unsatisfactory assumption likely to produce a biased underestimate of the true prevalence, hence the decision to include one in 14 of screen negatives in the second phase.8 As we have seen, the screen was far from perfectly sensitive. The final weighted prevalence of psychosis, taking account of the different sampling fractions for screen negative and positive, was 1.11% (0.521.70%), more than double that which would have been obtained under the assumption, used in the earlier NPMS, that screen negatives were disease free0.53 (0.370.63%).8 The correctly weighted prevalence estimate has a much higher standard error. It is the weighted product of a precise prevalence estimate of a common condition among the screen positives and a very imprecise estimate of a rare condition among screen negatives. The coefficient of variation was nearly two and a half times larger for the latter group. In an interesting and closely argued review of the options the NPMS decided to use the second figure (0.53%) as its estimate of the national prevalence of psychosis (conceding that it is an underestimate) on the basis that it allowed direct comparison with earlier estimates and was precise enough to be used to inform policy, while the first figure (1.11%), because of the associated sampling error, might have been a considerable overestimate.8 I must declare an interest here; I am a member of the expert advisory group for NPMS, a signatory to the report, and a party to the decision. On reflection, that decision was wrong. We had decided to use a two-phase design with what turned out to be an inefficient screen. The consequence was an imprecise estimate and this should have been accepted. With the benefit of hindsight, and armed now with McNamees rule of thumb, we would have been better off investing the available clinical interviewer resources into a smaller, nested, one-phase psychosis study. This will remain the case until or unless a more effective screening procedure for psychosis can be identified.
There are other powerful arguments for favouring one-phase over two-phase designs.
Firstly, data analysis with two-phase designs is greatly complicated. Prevalence estimates should be weighted back to the composition of the base population, taking account of the sampling fractions. This is not straightforward. Although point estimates can be calculated accurately by simple algebra (the Horvitz-Thompson estimator) standard errors will be underestimated, and require the application of special techniques. Dunn provides an excellent overview.3 It is not appropriate to test for the statistical significance of observed differences between proportions in sub-groups (for example the prevalence of depression in men and women) using standard 2 tests. SPSS will not help, only more specialized statistical software packages (e.g. STATA10) provide appropriate techniques.
Secondly, more serious problems arise from the non-response in the burdensome second phase. NPMS is typical in that the proportion responding in the second phase is lower than that in the first phase (69%), and lower among screen positives (54%) than screen negatives (57%). For older people in dementia surveys the attrition between first and second phase is generally even more striking.11 Non-response in the second phase arises from death, frailty, refusal, or loss to follow-up. As such, second-phase data will not be missing completely at random, the assumption of random sampling within screening strata will be violated, and estimates of prevalence and association will be biased. Surprisingly, there has been very little treatment of this problem within the methodological literature; however, it is effectively a variant of the well-recognized problem of informative censoring in longitudinal studies. Gao et al.12 propose adjusted weighting, modelling, or regression estimators as partial remedies, each of which is fairly robust where data can be considered at least to be missing at random, i.e. non-response is non-random with respect to covariates measured in the first phase, but, given those covariates, random with respect to the outcome of the second phase assessment, had it occurred. No methods have been shown to have dealt effectively with the more plausible distribution of non-response known as informative censoring where response is non-random with respect both to the known characteristics of participants and with the diagnostic outcome.
If psychiatric epidemiology is not quite ready to renounce its addiction to two-phase surveys, let us hope at least that awareness of some of the more obvious pitfalls is increasing. Failure to assess screen negatives is still far too common, with many apparently heedless of the damage caused to the validity of their research. Thus, prevalence estimates can only signify a minimum level for the population surveyed.13 Also, measures of association are likely to be biased. For example (one of many in the literature), a study that reported a prospective association between hypercholesterolaemia and later onset of dementia was seriously flawed because only those scoring less than 25 on the MMSE proceeded to the second-phase dementia diagnostic assessments;14 dementia was likely to be selectively under-ascertained in better-educated, higher-status participants, who also may have had healthier lipid profiles. Graham Dunn has waged, and largely won a campaign against underestimation of standard errors in two-phase surveys.3 One would hope also that the problem of missing data in the second phase will increasingly be recognized and dealt with more appropriately.12 I confess though that my main hope is that Roseanne McNamee will have convinced many not to embark on this course in the first place. For this to occur, we need valid one-phase diagnostic assessments that are convenient for use in epidemiological studies. In psychiatric epidemiology, comprehensive structured assessments for the common mental disorders mean that most studies in this area now use one-phase designs. For dementia, the goal of one-phase assessment also seems attainable.15 Psychosis remains a problem, but Terry Brughas effort to adapt clinimetric approaches for use by social survey interviewers shows promise.16
![]() |
References |
---|
![]() ![]() |
---|
2 Deming WE. An essay on screening, or two-phase sampling, applied to surveys of a community. Int Statistical Rev 1978;45:2837.
3 Dunn G, Pickles A, Tansella M, Vazquez-Barquero JL. Two-phase epidemiological surveys in psychiatric research. Br J Psychiatry 1999;174:95100.[ISI][Medline]
4 OConnor DW, Pollitt PA, Hyde JB et al. The reliability and validity of the Mini-Mental state in a British community survey. J Psychiatr Res 1989;23:8796.[CrossRef][ISI][Medline]
5 Bebbington PE, Nayani T. The Psychosis Screening Questionnaire. Int J Methods Psychiatr Res 1995;5:1120.[ISI]
6 Donath S. The validity of the 12-item General Health Questionnaire in Australia: a comparison between three scoring methods. Aust N Z J Psychiatry 2001;35:23135.[CrossRef][ISI][Medline]
7 Lewis G, Pelosi AJ, Araya R, Dunn G. Measuring psychiatric disorder in the community: a standardized assessment for use by lay interviewers. Psychol Med 1992;22:46586.[ISI][Medline]
8 Singleton N, Bumpstead R, OBrien M, Lee A, Meltzer H. Psychiatric Morbidity Among Adults Living in Private Households, 2000. London: The Stationery Office, 2001.
9 Jenkins R, Bebbington P, Brugha T et al. The National Psychiatric Morbidity Surveys of Great BritainStrategy and Methods. Psychol Med 1997;27:76574.[CrossRef][ISI][Medline]
10 StataCorp. Stata Statistical Software: Release 5.0. College Station TX: Stata Corporation, 1997.
11 The 10/66 Dementia Research Group. Methodological issues in population-based research into dementia in developing countries. A position paper from the 10/66 Dementia Research Group. Int J Geriatr Psychiatry 2000;15:2130.[CrossRef][ISI][Medline]
12 Gao S, Hui SL, Hall KS, Hendrie HC. Estimating disease prevalence from two-phase surveys with non-response at the second phase. Stat Med 2000;19:210114.[CrossRef][ISI][Medline]
13 Prince M. Dementia in developing countries. (Guest Editorial) Int Psychogeriatr 2001;13:38993.[ISI][Medline]
14 Kivipelto M, Helkala EL, Laakso MP et al. Midlife vascular risk factors and Alzheimers disease in later life: longitudinal, population based study. BMJ 2001;322:144751.
15 Prince M, Acosta D, Chiu H, Scazufca M, Varghese M. Dementia diagnosis in developing countries: a cross-cultural validation study. Lancet 2003;361:90917.[CrossRef][ISI][Medline]
16 Brugha TS, Nienhuis F, Bagchi D, Smith J, Meltzer H. The survey form of SCAN: the feasibility of using experienced lay survey interviewers to administer a semi-structured systematic clinical assessment of psychotic and non-psychotic disorders. Psychol Med 1999; 29:70311.[CrossRef][ISI][Medline]