1 MRC Laboratories, The Gambia
2 Department of State for Health and Social Welfare, Government of The Gambia
3 MRC Tropical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
Correspondence: Paul Milligan, MRC Laboratories, PO Box 273, Fajara, The Gambia. E mail: pmilligan{at}mrc.gm
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Methods We conducted two surveys, one using the EPI scheme and one using compact segment sampling, to estimate vaccination coverage in Western Region of The Gambia within 3 months of each other in 20002001.
Results Point estimates for vaccination coverage from the two surveys rarely differed by more than 2%. Any differences were more likely to be due to household selection than to population movement. A simple mathematical model showed that even in extreme situations, ignoring population movement since the last census is unlikely to have any appreciable effect. Rates of homogeneity did not differ systematically between the surveys.
Conclusions In situations where quality of fieldwork can be guaranteed, the EPI random walk method can give accurate and precise results. However, compact segment sampling is generally to be preferred as it ensures objectivity in household selection and permits the estimation of population totals (such as those unvaccinated), which are helpful for planning service provision.
Accepted 15 December 2003
Household surveys of health in developing countries are carried out frequently to estimate burden of disease, attitudes to health care, use of health services, and coverage of interventions. These surveys are often undertaken by teams from health ministries, disease control programmes or non-governmental agencies with limited resources, without specialized sampling skills, and without the use of a comprehensive and up-to-date sampling frame of people or households.
The World Health Organization's Expanded Program on Immunization (EPI) first recognized the need to carry out an approximately valid sample survey in such a situation. The EPI 30 x 7 cluster survey method1 was developed to meet the needs of health managers for reliable estimates of vaccine coverage. It has been used for thousands of such surveys, and, with or without adaptation, for many other purposes. The method is standardized, quick to implement, and approximately self-weighting (and therefore simple to analyse), but it has important limitations. Firstly, communities are selected with probability proportional to size (pps) according to the most recent census data, but these data can be inaccurate and out of date, particularly with respect to fast-growing peri-urban areas. This will often mean that such areas, which may have the poorest access to health care, will be under-represented in the sample, and the overall estimate of vaccine coverage will be biased upwards. Secondly, the method does not select households from a sampling frame, but instructs the interviewer to follow a random procedure in the field, resulting in a cluster of households being selected within the community. This procedure is open to conscious or unconscious bias of the interviewer, and does not lead to a sample selected with known probability. Thirdly, in case of non-response, one simply goes on to select the next household, leading to bias if non-responders differ systematically from those who do participate.
The EPI methodology has sometimes been adopted for purposes other than vaccine coverage, in situations where the sample size (30 clusters of 7 children) and the method of household selection (based on children aged 1223 months) are quite inappropriate. Bennett et al.2 presented a number of extensions and adaptations of the EPI approach, including alternative household selection methods and appropriate sample size and analysis methods, retaining its simplicity while extending its applicability. However, this did not adequately address the problems of outdated population estimates and possible subjectivity in household selection mentioned above.
More recently, Turner and colleagues3 have proposed an improved cluster sampling method for resource-poor situations without a household sampling frame, which, by segmenting clusters and visiting all households in one randomly chosen segment, allows objective selection of households, sampling probabilities to be calculated, and non-responders to be revisited without resort to the cost of complete enumeration. A further potential advantage of this method is that, since all selections are made with known probability, population totals may be estimated, which may be useful for resource planning.
The EPI method and its adaptations have been evaluated for precision and possible bias by computer simulation.4,5 Both segmenting and the EPI random walk method were used by a number of countries in the UNICEF Multiple Indicator Cluster Surveys.6,7 However, we were unable to find any published data comparing their use in the same field situation. We used both methods in separate surveys to measure childhood vaccination coverage in the same region of The Gambia at approximately the same time. In this paper we describe the design and analysis and compare the results of the two approaches. We also predict under a simplified mathematical model the situations in which the EPI method will exhibit substantial bias.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Compact segment sampling
In compact segment sampling,3,8 clusters are still selected with probability proportional to size at the last census. A sketch map is then drawn of each selected cluster, showing dwellings, and the cluster is then split into a small number of segments, such that the number of dwellings per segment is always roughly the same. One segment is then chosen at random from each community and all households in the segment are included in the sample. This method removes the subjectivity and possible bias inherent in allowing the interviewer to select households in the field. It also facilitates calling back at those households where there is no, or incomplete, response. Kish8 gives detailed practical guidance. Turner et al.3 claim that this design is self-weighting, allowing easily for changes in cluster population since the last census. This is only true if the census is up to date. Using their notation, if Mi is the census population of the ith selected cluster, M the total population of all clusters in the sampling frame, m the number of clusters selected in the survey and Si the number of segments created in the ith cluster, then the probability of selection of any household in the ith cluster is
![]() | (1) |
Our surveys
In November 2000 to January 2001 we undertook a survey of vaccination coverage in the Western Region of The Gambia using the compact segment method, as part of a study to evaluate the introduction of Haemophilus influenzae type b (Hib) vaccine into the country's immunization schedule.9 In February 2001, the Ministry of Health carried out their routine survey using the EPI method.
Study area
Approximately half of the population of The Gambia (just over 1 million in 1993) live in Western Region, which includes the capital Banjul, rapidly growing urban areas of Kanifing and Brikama, as well as rural areas (Figure 1). The average annual growth rate between the 1983 and 1993 censuses was 6% in urban areas and 3% in rural areas9 (Table 1).
|
|
First survey, using compact segments: sample size and planning
The most recent census (1993) defined 909 enumeration areas (EA) of median population 617 (range: 2082779) in Western Region, with a total population of 574 026. The median number of households per EA was 76 (range: 2348) and the median household size was 8 (range: 179). We estimated that by the time of our survey there would be an average of 26 to 28 1223 month old children per EA.
We were primarily interested in the coverage of the jointly administered diphtheria-pertussis-tetanus (DPT) and Hib vaccines. The total number of children required2 is given by where p is the proportion of children vaccinated, d the desired margin of error, z(1
/2) the 1
/2 percentage point of a standard normal distribution and deff the design effect, the loss of information (increase in variance) in the sample due to the clustering of observations. The value of deff increases with increasing cluster size, according to the relationship10, where deff = 1 + [(b 1) x roh], where b is the average number of individuals sampled in each cluster and roh is the rate of homogeneity, which can range from just less than zero up to 1. We took roh to be 0.15, typical of values seen for immunization coverage.2,12
If we segmented EA to give a cluster size of about 40 households, we would expect to find 14 children in each cluster, and 60 clusters could be completed in the time available. This gave a predicted design effect of 2.95, and an expected sample size of 60 x 14 = 840, which would give an estimate of vaccination coverage within ±6% of the true value if the coverage was 50%, and within ±5% if the coverage was 80%.
First survey, using compact segments: sample selection
EA were selected with probability proportional to the number of households recorded in the 1993 census. Small EA were grouped with adjacent larger EA before selection. In all, 60 EA were selected systematically, yielding an implicit geographical stratification. In a preliminary exercise, each selected EA was sketch-mapped to show the location of each compound (a multi-household dwelling, frequently housing an extended family). In each compound, we determined the number of households and made a rough estimate of the number of resident children aged 1223 months. The latter was not essential and added significantly to the workload, but allowed us to reduce the variability in the number of children per segment.
Maps were returned to the office for segmenting. For each EA, we calculated the number of segments by dividing the estimated number of children aged 1223 years in that EA by 14 and rounding to the nearest integer. Segments were defined using natural boundaries (roads, rivers) and avoiding splitting blocks of compounds where possible. If the estimated number of children in an EA was less than 14, all households in the EA were included in the survey. The median number of segments per EA was 2 (range: 112). One segment was selected from each selected EA by simple random sampling, and marked clearly on the map.
The interviewer teams returned to the area and visited each household in the selected segment. Age was confirmed from the health card and, for each child aged 1223 months on 20 December 2000, vaccination, date of birth, and other details were transcribed from the card. If the health card was missing the mother's estimate of the child's age and details of vaccinations were recorded using a questionnaire developed by the EPI programme. Up to two re-visits were scheduled within 2 days for each mother or child who was absent. There were no refusals or non-responses.
Second survey, using EPI method
A nationwide survey was conducted in seven strata, three of which (Banjul, Western Division, and North Bank West) comprised Western Region. Within each stratum, 30 settlements (villages) were selected with probability proportional to the 1993 estimates of the number of households. The same personnel conducted these surveys in February 2001. In all, seven or eight children 1223 months were selected in each cluster as described in the section on The EPI method above, but using a random start point obtained from the list of local tax-payers. Only children with health cards were included in the survey.
Analysis of data
For any sampling scheme, both the number of children vaccinated and the number of children selected in a cluster, are random variables, thus the proportion of children vaccinated is a ratio of random variables. The total number of vaccinated children in the region, Y, and the current total number of children in the region, N, are estimated by weighting each observation by 1/Pi:
![]() |
![]() | (2) |
The variance of the estimated coverage can be estimated by:13
![]() | (3) |
![]() |
![]() | (4) |
The design effect may be estimated as:
![]() |
The variance of the estimated total number of children, , is estimated by:
![]() | (5) |
Predicted bias of EPI sampling
Using data from the surveys, we can predict mathematically the expected bias in EPI sampling due to not weighting for uneven population growth (the weighted segment method is unbiased). We neglect other sources of bias (e.g. non-random sampling) and we make the simplifying assumption that at the time of the census all clusters were the same size and had the same number of children, N0, so that we have simple random sampling of clusters. The expected bias is , where
is the point estimate of coverage from EPI sampling and R is the true population coverage. Taking expectations over all possible samples for the EPI method, we have:
![]() |
![]() |
Defining g,p to be the correlation between pi and gi in the population,
, where V(p) and V(g) are the variances of p and g respectively, we see that the bias is equal to
![]() | (6) |
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
Using equation (6) and data from these surveys, we can model the potential bias of the EPI method. We use the number of households as a measure of population size (and neglect the effects of variation among EA in the number of children per household). From our data we had = 0.28 for three doses of DPT/Hib. The observed sample variance of cluster-level coverage of three doses of DPT/Hib was
0.05 and the population variance was estimated from this by subtracting an estimate of the within-cluster variance14 as V(p)
0.03. The coefficient of variation of the proportionate increase in population (estimated for each EA from the number of households) was CV(g) = 0.45 (
, V(g) = 0.42), so the expected bias = +0.28 x 0.45 x 0.17 = 0.022, indicating that EPI sampling will overestimate vaccine coverage by about 2%. For all other vaccines the predicted bias was much smaller, around 0.5%. Suppose we now assume an extreme scenario: assume vaccine coverage pi is very variable between clusters (uniformly distributed from 0 to 1), so that V(p) = 1/12 = 0.083, and assume a correlation almost twice as high as observed between vaccine coverage and population growth (
= 0.5). If CV(g) = 0.5, the bias of the EPI method is then given by equation (5) as +0.5 x 0.5 x 0.29 = +0.07. Thus this extreme scenario results in EPI sampling overestimating vaccine coverage by 7%. In practice the bias is likely to be much smaller.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The differences between the two methods in point estimates and confidence intervals were in most cases negligible. There was no clear pattern to differences in rate of homogeneity. To some extent these findings may be explained by the very high coverage for individual vaccines, but full vaccine coverage was not high and EPI sampling overestimated it by only 2%.
The EPI method falls short in two main aspects: the dependence on an out-of-date census for selection of clusters and the lack of objectivity of household sampling. Since the former is accounted for in segment sampling by the weighting, it may be evaluated by comparing the weighted and unweighted segment sampling estimates. In our survey, differences between these are almost non-existent. One would expect EPI sampling to perform badly in situations where there has been uneven population growth since the last census and that growth is highly correlated with the variable of interest. However, our calculations demonstrate that even in an extreme situation the bias is unlikely to be severe.
The lack of objectivity of household selection may be evaluated by comparing the unweighted segment sampling estimate with the EPI estimate. This is usually of the order of 12% in our study. However, our survey was conducted by an experienced team, and there may be more of a problem where staff are less well trained or supervized.
The EPI survey, and the interview phase of the segmented survey, each took a team of 16 interviewers about 5 weeks to complete. The mapping of households and preliminary enumeration for the segmented survey added a further 5 weeks to the fieldwork, but if segments were defined in terms of compounds, a simpler mapping exercise could be used.
The theoretically ideal approach to surveys in the absence of a household sampling frame would be complete enumeration of all households in selected clusters, followed by random or systematic sampling of households,15 but such schemes demand carefully supervised field work and the weighting may become complex. Kish8 discusses in detail the advantages and disadvantages of compact segment designs versus complete enumeration. In standard (as opposed to compact) segment sampling, as used by the Demographic and Health Surveys,16 households in a selected segment are listed and a systematic sample taken. Compact segment sampling, in which all children in a segment are selected, avoids the accurate mapping and listing, but at the cost of a less precise estimate, because of increased homogeneity within the sample.
Although we have shown that the effect of an out-of-date census is likely to be small, the EPI approach is still inferior to any of the other approaches mentioned, in that it is not possible to ensure objectivity of household selection, to deal appropriately with non-response or to estimate total numbers of those vaccinated (useful for planning). The choice to have the security of a probability design will depend on time and budget, but mapping need not add substantially to costs particularly if existing maps can be updated. However, it is encouraging to see that, at least when carried out by an experienced team, the EPI method can give accurate results.
KEY MESSAGES
|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Bennett S, Woods T, Liyanage WM, Smith DL. A simplified general method for cluster-sample surveys of health in developing countries. World Health Stat Q 1991;44:98106.[Medline]
3 Turner, AG, Magnani, RJ, Shuaib M. A not quite as quick but much cleaner alternative to the expanded programme on immunization (EPI) cluster survey design. Int J Epidemiol 1996;25:198203.[Abstract]
4 Lemeshow S, Tserkovnyi AG, Tulloch JL, Dowd JE, Lwanga SK, Keja J. A computer simulation of the EPI survey strategy. Int J Epidemiol 1985;14:47381.[Abstract]
5 Bennett S, Radalowicz A, Vella V, Tomkins A. A computer simulation of household sampling schemes for health surveys in developing countries. Int J Epidemiol 1994;23:128291.[Abstract]
6 UNICEF. Evaluation of Multiple Indicator Cluster Surveys. 1997. http://www.unicef.org/reseval/pdfs/MICSrpt.pdf (Accessed 12 December 2002)
7 UNICEF. Evaluation of the UNICEF Multiple Indicator Cluster Surveys, Supplement G: Sampling for Multiple Indicator Cluster Surveys. New York, UNICEF, Division of Evaluation, Policy and Planning, 1998.
8 Kish L. Survey Sampling. New York: Wiley. 1965. Reprinted as Wiley Classics Library Edition 1995.
9 Adegbola RA, Usen S, Lloyd-Evans N et al. Haemophilus influenzae type b meningitis in The Gambia after introduction of a conjugate vaccine. Lancet 1999;354:109192.[CrossRef][ISI][Medline]
10 Gambia Government. National Report on Population and Development in The Gambia. Banjul: National Population Council, 1996.
11 Gambia Government. Missed Opportunities for Immunization in Talinding Kunjang. Banjul: Department of State for Health, 1999.
12 Bennett S. The EPI cluster sampling method: A critical appraisal. Proceedings of the International Statistical Institute 49th Session, Firenze 1993, pp. 2135.
13 Cochran WG. Sampling Techniques. 3rd Edn. New York: Wiley, 1977.
14 Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. Int J Epidemiol 1999;28:31926.[Abstract]
15 Brogan D, Flagg E, Deming M, Waldman R. Increasing the accuracy of the expanded programme on immunization's cluster survey design. Ann Epidemiol 1994;4:30211.[Medline]
16 Macro International Inc. Sampling Manual. DHS-III Basic Documentation No. 6. Calverton, MD: Macro International Inc., 1996.
|