Use of Generalized Linear Mixed Models in the Spatial Analysis of Small-Area Malaria Incidence Rates in KwaZulu Natal, South Africa

I. Kleinschmidt1, B. L. Sharp1, G. P. Y. Clarke2, B. Curtis1 and C. Fraser1

1 Medical Research Council (South Africa), Durban, South Africa.
2 Department of Statistics and Biometry, University of Natal, Pietermaritzburg, South Africa.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DATA
 ANALYSIS AND RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Spatial statistical analysis of 1994–1995 small-area malaria incidence rates in the population of the northernmost districts of KwaZulu Natal, South Africa, was undertaken to identify factors that might explain very strong heterogeneity in the rates. In this paper, the authors describe a method of adjusting the regression analysis results for strong spatial correlation in the rates by using generalized linear mixed models and variograms. The results of the spatially adjusted, multiple regression analysis showed that malaria incidence was significantly positively associated with higher winter rainfall and a higher average maximum temperature and was significantly negatively associated with increasing distance from water bodies. The statistical model was used to produce a map of predicted malaria incidence in the area, taking into account local variation from the model prediction if this variation was supported by the data. The predictor variables showed that even small differences in climate can have very marked effects on the intensity of malaria transmission, even in areas subject to malaria control for many years. The results of this study have important implications for malaria control programs in the area.

malaria; models; statistical; spatial analysis; variogram

Abbreviations: GIS, geographic information system; GLMM, generalized linear mixed model; SD, standard deviation


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DATA
 ANALYSIS AND RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
The districts of Ngwavuma and Ubombo in the northern part of the province of KwaZulu Natal have the highest malaria incidence rates in South Africa (1Go). Very large variation in the rates in these districts has not been properly accounted for, although it has been ascribed as much to human factors such as cross-border migration as to natural forces such as climate and environment.

The purpose of this study was to undertake a spatial statistical analysis of malaria incidence to identify important predictor variables and to produce an incidence map of the area illustrating the variation in malaria risk. A secondary, but important aim was to advance methodology for the spatial analysis and modeling of malaria transmission data in the context of the MARA/ARMA project (2Go), which, among other objectives, seeks to produce maps of malaria risk for the continent of Africa (3Go, 4Go).

In Africa, the predominant species of malaria-causing parasite is Plasmodium falciparum. When a person is bitten by an infected anopheles mosquito, the parasite, called a sporozoite during this stage of its cycle, enters the body via the mosquito's saliva, which is injected into the person's blood. The parasites multiply in the liver and reinvade the blood, via the red blood cells, as merozoites. The merozoites multiply sexually, some becoming gametocytes. Uninfected anopheles mosquitoes become infected if they feed on a person whose blood contains gametocytes. In the insect, the gametocytes undergo another phase of reproduction called the "sporogony" cycle. At the end of this cycle, the mosquito becomes infective as a new generation of sporozoites is able to infect another human host.

Therefore, malaria is bound closely to conditions that favor survival of the anopheles mosquito and the life cycle of the parasite. These conditions are determined predominantly by climatic factors, vegetation cover, and the vector's access to water surfaces for breeding (5GoGo–7Go). Human population movement from areas in which malaria is endemic to those in which the disease has been at least partially eradicated can also contribute to malaria transmission (8Go).

Accurate knowledge regarding the distribution of malaria is a crucial tool in planning and evaluating malaria control (9Go). Explaining this distribution is also important, since it provides a rationale for interventions and makes it possible to predict transmission intensity in places in which it has not been measured. The houses in the area under consideration have been sprayed with insecticide over several decades (1Go), resulting in reduced malaria incidence rates compared with the era before control measures were introduced (10Go). In recent years, incidence rates have again risen steeply. The study area is situated on the southern fringe of climatic suitability for endemic malaria distribution in Africa (11Go). Summer rainfall (6-month average, 68 mm per month) and temperatures (average maximum daily temperature, 29.4°C) are generally suitable for malaria transmission. On the other hand, winter conditions are suboptimal and could limit transmission of the disease (5Go) (average rainfall, 19 mm per month; average maximum daily temperature, 25.9°C).

We sought to test the hypothesis that the spatial distribution of climatic conditions in winter accounts for much of the variation in the intensity of malaria transmission in this region. To this end, we undertook a spatial statistical analysis of small-area malaria incidence rates in relation to rainfall and temperature. Because of their potential confounding effects, we included proximity to permanent water bodies and distance to the Mozambique border. The latter was used as a proxy for migration from Mozambique, where routine malaria control had not been implemented. Water bodies included all permanent freshwater surfaces except rivers.

In both the Ngwavuma and Ubombo districts, a small-area malaria incidence reporting system has been in operation for a number of years as part of the provincial malaria control program (12Go). A component of the control strategy is to identify and treat all infected persons. Therefore, both passive and active case finding is practiced. The latter consists of screening measures in which teams visit the community to encourage those persons suspected of having malaria to be tested. While this system may not achieve 100 percent coverage, it is thought to identify the vast majority of cases. Low levels of exposure to P. falciparum in the past have made it unlikely that the host population, apart from recent immigrants from Mozambique, has naturally acquired immunity. Therefore, persons are unlikely to possess clinical tolerance to parasite infection and to recover from it without treatment (5Go).

As part of the control program, a homestead-level population census was conducted in 1994, thus making it possible, with the use of a geographic information system (GIS), to derive population counts for the same small areas for which cases were being reported. It was decided to base the analysis on the July 1, 1994–June 30, 1995 malaria season, since the denominator data for this time period were the most reliable available for calculating incidence rates.


    DATA
 TOP
 ABSTRACT
 INTRODUCTION
 DATA
 ANALYSIS AND RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Cases of malaria (ascertained by passive and active detection) in the mid-1994 to mid-1995 season in the districts of Ngwavuma and Ubombo, northern KwaZulu Natal, were extracted from the malaria control program database and were allocated to 220 magisterial subdivisions referred to as sections. Population totals for each section, based on a census conducted during the 1994–1995 year, were obtained from the same source.

The total number of cases was 2,418, including 400 patients whose place of residence was not known and who therefore were excluded from the spatial analysis. Many of these latter persons were probably from Mozambique, with no fixed residence in the study area. The total population for the study area was 241,397 persons, resulting in a crude overall malaria incidence rate of 8.4 per 1,000 person-years.

Population per section ranged from 11 to 5,482 persons (median, 858; mean, 1,102; standard deviation (SD), 886). Area per section varied from 1.1 km2 to 263.2 km2 (median, 19.0; mean, 25.9; SD, 27.5).

For each section, a crude incidence rate for 1994–1995 was calculated. To calculate rates, we assumed that the entire population of an area was exposed to the risk of malaria during the period studied, that is, that each person contributed exactly 1 person-year of exposure. Crude incidence rates per section ranged from 0 to 306 cases of malaria per 1,000 person-years (mean, 11.2; SD, 32.0; interquartile range, 0–8 per 1,000). In all but 14 sections, the rate was less than 50 per 1,000 person-years. In 66 sections, there were no cases. Figure 1 shows the distribution of incidence rates by section.



View larger version (52K):
[in this window]
[in a new window]
 
FIGURE 1. Water bodies, malaria incidence rates, and smoothed malaria incidence rates for the population of the districts of Ngwavuma and Ubombo, KwaZulu Natal, South Africa, July 1994–June 1995 (Source: KwaZulu Natal Malaria Control Programme).

 
Long-term average climatic data (rainfall, daily maximum temperature) by calendar month were obtained from climate databases for Africa (13Go). By using GIS (14Go), we calculated the average value of each variable for each section by averaging the pixel values of the variable over the area in the section. The monthly averages were combined into 6-month averages to coincide with the winter (April–September) and summer (October–March) seasons. The average daily maximum temperature in winter was highly correlated with the average daily maximum temperature in summer (correlation coefficient = 0.995). Therefore, we combined these two variables into a single average daily maximum temperature for the whole year. We also calculated the average monthly rainfall and the average maximum daily temperature for the midwinter months of June and July combined.

The centroid of each section was also derived by using GIS. For each of these centroids, distance to water bodies (lakes, reservoirs, and dams; figure 1) and distance to the Mozambican border were calculated.


    ANALYSIS AND RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 DATA
 ANALYSIS AND RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Smoothed incidence map
Relatively small population totals per section result in incidence rates subject to considerable random error (15Go). The map of smoothed rates (figure 1) was produced by using the smoothing method proposed by Kafadar (16Go). The smoothing effect of the incidence rate in one area on another was limited to those areas whose centroids were less than 15 km apart.

Heterogeneity test
A Potthoff-Whittinghill heterogeneity test (17Go) was applied to the observed rates by using Monte Carlo simulation to distribute cases randomly among areas but in proportion to the populations of each area. This test showed that the observed distribution of cases was very unlikely to have resulted from chance alone (p < 0.0001) given a null hypothesis of no differences in underlying risk between areas.

Test for spatial pattern
The nonparametric D statistic (18Go) was used to test the observed incidence rates in the sections for a significant spatial pattern. The statistic is calculated as a weighted average of rank differences in incidence rates for all pairs of sections; the weights favor pairs of areas in close proximity, which results in a low D value if there is marked spatial correlation. We used binary mutual neighborhood weights (i.e., a weight of 1 if two areas shared a common boundary, 0 otherwise), which were obtained by using a GIS program written for this purpose. The calculated value of D for the observed incidence rates was 34.3, while the expected value of D given the adjacency configuration of the areas was 194 (SD, 5.67), indicating that there was strong evidence of a spatial pattern in these rates (p < 0.0001), as might be expected.

Association between malaria incidence and climatic and environmental factors
To determine the climatic and environmental covariates that are associated with observed incidence rates while allowing for spatial correlation of the data, we used an iterative approach of variograms and generalized linear mixed models (GLMM) (19Go). We chose this approach since we expected it to include the optimal properties (best linear unbiased prediction (20Go)) of the mixed model.

The Appendix provides details of how the spatial correlation of the data can be derived from a variogram of model residuals and how this spatial correlation can be taken into account by implementing the GLMM using SAS software (21Go). The overall strategy for modeling spatially correlated incidence data is described in the paragraphs that follow.

The counts of malaria cases in each section were represented by a GLMM with a Poisson distribution, a logarithmic link function, and a correlated error structure as described in the Appendix, and with the population of the section as an offset. (The offset is a term added to the Poisson model on the logarithmic scale so the count of cases is adjusted for population size.) Potential explanatory variables were the average values, for the winter and summer seasons separately, of monthly rainfall, annual daily maximum temperature, distance to the nearest water body, and distance to the Mozambique border. Average monthly rainfall and average daily maximum temperature combined for the midwinter months of June and July also were included as candidate explanatory variables. Specification of the correlated error structure in the GLMM due to a spatial pattern in the data was improved iteratively by examining the model residuals and respecifying the covariance matrix, as detailed in the Appendix.

The final model contained three variables that were significantly associated with malaria incidence: average daily maximum temperature, average monthly rainfall during March–September, and distance to water bodies. Once these variables were selected, we investigated, for each of the three variables in turn, whether any of a set of transformations of the variable would result in a significant reduction in deviance in the multiple regression Poisson model, following a procedure suggested by Royston et al. (22Go). The transformation producing the biggest reduction in residual deviance was chosen if this reduction was at least 3.84 compared with the untransformed variable. Transformations that were tried for each variable x were 1/x2, 1/x, 1/x0.5, ln(x), x0.5, and x2, without simultaneous inclusion of the untransformed variable (x1) in the model. Transformations are useful in representing relations in which the log incidence rate increases more rapidly than a straight line at low values of x and more slowly at high values, or vice versa. Any resulting improvement in the fit of the model enhances its utility for the purpose of prediction.

The variogram of residuals of the final model (figure 2) shows that there was residual spatial dependence after fitting the model. Specification of a covariance structure in the GLMM based on the spatial correlation of residuals ensures that the results are adjusted for this spatial correlation.



View larger version (10K):
[in this window]
[in a new window]
 
FIGURE 2. Variogram of deviance residuals of the final model of malaria incidence rates for the population of the districts of Ngwavuma and Ubombo, KwaZulu Natal, South Africa, for the July 1994–June 1995 season.

 
The incidence rate ratios for the final model are shown in table 1, together with the estimated effect that these variables by themselves would have on incidence rates. Distance to water and winter rainfall were fitted as square root and logarithmic transformations, respectively, whereas maximum temperature fit best without any transformation. To facilitate reporting and interpretation of transformed variables, we calculated the model-based incidence rate ratio for the transformed variable for two separate values of the variable relative to a third (referent) value of the variable (22Go). The referent value chosen was the mean of the untransformed variable minus one standard deviation; the other two values were the mean itself and the mean plus one standard deviation. Therefore, the two incidence rate ratios demonstrating the association between the transformed variable and malaria incidence were one and two standard deviations removed from the referent value, and they provided some indication of the nonlinear nature of the association.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Final model of the climatic and topographic factors and their effect on malaria incidence in the districts of Ngwavuma and Ubombo, KwaZulu Natal, South Africa, 1994–1995*

 
Throughout this analysis, there was evidence of considerable overdispersion of the Poisson model, with a variance greater than the mean (dispersion factor in the final model, 11.5). This overdispersion was accounted for by inflating the standard errors in the model by the square root of the overdispersion factor (19Go). Overdispersion indicated other unmeasured sources of variation that could not be taken into account.

We grouped observed and predicted malaria incidence rates per 1,000 into categories of less than 1, 1–5, 5–10, 10–50, and more than 50. Doing so provided a weighted kappa statistic of 83 percent for agreement between observed and predicted categories.

Model-based prediction map
Using the model derived above, we produced a map of predicted malaria incidence rates by following a two-stage approach developed previously to map malaria prevalences (4Go). Kriged residuals from the final model were added to the model predictions and exponentiated (exp(linear prediction + kriged residual)) to produce a map of incidence rates that took into account local deviation from the model predictions, when supported by the data (figure 3).



View larger version (51K):
[in this window]
[in a new window]
 
FIGURE 3. Map showing the model-based prediction of malaria incidence for the population of the districts of Ngwavuma and Ubombo, KwaZulu Natal, South Africa, for the July 1994–June 1995 season.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 DATA
 ANALYSIS AND RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Historical comparisons of malaria incidence with the period prior to introduction of malaria control (1Go) and comparisons with neighboring countries suggest that house spraying has resulted in a significant reduction in the intensity of transmission in KwaZulu Natal. Our analysis found that even with this relatively low level of transmission, the overwhelming factors related to variation in malaria incidence were the same climatic factors that are also associated with malaria distribution in endemic areas (5Go, 11Go). The prediction map (figure 3) shows that the combination of proximity to water bodies, winter rainfall, and maximum temperature was closely associated with high malaria risk in the northwest portion of the study area and with a moderately high intensity of transmission in the border areas along the northern and western parts of the area. Although cross-border migration of infected persons and the proximity of uncontrolled areas across the border may further add to the intensity of transmission in border areas, our model showed that natural factors help to explain the higher transmission levels found in these areas. Distance to the Mozambique border (the surrogate measure of cross-border migration), summer rainfall, and average rainfall and average daily maximum temperature in June and July were not significant once other variables were included in the model.

Our model underlines the importance of adjusting for confounding effects when investigating the association between malaria incidence and climatic factors. Our study area was marked by a significantly higher winter rainfall along the coast than in the interior. On the other hand, the coastal areas had lower daily maximum temperatures than the inland areas. Both of these factors were closely associated with malaria incidence, but, because of the negative correlation between winter rainfall and maximum temperature (correlation coefficient = -0.89), the former by itself was negatively associated with malaria incidence (table 1). The same negative confounding effect between rainfall and temperature was responsible for the large difference between the sizes of the adjusted and unadjusted rate ratios for maximum temperature. The multiple regression model showed that, after adjustment for maximum temperature, winter rainfall was positively associated with malaria incidence. Similarly, the effect of temperature, after adjustment for the effect of rainfall and, to a lesser extent, distance to water, was much larger than the single-predictor relation between temperature and malaria incidence suggests. The large magnitude in the rate ratios should be viewed in the context of very low incidence rates in much of the study area; incidence rates in the areas in which risk was higher were many tens of times higher. For this reason, the incidence rate ratios cannot be extrapolated to situations of higher endemicity levels, found in the areas further north. Furthermore, the likely absence of immunity in the study population has the effect that any increase in transmission intensity is directly translated into higher incidence rates.

The very high collinearity between the average daily maximum temperatures in summer and in winter means that we were unable to test the hypothesis that variation in the average daily maximum temperature in winter is a factor in determining variation in malaria risk. In the study area, there is virtually no variation in average daily maximum temperature in winter independent of average daily maximum temperature in summer.

According to our results, average daily maximum temperature has a large effect on malaria incidence, with a rate ratio of 135.6 per degree centigrade. Our study area is at the southern limit of malaria distribution, and an association with temperature, particularly winter temperature, would not be surprising because of its effect on both parasite and vector development (11Go). In the study area, annual average daily maximum temperatures vary from 26.5°C to 29°C; some places have much lower maxima during some of the winter months. The temperature of small water bodies around which the vectors overwinter is likely to be well below the maximum daily air temperature. Low water temperatures have been shown to have a significant negative impact on mosquito abundance because of long larval duration (23Go), while fairly high air temperatures for at least part of the day keep adult mosquito life spans reasonably short. At the same time, any reduction in air temperature leads to an increased incubation period in the vector (the time needed for production of sporozoites in the female mosquito that infect humans) (5Go), thereby reducing the chances that adult vectors will survive long enough to become infective. Therefore, this result is in accordance with the Macdonald model, which expresses the dependence of the basic reproduction rate of malaria in terms of the daily survival probability of the vector and the length of the incubation period (5Go, 24Go).

Winter rainfall obviously affects winter vegetation and sustains smaller water bodies in which vector populations can survive. The range in winter rainfall is quite large in the study area, and some places are fairly dry, averaging less than 15 mm per month during the winter half of the year. This fact would suggest a lengthy period during which no breeding sites are available in these areas. The nonlinear nature of the relation between winter rainfall and malaria incidence suggests that some places are below the threshold needed before additional rain can be associated with a large increase in malaria incidence. The distribution of winter rainfall in this area is a plausible constraint on vector survival in winter and hence on malaria transmission.

Under these circumstances, mosquitoes may be forced to overwinter around permanent water bodies from which populations can spread out as more transient water bodies form after early summer rains. Permanent water bodies may be a last-resort breeding habitat for Anopheles gambiae because of the presence of predators (25Go). However, these water bodies are often surrounded by smaller, less permanent ones that are more suitable for breeding. Whatever the dynamics of survival around these sites, our model shows that their proximity is an additional risk factor for malaria. As a result of the nonlinearity of the relation, the effect of a unit increase in distance from water bodies attenuates with the distance from them.

Spatial correlation in the model residuals up to 20 km between pairs of sections (figure 2) suggests that there are unmeasured spatially structured sources of variation not accounted for by the model. The range of this spatial pattern in residuals is in excess of the flight distances of mosquitoes (6Go). Some of this unexplained variation could be due to deviation of climatic effects during the particular study year from the pattern of the long-term averages used in the analysis. Other possible sources of spatially dependent variation in incidence rates may be the diligence with which insecticide spraying was carried out by teams in particular sections, emergence of insecticide and drug resistance, and differences in the host populations.

The two significant climatic factors in our analysis both influence survival of vector populations in winter. Therefore, an important conclusion is that in this area, control activities can become more effective by focusing on the eradication of winter populations of vectors. Any areas with a combination of high average daily maximum temperature and high winter rainfall are at risk. The additional risk posed by water bodies should be considered when new irrigation schemes are envisaged. The highly uneven distribution of malaria in the Ngwavuma and Ubombo districts justifies an uneven application of control activities, with more effort being concentrated in the higher risk areas.

This study had a number of limitations that could have affected the results. First, only 83 percent of the cases could be allocated to their geographic areas. We did not include the remaining 17 percent of the cases in the spatial analysis because we assumed that they were either transients or constituted a random sample of all cases. Second, incidence rates could not be adjusted for age and sex, since demographic data were not available for the population counts. Previous studies have shown that the age distribution of cases in this area closely follows the age distribution of the population, so adjustment for age would have had a negligible effect on the results (10Go). Third, large databases of reporting data are always prone to inaccuracies, omissions, and duplicates (26Go). As long as these errors are distributed randomly, any significant associations are attenuated (27Go). Fourth, potential bias caused by the system of active case finding cannot be ruled out, but we have no reason to think that such underreporting favored some areas over others. A recent health survey conducted in this area showed that health-seeking behavior in general does not appear to be affected by the distance of a residence from health facilities (Joyce Tsoka, Medical Research Council, Durban, personal communication, 1999).

Our study accounted for spatial correlation in the data by iteratively improving the measurement of residual correlation and subsequently specifying the covariance matrix of the data in a GLMM. Ignoring spatial correlation can result in explanatory variables apparently being associated with incidence as a result of overstatement of the degrees of freedom in the data and consequent underestimation of the sizes of standard errors. In our analysis, the standard errors of the regression coefficients were underestimated by as much as 35 percent in the model that ignored spatial effects compared with the model that adjusted for spatial effects. The advantage of using the GLMM approach is that it can be extended to incorporate additional data requiring further random effects to be specified. For example, we would like to extend this analysis to consider several years of data for the same small areas by specifying the year of each annual count as a random effect and allowing for temporal correlation. Such spatial-temporal analysis would show whether unusual weather conditions in a given winter predict unusual malaria incidence during the ensuing malaria season.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 DATA
 ANALYSIS AND RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Iterative Procedure for Specifying Spatial Dependence in the Generalized Linear Mixed Model
The SAS procedure MIXED (SAS Institute, Inc., Cary, North Carolina) is an implementation of mixed-model methodology for Gaussian response variables. For non-Gaussian responses, such as incidence rates, the macro GLIMMIX implements the generalized linear mixed model. GLIMMIX produces estimates via the PROC MIXED procedure (SAS Institute, Inc.) and hence provides similar functionality. We used GLIMMIX to adjust for spatial dependence in the regression analysis

Standard generalized linear model
With the classical generalized linear model (28Go), a vector of observations y is assumed to have uncorrelated elements. In our particular application, the model assumes that the yi are Poisson distributed; if mean(y) = µ, then log(µ) = , a linear function of the explanatory variables X; var(y) = V, a diagonal matrix with unequal elements, since the variance of a Poisson variate depends on the mean.

Linear mixed model allowing for a correlated error structure
In the linear mixed model, as implemented in the SAS procedure MIXED, it is possible for observations, y, which are assumed to be distributed normally, to have a spatial correlation structure. In particular, if dij denotes the distance between points i and j, where observations yi and yj were made, V = I{sigma}12 + F{sigma}2, where Fij = exp(-dij/{rho}). The unknown parameters in this model, namely, {sigma}12 (the nuggett), {sigma}2 (the sill), and {rho} (the range), can be obtained from a variogram of the data, as described below. They are jointly referred to as . This particular spatial model is known as the exponential model, and it effectively postulates that the correlation increases as points occur closer together, whereas points at distances greater than the range from one another are uncorrelated. Fij can be defined by other spatial models (19Go). In the PROC MIXED implementation, only the variogram parameters, {theta}, are specified so that the SAS procedure can be used to calculate V. For nonnormally distributed data, {theta} is specified in the macro call to enable GLIMMIX to estimate the appropriate correlation matrix for a spatially correlated Poisson model. However, since the Poisson model does not satisfy the requirement of constant variance, estimation of {theta} is carried out on standardized residuals, as explained below.

Variogram approach to estimating {theta}
In classical kriging (29Go), the following approach is adopted: If it can be assumed that the mean value of y does not change from point to point (the characteristic of stationarity), a variogram can be constructed. The pairs of observations are arranged so that all pairs a given distance of h±h/2 apart are pooled into one class, and the semivariance {gamma}(h) = 1/2 average(yiyj)2 is calculated. The vari-ogram is a plot of {gamma}(h) against distance for distances h, 2h, 3h, 4h . . . etc. (19Go). Estimates of {theta} are obtained graphically. We used the software package GS-Windows (30Go) for this purpose.

In our application, the variogram was not constructed from y because of the nonstationarity and the Poisson distribution (implying nonconstant variance). Instead, we used the signed deviance residuals calculated from r = sign(y - µ){2(ylog(y/µ) - y + µ)}1/2, where y and µ are the observed and fitted values of the response variable, respectively (28Go).

Fitting the GLMM and iteration between the variogram and GLMM
The mixed-model estimates of ß and {theta} (19Go) can be obtained jointly via the SAS procedure MIXED. However, in the context of the GLIMMIX implementation (i.e., non-Gaussian models), interpretation of the output relating to the spatial parameters {theta} is not straightforward, and convergence problems can arise. Our approach was to fix {theta}; use GLIMMIX to estimate ß; and then, from the variogram of residuals, r, revise the estimate of {theta}. In other words, accommodation of correlation between elements of y due to spatial effects was achieved by converting from the observed y to standardized residuals r, which have the property of constant variance and mean. The following steps were carried out:

  1. An initial estimate 0 of ß was made by assuming no spatial correlation, that is, with V as a diagonal matrix. The deviance residuals r0 were calculated by using 0, and a variogram was constructed by using GS- Windows. Initial estimates 0 of {theta} were made as described above.
  2. GLIMMIX was then used with {theta} being fixed at the values 0.
  3. From GLIMMIX, a new set of estimates was found; hence, a new set of deviance residuals was calculated. These were used in a new cycle to redraw the variogram and thus derive fresh estimates of 0.

Steps 2 and 3 form an iterative cycle that continued until there was no further change in the estimates. Assuming that any spatial correlation was positive rather than negative, standard errors of 0 might have been underestimated rather than overestimated. Adjustment for spatial correlation may therefore lead to removal of some variables from the model whose contributions were overstated initially. The criterion for dropping a variable from the model was p > 0.05.

Producing predicted values by kriging
Figure 2 shows that the residuals of the final model remained spatially correlated. Therefore, map predictions could be improved by kriging the residuals. We did so by calculating the log(mean y) using and then adding to this (on the log scale) the kriged predictor for the deviance residual, r, at that point (4Go). These kriged predictions were then transformed back into predicted incidence rates (figure 3).


    ACKNOWLEDGMENTS
 
The authors thank Dr. Tom Smith for his critical comments on an earlier draft of this paper and Carrin Martin for providing the digitized data on water bodies.


    NOTES
 
Reprint requests to Immo Kleinschmidt, Medical Research Council (South Africa), P.O. Box 17120, Congella, Durban 4013, South Africa (e-mail: kleinsci{at}mrc.ac.za).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DATA
 ANALYSIS AND RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 

  1. Sharp BL, Le Sueur D. Malaria in South Africa—the past, the present and selected implications for the future. S Afr Med J 1996;86:83–9.[ISI][Medline]
  2. Towards an atlas of malaria risk in Africa: first technical report of the MARA/ARMA collaboration. Durban, South Africa: MARA/ARMA, December 1998. (http://www.mara.org.za).
  3. Snow RW, Gouws E, Omumbo JA, et al. Models to predict the intensity of Plasmodium falciparum transmission: applications to the burden of disease in Kenya. Trans R Soc Trop Med Hyg 1998;92:601–6.[ISI][Medline]
  4. Kleinschmidt I, Bagayako M, Clarke GPY, et al. A spatial statistical approach to malaria mapping. Int J Epidemiol 2000;29:355–61.[Abstract/Free Full Text]
  5. Molineaux L. The epidemiology of human malaria as an explanation of its distribution, including some implications for its control. In: Wernsdorfer WH, McGregor I, eds. Malaria: principles and practice of malariology. Vol 2. London, United Kingdom: Churchill Livingstone, 1988:445, 936–8.
  6. Gillies MT, De Meillon B. The Anophelinae of Africa south of the Sahara. Johannesburg, South Africa: The South African Institute for Medical Research, 1968:212–19.
  7. Ghebreyesus TA, Haile M, Witten KH, et al. Incidence of malaria among children living near dams in northern Ethiopia: community based incidence survey. BMJ 1999;319:663–6.[Abstract/Free Full Text]
  8. Martens P, Hall L. Malaria on the move: human population movement and malaria transmission. Emerg Infect Dis 2000March–April:6. (Online serial, World Wide Web URL: http://www.cdc.gov/ncidod/EID/eid.htm).
  9. Snow RW, Marsh K, LeSeur D. The need for maps of transmission intensity to guide malaria control in Africa. Parasitol Today 1996;12:455–7.[ISI]
  10. Sharp BL, Ngxongo S, Botha MJ, et al. An analysis of 10 years of retrospective malaria data from the KwaZulu areas of Natal. S Afr J Sci 1988;84:102–6.[ISI]
  11. Craig MH, Snow RW, Le Sueur D. A climate-based distribution model of malaria transmission in Africa. Parasitol Today 1999;15:105–11.[ISI][Medline]
  12. Sharp B, Fraser C, Naidoo K, et al. Computer assisted health information system for malaria control. Presented at the MIM (Multilateral Initiative on Malaria) African Malaria Conference, Durban, South Africa, March 14–19, 1999.
  13. Hutchinson MF, Nix HA, McMahon JP, et al. Climate data: a topographic and climate database. (CD-ROM). Canberra, Australia: Centre for Resource and Environmental Studies, The Australian National University, 1995.
  14. Clark Labs. Idrisi for Windows version 2.008. Worcester, MA: The Idrisi Project, Clark University, 1998.
  15. Cuzick J, Elliott P. Small-area studies: purpose and methods. In: Cuzick J, Elliott P, eds. Geographical and environmental epidemiology: methods for small area studies. Oxford, United Kingdom: Oxford University Press, 1992:14–21.
  16. Kafadar K. Smoothing geographical data, particularly rates of disease. Stat Med 1996;15:2539–60.[ISI][Medline]
  17. Potthoff RF, Whittinghill M. Testing for homogeneity. II. The Poisson distribution. Biometrika 1966;53:167–82.[ISI][Medline]
  18. Walter SD. A simple test for spatial pattern in regional health data. Stat Med 1994;13:1037–44.[ISI][Medline]
  19. Littell RC, Milliken GA, Stroup WW, et al. SAS system for mixed models. Cary, NC: SAS Institute Inc, 1996:229–251, 305, 307, 423–60.
  20. Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. New York, NY: Springer, 2000:80.
  21. SAS Institute Inc. SAS system version 6.12. Cary, NC: SAS Institute Inc, 1996.
  22. Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 1999;28:964–74.[Abstract]
  23. Le Sueur D. The ecology, over-wintering and population dynamics of the pre-imaginal stages of the Anopheles gambiae Giles complex (Diptera: Culicidae) in northern Natal, South Africa. Doctoral thesis. Natal, South Africa: University of Natal, Pietermaritzburg, 1991:134–85.
  24. Bruce-Chwatt LJ. Essential malariology. London, United Kingdom: Heineman Medical Books Ltd, 1980:149–59.
  25. Le Sueur D, Sharp BL. The breeding requirements of three members of the Anopheles gambiae Giles complex in the endemic malaria area of Natal, South Africa. Bull Entomol Res 1988;78:549–60.[ISI]
  26. Kleinschmidt I, Pattenden P, Walls P, et al. A national health statistics database: data quality requirements for small area health studies. In: Frederiksen P, ed. Proceedings of Eurocarto XII Conference on Geo related databases, Copenhagen, Denmark, October 1994.
  27. MacMahon S, Peto R, Cutler J, et al. Blood pressure, stroke, and coronary heart disease. Part 1. Prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias. Lancet 1990;335:765–74.[ISI][Medline]
  28. McCullagh P, Nelder JA. Generalised linear models. 2nd ed. New York, NY: Chapman and Hall, 1989:37–40.
  29. Cressie NAC. Statistics for spatial data. New York, NY: John Wiley & Sons, 1991.
  30. GS+ geostatistics for the environmental sciences, version 3.10.2 beta. Plainwell, MI: Gamma Design Software, 1998.
Received for publication September 21, 1999. Accepted for publication October 20, 2000.