1 Medical Research Council (South Africa), Durban, South Africa.
2 Department of Statistics and Biometry, University of Natal, Pietermaritzburg, South Africa.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
malaria; models; statistical; spatial analysis; variogram
Abbreviations: GIS, geographic information system; GLMM, generalized linear mixed model; SD, standard deviation
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The purpose of this study was to undertake a spatial statistical analysis of malaria incidence to identify important predictor variables and to produce an incidence map of the area illustrating the variation in malaria risk. A secondary, but important aim was to advance methodology for the spatial analysis and modeling of malaria transmission data in the context of the MARA/ARMA project (2), which, among other objectives, seeks to produce maps of malaria risk for the continent of Africa (3
, 4
).
In Africa, the predominant species of malaria-causing parasite is Plasmodium falciparum. When a person is bitten by an infected anopheles mosquito, the parasite, called a sporozoite during this stage of its cycle, enters the body via the mosquito's saliva, which is injected into the person's blood. The parasites multiply in the liver and reinvade the blood, via the red blood cells, as merozoites. The merozoites multiply sexually, some becoming gametocytes. Uninfected anopheles mosquitoes become infected if they feed on a person whose blood contains gametocytes. In the insect, the gametocytes undergo another phase of reproduction called the "sporogony" cycle. At the end of this cycle, the mosquito becomes infective as a new generation of sporozoites is able to infect another human host.
Therefore, malaria is bound closely to conditions that favor survival of the anopheles mosquito and the life cycle of the parasite. These conditions are determined predominantly by climatic factors, vegetation cover, and the vector's access to water surfaces for breeding (57
). Human population movement from areas in which malaria is endemic to those in which the disease has been at least partially eradicated can also contribute to malaria transmission (8
).
Accurate knowledge regarding the distribution of malaria is a crucial tool in planning and evaluating malaria control (9). Explaining this distribution is also important, since it provides a rationale for interventions and makes it possible to predict transmission intensity in places in which it has not been measured. The houses in the area under consideration have been sprayed with insecticide over several decades (1
), resulting in reduced malaria incidence rates compared with the era before control measures were introduced (10
). In recent years, incidence rates have again risen steeply. The study area is situated on the southern fringe of climatic suitability for endemic malaria distribution in Africa (11
). Summer rainfall (6-month average, 68 mm per month) and temperatures (average maximum daily temperature, 29.4°C) are generally suitable for malaria transmission. On the other hand, winter conditions are suboptimal and could limit transmission of the disease (5
) (average rainfall, 19 mm per month; average maximum daily temperature, 25.9°C).
We sought to test the hypothesis that the spatial distribution of climatic conditions in winter accounts for much of the variation in the intensity of malaria transmission in this region. To this end, we undertook a spatial statistical analysis of small-area malaria incidence rates in relation to rainfall and temperature. Because of their potential confounding effects, we included proximity to permanent water bodies and distance to the Mozambique border. The latter was used as a proxy for migration from Mozambique, where routine malaria control had not been implemented. Water bodies included all permanent freshwater surfaces except rivers.
In both the Ngwavuma and Ubombo districts, a small-area malaria incidence reporting system has been in operation for a number of years as part of the provincial malaria control program (12). A component of the control strategy is to identify and treat all infected persons. Therefore, both passive and active case finding is practiced. The latter consists of screening measures in which teams visit the community to encourage those persons suspected of having malaria to be tested. While this system may not achieve 100 percent coverage, it is thought to identify the vast majority of cases. Low levels of exposure to P. falciparum in the past have made it unlikely that the host population, apart from recent immigrants from Mozambique, has naturally acquired immunity. Therefore, persons are unlikely to possess clinical tolerance to parasite infection and to recover from it without treatment (5
).
As part of the control program, a homestead-level population census was conducted in 1994, thus making it possible, with the use of a geographic information system (GIS), to derive population counts for the same small areas for which cases were being reported. It was decided to base the analysis on the July 1, 1994June 30, 1995 malaria season, since the denominator data for this time period were the most reliable available for calculating incidence rates.
![]() |
DATA |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The total number of cases was 2,418, including 400 patients whose place of residence was not known and who therefore were excluded from the spatial analysis. Many of these latter persons were probably from Mozambique, with no fixed residence in the study area. The total population for the study area was 241,397 persons, resulting in a crude overall malaria incidence rate of 8.4 per 1,000 person-years.
Population per section ranged from 11 to 5,482 persons (median, 858; mean, 1,102; standard deviation (SD), 886). Area per section varied from 1.1 km2 to 263.2 km2 (median, 19.0; mean, 25.9; SD, 27.5).
For each section, a crude incidence rate for 19941995 was calculated. To calculate rates, we assumed that the entire population of an area was exposed to the risk of malaria during the period studied, that is, that each person contributed exactly 1 person-year of exposure. Crude incidence rates per section ranged from 0 to 306 cases of malaria per 1,000 person-years (mean, 11.2; SD, 32.0; interquartile range, 08 per 1,000). In all but 14 sections, the rate was less than 50 per 1,000 person-years. In 66 sections, there were no cases. Figure 1 shows the distribution of incidence rates by section.
|
The centroid of each section was also derived by using GIS. For each of these centroids, distance to water bodies (lakes, reservoirs, and dams; figure 1) and distance to the Mozambican border were calculated.
![]() |
ANALYSIS AND RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Heterogeneity test
A Potthoff-Whittinghill heterogeneity test (17) was applied to the observed rates by using Monte Carlo simulation to distribute cases randomly among areas but in proportion to the populations of each area. This test showed that the observed distribution of cases was very unlikely to have resulted from chance alone (p < 0.0001) given a null hypothesis of no differences in underlying risk between areas.
Test for spatial pattern
The nonparametric D statistic (18) was used to test the observed incidence rates in the sections for a significant spatial pattern. The statistic is calculated as a weighted average of rank differences in incidence rates for all pairs of sections; the weights favor pairs of areas in close proximity, which results in a low D value if there is marked spatial correlation. We used binary mutual neighborhood weights (i.e., a weight of 1 if two areas shared a common boundary, 0 otherwise), which were obtained by using a GIS program written for this purpose. The calculated value of D for the observed incidence rates was 34.3, while the expected value of D given the adjacency configuration of the areas was 194 (SD, 5.67), indicating that there was strong evidence of a spatial pattern in these rates (p < 0.0001), as might be expected.
Association between malaria incidence and climatic and environmental factors
To determine the climatic and environmental covariates that are associated with observed incidence rates while allowing for spatial correlation of the data, we used an iterative approach of variograms and generalized linear mixed models (GLMM) (19). We chose this approach since we expected it to include the optimal properties (best linear unbiased prediction (20
)) of the mixed model.
The Appendix provides details of how the spatial correlation of the data can be derived from a variogram of model residuals and how this spatial correlation can be taken into account by implementing the GLMM using SAS software (21). The overall strategy for modeling spatially correlated incidence data is described in the paragraphs that follow.
The counts of malaria cases in each section were represented by a GLMM with a Poisson distribution, a logarithmic link function, and a correlated error structure as described in the Appendix, and with the population of the section as an offset. (The offset is a term added to the Poisson model on the logarithmic scale so the count of cases is adjusted for population size.) Potential explanatory variables were the average values, for the winter and summer seasons separately, of monthly rainfall, annual daily maximum temperature, distance to the nearest water body, and distance to the Mozambique border. Average monthly rainfall and average daily maximum temperature combined for the midwinter months of June and July also were included as candidate explanatory variables. Specification of the correlated error structure in the GLMM due to a spatial pattern in the data was improved iteratively by examining the model residuals and respecifying the covariance matrix, as detailed in the Appendix.
The final model contained three variables that were significantly associated with malaria incidence: average daily maximum temperature, average monthly rainfall during MarchSeptember, and distance to water bodies. Once these variables were selected, we investigated, for each of the three variables in turn, whether any of a set of transformations of the variable would result in a significant reduction in deviance in the multiple regression Poisson model, following a procedure suggested by Royston et al. (22). The transformation producing the biggest reduction in residual deviance was chosen if this reduction was at least 3.84 compared with the untransformed variable. Transformations that were tried for each variable x were 1/x2, 1/x, 1/x0.5, ln(x), x0.5, and x2, without simultaneous inclusion of the untransformed variable (x1) in the model. Transformations are useful in representing relations in which the log incidence rate increases more rapidly than a straight line at low values of x and more slowly at high values, or vice versa. Any resulting improvement in the fit of the model enhances its utility for the purpose of prediction.
The variogram of residuals of the final model (figure 2) shows that there was residual spatial dependence after fitting the model. Specification of a covariance structure in the GLMM based on the spatial correlation of residuals ensures that the results are adjusted for this spatial correlation.
|
|
We grouped observed and predicted malaria incidence rates per 1,000 into categories of less than 1, 15, 510, 1050, and more than 50. Doing so provided a weighted kappa statistic of 83 percent for agreement between observed and predicted categories.
Model-based prediction map
Using the model derived above, we produced a map of predicted malaria incidence rates by following a two-stage approach developed previously to map malaria prevalences (4). Kriged residuals from the final model were added to the model predictions and exponentiated (exp(linear prediction + kriged residual)) to produce a map of incidence rates that took into account local deviation from the model predictions, when supported by the data (figure 3).
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our model underlines the importance of adjusting for confounding effects when investigating the association between malaria incidence and climatic factors. Our study area was marked by a significantly higher winter rainfall along the coast than in the interior. On the other hand, the coastal areas had lower daily maximum temperatures than the inland areas. Both of these factors were closely associated with malaria incidence, but, because of the negative correlation between winter rainfall and maximum temperature (correlation coefficient = -0.89), the former by itself was negatively associated with malaria incidence (table 1). The same negative confounding effect between rainfall and temperature was responsible for the large difference between the sizes of the adjusted and unadjusted rate ratios for maximum temperature. The multiple regression model showed that, after adjustment for maximum temperature, winter rainfall was positively associated with malaria incidence. Similarly, the effect of temperature, after adjustment for the effect of rainfall and, to a lesser extent, distance to water, was much larger than the single-predictor relation between temperature and malaria incidence suggests. The large magnitude in the rate ratios should be viewed in the context of very low incidence rates in much of the study area; incidence rates in the areas in which risk was higher were many tens of times higher. For this reason, the incidence rate ratios cannot be extrapolated to situations of higher endemicity levels, found in the areas further north. Furthermore, the likely absence of immunity in the study population has the effect that any increase in transmission intensity is directly translated into higher incidence rates.
The very high collinearity between the average daily maximum temperatures in summer and in winter means that we were unable to test the hypothesis that variation in the average daily maximum temperature in winter is a factor in determining variation in malaria risk. In the study area, there is virtually no variation in average daily maximum temperature in winter independent of average daily maximum temperature in summer.
According to our results, average daily maximum temperature has a large effect on malaria incidence, with a rate ratio of 135.6 per degree centigrade. Our study area is at the southern limit of malaria distribution, and an association with temperature, particularly winter temperature, would not be surprising because of its effect on both parasite and vector development (11). In the study area, annual average daily maximum temperatures vary from 26.5°C to 29°C; some places have much lower maxima during some of the winter months. The temperature of small water bodies around which the vectors overwinter is likely to be well below the maximum daily air temperature. Low water temperatures have been shown to have a significant negative impact on mosquito abundance because of long larval duration (23
), while fairly high air temperatures for at least part of the day keep adult mosquito life spans reasonably short. At the same time, any reduction in air temperature leads to an increased incubation period in the vector (the time needed for production of sporozoites in the female mosquito that infect humans) (5
), thereby reducing the chances that adult vectors will survive long enough to become infective. Therefore, this result is in accordance with the Macdonald model, which expresses the dependence of the basic reproduction rate of malaria in terms of the daily survival probability of the vector and the length of the incubation period (5
, 24
).
Winter rainfall obviously affects winter vegetation and sustains smaller water bodies in which vector populations can survive. The range in winter rainfall is quite large in the study area, and some places are fairly dry, averaging less than 15 mm per month during the winter half of the year. This fact would suggest a lengthy period during which no breeding sites are available in these areas. The nonlinear nature of the relation between winter rainfall and malaria incidence suggests that some places are below the threshold needed before additional rain can be associated with a large increase in malaria incidence. The distribution of winter rainfall in this area is a plausible constraint on vector survival in winter and hence on malaria transmission.
Under these circumstances, mosquitoes may be forced to overwinter around permanent water bodies from which populations can spread out as more transient water bodies form after early summer rains. Permanent water bodies may be a last-resort breeding habitat for Anopheles gambiae because of the presence of predators (25). However, these water bodies are often surrounded by smaller, less permanent ones that are more suitable for breeding. Whatever the dynamics of survival around these sites, our model shows that their proximity is an additional risk factor for malaria. As a result of the nonlinearity of the relation, the effect of a unit increase in distance from water bodies attenuates with the distance from them.
Spatial correlation in the model residuals up to 20 km between pairs of sections (figure 2) suggests that there are unmeasured spatially structured sources of variation not accounted for by the model. The range of this spatial pattern in residuals is in excess of the flight distances of mosquitoes (6). Some of this unexplained variation could be due to deviation of climatic effects during the particular study year from the pattern of the long-term averages used in the analysis. Other possible sources of spatially dependent variation in incidence rates may be the diligence with which insecticide spraying was carried out by teams in particular sections, emergence of insecticide and drug resistance, and differences in the host populations.
The two significant climatic factors in our analysis both influence survival of vector populations in winter. Therefore, an important conclusion is that in this area, control activities can become more effective by focusing on the eradication of winter populations of vectors. Any areas with a combination of high average daily maximum temperature and high winter rainfall are at risk. The additional risk posed by water bodies should be considered when new irrigation schemes are envisaged. The highly uneven distribution of malaria in the Ngwavuma and Ubombo districts justifies an uneven application of control activities, with more effort being concentrated in the higher risk areas.
This study had a number of limitations that could have affected the results. First, only 83 percent of the cases could be allocated to their geographic areas. We did not include the remaining 17 percent of the cases in the spatial analysis because we assumed that they were either transients or constituted a random sample of all cases. Second, incidence rates could not be adjusted for age and sex, since demographic data were not available for the population counts. Previous studies have shown that the age distribution of cases in this area closely follows the age distribution of the population, so adjustment for age would have had a negligible effect on the results (10). Third, large databases of reporting data are always prone to inaccuracies, omissions, and duplicates (26
). As long as these errors are distributed randomly, any significant associations are attenuated (27
). Fourth, potential bias caused by the system of active case finding cannot be ruled out, but we have no reason to think that such underreporting favored some areas over others. A recent health survey conducted in this area showed that health-seeking behavior in general does not appear to be affected by the distance of a residence from health facilities (Joyce Tsoka, Medical Research Council, Durban, personal communication, 1999).
Our study accounted for spatial correlation in the data by iteratively improving the measurement of residual correlation and subsequently specifying the covariance matrix of the data in a GLMM. Ignoring spatial correlation can result in explanatory variables apparently being associated with incidence as a result of overstatement of the degrees of freedom in the data and consequent underestimation of the sizes of standard errors. In our analysis, the standard errors of the regression coefficients were underestimated by as much as 35 percent in the model that ignored spatial effects compared with the model that adjusted for spatial effects. The advantage of using the GLMM approach is that it can be extended to incorporate additional data requiring further random effects to be specified. For example, we would like to extend this analysis to consider several years of data for the same small areas by specifying the year of each annual count as a random effect and allowing for temporal correlation. Such spatial-temporal analysis would show whether unusual weather conditions in a given winter predict unusual malaria incidence during the ensuing malaria season.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Standard generalized linear model
With the classical generalized linear model (28), a vector of observations y is assumed to have uncorrelated elements. In our particular application, the model assumes that the yi are Poisson distributed; if mean(y) = µ, then log(µ) = Xß, a linear function of the explanatory variables X; var(y) = V, a diagonal matrix with unequal elements, since the variance of a Poisson variate depends on the mean.
Linear mixed model allowing for a correlated error structure
In the linear mixed model, as implemented in the SAS procedure MIXED, it is possible for observations, y, which are assumed to be distributed normally, to have a spatial correlation structure. In particular, if dij denotes the distance between points i and j, where observations yi and yj were made, V = I12 + F
2, where Fij = exp(-dij/
). The unknown parameters in this model, namely,
12 (the nuggett),
2 (the sill), and
(the range), can be obtained from a variogram of the data, as described below. They are jointly referred to as
. This particular spatial model is known as the exponential model, and it effectively postulates that the correlation increases as points occur closer together, whereas points at distances greater than the range from one another are uncorrelated. Fij can be defined by other spatial models (19
). In the PROC MIXED implementation, only the variogram parameters,
, are specified so that the SAS procedure can be used to calculate V. For nonnormally distributed data,
is specified in the macro call to enable GLIMMIX to estimate the appropriate correlation matrix for a spatially correlated Poisson model. However, since the Poisson model does not satisfy the requirement of constant variance, estimation of
is carried out on standardized residuals, as explained below.
Variogram approach to estimating
In classical kriging (29), the following approach is adopted: If it can be assumed that the mean value of y does not change from point to point (the characteristic of stationarity), a variogram can be constructed. The pairs of observations are arranged so that all pairs a given distance of h±h/2 apart are pooled into one class, and the semivariance
(h) =
average(yi yj)2 is calculated. The vari-ogram is a plot of
(h) against distance for distances h, 2h, 3h, 4h . . . etc. (19
). Estimates of
are obtained graphically. We used the software package GS-Windows (30
) for this purpose.
In our application, the variogram was not constructed from y because of the nonstationarity and the Poisson distribution (implying nonconstant variance). Instead, we used the signed deviance residuals calculated from r = sign(y - µ){2(ylog(y/µ) - y + µ)}, where y and µ are the observed and fitted values of the response variable, respectively (28
).
Fitting the GLMM and iteration between the variogram and GLMM
The mixed-model estimates of ß and (19
) can be obtained jointly via the SAS procedure MIXED. However, in the context of the GLIMMIX implementation (i.e., non-Gaussian models), interpretation of the output relating to the spatial parameters
is not straightforward, and convergence problems can arise. Our approach was to fix
; use GLIMMIX to estimate ß; and then, from the variogram of residuals, r, revise the estimate of
. In other words, accommodation of correlation between elements of y due to spatial effects was achieved by converting from the observed y to standardized residuals r, which have the property of constant variance and mean. The following steps were carried out:
Steps 2 and 3 form an iterative cycle that continued until there was no further change in the estimates. Assuming that any spatial correlation was positive rather than negative, standard errors of 0 might have been underestimated rather than overestimated. Adjustment for spatial correlation may therefore lead to removal of some variables from the model whose contributions were overstated initially. The criterion for dropping a variable from the model was p > 0.05.
Producing predicted values by kriging
Figure 2 shows that the residuals of the final model remained spatially correlated. Therefore, map predictions could be improved by kriging the residuals. We did so by calculating the log(mean y) using and then adding to this (on the log scale) the kriged predictor for the deviance residual, r, at that point (4
). These kriged predictions were then transformed back into predicted incidence rates (figure 3).
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|