1 Research Unit in Epidemiology, Information Systems, and Modelisation (INSERM U707), National Institute of Health and Medical Research, Paris, France
2 Department of Community Medicine, Malmö University Hospital, Lund University, Malmö, Sweden
3 Department of Society, Human Development and Health, Harvard School of Public Health, Boston, MA
4 Department of Epidemiology, Center for Social Epidemiology and Population Health (CSEPH), University of Michigan, Ann Arbor, MI
Correspondence to Dr. Basile Chaix, INSERM U707, Faculté de Médecine Saint-Antoine, 27, rue Chaligny, 75571 Paris cedex 12, France (e-mail: chaix{at}u707.jussieu.fr).
Received for publication October 29, 2004. Accepted for publication March 7, 2005.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
epidemiologic methods; logistic models; mental disorders; social environment; spatial analysis; substance-related disorders
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
This methodological question was motivated by an epidemiologic investigation of spatial variations in mental disorders in Malmö, Sweden, using data on all 65,830 residents aged 4059 years in 2001 geocoded at their exact place of residence. Several previous studies that investigated neighborhood variations in mental health as a general category reported only weak variations between neighborhoods (912
). Such variations were usually explained by differences in neighborhood composition (9
11
), but some analyses found that neighborhood deprivation was weakly, but significantly associated with deteriorated mental health (12
14
). However, we hypothesized that the absence of important neighborhood variations in mental health as a general category may conceal situations of strong context dependence for specific mental health outcomes. Therefore, we investigated spatial variations in mental and behavioral disorders due to psychoactive substance use, which, it was assumed, were particularly dependent on the social context.
After describing the spatial distribution of these mental disorders, our objective was to examine whether they were independently associated with the socioeconomic status of the context. Beyond spatial filtering of mentally ill individuals to places considered deprived (15), this relation may result from a direct influence of neighborhood deprivation on mental health (14
). Describing spatial variations and investigating whether contextual poverty is a marker for places of high prevalence aid in determining whether the intensity of intervention programs for mental health should vary over space.
Original to our approach is an emphasis on the public health relevance of determining, beyond the magnitude of spatial effects, the spatial scale on which these effects operateboth when describing the spatial distribution of mental disorders and when investigating the impact of contextual characteristics. For example, the existence of clusters of increased prevalence that extend beyond administrative neighborhoods may indicate that public health interventions should be coordinated on a larger scale than that of the neighborhood.
To describe the spatial distribution of disorders, we did not use cluster recognition techniques (16, 17
) but instead used regression approaches to identify individual and contextual characteristics contributing to clustering. We compared three modeling approaches geoadditive, multilevel, and hierarchical geostatisticalall of which build upon different notions of space for investigating the spatial distribution of mental disorders.
Regression models that rely on a space fragmented into administrative neighborhoods are affected by the modifiable areal unit problem (1820
): their results depend on the particular size and shape of the administrative neighborhoods (21
23
). To obtain precise cartographic information on mental disorders independent of neighborhood boundaries, we used a geoadditive model that captures spatial variations with a two-dimensional (longitude/latitude) smooth term (24
, 25
). Whereas spatial random effect approaches are often computationally unable to process the spatial coordinates for individuals, geoadditive models enabled us to use this accurate locational information to produce smoothed maps of prevalence (26
, 27
). However, this approach does not provide parametric information that would enable us to make inferences about the magnitude and scale of spatial variations. For example, it does not permit assessment of whether a similar prevalence noted in surrounding neighborhoods corresponds to real patterns of variation or simply results from the smoothing of data.
To make these inferences, we first used the multilevel model, considered the "gold standard" for contextual analysis. Taking into account the neighborhood affiliation of individuals using independent random effects that ignore spatial connections between neighborhoods, the multilevel model assumes that all spatial correlation can be reduced to within-neighborhood correlation. This model is not spatial because it does not incorporate any notion of space. Accordingly, it allows quantifying the magnitude of neighborhood variations, but it fails to provide information on their scale and does not indicate whether neighborhood variability follows a spatially organized pattern or consists of unstructured variations.
Many ecologic studies have explicitly modeled the spatial correlation of outcomes (2831
). However, there has been less effort to do so with individual data (32
35
). To obtain information on the scale of spatial variations, we used a hierarchical geostatistical model (32
) that geocodes individuals at the neighborhood level and splits neighborhood variability into a spatially structured component and an unstructured component (35
, 36
). Doing so allowed us to make statistical inferences on not only the magnitude of correlation within neighborhoods but also the range of correlation in space (33
, 34
).
Regarding the association between contextual deprivation and mental disorders, measuring deprivation within administrative neighborhoods may be overly restrictive because individual locational information was available. For one thing, the administrative neighborhood scale may be too broad to capture the effect of contextual deprivation. Moreover, using fixed boundary areas may not enable capture of contextual information in surrounding space for those individuals residing on the margins of administrative neighborhoods. Therefore, we measured contextual deprivation within small circular areas centered on the residences of individuals (i.e., within moving-window areas). Beyond quantifying the strength of association, this approach allowed us to examine whether contextual deprivation was related to mental health on the administrative neighborhood level commonly used in epidemiologic research or on a smaller or larger scale.
In summary, we aimed to describe precisely the spatial distribution of substance-related disorders (geoadditive model), make inferences of relevance to public health on the magnitude and scale of spatial variations (multilevel and hierarchical geostatistical models), and investigate the strength and spatial scale of the association between contextual deprivation and mental disorders.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
As individual variables, we considered age, gender, marital status, education, and income. Age was divided into two categories (4049 years, 5059 years). Marital status was coded as "married or cohabiting" and "others" (single, divorced, widowed). Educational attainment was dichotomized (9 years or less vs. more than 9 years of education). Since household income was not available, we used individual income (dichotomized, with the median value as a cutoff) as a proxy for individual socioeconomic position.
Malmö is divided into 100 administrative neighborhoods. The Regional Office of Skåne used street addresses to geocode individuals' places of residence. Figure 1 indicates the spatial distribution of the 65,830 individuals aged 4059 years over 13,730 locations. These locations correspond to houses or buildings. Figure 1 also provides basic information on the neighborhood structure.
|
Statistical analyses
To produce a precise smoothed map of prevalence based on individual locational information, we estimated a geoadditive model (2427
) by using a two-dimensional (latitude/longitude) smooth term for the spatial effect and simple regression coefficients for individual and contextual factors (refer to the Appendix). To obtain easily interpretable information on the magnitude of spatial variations, we propose an indicator on the odds ratio scale, the interquartile spatial odds ratio, defined as the odds ratio between an individual residing in a location in the first quartile and one from a location in the fourth quartile of spatial risk, as estimated from the geoadditive model (Appendix).
To make inferences on the magnitude of spatial variations, we estimated a multilevel logistic model (3, 4
) with individuals nested within administrative neighborhoods (refer to the Appendix). The neighborhood variance
indicated the amount of variability between neighborhoods regarding substance-related disorders. Using Moran's I statistic (Appendix), we sought spatial autocorrelation in the neighborhood residuals of the model (41
, 42
). To investigate whether spatial correlation decreased with increasing distance, we computed Moran's I separately for those neighborhoods less than 1,000 m apart, 1,0001,999 m apart, 2,0002,999 m apart, and so forth.
To assess the spatial scale of variations, we estimated a hierarchical geostatistical model (33, 34
) with two sets of neighborhood random effects, including the usual unstructured effects uj of variance
and an additional set of spatially correlated random effects sj of variance
(refer to the Appendix) (32
, 35
, 36
). Whereas uj takes independent values in each neighborhood and therefore captures unstructured neighborhood variations, sj adopts more similar values for neighborhoods close to each other than for those further apart, thereby reflecting spatially organized variations. We computed the proportion of total neighborhood variance attributable to the spatially structured component of variability as
(36
). The parameter
, which quantifies the rate of correlation decay with increasing distance between neighborhoods, was used to assess the spatial scale of variations in mental disorders: we computed the range of spatial correlation (3/
), defined as the distance beyond which the correlation between neighborhoods is below 5 percent, that is, beyond which neighborhood risk levels are no longer correlated (33
, 34
).
As detailed in the Appendix, we performed a simulation to verify that the hierarchical geostatistical model was able to disentangle spatially structured from unstructured neighborhood variations. We disorganized the spatial structure of the data without modifying the multilevel structure (spatial connections between neighborhoods were modified, but the same individuals were still grouped together within neighborhoods) and examined the resulting changes in neighborhood variance parameters. We found that the model was able to distinguish between spatially structured and unstructured neighborhood variations but that the proportion of spatially structured variations needs to be interpreted jointly with the spatial range of correlation 3/
.
Multilevel and hierarchical geostatistical models were estimated with Markov chain Monte Carlo simulation (refer to the Appendix) (43). We used the deviance information criterion to compare their goodness of fit (the smaller the deviance information criterion, the better the fit of the model) (44
). We could not compare them with the geoadditive model in this way because of differences in the estimation procedures. For each modeling option, we first estimated an empty model (without explanatory variables), then we introduced the individual covariates and finally the contextual variable.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
When individual locational information was used, the empty geoadditive model provided a precise smoothed map of the prevalence of disorders independent of administrative boundaries. The map in figure 2 shows increased prevalence in a large area of northern Malmö, including two local subareas of particularly high prevalence. Based on the spatial smooth term for the 65,830 individuals, the interquartile spatial odds ratio was 3.96, which approximately quantifies the odds ratio between individuals in the lowest and highest quartiles of spatial risk.
|
|
|
|
|
|
A higher prevalence of disorders was found in deprived administrative neighborhoods, after adjustment for individual factors (tables 1, 2, and 3). When the mean income was measured in spatially adaptive areas smaller than administrative neighborhoods, the strength of association between contextual deprivation and prevalence increased markedly with decreasing size of the areas considered (table 3). In geoadditive models, the risk of substance-related disorders was 1.97 times higher (95 percent confidence interval: 1.39, 2.79) in the highest versus the lowest quartiles of contextual deprivation when the contextual factor was measured in administrative neighborhoods, but it was 4.12 times higher (95 percent confidence interval: 3.01, 5.64) when the 100 nearest inhabitants aged 25 years or older were considered. The raw data showed that 38 percent of those with a substance-related disorder (362 of 956 cases) resided in the highest quartile of contextual deprivation when the contextual factor was defined in administrative neighborhoods, whereas 51 percent of the cases (n = 485) were in the highest quartile when the factor was defined in local areas comprising the 100 nearest inhabitants.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Investigating the magnitude and scale of spatial variations in outcomes
To investigate the spatial distribution of disorders, we initially sought precise cartographic information, independent of neighborhood boundaries. Our second objective was to make inferences about the magnitude and scale of spatial variations. Beyond knowing whether the magnitude of neighborhood variations justifies including a contextual dimension in public health programs (6), it is relevant to assess the spatial scale on which programs should be coordinated. To obtain this information, we compared three modeling approaches that, building on different notions of space, provided different insights into the spatial distribution of mental disorders.
A flexible way to obtain precise cartographic information was to fit a geoadditive model (24, 25
). Working with continuous space, this model was able to process the spatial coordinates of individuals to produce a smoothed map of prevalence independent of neighborhood boundaries (26
, 27
)a result far more precise than maps obtained by using poor locational information at the neighborhood level. However, this approach provides only visual information but no parameters to make statistical inferences about the spatial distribution of mental disorders. We obtained only quantitative information on the magnitude of spatial variations, expressed on the odds ratio scale with an interquartile odds ratio based on prevalence estimates at the 13,730 individual locations.
To make inferences, we considered analytical techniques such as the multilevel model (3) and the hierarchical geostatistical model (32
). Since it is computationally intractable to estimate a parametric spatial correlation structure for a considerable number of locations (refer to the Appendix), these analyses were based on the 100 neighborhood locations, thus representing a dramatic underuse of information available.
The multilevel model showed important neighborhood variations in the prevalence of substance-related disordersa mental health outcome that may be more context dependent than others. However, using multilevel models leads one to hypothesize that spatial correlation is reducible to within-neighborhood correlation, that is, that the distribution of neighborhoods at risk in space is completely random (35). Multilevel models do not incorporate any notion of space and, as such, may be described as nonspatial: they consider the neighborhood affiliation of individuals but neglect spatial connections between neighborhoods. Therefore, measures of variation/correlation in such models provided only partial insight into the spatial distribution of disorders, allowing us to make statistical inferences on the magnitude but not the spatial scale of variations.
To obtain this additional information, we used the hierarchical geostatistical model (33, 34
). Georeferencing individuals at the administrative neighborhood level, our specific model splits neighborhood variability into a spatially organized component and an unstructured component (32
, 35
, 36
). We found it more informative to use a geostatistical rather than a lattice formulation (45
) of the spatial correlation structure (34
): defining the correlation between neighborhoods as a decreasing function of the spatial distance between them enabled us to estimate the spatial range of correlation (34
).
First, the hierarchical geostatistical model is of heuristic interest. Since many contextual factors have a strong spatial structure, disentangling spatially structured variability from other more chaotic sources of neighborhood variation may allow researchers to generate hypotheses on contextual mechanisms (35). Following recommendations in the literature (46
), we compared the spatially structured variations in mental health (figure 4) with the geographic distribution of neighborhood income (figure 1) to gain preliminary insight into the association between deprivation and mental disorders.
Second, rather than being a nuisance parameter, the parameter for correlation decay allowed us to make inferences about the scale of spatial variations, showing that variations in substance-related disorders occurred on a larger scale than that of the neighborhood. As a public health implication, coordinating interventions between administrative neighborhoods close to each other may be an effective strategy. If recent developments in local regression techniques are used (47
), one possible analytical refinement may consist of moving from a global to a local perspective in which the spatial autocorrelation parameter could vary over space.
Measuring contextual factors across continuous space around residences of individuals
In many instances, relying on administrative boundaries to define contextual factors may be restrictive. We found a much stronger relation between contextual deprivation and prevalence of disorders when we measured the factor in local areas of smaller size than administrative neighborhoods. Therefore, this association may operate on a more local scale than the neighborhood scale commonly used in contextual studies.
Contextual income was measured within spatially adaptive areas, that is, circles of variable width and fixed population size centered on residences of individuals. Using these areas appeared to be the only way to investigate whether contextual deprivation operated on a local scale, since measuring contextual income within areas having a small, fixed radius results in missing values or unreliable measurements in sparsely populated areas (39, 40
). Theoretically, this approach that considers surrounding population rather than surrounding space may be particularly appropriate when considering contextual factors aggregating individual characteristics (e.g., income) rather than features of the physical environment.
In our cross-sectional study, causal mechanisms for the association between contextual deprivation and substance-related disorders may operate in both directions. On the one hand, although not yet definitely confirmed by quantitative studies, selective migration processes may contribute to the clustering of substance-related disorders in the most-deprived neighborhoods (15). On the other, deprivation may have an independent negative impact on mental well-being (14
). Despite such uncertainty, an issue that has yet to be addressed in a longitudinal study, our findings show that interventions focused on individuals with substance-related disorders may be particularly useful in hot spots of contextual deprivation identified on a smaller scale than that of administrative neighborhoods.
In conclusion, beyond important geographic variations in the prevalence of substance-related disorders, our spatial analytical perspective showed that the spatial scale of variations was much larger than that of administrative neighborhoods. However, apart from such large-scale variations due to the clustering of poor socioeconomic circumstances in northern Malmö, we also found more local variations in prevalence that were attributable to differences in the intensity of deprivation.
We are aware that multilevel models may be appropriate when the context is defined in a way that is not strictly geographic (e.g., as hospitals (48), workplaces, or schools) or when spatial correlation can be reduced to within-area correlation. Similarly, it may be adequate to measure contextual factors within administrative boundaries when investigating effects operating on those scales (e.g., in relation to public policies). However, in many neighborhood studies, a deeper understanding of spatial variations in health outcomes may be gained by building notions of space into statistical models and measuring contextual factors across continuous space.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To express the magnitude of spatial variations in an easily interpretable way, we propose an indicator on the odds ratio scale, the interquartile spatial odds ratio. We define it as the odds ratio between an individual located in the first quartile and one located in the fourth quartile of spatial risk, as estimated from the geoadditive model. The median odds for individuals residing in the first and last quartiles of spatial risk are equal to exp(ßX + t12.5) and exp(
ßX + t87.5), where t12.5 and t87.5 are the 12.5th and 87.5th quantiles in the distribution of the spatial smooth term in the study population. Accordingly, the interquartile spatial odds ratio was computed as exp(t87.5 t12.5).
Multilevel logistic model
To model variations in the probability pij of individual i in neighborhood j having a substance-related disorder, we fitted a multilevel logistic model to the data as logit(pij) = ß0 + Xijß + uj, where Xij is the vector of explanatory variables and uj is a normally distributed random intercept of variance (3
, 4
). The ujs for the different neighborhoods are independent of one another.
To assess spatial autocorrelation in the neighborhood residuals, we computed Moran's I statistic (41, 42
):
![]() |
![]() |
Multilevel models were estimated with Markov chain Monte Carlo simulation (see below) (43). In this Bayesian perspective, we do not need to make specific assumptions to obtain the standard error of the Moran's I statistic (42
): by computing Moran's I for each set of sampled values of the neighborhood residuals, we obtain its posterior distribution and report the median, as well as the 2.5th and 97.5th quantiles, to construct a 95 percent credible interval. In the absence of spatial autocorrelation, the Moran's I statistic has a small negative expectation when applied to regression residuals (42
, 53
). In comparing the 95 percent credible interval with the value 0, we have therefore applied a conservative test.
Hierarchical geostatistical logistic model
We used a logistic model including independent neighborhood random effects uj and neighborhood spatially correlated random effects sj (32, 35
, 36
). For an individual i in neighborhood j, the model was defined as logit(pij) = ß0 + Xijß + uj + sj. The ujs are mutually independent and Gaussian, with mean 0 and variance
Let S = (s1, s2, ..., s100) be the vector of spatial effects for the 100 neighborhoods. The distribution of S is expressed as S
N(0, V), with Vkl defined as a parametric function of the distance dkl in meters between the population-weighted centroids of neighborhoods k and l. We assumed an isotropic spatial process (in which spatial correlation does not depend on direction). Vkl was defined as
with an exponential correlation function
kl = exp(
dkl) (33
). The spatial range of correlation (beyond which the correlation is below 5 percent) was computed as 3/
. The proportion of neighborhood variance that is spatially structured was computed as
We examined whether the hierarchical geostatistical model was really able to disentangle spatially structured variations from the neighborhood unstructured variability. In six successive simulations, we randomly selected 10, 25, 50, 75, 90, and 100 neighborhoods out of 100 and randomly assigned all individuals from each of these neighborhoods as a group to one other selected neighborhood, while no changes were made for the other nonselected neighborhoods. We therefore did not modify the multilevel structure of the data, since the same individuals were still grouped together within neighborhoods, but progressively disorganized the neighborhood spatial structure. Fitting a hierarchical geostatistical model to each data set, we observed that the proportion of neighborhood variations attributable to the spatially structured component [] decreased as the number of neighborhoods selected for random reassignment of inhabitants increased (appendix table 1). However, spatially structured variations still constituted an important part of neighborhood variability when we completely disorganized the spatial structure of the data. To explain this result, one notes that the spatial range of correlation (3/
) regularly decreased with increasing disorganization of the neighborhood spatial structure (table 4). When inhabitants were randomly reassigned among the 100 neighborhoods, the spatial range of correlation was equal to 475 m (vs. 3,424 m in the real data), indicating that the spatially structured and unstructured components of variability were no longer intrinsically different. This simulation confirms a certain ability of the hierarchical geostatistical model to disentangle spatially structured variations from unstructured neighborhood variations, and it indicates that the proportion of spatially structured variations
and the spatial range of correlation 3/
need to be interpreted jointly.
APPENDIX TABLE 1. Results of the empty hierarchical geostatistical models when randomly reassigning all individuals from one neighborhood as a group to another neighborhood for 10, 25, 50, 75, 90, and 100 neighborhoods out of 100 (data on all inhabitants aged 4059 years in Malmö, Sweden, 2001)
|
* is the proportion of neighborhood variations that is spatially structured.
CI, credible interval.
3/
is the spatial range of correlation, defined as the distance beyond which the correlation in risk level between neighborhoods is below 5 percent.
Multilevel models and hierarchical geostatistical models were estimated with a Markov chain Monte Carlo approach by using the WinBUGS version 1.4 program (43). We used noninformative uniform priors for all parameters. We ran a single chain with a burn-in period of 100,000 iterations. After convergence, we retained every 10th iteration until a sample size of 10,000 was attained. For each parameter, we report the median of the posterior distribution and provide a 95 percent credible interval.
To illustrate that the hierarchical geostatistical model cannot take into account the 13,730 different locations of individuals, we successively grouped individuals within squares of different sizes (1,000, 750, 500, 250, or 125 m on a side), resulting in data sets with a different number of locations for geocoding individuals (154, 235, 422, 1,025, and 2,612 locations, respectively). We estimated an empty hierarchical geostatistical model for each data set. The computation time for 1,000 iterations was 0.04 hour, 0.13 hour, 0.66 hour, 8.96 hours, and 145 hours (or 6 days) in these five different cases. Far more than 1,000 iterations would have been needed to fit the models. Furthermore, models including covariates require much longer computation times. Therefore, it is obvious that our hierarchical geostatistical model could not take into account the 13,730 different locations.
![]() |
ACKNOWLEDGMENTS |
---|
The authors express their gratitude to Thor Lithman, Dennis Noreen, and Åke Boalt from the County Council of Skåne, Sweden, for their indispensable help and support regarding this research project.
Conflict of interest: none declared.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|