Prior Information in Behavioral Capture-Recapture Methods: Demographic Influences on Drug Injectors' Propensity to Be Listed in Data Sources and Their Drug-related Mortality

Ruth King1, Sheila M. Bird2,3, Steve P. Brooks4, Sharon J. Hutchinson5,6 and Gordon Hay7

1 Centre for Research into Ecological and Environmental Modelling, University of St. Andrews, St. Andrews, United Kingdom
2 Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
3 Department of Statistics and Modelling Science, University of Strathclyde, Glasgow, United Kingdom
4 Statistical Laboratory, University of Cambridge, Cambridge, United Kingdom
5 Health Protection Scotland, Glasgow, United Kingdom
6 Public Health and Health Policy Section, University of Glasgow, Glasgow, United Kingdom
7 Centre for Drug Misuse Research, University of Glasgow, Glasgow, United Kingdom

Correspondence to Dr. Sheila M. Bird, MRC Biostatistics Unit, University of Cambridge, Robinson Way, Cambridge CB2 2SR, United Kingdom (e-mail: sheila.bird{at}mrc-bsu.cam.ac.uk).

Received for publication December 21, 2004. Accepted for publication April 29, 2005.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
The authors present findings from a Bayesian analysis of Scotland's four primary capture-recapture data sources for 2000 that was carried out to estimate numbers of current injecting drug users by region (Greater Glasgow vs. elsewhere in Scotland), sex (male vs. female), and age group (15–34 years vs. ≥35 years). A secondary goal of the analysis was to obtain Bayesian estimates and credible intervals for the demographic influences on Scotland's drug-related death rate per 100 current injectors. Incorporation of informative priors altered the models with highest posterior probability. Expert opinion on how demography influenced Scottish drug injectors' propensity to be listed in different data sources was taken into account, along with external information about European injectors' drug-related death rates and male:female ratios. Higher drug-related mortality was confirmed in older drug injectors and those outside of Greater Glasgow. Female injectors' lower drug-related death rate was not sustained beyond 34 years of age. The authors recommend that demographic influences be accommodated in behavioral capture-recapture estimation, especially when it is a prelude to secondary analysis, such as the analysis of drug-related death rates presented here.

Bayes theorem; data collection; epidemiologic methods; models, statistical; mortality; prevalence; substance abuse, intravenous


Abbreviations: DMD, Drug Misuse Database; HCV, hepatitis C virus; HPDI, highest probability density interval; IDU, injecting drug user


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Cormack's mark-recapture estimation method for closed wildlife populations (1Go) has been adapted by epidemiologists (2Go), demographers concerned about underregistration (3Go), and infectious disease specialists (4Go). For example, Mastro et al. (4Go) estimated the number of human immunodeficiency virus-infected injecting drug users (IDUs) in Bangkok, Thailand, from the literal recapture of persons listed as having received methadone between April 17 and May 17, 1991, among persons subsequently held at police stations whose urine tested positive for opiates or methadone. When there are only two data sources, estimation relies on a strong assumption of independence to guarantee that marked and unmarked subjects are equally likely to be recaptured. Use of multiple data sources versus two data sources (3Go) relaxes the need for independence, because list-dependencies can be modeled to avoid underestimation (5Go). Recourse to stratified capture-recapture estimation is often made when the homogeneity assumption that all persons in the study population have equal probabilities of being listed seems untenable (6Go, 7Go).

In Scotland, Frischer et al. (6Go) were the first investigators to use capture-recapture methods to estimate the 1989 prevalence of current IDUs in Glasgow. Subsequently, Hay et al. (7Go) used stratified capture-recapture methods to estimate numbers of current IDUs in Scotland in 2000, separately for 11 health boards. Four data sources were available: lists of current IDUs known to Scotland's Drug Misuse Database (DMD), obtained via reports made by 1) drug treatment agencies or 2) family practitioners; 3) social inquiry reports about IDUs; and 4) reported diagnoses of hepatitis C virus (HCV) infection among persons who had ever injected drugs. Health board-specific overlaps among the data sources (captures) were the basis for estimating the uncaptured or hidden IDUs in each region.

Hay et al.'s (7Go) decision to apply capture-recapture methods separately by health board reflected not only the fact that regional estimates were their primary concern but also an appreciation of potential source heterogeneity by region. For example, the reporting of IDUs to Scotland's DMD may be more comprehensive in some regions than in others, and likewise IDUs' utilization of HCV testing. Demographic characteristics of injectors, notably sex and age, may also differentially determine IDUs' propensity to be featured in data sources, as transpired in England (8Go).

Bird et al. (9Go) used Hay et al.'s capture-recapture estimates for Scotland's current IDUs in 2000 (7Go) and other published data on Scottish IDUs' sex and age distribution to draw inferences about drug-related deaths in 2000 + 2001 by region, sex, and age group per 100 IDUs.

In this paper, we present a Bayesian approach to the problem of differential propensity to be featured in data sources by explicitly modeling the extent to which region (summarized as Greater Glasgow vs. elsewhere in Scotland), sex, and age group (15–34 years vs. ≥35 years in 2000) influence injectors' propensity to be listed in Hay et al.'s four capture-recapture data sources (7Go). Resulting posterior distributions—for example, for Greater Glasgow's number of current IDUs by sex and age group—then serve as denominators for calculating the city's drug-related deaths in 2000 + 2001 + 2002 when it comes to providing credible intervals for demographic influences on Scotland's drug-related death rate per 100 IDUs. (See appendix table 1 for a glossary of the Bayesian terms used in this paper.)


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
In this paper, we outline a Bayesian approach to capture-recapture modeling with covariates (10Go), which can be generalized to incorporate prior information on European IDUs' drug-related death rates and male:female IDU ratios and on the direction (sign) of covariate or data-source interactions within Scotland. Bayesian estimation (of IDU totals) allows for interactions not only between data sources (source interdependencies) but also between covariates and data sources (capture heterogeneities), which have potentially important sociologic interpretations. Here we report posterior mean values and 95 percent credible intervals or highest probability density intervals (HPDIs) from Bayesian capture-recapture modeling. Subsequently, we compute HPDIs for the influence of covariates on Scotland's drug-related deaths in 2000 + 2001 + 2002 per 100 IDUs.

For illustration, we also include classical results for two sets of capture-recapture models incorporating stratification by sex or age group. Unreliable estimates are risked when the stratum-specific data are sparse in some cells. The usual solution is to pool sources or covariate levels.

Data sources
The four capture-recapture data sources for persons who reported injecting drug use, accessed for the period January 1, 1999–December 31, 2000, and described in detail by Hay et al. (7Go), were: 1) Scotland's database of HCV diagnoses; 2) social inquiry reports; 3) general practitioners' reports to Scotland's DMD; and 4) new drug treatment agency contacts reported to Scotland's DMD.

Bayesian capture-recapture modeling with covariates
From the four data sources given above and three potential covariates (region, sex, age group), each with two levels, we constructed a 27 contingency table (11Go) of observed counts by data sources and covariate values. The number of persons who are unobserved by all data sources is unknown for each combination of covariate values. A log-linear model describes the relations among cell probabilities, data sources, and covariates (2Go, 6Go). First-order (or main-effect) log-linear parameters represent the effect of the corresponding data source or covariate on the underlying capture rate. Higher-order terms correspond to interaction effects between different data sources or covariates. We assume that all main-effect terms are present, but we use Bayesian model discrimination techniques (12Go, 13Go) to determine which, if any, interactions are supported by the data. Allowing only second-order interaction terms, we have 221 (approximately 2 million) distinct models corresponding to different presence/absence combinations of the 21 distinct first-order terms. Conditioning on any particular model, we can estimate the numbers of unobserved persons for each combination of covariates.

For IDUs outside Greater Glasgow (per combination of age group and sex), we multiply the estimated population sizes by 1.15 (9Go). This multiplier takes into account the fact that the four health service areas outside Greater Glasgow which lacked capture-recapture data had 58 drug-related deaths in 2000 + 2001, when the rest of Scotland had 566 drug-related deaths and, Hay et al. (7Go) estimated, 22,805 IDUs. The health service areas with missing data were assumed to have 2,237 IDUs (58/566 x 22,805 = 2,237), an addition of 15 percent to Hay's capture-recapture total of 15,618 IDUs outside Greater Glasgow (9Go); this gave rise to a multiplier of 1.15 (1 + 2,337/15,618). (Note that all Bayesian estimates presented in this paper include this factor of 1.15.)

Estimates can vary substantially, even between models that fit the data well (14Go, 15Go). The alternative is to associate weights, in the form of Bayesian posterior model probabilities, with each model and produce model-averaged inferences by taking, as the estimated population size (here, for IDUs), the weighted average of the corresponding estimates under each model. Known as Bayesian model-averaging (16Go), the resulting estimate and its HPDI reflect both parameter and model uncertainty. Posterior model probabilities within the Bayesian approach allow a formal quantitative comparison among different competing models. Bayes factors are also often used to compare two models, where the Bayes factor equals the posterior odds ratio for the pair of models divided by their prior odds ratio. A Bayes factor greater than 3 provides positive support for one model over another (17Go).

Summarizing the posterior distribution for any statistic of interest
Posterior distributions are formed by combining the likelihood of the data (for a given model) with prior distributions which reflect our beliefs about the model and its parameters before observing data (17Go). Here, the parameters of interest are the log-linear parameters together with the unknown population sizes in the unobserved cells of the contingency table—that is, the numbers of persons who remain unobserved by all four data sources. To discriminate between competing models, we treat the model itself as an unknown parameter which is to be estimated. The resulting posterior distribution is typically both high-dimensional and complex, but it can be obtained via a computational device known as the Markov chain Monte Carlo method (18Go), which samples from the posterior distribution. It is by this very sampling that posterior means and variances for all parameters of interest, and indeed empirical estimates for any statistics of interest, can be obtained. The reversible jump Markov chain Monte Carlo method is used to obtain the corresponding posterior model probabilities (10Go, 19Go).

Incorporation of prior information
Initially, we consider uninformative (but conventional) priors so that information from the data (represented by the likelihood) dominates the posterior. In particular, we use Jeffreys' prior (20Go) for the unobserved population sizes and independent normal priors with zero mean (for neutral influence) and variance {sigma}2 for the log-linear parameters present in each model. To reflect prior uncertainty about the values to expect for the log-linear parameters, we place a vague gamma prior (to be wide-ranging) on the prior variance parameter {sigma}2. Finally, since we have model uncertainty, we place a prior on the models themselves, with regard to the interactions present. We specify equal prior probability (for agnosticism) for each possible model, which corresponds to a prior probability of 0.5 that each second-order interaction is present. In the Results section, we describe the incorporation of informative priors into this framework.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Classical approach with stratification: influential interactions between data sources
In this subsection, which is intended for illustration only, we set aside regional differences, and likewise the 1.15 multiplier, to consider just two simple one-way stratifications—stratification by sex and by age group (15–24, 25–34, and 35–54 years). Table 1 displays overlaps among the four capture-recapture data sources of Hay et al. (7Go).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Classical capture-recapture analysis within stratum: overlaps among four sources* of data on injecting drug use in Scotland and within-stratum fitted interactions between data sources pertaining to the lowest Akaike Information Criterion model, 2000

 
Only one IDU is present in all four data sources: a male in the age group 15–24 years. Relative to the other data sources, table 2 shows that females may be underrepresented among IDUs who have been subject to a social inquiry report but represent one third of IDUs diagnosed with HCV. HCV diagnoses relate to ever injectors, more of whom are aged 35–54 years than persons listed in any of the other three data sources. Table 1 reveals an apparent strength of stratified capture-recapture estimation, namely that different interaction terms can be applicable according to covariate-defined stratum. Table 3 reveals its disconcerting aspect, namely that estimates for the total number of current IDUs may differ uncomfortably (based on the lowest Akaike Information Criterion: by over 2,500 IDUs) according to whether stratification is by sex or by age group. Of course, wide uncertainty (6Go) qualifies all central estimates reported in table 3.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Total numbers of injecting drug users identified by each of four data sources* and frequency distribution (%) of injecting drug users by sex or age group, 2000

 

View this table:
[in this window]
[in a new window]
 
TABLE 3. Approximate total numbers of injecting drug users in Scotland as estimated from five separate capture-recapture models, 2000

 
Bayesian approach
To avoid the above weakness but capitalize on an apparent strength, table 4 adopts a Bayesian perspective and shows the set of influential interaction terms for the log-linear models which have the three highest individual posterior probabilities. However, model 2 in table 4 is at variance with published estimates of numbers of current IDUs in Scotland (7Go), and its central estimate of approximately 17,000 injectors would mean that Scotland had a much higher drug-related death rate among its IDUs than had previously been considered, either against a European backdrop (21Go) or as evidenced from the limited cohort studies conducted in Scotland (22Go–24Go).


View this table:
[in this window]
[in a new window]
 
TABLE 4. Interaction terms included in three models with high posterior probability using the initial, vague priors, Scotland, 2000*

 
The Bayesian approach allows us to incorporate directly into the analytical framework such extra information as may be available independently of the data observed. This is preferable to making ad hoc postanalysis adjustments or interpretations. Here, inclusion of external information has a direct impact on the models identified as most probable a posteriori and reduces the support for model 2.

External information.
Bayesian models incorporate expert opinion into analyses via the priors specified on the parameters in their models. We have additional but indirect external information about Scotland's total number of IDUs, which derives from an informative prior for the drug-related death rate of Europe's IDUs (21Go–24Go). In addition, upon reflection, there is also prior information, independent of the data collected here, concerning the ratio of male IDUs to female IDUs in European countries (21Go) and, for Scotland, expert opinion about the direction of possible interactions between data sources and covariates. Within a Bayesian analysis, we are able to include this information by simply specifying prior distributions on the corresponding parameters to reflect these prior beliefs. The influence that the priors have on the posterior distribution is dependent on the relative amount of information contained in the prior versus in the data (via the likelihood). We discuss first the prior information that we have (21Go–24Go) and then how to construct priors which we can use to represent these beliefs.

Incorporating prior information.
Initially, we consider the total population size. We have additional, independent information from the General Register Office for Scotland about the annual number of drug-related deaths in Scotland, which totaled 1,006 in 2000–2002 (an average of 335.3 per annum), of whom the vast majority (but not all) would be deaths of IDUs. We can couple this information with our European preconception of IDUs' annual drug-related death rate to obtain a prior estimate of the total population size for Scotland's IDUs. The drug-related death rate for Europe's IDUs is generally taken to be 0.5–2 percent (21Go–24Go), which—to allow for uncertainty—we take to represent the lower and upper fifth percentiles of our prior distribution. A corresponding prior on Scotland's total number of IDUs should then have lower and upper fifth percentiles of 16,767 and 67,066, respectively (33,533/2 percent = 16,767; 33,533/0.5 percent = 67,066), which can be translated conveniently onto the required multiplicative scale by a lognormal distribution (log(33,533), 0.1776) with appropriate variance.

European prior information (21Go) is that the male:female IDU ratio most often ranges from 60:40 to 90:10, which are interpreted as lower and upper 10th percentiles. To represent this prior belief about the male:female IDU ratio, we again specify the corresponding prior to be an appropriate lognormal distribution (log(3.6742), 0.489). We have no additional information about the effect of age or region within Scotland on this ratio. Thus, conditional on the number of male IDUs (likewise female), we specify a Dirichlet (1Go) prior (i.e., a uniform prior over the simplex) on the proportion of male IDUs in each combination of age group and region. This completes our prior on the unknown cell entries.

We now consider interactions between the data sources and covariates. There is no strong prior information about the presence of different interaction terms, and so we once again specify an equal prior weighting for each model. However, there is prior information on the direction (or sign) of some of the interaction terms, if they are present: namely, that females are relatively less likely to appear in social inquiry reports; older injectors are relatively more likely to live in Greater Glasgow; and older injectors are relatively more likely to appear in the HCV diagnoses database (S. M. B.). We represent this prior information by using a mixture distribution of half-normals on the corresponding log-linear parameters: one positive and one negative, each with common variance parameter {sigma}2. The mixture weight for each half-normal represents the strength of prior belief associated with the corresponding sign for the interaction term. Weights of 0.5 for each half-normal simply reproduce the normal distribution (which is without preference of sign) and are used for all log-linear terms where there is no directional prior information. Conversely, mixture weights of 0 and 1 reduce the distribution to a single half-normal distribution so that only a negative (or positive) interaction is possible.

Consistent with three strong, directional prior beliefs, we set weights of 0.95 and 0.05 on the half-normals, with the larger weight on the direction specified above for the interactions where there is robust expert opinion. Thus, this does not, a priori, place zero probability (i.e., impossibility) on the opposite effect's being present, but it is heavily weighted against. To specify the prior uncertainty on the variance, we use a vague gamma distribution.

Using informative priors.
Estimates obtained under the informative priors are given in table 5. Repetition of the Markov chain Monte Carlo simulations obtained mean estimates of the parameters within 1 percent of their reported values, so estimates have converged sufficiently for our purposes. Models 1–3 in table 5 are close neighbors of each other, but the model which was identified as inconsistent with national preconceptions in the initial analysis using vague priors (model 2 in table 4) is out of the top three and has much lower support (0.012). This is a direct result of incorporating these preconceptions as prior beliefs within the analysis. The model with the largest posterior support is the same, irrespective of the prior placed on the parameters. However, its central estimate for total population size is slightly higher under the informative prior because of more prior support for larger population sizes.


View this table:
[in this window]
[in a new window]
 
TABLE 5. Interaction terms included in the three top models using the informative priors, Scotland, 2000

 
Demographic influences.
There is substantial agreement on interactions across models. Interactions between data sources suggest that being reported to Scotland's DMD as a new client by a drug treatment agency is in positive equilibrium both with being the subject of a social inquiry report and with DMD registration via the patient's general practitioner. Model 2 in table 5 also suggests a positive association between social inquiry reportage and HCV diagnosis, for which severity or duration of injecting may offer a weakly plausible explanation. Alternatively, the observation could be explained by a disproportionately high 9 percent of HCV diagnoses' having prison as the place where HCV testing was conducted.

Table 5 highlights the fact that relative to covariate main effects, within the age group 15–34 years, both males and residents of Greater Glasgow are relatively underrepresented; likewise, the sex differential appears to be largely tilted away from males in Greater Glasgow, where the phenomenon of injection drug use has had a tenaciously long hold (25Go) in comparison with newer epidemics elsewhere in Scotland. The latter interaction is not present in model 2 in table 5, but it has an overall posterior probability of 0.75, or equivalently a Bayes factor of 3.

The propensity to be listed in data sources 1 (the HCV diagnosis database) and 2 (social inquiry reports) is also covariate-dependent. Thus, table 5 shows that younger injectors are less likely to be diagnosed with HCV but more likely to be subject to social inquiry reports. Male injectors are also less likely to be diagnosed with HCV. Whether this is part of a general male tendency toward later diagnosis or is due to the fact that HCV testing is specifically offered more often to female injectors, particularly if they are pregnant (26Go), is unclear. It appears that social inquiry reports on IDUs are relatively less likely to be made in Greater Glasgow than elsewhere.

Bayesian inference about IDU total.
The within-model estimates in table 6 have much greater precision than the corresponding model-averaged results, which reflect both parameter uncertainty and model uncertainty. The posterior mean and 95 percent HPDI bounds are higher than corresponding estimates in table 4 under the vague prior, which is a clear consequence of the informative prior's having placed more weight (a priori) on a larger number of IDUs.


View this table:
[in this window]
[in a new window]
 
TABLE 6. Posterior mean values for estimates of the number of injecting drug users per stratum* from the three most probable models a posteriori, with averages over models, Scotland, 2000

 
The three top models in table 5, which share 20 percent posterior probability, consistently put the mean IDU total in the range 25,000–27,600, with a standard deviation of 2,000–2,700. Table 6, which also presents the model-averaged results for our eight cross-classifications by sex, age group, and region, shows that even incorporation of prior information did not rule out the possibility that Scotland's current IDUs could number as few as 16,000. Thus, the model-averaged posterior credible intervals tend to be negatively skewed, as can be seen by comparing table 6's 95 percent HPDI with the corresponding symmetric (2.5 percent, 97.5 percent) and (5 percent, 95 percent) credible intervals of (16,692, 34,675) and (17,542, 33,165), respectively. Notice that the posterior fifth percentile is larger than the prior of 16,767 (which corresponded to a high overall drug-related death rate of 2 percent).

Bayesian denominators.
We also obtained posterior estimates for the number of IDUs with each combination of covariate values (see table 6). Younger male injectors (aged 15–34 years) predominated to a greater extent outside of Greater Glasgow, whereas among females there was lesser age disparity by region. In Greater Glasgow, females accounted for 30 percent (95 percent HPDI: 27.5, 33.3) of IDUs aged 15–34 years but for only 27 percent (95 percent HPDI: 24.7, 29.4) of young IDUs elsewhere. Among older IDUs, only 20 percent (95 percent HPDI: 17.5, 23.4) were female.

Secondary analysis: covariate influences on Scotland's annual drug-related death rate in 2000–2002 per 100 current IDUs
By sampling from the posterior distribution averaged over all models, which was obtained via the Markov chain Monte Carlo algorithm and the use of informative priors, we derived the posterior means and 95 percent HPDIs for the annual drug-related death rate per 100 IDUs shown in table 7 for eight major cross-classifications of injectors by sex, age group, and region.


View this table:
[in this window]
[in a new window]
 
TABLE 7. Estimated numbers of current injecting drug users and drug-related deaths in Scotland in 2000–2002 and model-averaged posterior mean values for the annual drug-related death rate per 100 current injecting drug users, by region, sex, and age group

 
Table 7 suggests that drug-related death rates differ by region as well as by age group. For all four combinations of sex and region, the lower limit for the drug-related death rate in the age group ≥35 years exceeds or nearly exceeds the upper limit of the HPDI of the corresponding rate for injectors aged 15–34 years. Systematically, and significantly when pooled across the four strata, drug-related death rates are higher elsewhere in Scotland than in Greater Glasgow. Regionally, older injectors appear to have a common drug-related death rate irrespective of sex. Overall, the male:female drug-related death rate ratio was 1.7 (95 percent HPDI: 1.56, 1.87). However, the male:female drug-related death rate ratio was estimated at 1.0 (95 percent HDPI: 0.86, 1.24) for the older age group but was 1.9 (95 percent HDPI: 1.75, 2.09) for 15- to 34-year-olds. Thus, posterior probability is 1 (i.e., certainty) that the male:female drug-related death rate ratio is lower for IDUs in the older age group.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
In this paper, we have illustrated strengths (such as insights for capturing propensities and incorporating prior information) as well as limitations of classical and Bayesian capture-recapture methods in application to estimating numbers of current IDUs. Injectors' high drug-related death rate is a major public health concern and was therefore a goal of secondary analysis. Limitations included model uncertainty, which can lead to such wide confidence or posterior intervals that implications for public health action are hopelessly diverse. Epidemiologists are frequently concerned with rates or ratios, for which numerators and denominators have been estimated in separate analyses. Thus, the methodological issues raised here, including incorporation of prior information on a rate when estimating denominators, generalize beyond our application to IDUs.

Interactions within and between covariates and data sources, whether incorporated in a classical framework or a Bayesian capture-recapture framework with vague priors, can still give rise to preferred models which, because of model uncertainty, give markedly different answers for total numbers of IDUs. Yet, they all agree tolerably well in terms of how those IDUs are apportioned relatively across major cross-classifications—for example, as defined by sex, age group, and region.

Incorporation of informative priors displaced from among the models with the highest posterior probability any model which suggested that Scotland could have significantly fewer than 17,000 drug injectors. This contingency was also ruled out by a symmetric 90 percent (but not by 95 percent) credible interval around the model-averaged central estimate. We must enter a reservation here: No formal prior account was taken that some drug-related deaths occurred among non-IDUs. The more this is so, the more table 7's annual drug-related death rates per 100 current IDUs could be overestimates.

Important insights into data source-by-covariate interactions were gleaned, and in secondary analyses, there was confirmation of a higher drug-related death rate among IDUs aged ≥35 years (9Go) and (a novel finding) IDUs resident elsewhere in Scotland than Greater Glasgow. The observation that female IDUs' lower drug-related death rate was not sustained into the age group ≥35 years has clear consequences for watchfulness by drug treatment agencies. We also note that females represented only 20 percent of IDUs in this older age group.

Methodologically, this paper breaks new ground in its use of an informative international prior about a rate when deriving the local denominator for that rate, the local numerator for which is known. This problem is not uncommon in epidemiology. Notice, in particular, that our informative prior on the overall drug-related death rate did not constrain drug-related death rates for individual cross-classifications to be within the same range of 0.5–2 percent: See, for example, the higher rates for IDUs aged ≥35 years in table 7.

In public health terms, we have demonstrated demographic influences on injectors' propensity to be listed in data sources, on their number and make-up in Greater Glasgow versus elsewhere in Scotland, and on injectors' drug-related death rate. A lower proportion of females among older IDUs may have concealed the fact that female IDUs' lower vulnerability to drug-related death is not sustained beyond 34 years of age. Health officials need to examine why drug-related death rates among IDUs in Scotland seem to be higher outside of Greater Glasgow.


APPENDIX TABLE 1. Glossary of Bayesian terminology


Bayesian term


Explanation


Bayesian analysis A statistical philosophy in which model parameters are assumed to have fixed but unknown probability distributions. Contrasts with frequentist philosophy, which assumes that parameters have a fixed but unknown value. Bayesian inference is based upon the posterior distribution, which represents the analyst's beliefs about the model parameters after having observed the data. The posterior therefore references an updating of the analyst's prior beliefs before having observed the data. Within the Bayesian paradigm, it is necessary to specify a prior distribution, which is combined with the likelihood of the current data, in order to obtain the posterior distribution of the parameters.
Bayes factor Quantitative comparison of the evidence provided by the data for one model against another. Defined as the ratio of posterior odds to prior odds between the two competing models. Bayes factors in excess of 3 are deemed significant.
Credible interval Under the Bayesian paradigm, parameters have (posterior) distributions rather than fixed values. A 95% credible interval (lower bound, upper bound) is such that the posterior probability that the parameter of interest lies between the lower and upper bounds is 95%.
Dirichlet prior A prior distribution on the parameters (x1, ..., xn) which imposes the condition that xi [0,1] for i = 1, ..., n and Thus, the Dirichlet distribution is often used for a set of probabilities. We write (x1, ..., xn) ~ Dir({alpha}1, ..., {alpha}n), where the {alpha}1's denote the parameters characterizing the properties of the corresponding distribution. If {alpha}1 = 1, for all i, the Dirichlet distribution reduces to the uniform distribution.
Gamma prior A prior distribution for a single parameter x, which imposes the condition that x is both real and nonnegative. It is often used for variance and/or precision parameters. We write x ~ {Gamma}({alpha},ß), where {alpha} determines the shape and ß the scale of the distribution.
Half-normal prior A prior distribution for parameter x, which imposes the condition that x is both real and either positive (positive half-normal) or negative (negative half-normal). It is constructed by taking a normal distribution with mean (and also mode) zero and ignoring values that fall on the wrong side of the mode.
Highest probability density interval (HPDI) The shortest possible credible interval for a given probability level. Unlike the credible interval, the HPDI is unique.
Informative prior A prior distribution that represents strong opinion concerning the corresponding model parameter(s).
Jeffreys' prior A vague prior that is invariant to one-to-one transformations of the associated parameter. It is defined to be the square root of the Fisher information and is used to represent as vague a priori knowledge as possible.
Lognormal prior A prior distribution for parameter x imposing the condition that x is both real and positive. If x has a lognormal distribution, then log(x) has a normal distribution.
Markov chain Monte Carlo (MCMC) method A computationally intensive method for sampling from complex distributions, such as a Bayesian posterior. These samples can then be used to obtain empirical estimates of statistics of interest, such as the posterior mean or variance.
Mixture distribution A distribution that is a linear combination of a finite number of distinct probability distributions.
Model-averaging An inferential process that accounts for both model and parameter uncertainty. A model-averaged estimate of a parameter is a weighted average of the corresponding estimates from each model under consideration, in which the weights are taken to be the corresponding posterior model probabilities.
Model probability The (prior/posterior) probability that a particular model represents the true underlying mechanism driving an observed stochastic process. Reversible jump MCMC methods are often used to estimate posterior model probabilities.
Prior (or posterior) odds The ratio of the prior (or posterior) probabilities associated with two different models.
Prior (beliefs/ distributions) A prior distribution represents the analyst's beliefs about the values that a particular model parameter might take, before the data are observed.
Reversible jump MCMC An extension to the MCMC algorithm that facilitates estimation of posterior model probabilities.
Symmetric credible interval A credible interval (lower bound, upper bound) such that the probability that the parameter lies above the upper limit is equal to the probability that the parameter lies below the lower limit. Like the HPDI, the symmetric credible interval is unique.
Vague prior

A prior distribution that represents a very weak opinion concerning the corresponding model parameter(s), thus allowing the information from the data to dominate the posterior.


    ACKNOWLEDGMENTS
 
The referenced work by Hay et al. (7Go) was commissioned by the Information Services Division of the National Health Service in Scotland on behalf of the Scottish Executive.

Professor Sheila M. Bird holds stock in GlaxoSmithKline but is not currently conducting any research sponsored by the company.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Cormack RM. Interval estimation for mark-recapture studies of closed populations. Biometrics 1992;48:567–76.[ISI][Medline]
  2. Hook EB, Regal RR. Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev 1995;17:243–64.[ISI][Medline]
  3. Nanan DJ, White F. Capture-recapture: reconnaissance of a demographic technique in epidemiology. Chronic Dis Can 1997;18:1–6.[Medline]
  4. Mastro TD, Kitayaporn D, Weniger BG, et al. Estimating the number of HIV-infected injection drug users in Bangkok: a capture-recapture method. Am J Public Health 1994;84:1094–9.[Abstract]
  5. Hook EB, Regal RR. Accuracy of alternative approaches to capture-recapture estimates of disease frequency: internal validity analysis of data from five sources. Am J Epidemiol 2000;152:771–9.[Abstract/Free Full Text]
  6. Frischer M, Bloor M, Finlay A, et al. A new method for estimating prevalence of injecting drug use in an urban population: results from a Scottish city. Int J Epidemiol 1991;20:997–1000.[Abstract]
  7. Hay G, McKeganey N, Hutchinson S. Estimating the national and local prevalence of problem drug misuse in Scotland: executive report. Glasgow, United Kingdom: University of Glasgow and Scottish Centre for Infection and Environmental Health, NHS Scotland, 2001.
  8. Beynon C, Bellis MA, Millar T, et al. Hidden need for drug treatment services: measuring levels of problematic drug use in the North West of England. J Public Health Med 2001;23:286–91.[Abstract/Free Full Text]
  9. Bird SM, Hutchinson SJ, Goldberg DJ. Drug-related deaths by region, sex, and age group per 100 injecting drug users in Scotland, 2000–01. Lancet 2003;362:941–4.[CrossRef][ISI][Medline]
  10. King R, Brooks SP. On the Bayesian analysis of population size. Biometrika 2001;88:317–36.[Abstract/Free Full Text]
  11. Fienberg SE. The multiple recapture census for closed populations and incomplete 2k contingency tables. Biometrika 1972;59:591–603.[ISI]
  12. King R, Brooks SP. Model selection for integrated recovery/recapture data. Biometrics 2002;58:841–51.[CrossRef][ISI][Medline]
  13. King R, Brooks SP. Bayesian model discrimination for multiple strata capture-recapture data. Biometrika 2002;58:841–51.[CrossRef]
  14. Madigan D, York JC. Bayesian methods for estimation of the size of a closed population. Biometrika 1997;84:19–32.[Abstract]
  15. Fienberg SE, Johnson MS, Junker BW. Classical multilevel and Bayesian approaches to population size estimation using multiple lists. J R Stat Soc Ser A 1999;162:383–405.[CrossRef][ISI]
  16. Hoeting JA, Madigan D, Raftery AE. Bayesian model averaging: a tutorial. Stat Sci 1999;14:382–401.[CrossRef][ISI]
  17. Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc 1995;90:773–95.[ISI]
  18. Brooks SP. Markov chain Monte Carlo method and application. Statistician 1998;47:69–100.
  19. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 1995;82:711–32.[ISI]
  20. Bernado JM, Smith AF. Bayesian theory. (Wiley series in probability and statistics). New York, NY: John Wiley and Sons, Inc, 1994:357–62.
  21. European Monitoring Centre for Drugs and Drug Addiction. Annual report 2003: the state of the drugs problem in the European Union and Norway. Lisbon, Portugal: European Monitoring Centre for Drugs and Drug Addiction, 2003. (http://ar2003.emcdda.eu.int/en/home-en.html).
  22. Frischer M, Goldberg D, Rahman M, et al. Mortality and survival among a cohort of drug injectors in Glasgow, 1982–1999. Addiction 1997;92:419–27.[CrossRef][ISI][Medline]
  23. Hutchinson SJ, Taylor A, Gruer L, et al. One-year follow-up of opiate injectors treated with oral methadone in a GP-centred programme. Addiction 2000;95:1055–68.[CrossRef][ISI][Medline]
  24. Copeland L, Budd J, Robertson JR, et al. Changing patterns in causes of death in a cohort of injecting drug users, 1980–2001. Arch Intern Med 2004;164:1214–20.[Abstract/Free Full Text]
  25. Ditton J, Frischer M. Computerized projection of future heroin epidemics: a necessity for the 21st century? Subst Use Misuse 2001;36:151–66.[CrossRef][ISI][Medline]
  26. Hutchinson SJ, Goldberg DJ, King M, et al. Hepatitis C virus among childbearing women in Scotland: prevalence, deprivation, and diagnosis. Gut 2004;53:593–8.[Abstract/Free Full Text]




This Article
Abstract
Full Text (PDF)
All Versions of this Article:
162/7/694    most recent
kwi263v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Disclaimer
Request Permissions
Google Scholar
Articles by King, R.
Articles by Hay, G.
PubMed
PubMed Citation
Articles by King, R.
Articles by Hay, G.