1 School of Public Health, University of Sydney, 2 Drug Health Services, Royal Prince Alfred Hospital, Departments Psychological Medicine and Medicine and School of Public Health, University of Sydney, 3 Screening and Test Evaluation Program, School of Public Health, University of Sydney, and 4 Department of Biochemistry, Royal Prince Alfred Hospital and Department of Medicine, University of Sydney, Sydney, New South Wales, Australia
(Received 2 August 2002; first review notified 7 October 2002; in revised form 6 June 2003; accepted 30 June 2003)
* Author to whom correspondence should be addressed at: Drug Health Services, RPAH, Missenden Road, Camperdown, NSW 2050, Australia. Tel.: +61 2 9515 8650; Fax: +61 2 9515 8970; E-mail: katec{at}med.usyd.edu.au
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several studies have shown that use of a combination of tests such as GGT and CDT may increase diagnostic accuracy. Often the tests have been combined in a qualitative manner (i.e. whether one or more results are above the reference range). In addition several researchers have developed rules or algorithms for combining the quantitative results of multiple biochemical tests to increase diagnostic accuracy for chronic excessive alcohol consumption (Shaper et al., 1985; Sillanaukee, 1992
; Hartz et al., 1997
; Huseby et al., 1997
). Recently, Sillanaukee and Olsson (2001)
provided an algorithm for combining CDT and GGT, and the resulting combination showed higher accuracy than either test alone. Accuracy was evaluated as the area under the receiver operating characteristic (ROC) curve. Based on discriminant analysis, the authors concluded that the combination of CDT and GGT provided a powerful tool in discriminating between problem drinkers and social drinkers.
That study raises several practical and methodological questions regarding the use of a combination of biochemical markers in the diagnosis of problem drinking:
Can the predictive function derived from the study (i.e. 0.8*lnGGT + 1.3*lnCDT) be usefully applied to other settings?
How consistent are the results obtained if logistic regression is used to derive the predictive function instead of discriminant analysis (Su et al., 1993)?
Does the addition of information on non-alcohol related clinical factors such as obesity and smoking improve the accuracy of the predictive function?
To what extent does the addition of information on past history of alcohol dependence improve the accuracy of the predictive function?
Should different combination rules be used for population subgroups such as men and women?
Does a nonlinear combination of markers provide greater predictive accuracy than a linear combination?
These issues can best be addressed by a large scale study with sufficient sample size and population heterogeneity. The multicentre study of biological markers of alcohol use initiated by the World Health Organization (WHO) and the International Society for Biomedical Research on Alcoholism (ISBRA) provides such an opportunity (Helander and Tabakoff, 1997; World Health Organization, 1999
; Conigrave et al., 2002
). This report uses the data from this study to address the above questions.
![]() |
MATERIAL AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Interview and blood sample
All patients were interviewed using the WHO/ISBRA schedule. This is closely based on the AUDADIS interview (Grant et al., 1995) and takes 7090 min to complete. The questionnaire includes demographic characteristics, amount of alcohol consumption in the past 30 days, symptoms of harmful alcohol use, abuse or dependence (ICD-10 and DSM criteria), affective changes associated with drinking, illicit drug use; medical and psychiatric history. Alcohol-dependent inpatients were interviewed within 72 h of entering the treatment facilities and where possible, were reinterviewed approximately 2 weeks later. Following the Universal Precautions Guidelines for biological samples, two 20-ml blood samples were obtained from each individual at the completion of the interview. Serum samples were then sent packed in dry ice to the Helsinki sample processing centre, from which aliquots were distributed to the relevant assay centre. A detailed description of the sampling and testing procedure has been presented elsewhere (World Health Organization, 1999
). Questionnaires were dispatched to the data coordinating centre in Denver.
GGT and AST were assayed by reflectance spectrophotometry using a Vitros 250 Analyser (Ortho Clinical Diagnostics). Serum CDT determinations were carried out using the Pharmacia CDTectTM test. This test employs separation of transferrin isoforms on an anion exchange chromatography microcolumn followed by quantification by a double antibody radioimmunoassay. It provides test performance that correlates strongly with the BioRad %CDT method (r = 0.84) (Anton et al., 2001).
Statistical methods
The distributions of the markers (CDT, GGT and AST) were examined and summary statistics computed on both the original and natural logarithmic (lnCDT, lnGGT and lnAST) scales. The logarithmic transformation was used because the marker distributions were positively skewed.
To allow direct comparison with the results of Sillanaukee and Olsson (2001), problem drinking was defined as an average daily alcohol consumption of 60 g/day or more for both men and women. Lower levels of consumption were classified as social drinking. Discriminant analysis was used to carry out stepwise selection of the three biochemical markers (lnCDT, lnGGT and lnAST) to determine the best algorithm for differentiating between problem drinkers and social drinkers/ abstainers. A split-sample strategy was adopted to examine the stability of the results. Half of the total sample for each recruitment site was chosen at random to form a calibration sub-sample for estimating the model parameters. This differed from the approach of Sillanaukee and Olsson (2001)
who used two of their six study sites as the calibration sample, then validated the findings using data from the other four sites. The models were then applied to the whole sample. The analysis was repeated using logistic regression to determine if the results were comparable with those obtained by discriminant analysis.
Further analyses explored the contribution of clinical variables to the predictive function. For these we used logistic regression in preference to discriminant analysis. Logistic regression may be more appropriate in deriving predictive functions when the clinical variables are categorical (e.g. sex and study site) so that the multivariate normality assumptions of discriminant analysis are less likely to be met. Also, the logistic regression procedures available in many standard software packages can be used to assess the diagnostic performance of a prediction rule by estimating the ROC curve and the area under the curve (AUC). For simple models with single markers only, whether using untransformed data (CDT, GGT or AST) or logarithmic transformation, the ROC curve and AUC will be the same. However, for complex models, where markers were combined with clinical variables, logarithmic transformation produced statistically significant better discriminating power (i.e. significantly higher AUC) compared with algorithms using untransformed data. For the purposes of consistency, and to allow assessment of the incremental benefit of adding variables, log transformation is used for the markers throughout.
The relative performance of alternative prediction rules was assessed by comparing their AUC using roccomp and rocgold in Stata (Statacorp, 2001). These procedures implement a nonparametric test developed by DeLong et al. (1988)
which takes into account the paired nature of the analysis (i.e. alternative prediction rules are compared within the same patient). Listwise deletion of missing values was used to ensure that model comparisons were based on the same set of subjects. After deletion of cases with missing values, models that included covariate information were based on 1552 subjects.
Interaction effects between gender, body mass index (BMI) and markers were examined. Because a significant interaction between gender and markers was found, separate analyses were then undertaken for men and women to derive predictive functions based on both clinical and laboratory data. Backward elimination of nonsignificant variables was used to select the best predictive model. Nonlinear relationships of the markers to the diagnosis of excessive consumption were also considered using cubic polynomial regression. Both statistical significance and the improvement in the AUC were used as criteria for including non-linear terms.
To assist clinical application of the results, a plot of the predicted post-test probability of problem drinking was created for the best marker or combination of markers separately for females and males. For both women and men, three post-test probability plot curves were produced: for the whole sample with only marker(s) included, and for higher risk and lower risk groups. These groups were derived from the complex models with clinical information included. As results indicated (see below) that both CDT and GGT contribute significantly to diagnostic accuracy in men, we re-estimated the complex model in men using ln(CDT2 x GGT) as a single independent variable. This was because the ratios of the regression coefficient of lnCDT to lnGGT were close to 2 in both Sillanaukee's and our analyses. The post-test probabilities were estimated using the formula:
![]() |
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
AUC based on different models for the whole sample
The AUC was estimated for each individual marker giving 0.74 for CDT, 0.74 for GGT and 0.70 for AST. Table 2 shows the logistic regression results for each possible two marker combination (lnCDT + lnGGT, lnGGT + lnAST, lnCDT + lnAST) with and without adjustment for gender, age groups, body mass index (BMI), ethnicity, smoking status and past alcohol dependence.
|
Does the diagnostic value of markers vary with sex or body mass index?
There were significant interaction effects between sex and markers (P < 0.001) but no significant interaction effects between BMI and markers. Stratifying by sex, lnCDT + lnGGT resulted in the highest AUC (0.82) for men. The AUC for lnGGT + lnAST and lnCDT + lnAST were 0.73 and 0.80, respectively. For women, the model with lnGGT as the only marker yielded an AUC of 0.78. This did not differ significantly (P = 0.42) from the AUC for lnCDT + lnGGT (0.80). The AUC for the other combinations, lnGGT + lnAST and lnCDT + lnAST, were 0.78 and 0.77, respectively. The above AUC are for biochemical markers alone without clinical information. Based on these results, all subsequent analyses are stratified by sex.
The contribution of clinical information to the AUC
Table 3 presents three different models for men based on lnCDT + lnGGT: (1) markers without any clinical information; (2) markers with non-alcohol clinical variables; and (3) the above plus past alcohol dependence. The site effect was not included in the model as the aim was to examine the contribution of clinical information given the pooled average site effect.
|
|
|
Figure 2a shows that for a female smoker with a history of alcohol dependence, with a pre-test probability of current problem drinking of 53%, a GGT value of 40 U/l corresponds to a 60% chance of being a problem drinker. For an ex-smoker with no history of alcohol dependence, with a pre-test probability of current problem drinking of 2%, the same GGT value corresponds to a 3% chance of being a problem drinker. As expected, more extreme values of GGT led to a greater difference between the pre- and post-test probabilities.
|
Figure 2b shows that for an 1829-year-old normal-weight man with a GGT of 67 U/l and a CDT of 30 U/l (GGT x CDT2 = 60 000) the chance of being a problem drinker is 90% for a smoker with a history of alcohol dependence and 37% for a non-smoker with no history of alcohol-dependence. Figure 2b also illustrates the variation in probability of problem drinking in both the high-risk and low-risk groups as values for the CDT- and GGT-based algorithm vary.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It should be noted that even when combining markers to maximize diagnostic accuracy, the blood tests still achieved no higher than 75% sensitivity for specificities of >75%. However, in situations where clinical history is unreliable or unobtainable, laboratory markers provide a valuable aid to diagnosis. The current analysis has shown that whatever clinical information is available (e.g. age, BMI, ethnicity and smoking status in men, and smoking alone in women), combined with the test results can significantly improve diagnostic accuracy. In clinical practice, an estimate of the probability of problem drinking will usually precede the laboratory test results, and this generally helps determine the pre-test probability. If tests are used as a screening instrument without assessment by a clinician, then further information about the patient will improve upon the performance of tests alone. In our study, we produced several post-test probability plots which demonstrated how clinical information can contribute to the predicted probabilities given the same marker values, and marker values can change the clinically based pre-test probability estimates. The method presented here for computing the post-test probability of problem drinking provides a formula that can be adopted by clinicians. Alternatively, our graphs (Figs 2a,b) allow clinicians to estimate post-test probabilities given their own estimate of pretest probability for a particular patient and the marker results.
Although the regression coefficients estimated by discriminant analysis in this study differ from those given by Sillanaukee and Olsson (2001), and also differ from those estimated using logistic regression, the AUC associated with these three predictive functions using CDT + GGT were almost indistinguishable when applied to the current sample. This suggests stability for the combination rules derived from these two studies and derived using these two methods. The stability is explained by the fact that the ratio of the two regression coefficients for CDT and GGT was similar for the three models, and it is this ratio which determines the AUC of an ROC curve (Pepe and Thompson, 2000
). The similarity of the ratio of the coefficients across studies indicates stability in the relative contribution of the two markers to the predictive function. This may, in part, be due to the similarity of the sampling strategy used in the two studies. Both used stratified recruitment targeting certain levels of alcohol consumption, rather than random or consecutive sampling.
It is important to note that the reference standard used in this study (self-reported alcohol consumption) is itself subject to error. This is the case for most studies on drinking. If different studies are based on the same reference standard (e.g. a set level of excessive alcohol consumption) but have different degrees of measurement error and differences in underlying population prevalence of excessive consumption, the regression coefficients and AUC from these studies could differ considerably (Irwig et al., 2002). The effect is likely to be an underestimation of the AUC (Walter et al., 1999
), although the relative contribution of the markers may be similar. Each regression coefficient will be downward-biased, with the extent of this bias depending on the degree of the misclassification of the reference variable. This may partly explain why the absolute values of regression coefficients and AUC differed between the Sillanaukee study and this one, but the combination rules (understood as the ratio of the two coefficients) were remarkably similar.
Although Sillanaukee and Olsson (2001) considered including gender as a covariate in the discriminant function, the possible modification effect of gender when using markers in combination was not examined. Our results indicate that gender is indeed an effect modifier and the advantage of combining the markers was more evident among men than among women. For women, GGT alone provided the most efficient biochemical method overall for diagnosing problem drinking. This finding is important as it may avoid unnecessary cost and inconvenience in performing multiple tests in women. In contrast to the current results, Anton and Moak (1994)
, found that a result above the reference range for either CDT or GGT was more effective in detecting alcohol dependence in women than using either marker alone (sensitivity 72%, specificity 90% for the combined test). In that study 18 female alcohol dependent subjects were compared against 21 controls who were either minimal alcohol users or complete abstainers (Anton and Moak, 1994
). In the current study there is a wider range of intermediate drinking patterns. The study of Huseby et al. (1997)
on the combination of CDT and GGT results did not have sufficient numbers of female subjects to allow firm conclusions on gender differences. It is important to note that in the current study the area under the whole ROC curve was used as the criterion for selecting the best algorithm. Further research may be needed to ascertain whether the partial AUC (e.g. the area underneath the ROC curve when specificity is larger than a certain value, say, 0.8) would be a better criterion for deriving the algorithm (Pepe and Thompson, 2000
).
Adding clinical data greatly increased the predictive accuracy of the final diagnostic algorithm. However, it is important to note that these results may have limited generalizability. The sampling frame used in this study targeted community based and alcohol treatment services patients. Subjects were volunteers, in many cases recruited by advertisement or convenience sampling, whereas the ideal would be patients in whom the issue of chronic problem drinking is a clinically relevant question. A stratified sampling frame was employed to ensure adequate numbers of subjects across age, sex and alcohol consumption categories. Hence, the incremental value of the various combinations of clinical information may not be applicable in a population based screening programme or other clinical settings (Irwig et al., 2002). Similarly, one would expect the diagnostic accuracy of individual markers to vary across settings. However, the utility of combining markers is likely to be generalizable from one setting to another under the assumption that the correlation between markers remains constant in both the diseased and non-diseased groups.
The assay used for measurement of CDT (CDTect) has recently become unavailable. However the commercial kit which is currently widely used, Biorad %CDT, has been reported to have very similar performance (r = 0.85) to that of CDTect (Scouller et al., 2000; Anton et al., 2001
). We do not have data on total transferrin and so cannot comment on effects of unusually high or low transferrin levels. As analyses were restricted to those without self-reported liver disease, and all subjects were apparently healthy (apart from alcohol withdrawal in the dependent patients), extreme values of transferrin are unlikely. Furthermore, we note that our findings of lesser benefit of the addition of CDT to GGT in women than in men, are consistent with findings of reduced accuracy of the %CDT methods in women (Helander et al., 2001
). Accordingly, the findings of the present study on the principle of combining values of GGT and CDT, and considering other patient characteristics are likely to be transferable to other assay methods.
We have provided results for all possible pairs of markers for the whole sample because the best combination may depend on factors other than statistical significance. For example, cost, ease of implementation, and individual clinical factors may affect the clinical utility of each of these combinations. There were relatively small differences in diagnostic accuracy (as measured by AUC) among three different linear combinations of markers (0.81, 0.77 and 0.75 for lnCDT + lnGGT, lnCDT + lnAST, lnGGT + lnAST, respectively). Although the lnCDT + lnGGT combination was statistically the best combination, the increase in diagnostic accuracy was relatively small, and this choice of tests should be open to reconsideration if other clinical or cost factors outweigh this limited benefit. As an increasing amount of clinical data is entered into the model, the differences between the test combinations become smaller. For example, if past history of alcohol dependence is entered along with sex, age, BMI, ethnicity and smoking, then all three possible combinations of markers have an AUC of 0.890.90. It is important to recall that this study excluded persons with self reported liver disease. The use of biological markers, or their combinations, to detect drinking in patients with known liver disease requires further study.
In summary, we found that the value of combining lnCDT and lnGGT was particularly evident in men. A simple linear combination of the markers (in logarithmic values) worked well and the inclusion of clinical information significantly contributed to an accurate diagnosis. The combination rules were consistent across two studies and the three methods used. We present a simple graphical method which can be used by the clinician to integrate clinical and biochemical data to estimate the probability of drinking 60 g ethanol or more per day. This method warrants evaluation in other populations.
![]() |
Acknowledgements |
---|
![]() |
FOOTNOTES |
---|
*** Other WHO/ISBRA study group members: B. Tabakoff (Project Coordinator), S. Borg, M. Dongier, H. Edenberg, C. J. P. Eriksson, M. L. O. S. Formigoni, B. F. Grant, P. L. Hoffman, K. Kiianmaa, T. Koyama, L. Legault, T-K. Li, M. G. Monteiro, M. Ogata, T. Saito, M. Salaspuro, S. Tufik.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Anton, R. F., Dominick C., Bigelow, M., Westby, C. with the CDTect Research Group (2001) Comparison of Bio-Rad %CDT TIA and CDTect as laboratory markers of heavy alcohol use and their relationships with gamma-glutamyltransferase. Clinical Chemistry 47, 17691775.
Conigrave, K. M., Saunders, J. B. and Whitfield, J. B. (1995) Diagnostic tests for alcohol consumption. Alcohol and Alcoholism 30, 1326.[Abstract]
Conigrave, K., Degenhardt, L., Whitfield, J. B., Saunders, J. B., Helander, A. and Tabakoff, B. et al. (2002) CDT, GGT and AST as markers of alcohol use: The WHO/ISBRA collaborative project. Alcohol: Clinical and Experimental Research 26, 332339.[ISI][Medline]
DeLong, E., DeLong, D. and Clarke-Pearson, D. (1988) Comparing the areas under two or more correlated receiver operating characteristics curves: a non-parametric approach. Biometrics 44, 837845.[ISI][Medline]
Grant, B. F., Harford, T. C., Dawson, D. A., Chou, P. S. and Pickering, R. P. (1995) The Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS): reliability of alcohol and drug modules in a general population sample. Drug and Alcohol Dependency 39, 3744.[CrossRef][ISI][Medline]
Hartz, A., Guse, C. and Kajdacsy-Balla, A. (1997) Identification of heavy drinkers using a combination of laboratory tests. Journal of Clinical Epidemiology 50, 13571368.[CrossRef][ISI][Medline]
Helander, A. and Tabakoff, B. (1997) Biochemical markers of alcohol use and abuse: experiences from the Pilot Study of the WHO/ISBRA Collaborative Project on state and trait markers of alcohol. International Society for Biomedical Research on Alcoholism. Alcohol and Alcoholism 32, 133144.[Abstract]
Helander, A., Fors, M. and Zakrisson, B. (2001) Study of Axis-Shield new %CDT immunoassay for quantification of carbohydrate-deficient transferrin (CDT) in serum. Alcohol and Alcoholism 36, 406412.
Huseby, N.-E., Nilssen, O. and Kanitz, R.-D. (1997) Evaluation of two biological markers combined as a parameter of alcohol dependency. Alcohol and Alcoholism 32, 731737.[Abstract]
Irwig, L., Bossuyt, P., Glasziou, P., Gatsonis, C. and Lijmer, J. (2002) Evidence base of clinical diagnosis: Designing studies to ensure that estimates of test accuracy are transferable. British Medical Journal 324, 669671.
Pepe, M. S. and Thompson, M. (2000) Combining diagnostic test results to increase accuracy. Biostatistics 1, 123140.
Scouller, K., Conigrave, K. M., Macaskill, P., Irwig, L. and Whitfield, J. B. (2000) Should we use carbohydrate deficient transferrin instead of gamma-glutamyltransferase for detecting problem drinkers? A systematic review and meta-analysis. Clinical Chemistry 46, 18941902.
Shaper, A. G., Pocock, S. J., Ashby, D., Walker, M. and Whitehead, T. P. (1985) Biochemical and haematological response to alcohol intake. Annals of Clinical Biochemistry 22, 5061.[ISI][Medline]
Sillanaukee, P. (1992) The diagnostic value of a discriminant score in the detection of alcohol abuse. Archives of Pathology and Laboratory Medicine 116, 924929.[Medline]
Sillanaukee, P. and Olsson, U. (2001) Improved diagnostic classification of alcohol abusers by combining carbohydrate-deficient transferrin and gamma glutamyltranseferase. Clinical Chemistry 47, 681685.
Statacorp (2001) Stata Statistical Software. College Station, Stata Corporation, Texas.
Su, J. Q. and Liu, J. S. (1993) Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association 88, 13501355.[ISI]
Walter, S. D., Irwig, L. and Glasziou, P. (1999) Meta-analysis of diagnostic tests with imperfect reference standards. Journal of Clinical Epidemiology 52, 943951.[CrossRef][ISI][Medline]
World Health Organization (1999) WHO/ISBRA Study on biological state and trait markers of alcohol use and dependence: progress report. Social Change and Mental Health. Substance Abuse Department, World Health Organization, Geneva.