1 School of Finance and Applied Statistics, Australian National University, Canberra, Australia
2 Division of Clinical Epidemiology, Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas
3 CNR Institute of Clinical Physiology, Pisa, Italy
4 Division of Diabetes, Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, Texas
5 Clinical Diabetes and Nutrition Section, National Institute of Diabetes and Digestive and Kidney Diseases/National Institutes of Health, Phoenix, Arizona
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
There is abundant evidence that insulin resistance is a precursor of type 2 diabetes (1,2) and perhaps of cardiovascular disease as well (35). The latter association, which is independent of diabetes, may be partially a consequence of the relationship between insulin resistance and the "metabolic syndrome," which consists of obesity, particularly abdominal obesity; impaired glucose regulation; dyslipidemia of the high-triglyceride/lowHDL cholesterol type; and hypertension (4,6).
A number of techniques are available for making definitive measurements of insulin resistance, including the hyperinsulinemic-euglycemic clamp technique (7), the frequently sampled intravenous glucose tolerance test (8), and the insulin suppression test (9,10). These techniques, however, are complicated, cumbersome, and, in general, not suitable for large-scale population studies or routine clinical work. For that reason a wide variety of indexes based on simpler, clinical measurements have been proposed for assessing insulin resistance. We recently reviewed a number of these indexes (11). Most have been validated with either the euglycemic clamp or the frequently sampled intravenous glucose tolerance test, but the populations in which these validations have been carried out typically have been small to moderate, ranging from <50 to 650. An exception is IRAS (the Insulin Resistance Atherosclerosis Study), which validated several indexes, including the homeostasis model assessment of insulin resistance (HOMA-IR), in 1,460 individuals (12). Although the HOMA-IR (13) has been the most widely used of these indexes, neither it nor any of the others has become the standard for diagnosing insulin resistance.
Indexes of insulin resistance have acquired new salience with the development of various pharmaceutical agents, specifically metformin and the thiazolidinediones, that sensitize the body to the action of endogenous insulin. Although initially developed for the treatment of diabetes, these agents also have a potential role in reducing the risk of diabetes and perhaps also of cardiovascular disease in insulin-resistant nondiabetic individuals. Moreover, the potential public health impact of such treatment could be large because it has been estimated that in developed countries as many as 25% of the nondiabetic population are as insulin resistant as patients with type 2 diabetes (3). Clinical trials would of course be needed to document the benefits of treating insulin-resistant nondiabetic individuals with insulin-sensitizing agents. Efforts to document the benefits of such treatment, however, have been hampered by the lack of an accepted method for assessing insulin resistance based on routine clinical measurements. Although a clinical trial could conceivably be performed based on enrolling insulin-resistant patients as defined by one of the definitive tests, translation of the results of such a trial into ordinary clinical practice would be problematic, given the lack of a clinical test for identifying the target population for treatment.
In the current study we have assembled what we believe to be the largest collection of euglycemic clamp data in the world from numerous research centers, and we have used recursively partitioned classification trees to develop decision rules for identifying insulin-resistant individuals based on routinely available clinical measurements.
![]() |
RESEARCH DESIGN AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The response variable was insulin-stimulated whole-body glucose disposal (µmol/min · kg lean body mass). Predictor variables included sex, weight (kg), BMI (kg/m2), lean body mass (kg), waist and hip circumferences (cm), fasting glucose (mmol/l), fasting insulin (pmol/l), total cholesterol (mmol/l), LDL and HDL cholesterol (mmol/l), free fatty acids (µmol/l), triglycerides (mmol/l), systolic and diastolic blood pressure (mmHg), and family history of a first-degree relative with diabetes. We also evaluated certain combined variables, namely, the triglyceride-to-HDL ratio and the HOMA-IR (fasting insulin x fasting glucose/22.5, with fasting insulin expressed in µU/ml and fasting glucose expressed in mmol/l) (13).
Statistical methods
Bimodal normal mixture models.
Based on the appearance of the histogram of the euglycemic clamp values, we elected to fit a bimodal normal mixture model to the distribution. Maximum likelihood methodology was used to estimate the means and standard deviations of the two hypothesized modes (17,18). In addition, the suitability of the bimodal normal mixture model was assessed using likelihood ratio testing and graphic diagnostics. The optimal cut point for distinguishing insulin-resistant from insulin-sensitive individuals based on clamp measurements was then set so as to maximize the sum of theoretical sensitivity and specificity, as determined from the fitted bimodal normal mixture distribution. The distribution of clamp values among diabetic subjects was used to further assess the suitability of the cut point suggested by the bimodality analysis.
Classification trees.
Recursively partitioned classification trees were used to model the relationship between insulin resistance and various combinations of clinical covariates. Sequential partitioning or "splitting" of the covariate space produces an ever-expanding number of "compartments" or "nodes." Each successive split is chosen to minimize the associated binomial deviance, so that nodes become increasingly homogeneous with respect to the proportion of individuals within them who are either insulin resistant or insulin sensitive (19,20). In theory this process can continue until all nodes are completely homogeneous, although this would result in an inordinately large number of nodes. Moreover, such an approach would inevitably lead to overfitting the data at the expense of generalizability to other datasets. Using cross-validation, however, it is possible to appropriately assess the balance between the discrimination of the two classes of individuals and the risk of overfitting the trees (20). A random 10-fold cross-validation was performed by dividing the dataset into 10 random subsets and then using all possible combinations of 9 subsets (development subsets) to develop sequences of trees of different sizes that were then tested on the remaining 10th subset (validation subset). The results of the cross-validation provide an unbiased assessment of the predictive accuracy of tree models of different sizes. We used "cost-complexity pruning" (19) to identify the tree size that optimized the trade-off between two competing aims: internal discrimination and external validity. Cost-complexity pruning measures the adequacy of any given tree model using a penalized version of its degree of node homogeneity (as measured by the binomial deviance), where the penalty is proportional to the size of the tree, i.e., its number of nodes. In addition, the random 10-fold cross-validation results allow for investigation of the consistency of split choices in the 10 development subsets, as well as the sensitivities, specificities, and areas under the receiver operating characteristic curves (aROCs) on each of the validation subsets.
Subjects for whom certain covariate data were missing were retained in the analyses using the method of "surrogate splits" (21,22). This method assigns a secondary, "surrogate" covariate to each split in the tree model, allowing classification of individuals with missing values for the primary covariate to be made on the basis of the associated surrogate covariate. The choice of surrogate covariates is made by identifying the covariate split that most closely matches the actual split among those individuals for whom both the actual and the surrogate covariates are available. There were no missing values for BMI. For the other covariates that figured in the ultimate decision rules, namely HOMA-IR, family history, and triglycerides, the percentages of missing values were 6.1, 28.1, and 34.8%, respectively.
Covariate selection.
Initially, a screening tree was fit using all available predictors except ethnicity and age. Exclusion of ethnicity allowed construction of models for insulin resistance based on physiological and biochemical covariates with which ethnicity may have been associated. Such models may be more generalizable, particularly to ethnic groups not represented in the dataset. We also did not use age in the modeling process because age is minimally, if at all, related to insulin resistance (14). Moreover, in the current dataset, Pima Indians tended to be younger than the other ethnic groups; none of the Pima Indians, for example, was aged >50 years, whereas 21.8% of San Antonio subjects and 31.8% of EGIR subjects were aged >50 years. Thus, age tended to act as a surrogate marker for Pima Indians. The covariates that significantly contributed to the screening tree were used as a covariate subset for further, more focused model fitting. Because of the recursive, binary splitting structure of tree model construction, there is no need to explicitly include covariate interactions or transformations.
In addition to the tree model incorporating all available covariates, two additional tree models were fit based on predictor subsets chosen to reflect various practical considerations, such as the ease of obtaining and the degree of standardization of the covariate measurements. The first of these two additional models was based on routine clinical measurements, excluding any that required obtaining a blood specimen. The second model was fit using these same clinical measurements, but it also incorporated the lipid measurements, though not the insulin measurement.
Development of decision rules.
Once subjects have been assigned to nodes, a decision rule for classifying them can be developed by labeling certain of the nodes as "test positive" (i.e., insulin resistant, in the current instance) and the remaining nodes as "test negative." The choice of which nodes to label as "test positive" is based on the proportion of "true positive" individuals (i.e., insulin resistant by clamp) in that node. The proportions of true positives in each terminal node can be thought of as an insulin resistance "score" for that node, and the decision as to which scores correspond to "test positive" or "test negative" can be chosen with a view to balancing sensitivity and specificity considerations, depending on program needs.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Tree models for predicting insulin resistance using all predictors.
In the tree model based on all of the predictors, the following predictors were statistically significant: HOMA-IR, fasting plasma insulin, BMI, waist circumference, and LDL cholesterol. Thus, subsequent, more-focused tree model analysis considered only these variables. HOMA-IR was included in the final model rather than fasting plasma insulin because, in the tree model construction process, it tended to be selected ahead of the latter by the recursive partitioning algorithm. However, because HOMA-IR and fasting plasma insulin were highly correlated (r2 = 0.969), the fasting plasma insulin cut points corresponding to the HOMA-IR cut points can be calculated from the regression equation: fasting plasma insulin (in pmol/l) = 6.786 + (25.314 x HOMA-IR). Figure 2 depicts the classification tree model using the predictors HOMA-IR, BMI, waist circumference, and LDL. The aROC for this model is 90.0%. The number of insulin-resistant individuals and the total number of individuals (insulin resistant + noninsulin resistant) in each node are presented in Fig. 2. The proportion of insulin-resistant individuals is shown for each of the eight terminal nodes (labeled 1 through 8).
|
The above prediction rule has an estimated sensitivity and specificity of 84.9 and 78.7%, respectively, obtained by summing the insulin-resistant and the noninsulin-resistant individuals in the nodes declared to be test positive (nodes 48, true positives and false positives) and similarly obtaining the true negatives and the false negatives from the remaining nodes (nodes 13). Other choices for the predictive cutoff value (i.e., other than 0.25) will lead to different prediction rules and different associated sensitivities and specificities. The results of the random 10-fold cross validation showed a strong degree of consistency in the splitting choices for the 10 development subsets and also satisfactory external validity (as judged by the sensitivities, specificities, and aROCs at the 0.25 prediction cutoff) in the 10 validation subsets (online appendix [available at http://diabetes.diabetesjournals.org]).
Models using clinical variables not requiring blood specimens.
Figure 3 shows the fitted tree based on only the basic clinical predictors. The aROC for this tree was 85.0%. Again, when nodes in which the proportion of insulin-resistant individuals is 0.25 (i.e., nodes 49 in Fig. 3) are declared to be test positive, the decision rule for insulin resistance is to predict an individual to be insulin resistant if either of the following conditions is met: 1) BMI >28.7 kg/m2 or 2) BMI >27.0 kg/m2 and family history of diabetes is positive. This decision rule has an estimated sensitivity and specificity of 78.7 and 79.6%, respectively.
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Using recursively partitioned classification trees, we have developed simple decision rules for identifying individuals deemed insulin resistant by the euglycemic insulin clamp technique. These decision rules are based on routine clinical measurements and appear to have acceptable sensitivity and specificity. Also, by performing a random 10-fold cross validation and by focusing on physiological and biochemical variables, rather than study-specific variables such as ethnicity, we hoped to enhance the ultimate generalizability of the decision rules. Nevertheless, it must be acknowledged that certain populations, such as Asians, may be more insulin resistant for a given BMI than Caucasians (25,26). Thus, the performance of our decision rules may be suboptimal in these populations.
The most accurate decision rule is based on HOMA-IR (which requires a measurement of fasting insulin concentration) and BMI. However, only slightly less accurate rules can be derived that do not require insulin measurements or that do not require obtaining a blood specimen at all. In view of the lack of standardization of insulin assays, these latter rules may be preferred.
The sensitivities and specificities of the decision rules we have presented flow from our decision to declare nodes to be test positive if the proportion of insulin- resistant individuals in them was 0.25. Applying other thresholds to our classification trees leads to different decision rules with different sensitivities and specificities. If, for example, one desires a highly specific rule, one might choose to declare as test positive only individuals in nodes in which the proportion of truly insulin-resistant individuals (by clamp) was at least 40% (nodes 58 in the classification tree depicted in Fig. 2). This would lead to the following decision rule: declare an individual to be test positive if HOMA-IR >4.65 or if HOMA-IR >3.60 and BMI >27.5 kg/m2; otherwise, declare that individual to be test negative. This rule has a sensitivity and specificity of 77.0 and 88.4%, respectively. Alternatively, if one accepted as insulin resistant those in nodes where the proportion of individuals with the condition was 50% or better (nodes 68 in the classification tree depicted in Fig. 2), the decision rule would be to declare the individual to be test positive if HOMA-IR >4.65; otherwise, declare the individual to be test negative. This rule has a sensitivity and specificity of 71.4 and 92.0%, respectively.
Highly specific decision rules, such as those just discussed, might be useful for a clinical trial where one wished to be highly certain that the enrolled participant actually had the condition in question and where sensitivity was a lesser concern. Of course, such a strategy might mean that when the results of the clinical trial, assuming a positive outcome, were translated into clinical practice, the public health impact might be compromised because, owing to the reduced sensitivity of the entry criteria, many who might have benefited from the treatment would not have been deemed eligible for the trial.
Classification trees are a nonparametric alternative to classical statistical discrimination techniques, such as logistic regression. Their advantages include the lack of a required a priori choice of model structure for the predictor scales and interactions, the ease of incorporation of observations with missing covariate values, and the simplicity and interpretability of the resultant prediction rules. Because the partitioning scheme is recursive, each successive split of the predictor space is conditional on all previous splits. Thus, interactions or nonlinear structures in the relationship between the predictors and the response variable can be captured automatically. For example, if a split occurs based on a particular cut point for BMI, it may be that the subsequent split for those above the BMI cut point might be based on HOMA-IR, whereas the subsequent split for those below the BMI cut point might be based on triglycerides. Such a three-way interaction term would almost never be detected by traditional regression techniques such as multiple logistic regression analysis, where it would rarely, if ever, even be sought in the absence of a powerful prior hypothesis. In addition, because the splits are simple bifurcations along predictor axes, the process is invariant to monotonic rescalings of the predictors. Thus, whether a predictor or, say, its logarithm is used has no effect on the resultant analysis.
In the current instance, multiple logistic regression analyses were also performed and gave results, in terms of sensitivities, specificities, and aROCs, that were generally similar to those obtained with the tree-based models (27). Application of logistic regression models, however, requires computing scores that are typically less readily interpretable than decision rules based on tree models. Moreover, the structure of a decision tree often leads to insights into the data that are not as easily gleaned from classic parametric analyses without a more detailed mathematical understanding of their model structure. In addition, decision trees permit some individuals to be classified on the basis of only one, or at most a few, measurements, whereas scores derived from multiple logistic regression models require that all covariates be available.
Tree-based models typically make use of a greater percentage of the available data. Logistic regression models, on the other hand, are limited to individuals for whom none of the covariates are missing, unless one imputes values for missing covariates. This practice is often discouraged, however. A disadvantage of imputed values is that they are not "real" data, whereas surrogate splits, used to maximize the amount of the data being utilized, are based on genuine observations.
Recently, McLaughlin et al. (28) reported that among 258 overweight individuals, fasting insulin, triglycerides, and the triglyceride-to-HDL ratio were the best predictors of insulin resistance, as defined by the insulin suppression test. The sensitivities of their cutoff points ranged from 57 to 67% and the specificities from 68 to 85%. These results are not necessarily incompatible with our results because they pertain specifically to overweight individuals (25 kg/m2), among whom the effect of BMI itself would presumably be attenuated, permitting the emergence of other predictors, in this case triglycerides and the triglyceride-to-HDL ratio. Moreover, it should be noted that our models allow for the identification of insulin-resistant patients who are of normal weight, but who might nevertheless be candidates for insulin-sensitizing treatment.
In conclusion, we have shown that it is possible to identify individuals who are insulin resistant using routine clinical measures, thereby improving the likelihood that recognition of this important harbinger of serious diseases (i.e., diabetes and cardiovascular disease) will be incorporated into clinical trials and ordinary clinical practice.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Address correspondence and reprint requests to Michael P. Stern, MD, Division of Clinical Epidemiology, Department of Medicine, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr., San Antonio, TX 78229-3900. E-mail: stern{at}uthscsa.edu
Received for publication May 4, 2004 and accepted in revised form October 27, 2004
aROC, area under the receiver operator characteristic curve; EGIR, European Group for the Study of Insulin Resistance; HOMA-IR, homeostasis model assessment of insulin resistance
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|