1 Research Division, Joslin Diabetes Center, Boston, Massachusetts
2 Department of Medicine, Harvard Medical School, Boston, Massachusetts
3 Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
![]() |
ABSTRACT |
---|
![]() |
INTRODUCTION |
---|
For this perspective, diabetes duration is defined as the period of time from diabetes onset until complication onset. We illustrate the potential importance of this variable by exploring several genetic models that fit the epidemiologic characteristics of diabetic nephropathy as diagnosed by persistent proteinuria. We concentrate on this stage of diabetic nephropathy because duration-related mortality should play a relatively minor role before the onset of proteinuria (1), thereby allowing us to focus solely on the impact of precomplication duration of diabetes.
Two classes of genetic studies are considered, one based on case-control analysis and the other based on family-based trio analysis. These two designs, the workhorses for evaluating whether specific genetic variants are associated with a disease end point, are emerging as the most commonly used tools for studying the genetics of diabetic complications. Case-control studies are attractive because they do not require identifying families with multiple occurrences of diabetes and because power per individual sampled typically is high. Unfortunately, case-control studies are also susceptible to bias if case and control subjects are not drawn from genetically similar populations. Family-based trio analysis, as proposed by Spielman et al. (7,8), overcomes this problem by eliminating the need for a control group. Most commonly, each trio comprises an affected offspring together with both parents. Studies based on such trios ("affected offspring trios") are logistically demanding, because parents must be identified and enrolled, and typically less powerful than case-control studies. However, they are more robust in the sense that matching of case and control subjects is not problematic. This follows from the fact that genetic variants (i.e., alleles) in parents serve as a reference set for comparison with the alleles in the offspring. Families with unaffected offspring ("unaffected offspring trios") can sometimes serve as a useful alternative to affected offspring trios, and we consider this design as well.
In both case-control and family-based trio studies, phenotypic dichotomy is usually assumed, implying, for example, that a case of diabetes with 10 years of duration before onset of complication is treated the same as a case of diabetes with 25 years of duration. In a general context, Morton and Collins (9) recommended various ways to define "hypernormal controls," such as excluding unaffected individuals who are still young, but much more can be done in the area of late diabetic complications to improve on the standard yes/no definition of disease. To show how diabetes duration before onset of complication can be useful for this purpose, we adopt the approach that Li and Hsu (10) proposed to study age at onset in affected offspring trio studies and extend their work to accommodate both unaffected offspring trio and case-control designs. By evaluating a number of genetic models consistent with epidemiologic data on the occurrence of proteinuria in type 1 diabetes, we show how power and, in some instances, validity could depend on duration of diabetes before onset of complication. Finally, we suggest several ways to incorporate duration data into genetic association analyses of late diabetic complications.
![]() |
MODELING DURATION IN DIABETIC COMPLICATIONS |
---|
|
![]() |
GENETIC MODELS CONSIDERED |
---|
To explore these two extremes, we consider two dominant acting genes, one with a major impact on susceptibility and one with a minor effect. Parameters are chosen so that carriers of the major gene risk allele have a lifetime risk of persistent proteinuria of 70% compared with 12% for noncarriers (Fig. 1A). Carriers also tend to develop disease more quickly than noncarriers. In contrast, the minor gene model considered here acts primarily by accelerating disease onset. Consequently, both carriers and noncarriers of the minor gene risk allele have a lifetime risk of 35% (Fig. 1D). In both cases, we assume risk allele frequency of 20%, although other frequencies are considered in sensitivity analyses along with other modes of inheritance. Environmental factors such as glycemic control and other genetic effects need not be explicitly modeled in this approach.
![]() |
EFFECT OF DURATION ON FAMILY-BASED TRIO STUDIES |
---|
For the major gene model (Fig. 1A), power to detect excess transmission of the risk allele to affected offspring is inversely correlated with diabetes duration before onset of proteinuria. Power curves for samples of case subjects with 17, 22, 25, and 27 years of diabetes duration before onset of proteinuria illustrate this point (Fig. 1B). Although not much efficiency is lost as duration increases from 17 to 22 years, substantial power loss does occur by 25 years of duration. By 27 years of duration, transmission of the risk allele is essentially 50%, resulting in basically no power at all. For the minor gene model (Fig. 1D), a similar pattern emerges (Fig. 1E), except that substantial power loss begins at 15 years of duration and accelerates until 17.5 years of duration, when power vanishes completely.
As duration reaches even higher levels, the risk allele is actually transmitted to affected offspring less frequently than the nonrisk allele. Amazingly, this deviation from expected transmission can reach statistical significance with samples of moderate size if a two-sided test is used. Figure 2 demonstrates this for the minor gene model by showing how power to detect excess transmission of the nonrisk allele using affected offspring trios with 20 years duration is only slightly lower than the power to detect excess transmission of the risk allele using affected offspring trios with 10 years duration. Therefore, a sample of case subjects with onset of proteinuria occurring after long duration of diabetes would misidentify the nonrisk allele as being causative. A combined sample of case subjects with short and long duration of diabetes may have basically no power because of the mixture of transmission rates.
|
We readily acknowledge that the findings for both types of trios depend in part on the characteristics of the genetic models evaluated, and we are by no means advocating universal duration-specific guidelines for defining ideal trios. Nevertheless, consistency of the basic trends for many permutations of disease allele frequency and mode of inheritance (results not shown) indicate that, as a general rule, affected offspring trio studies gain efficiency by selecting affected offspring with short duration before onset of complication. In contrast, unaffected offspring trio studies can often benefit by requiring the unaffected offspring to have long-duration diabetes. On a broader level, however, these results highlight the need to be cognizant of diabetes duration before onset of proteinuria or other late diabetic complications in family-based trio studies.
![]() |
EFFECT OF DURATION ON CASE-CONTROL STUDIES |
---|
Extrapolation of the findings from the previous section suggests that case-control studies should focus on short-duration case subjects and long-duration control subjects. In general, this turns out to be true for our illustrative models, but there is a slight twist related to the fact that case and control subjects are being considered simultaneously. Specifically, given an equal number of case and control subjects, power is not highly sensitive to duration among control subjects. For example, if all case subjects have 10 years of duration, then power under the major gene model is virtually unchanged regardless of whether control subjects have 8 or 22 years of duration (Fig. 3A). This follows because excessive carriage of the risk allele among the case subjects is driving power far more than deficient carriage among the control subjects. Because this excess carriage dissipates slowly with duration among case subjects for the major gene scenario, similar results are found even as case duration increases to 15 (Fig. 3B) or 17 (Fig. 3C) years. Similarly, in the minor gene scenario, control duration plays a very small role when 10-year-duration case subjects (Fig. 3D) are considered and only a slightly more pronounced role for 15-year-duration case subjects (Fig. 3E). What matters more in the minor gene case is the duration among case subjects, as illustrated by considering Fig. 3DF. Analogous to our findings from the family-based trio models, case subjects with duration longer than some model-dependent value will actually tend to carry nonrisk alleles in excess. For sufficiently long duration, this tendency can result in a higher, possibly significant, frequency of nonrisk alleles in case compared with control subjects.
|
![]() |
TREATMENT OF DURATION IN RECENT DIABETES ARTICLES |
---|
In terms of data quality, some studies reported duration at examination, some reported duration at onset of diabetic complications, and some did not comment on duration at all. Moreover, descriptions of duration estimation procedures were sparse. Many of the details concerning the assessment of diabetes onset (i.e., duration start date) and complication onset (i.e., duration end date) were omitted, making it difficult to gauge the precision of duration estimates.
Two basic analytic strategies were used to deal with duration. The first involved restricting entry of control subjects (but not case subjects) on the basis of duration. In most situations, this meant requiring control subjects to have long duration, but an alternative, choosing control subjects with case-matched duration, was also used. The second strategy involved incorporating duration information at the stage of analysis. In some studies, duration was treated as an independent variable. Variations included using duration as a stratification variable for case-control analysis (e.g., late-onset case versus long-duration control subjects and early-onset case versus short-duration control subjects) and using duration as an explanatory variable in logistic regression. Elsewhere, duration was treated as a dependent variable for testing whether certain genotypes corresponded to longer/shorter average duration. No studies attempted to use survival analysis to determine whether duration until complication varied among genotype groups.
Discussion of the possible impact of duration on results occurred primarily when inclusion of duration strengthened the significance of results. It was not uncommon for duration among case subjects to be longer than duration among control subjects, but this fact was not highlighted even when negative results could have been due in part to this aspect of the data.
![]() |
IMPLICATIONS FOR FUTURE STUDIES |
---|
First, more attention could be paid to the quality of duration information. It would be useful if authors reported summary measures of duration among case and control subjects as well as the protocols for obtaining this information. Relevant topics would include methods for ascertaining diabetes onset and complication onset as well as any duration-related exclusion/inclusion criteria. Such information would provide important context for accompanying results and conclusions and would facilitate more meaningful comparisons across studies.
Second, when reliable data are available, researchers could explore whether it is possible to use duration to improve power and/or reduce the potential for bias. At ascertainment, a reasonable, albeit imperfect, rule of thumb is to focus on early-onset case and long-duration control subjects. Admittedly, implementation of this simple idea is complicated by the fact that optimal duration cut-offs are dependent on unknown underlying models. Nevertheless, excluding at least some proportion of late-onset case and short-duration control subjects could have a dramatic impact. For instance, although the ideal affected offspring trios for our minor gene model would have no more than 12 years of duration before onset of proteinuria, misjudging this cut-off point by even 5 years would still result in exclusion of the most counterproductive trios, those in which the nonrisk allele is expected to be transmitted preferentially.
Flexible definitions could help alleviate some of the burden imposed by a restrictive ascertainment scheme. Normoalbuminuria after 15 years of diabetes, for instance, may be a sensible alternative to an entry criterion requiring 20 years without proteinuria (and it may also lessen the impact of mortality due to other complications). Moreover, a somewhat less restrictive duration cut-off for control subjects may be acceptable in case-control studies, because excess carriage of risk alleles among case subjects will likely be the primary determinant of power. When collecting short-duration case subjects, extra care should be taken to rule out kidney disease not related to diabetes.
Restricting ascertainment to early-onset case and long-duration control subjects is a simple but perhaps not optimal way to incorporate information on duration. This approach assumes that later-onset case subjects and control subjects with short-duration diabetes are dispensable, and this may not be so in all situations. In fact, there are plausible genetic models that fit the reported incidence data on proteinuria in which duration is irrelevant for either case or control subjects (although probably not both simultaneously). Moreover, the ability to address other important research areas, such as determining how genes interact with drug-based interventions to delay disease onset, may depend on availability of case subjects with late as well as early onset. Therefore, a more sensible approach may be to use analytic methods that are able to accommodate duration data. One easy alternative, stratification by duration group, can be carried out to either produce descriptive data or test for trends with increasing/decreasing duration. More sophisticated approaches include conditional logistic regression with duration as an independent variable or survival analysis. Mokliatchouk et al. (21) provided a detailed discussion of these two methods as they apply to both family-based trio studies and case-control studies. Major considerations in choosing an appropriate method include the type of duration data available (duration-until-onset versus duration-at-ascertainment) and the ascertainment scheme used (population-based versus trait-based sampling). An additional benefit of these statistical approaches is that other covariates such as sex, parental blood pressure, and level of glycemic control can be easily incorporated. Known genes can also be accommodated in a similar manner, as Mokliatchouk et al. (21) demonstrated using an example based on Alzheimers disease.
Our final comment pertains to interpreting results in a way that thoughtfully considers the potential impact of duration. For some studies, this may involve entertaining the idea that negative or positive results could be due in part to suboptimal duration profiles of study participants (e.g., duration being too long in case subjects).
![]() |
OTHER CONSIDERATIONS AND FUTURE RESEARCH |
---|
Future work could also address special considerations relevant to particular aspects of late diabetic complications. For example, the extremely high lifetime risk of proliferative retinopathy among those with type 1 diabetes (11) may increase the importance of long diabetes duration in control subjects for case-control studies. It would also be instructive to examine all of the above issues on each of the successive stages of any given complication. For diabetic nephropathy, this would range from the onset of microalbuminuria to progression from proteinuria to end-stage renal disease. Toward the later stages of disease, this would necessarily involve a detailed look into how increased mortality as a result of end-stage renal disease or cardiovascular disease could affect genotype distributions for genes involved in disease susceptibility as well as genes involved in modifying survival. Failure to appreciate such effects could result in false-positive or false-negative results. Finally, all of these topics must be reconsidered specifically in the context of type 2 diabetes, for which issues such as pinpointing diabetes onset will assume greater prominence.
In addition to incorporation of duration information, Morton and Collins (9) suggested several other ways to improve the efficiency of case and control subjects, and some of these may be relevant for late diabetic complications. For instance, there may be a benefit to choosing case subjects with a positive family history and control subjects with a heavy environmental load (e.g., very poor glycemic control). The Morton and Collins article also goes to great lengths to compare the relative efficiency of case-control and family-based trio studies (9). In general, our results echo their sentiment that case-control studies are more powerful (compare Fig. 1 with Fig. 3), but it would be interesting to pursue this point further in light of other factors, such as survival bias, which, if pertinent in parents, could partially determine which families are available for family-based trio analysis.
The coming years promise to be exciting ones in the field of genetics of diabetic complications. Many laboratories throughout the world are taking part in the search for susceptibility genes, and several large initiatives are currently under way to establish large data resources for genetic studies. These activities will provide a tremendous opportunity to improve our understanding of the genetic basis of kidney disease, heart disease, and eye disease among those with diabetes. Without proper attention to issues such as duration of diabetes, the full potential of these opportunities may not be realized.
![]() |
APPENDIX |
---|
![]() |
![]() |
By applying Bayes rule, the conditional probabilities pr(DD|xi) and pr(Dd|xi) can be expressed as functions of the known quantities pr(xi|DD), pr(xi|Dd), and pr(xi|dd). After applying the same argument to find E(T) and Var(T), we can then calculate E(U) = E(S) - E(T) and Var(U) = Var(S) + Var(T), and estimate the two-sided power for a significance level of test as:
![]() |
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Received for publication 25 September 2001 and accepted in revised form 31 January 2002.
![]() |
REFERENCES |
---|