1 Biostatistics Section, Division of Clinical Pharmacology, Thomas Jefferson University, Philadelphia, PA.
2 Department of Biostatistics, Harvard School of Public Health, Boston, MA.
3 Department of Psychiatry, Harvard Medical School and Massachusetts General Hospital, Boston, MA.
4 Department of Epidemiology, Harvard School of Public Health, Boston, MA.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
data interpretation; statistical; depression; epidemiologic methods; longitudinal studies; multivariate analysis
Abbreviations: CI, confidence interval; DIS, Diagnostic Interview Schedule; DPAX, DePression and AnXiety schedule; GEE, generalized estimating equations; OR, odds ratio
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
With multiple-source data, there are two broad types of research aims: risk factor and agreement questions. Examples of risk factor questions are whether the level of the outcome of interest varies by source and/or by time and what is the effect of suspected risk factors. Although risk factor results may be similar across sources, this may or may not represent good source agreement; for example, sources may or may not be classifying the same persons as positive. Assessing the agreement between instruments and their stability over time, as well as the effect of factors that may be influencing these associations, is an example of association questions.
In the simple case of just two sources, the outcome data can be displayed in a 2 x 2 cross-classification table (table 1). Risk factor questions can be answered by analyzing the marginal distributions of the outcomes (i.e., a + b, c + d, a + c, and b + d; table 1). Traditionally, such risk factor analyses have been conducted after combining the outcomes into a single diagnosis (pooling) or performing separate analyses for each outcomein both cases, using univariate analytical methods (i.e., methods for a single outcome).
|
Although the risk factor questions can be answered via the analysis of the outcomes' marginal distributions, answers to association or agreement questions rely on the analysis of the outcomes' joint distribution (i.e., the cell counts a, b, c, and d; table 1). Agreement analyses are typically conducted separately from the risk factor analyses of the same data. In psychiatric epidemiology, agreement analyses often use the kappa statistic. Effects of covariates on it are investigated via stratification, but this approach has severe limitations because it quickly encounters sparse-data problems.
In this paper, we present a new multivariate regression approach for the simultaneous analysis of multiple categorical outcomes. This approach allows us to model risk factor and association effects within a single framework, and it appropriately uses all available data (even from subjects who may have some missing outcomes). We apply this method to analyze data from a study of depression, involving a group of subjects who were assessed twice in the 1990s, each time with two different instruments.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
As the study has grown in size, the fieldwork has taken longer to complete. To have the majority of assessments closer in time, 631 subjects who were first interviewed in 1991 (the first year of the follow-up fieldwork) were sought for reinterview later on. Reinterviews were actually completed for 476 subjects (75 percent), on the average 3 years after the initial interview. A reinterview was not obtained for 155 subjects: 51 had died, 22 were not reassigned (too old or in nursing homes), 21 were unavailable or unable to complete the interview, and 61 refused. The subjects' assessments during the initial interview and their subsequent reinterview are the focus of this paper.
Depression assessments
The outcome of interest in this paper is depression. During both interviews, depression was assessed with two different diagnostic schedules, the DePression and AnXiety schedule (DPAX) and the Diagnostic Interview Schedule (DIS). The DPAX schedule was developed for the Stirling County Study and focuses on depression and anxiety, while the DIS was developed for the Epidemiologic Catchment Area Study and directly applies the criteria of the Diagnostic and Statistical Manual of Mental Disorders, Third Edition (13, 14
). In our study, both schedules were administered by lay interviewers during structured interviews in the subjects' homes.
Although the schedules are similar in that their depression components involve questions regarding essential features, associated symptoms, and duration or timing, they do have some important differences (11). The DIS covers all the criteria for the associated symptoms of depression as listed in the Diagnostic and Statistical Manual of Mental Disorders, Third Edition, while the DPAX covers only a portion of them. On the other hand, the DPAX takes into account impairment in everyday functioning as in the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (15
), while the DIS does not. The DPAX interview tends to identify both chronic and acute depression; thus, for compatibility, the DIS definition of depression in this paper includes both major depression and dysthymia. We report results on current or prevalent depression, that is, cases that meet criteria during the month preceding the interview.
Statistical methods and analyses
Building on previous work (1618
), we have developed a multivariate (i.e., multiple-outcomes) logistic regression approach to analyze all four observations per subject simultaneously (Daskalakis and Laird, Harvard University, unpublished manuscript, 2000), modeling the marginal distribution and the association of the outcomes within a single framework. To clarify this approach, consider that each subject contributes four separate observations, one for each assessment (schedule-by-timepoint combination). Observations of the same subject are distinguished from each other by a schedule and a time variable. Although time could have been used as a continuous variable (i.e., the exact time lag between interviews for each subject), we report results with time only as a dichotomous variable (i.e., first or second interview), because the time between interviews was similar for most subjects. Each observation also includes subject-specific covariates (study cohort, sex, age, and education), as well as the interviewer's sex (which could be different for the two interviews or timepoints of the same subject).
We modeled prevalent depression via logistic regression that included all of the above variables. We also considered interactions for schedule by time, schedule by risk factor, and time by risk factor, as well as covariate interactions that involved the subject's sex or age or the interviewer's sex. We illustrate the regression equation with a single covariate, subject's sex:
![]() | (1) |
The logistic regression model can answer questions regarding the prevalence of depression for each schedule and timepoint and how it is affected by the risk factors. It is referred to as the marginal model because it analyzes the outcomes' marginal distribution (in a sense, reflecting separate but simultaneous regressions for each outcome). Fitting this model via ordinary logistic regression yields valid parameter estimates but incorrect estimated standard errors, because the observations that are contributed by each subject are correlated.
One solution to this problem is to use a robust variance estimator via the generalized estimating equations (GEE) approach (17). The GEE approach has the advantage that only a "working" correlation matrix between the outcomes needs to be specified rather than the full joint distribution of the outcomes. Furthermore, even if this association is misspecified, the results for the marginal model are still valid. However, the association between the outcomes is not easily modeled within the GEE framework. In this paper, we considered risk factor and association questions of similar importance. For this reason, we used a different approach, maximum likelihood estimation, which requires more work on the specification of the full multivariate likelihood but is more flexible in incorporating both risk factor and association modeling in the same framework (Daskalakis and Laird, Harvard University, unpublished manuscript, 2000).
Association between dichotomous outcomes can be measured with a number of different measures, including various types of odds ratios, the phi (correlation) coefficient, and the kappa statistic. In our analyses, we used odds ratios of association. To illustrate the difference between these odds ratios and those of the marginal model of equation 1, consider the 2 x 2 cross-classification table of the DPAX and the DIS at the first interview (table 1), where source A is DPAX1 and source B is DIS1. Note that the marginal proportions (c + d)/n and (b + d)/n are the depression prevalences for DPAX1 (DPAX1) and DIS1 (
DIS1), respectively. Thus, the marginal odds ratio given by
![]() |
![]() |
![]() | (2) |
With four outcomes, there are six possible pairwise associations. Two of them pertain to the association between the two schedules, when used at the same timepoint (i.e., DPAX1/DIS1 and DPAX2/DIS2). Two others pertain to the association between the two schedules when used at different times (i.e., DPAX1/DIS2 and DIS1/DPAX2). Conceptually, one expects the former to be somewhat stronger than the latter. The final two pairwise associations refer to the stability of a single schedule over time (i.e., DPAX1/DPAX2 and DIS1/DIS2). Following equation 2, each of these six odds ratios can be modeled separately as functions of covariates. With appropriate parameterization, we can then assess the similarity of the association odds ratios, as well as the similarity of a covariate's effect on each of those association odds ratios. Thus, this association model can answer questions regarding the association among the different outcomes and the effect of covariates on it.
The multivariate (i.e., multiple-outcomes) regression approach is well suited to handle missing data. In our analyses, we included the 476 subjects who contributed four assessments each (DPAX and DIS at both first and second interviews), as well as the 155 subjects who contributed only the two assessments at the first interview (for a total of 2,214 assessments). Both the complete-case analysis (using only the 476 subjects with complete data) and the GEE approach (using all available data) would be valid only under the "missing completely at random" assumption; that is, the probability that the second interview is missing may depend only on covariates and not on depression outcomes at either timepoint (19). However, assuming that the model is correctly specified and that relevant covariates are included, the likelihood-based approach is valid under the broader "missing at random" situation; that is, missingness may depend on covariates and depression outcomes at the first timepoint but again not on the values of the missing outcomes of the second timepoint (19
).
We wrote a general SAS (PROC IML) (SAS Institute, Inc., Cary, North Carolina) macro that fits the multivariate logistic regression via maximum likelihood, incorporating a version of the expectation-maximization algorithm to handle subjects with missing outcomes. In our final model, we included all main effects regardless of their statistical significance to improve the validity of the model specification (a necessary requirement in the presence of missing data), the control of confounding, and the resulting fit of the estimated depression prevalences within various subgroups. Interaction terms were retained only if statistically significant (at = 0.05). Model selection and testing (p values) were based on the likelihood ratio test, and Wald-type confidence intervals were constructed for the odds ratios.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Table 4 shows estimated odds ratios and 95 percent confidence intervals from the final marginal model for current depression. Strictly speaking, of course, the schedule-by-sex interaction term is not really an odds ratio because it represents a ratio of odds ratios. In analyzing current depression, we identified a significant interaction between schedule and the subject's sex (p = 0.021). This implies that the effect of schedule depends on sex and that the effect of sex depends on schedule. Specifically, among men, the DIS identified significantly fewer depressed subjects than did the DPAX (odds ratio (OR) = 0.55, 95 percent confidence interval (CI): 0.35, 0.87), but among women, there was no difference between the two schedules (OR = 1.10, 95 percent CI: 0.76, 1.58). Consequently, the sex ratios were different for the two instruments. With the DPAX, women had a slightly but nonsignificantly lower prevalence than did men (OR = 0.79, 95 percent CI: 0.44, 1.44). With the DIS, on the other hand, women were more likely to be depressed than were men (OR = 1.57, 95 percent CI: 0.83, 2.98), but the difference was again not statistically significant.
|
We also found that subjects with a lower educational level were twice as likely to be depressed compared with subjects with a higher education (OR = 2.14, 95 percent CI: 1.06, 4.35). On the other hand, time and the subject's age at the first interview were not associated with depression. Following standard modeling recommendations (20), we also investigated age as a discrete variable, with no material change in the results. Finally, we found no effect of the study cohort or the interviewer's sex, and no interactions involving the latter were significant.
In addition to the risk factor results discussed above, our approach allowed us to investigate the association between the outcomes. Table 5 shows the results from the final association model. We found the association between the DPAX and the DIS at the first and second interviews (i.e., DPAX1/DIS1 and DPAX2/DIS2) to be similar. Furthermore, these same-time odds ratios of association between the two schedules were not significantly different from the cross-time odds ratios (i.e., DPAX1/DIS2 and DIS1/DPAX2), possibly because of the very short time elapsed between the two interviews. Therefore, we estimated a single odds ratio of association between the DPAX and the DIS that does not vary over time and also applies to measurements taken at different times (OR = 6.0, 95 percent CI: 3.8, 9.6). Regarding the association of the schedules over time, we found that the stability of the DPAX over time (i.e., DPAX1/DPAX2: OR = 14.2, 95 percent CI: 5.3, 38.4) was significantly higher than the stability of the DIS over time (i.e., DIS1/DIS2: OR = 2.9, 95 percent CI: 0.8, 10.0) (p value for their difference = 0.041). None of these association odds ratios was found to be significantly affected by any covariates.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The multivariate logistic regression approach can have broad application beyond psychiatric or behavioral-social epidemiology, with outcomes obtained from multiple sources (instruments, informants, methods) and/or timepoints. In this paper, we have shown how both can be handled by the method. Research is also currently under way to extend the methodology to "two-stage designs," which involve a brief screening instrument for all subjects at the first stage and a more elaborate diagnostic procedure for a small subset of them at the second stage.
The main disadvantage of our approach is that it is not implemented in standard software. However, we have written a general SAS (PROC IML) macro with accompanying documentation that is available upon request (c_daskalakis{at}lac.jci.tju.edu). Additional details on this approach and related multiple-informant research can be found on the Harvard Multiple Informant Web site (http://www.biostat.harvard.edu/multinform).
Our substantive results should be evaluated in the context of previous reports from both the Stirling County Study and other epidemiologic investigations of depression. For example, the inverse relation between education and the prevalence of depression in our analyses is consistent with previous reports regarding the effect of socioeconomic indices (2123
).
Gender differences regarding depression are a topic that has drawn much attention in psychiatric epidemiology (1, 5
, 24
28
). Recent studies have usually suggested a 2:1 female:male ratio. Where the DIS is concerned, our findings pointed in the same direction but not to the same magnitude. However, our estimated relative risk of 1.57 (women compared with men) was adjusted for other risk factors and is similar to the 1.37 reported by Blazer et al. (24
) who also controlled for a number of similar variables. In contrast, our DPAX results indicated that men and women did not have significantly different prevalence. This fits with other Stirling County reports from 1952 onward, with the recent exception of the 1992 cross-sectional sample (10
, 12
). Although the overall current prevalence of depression remained stable at about 5 percent throughout the years, a redistribution by age and gender occurred in the 1992 sample due to an increased rate among women under 45 years of age (10
). This change may account for a significant gender difference in the DPAX results for the 1992 sample, but it was absent in the results reported here, reflecting the fact that this sample did not include younger women.
As suggested above, we found that the schedule-by-gender interaction was significant, illustrated by the fact that prevalence by the DPAX and the DIS was similar for women, while the DPAX yielded an estimate for men that was almost double that of the DIS. A similar pattern was present among older adults of the Stirling County's most recent cross-sectional sample (DPAX and DIS prevalences of 6.3 and 6.4 percent among women and 5.3 and 2.7 percent among men) (10). Thus, the evidence of the schedule-by-gender interaction was consistent in both the cohort survivors and the cross-sectional sample when the focus was on the same age range, but the methods of the present analysis gave it explicit recognition.
Our analyses confirmed the low overall agreement between the DPAX and DIS, already reported for the 1992 cross-sectional sample (11). Examining disagreements between different data collection methods can be useful to the process of improving interview schedules. Although the earlier report found the DPAX-DIS disagreement to be higher among subjects with lower education (11
), our present analyses did not confirm that result. This may be due to the older age distribution of our study compared with the 1992 cross-sectional sample. In fact, when analysis of the 1992 sample was restricted to older subjects, the association between education and the DPAX-DIS agreement was attenuated (unpublished results).
Our results indicate that the DPAX may be more stable over time than the DIS. This suggests that the DPAX may be more sensitive to stability of chronic disorders than the DIS, even though the DIS definition of depression used in our analyses combined the episodic and chronic forms. The DIS emphasizes the episodic nature of depression by asking respondents to remember specific periods of their lives when they experienced specified durations of depressed feelings (5). The DPAX, however, is oriented toward the current state of depression and its onset according to the respondent's own assessment (9
). Evidence from longer follow-ups in the Stirling County Study has pointed to chronicity as a prominent feature of depression (8
, 26
), and our results suggest that the schedules may differ in their ability to capture this aspect of the disorder.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Arthur Sobol for data preparation, Dr. Leighton as the instigator of the Stirling County Study, and both Drs. Leighton and Monson for longstanding consultations.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|