Treatment comparisons in HIV infection: the benefits and limitations of observational cohort studies

Caroline A. Sabin,* and Andrew N. Phillips

Royal Free Centre for HIV Medicine, Department of Primary Care and Population Sciences, Royal Free and University College Medical School, Rowland Hill Street, London, UK

Background

Before the availability of highly active antiretroviral therapy, it was relatively easy to carry out randomized controlled trials (RCTs) of new antiretroviral (ARV) treatments for HIV infection that made use of clinical endpoints.14 As the rates of new AIDS-defining diseases and death were high, conclusions from RCTs could often be reached within a few years, a relatively short period compared with the many years currently required for such a trial. Only a limited number of ARV drugs were available, which had two major implications. First, specific treatment comparisons were fairly straightforward, and secondly, although patients did switch to new drugs,5 this occurred less frequently than today. However, as the number of ARV drugs grew, and patients were able to switch drugs more frequently, the results of long-term RCTs were sometimes thought to be irrelevant by the time they were published. For this reason, and because of a desire to rapidly evaluate new drugs, there was a demand for trials to be carried out in a more timely manner.

RCTs that used short-term surrogate marker responses to draw conclusions about the likely long-term clinical benefits of treatment were proposed as a possible solution to this problem. This type of trial had been carried out in other disease areas6,7 and so was not a new concept. Initially, the CD4 count, being the best prognostic marker available, was evaluated as a possible surrogate marker.8 Since the mid-1990s, however, the availability of routine viral-load testing has meant that RCTs utilizing changes in plasma HIV RNA levels as endpoints are now the norm.9,10 Currently, in the USA, new ARV drugs can receive accelerated approval on the basis of 16 week RNA data, and full approval on 48 week data. However, problems remain—viral rebound can occur some years after starting treatment in patients with suppressed viral load and the long-term toxicities of some of these drugs are now becoming apparent. Therefore, there is again a need for long-term RCTs to address these issues.

Whether clinical or surrogate endpoints are required, most clinical questions of relevance are now almost always centred on treatment strategies (which may involve more than one change of treatment) rather than on specific treatment comparisons, and the development of drug toxicities is a key issue. Many questions relating to general treatment strategies (e.g. sequencing of drugs) and long-term toxicities cannot be addressed using 48 week data. However, long-term RCTs are fraught with problems that should be acknowledged. In addition to the fact that the results may not be relevant by the time the trial has finished, the longer period of follow-up required means that measurements of laboratory markers may change. As with RNA levels, new surrogate markers may become available over the course of a trial, which may or may not be measurable in stored samples. In addition, RCTs, whether short- or long-term, suffer from a number of well-documented problems. Patients recruited to RCTs are almost always unrepresentative of the whole patient population,11,12 with patients often being restricted in terms of their CD4 count, HIV RNA level and/or previous ARV drug experience at entry to the trial (see, for example, refs 9 and 10). Patients are often more motivated and, thus, more adherent to therapy than their routine clinical counterparts.13 In addition, trial follow-up is usually more frequent than that of patients in a routine clinical setting. Thus, there are a number of reasons why the results from RCTs may not generalize to the majority of the patient population.

Given these limitations, the question arises as to whether non-randomized, i.e. observational, follow-up studies can play a potentially useful additional role in making treatment comparisons. There are many large observational cohort studies of HIV-infected individuals that have been following patients, in some cases since the early years of the epidemic. Patients in these studies receive ARV therapy according to protocols in place at their treatment centres and tend to be less selected. Thus, they appear to offer such an opportunity. We now discuss the benefits and limitations of such an approach.

Observational cohort studies in HIV infection are usually one of two types: traditional epidemiological cohorts and observational clinical cohorts or databases. Patients recruited to the former type of cohort visit a study centre on a regular basis (e.g. every 6 months). At each visit, patients are examined for clinical signs of progression, CD4 counts and HIV RNA levels are measured, and information on clinical events and ARV history over the previous 6 month period is recorded. Treatment is provided at the individual's own treatment centre, separate from the site where they are seen as part of the study. This type of cohort is epitomized by some of the larger cohorts in the USA, such as the Multicenter AIDS Cohort Study14 and others.15,16 In contrast, observational clinical cohorts follow patients in their own clinical centre, and data are usually obtained directly from a clinic database or patient records. In some instances, data collection may be retrospective (again, usually every 6 months) from a remote site, but all relevant information recorded on the patient over the previous period is collected. Thus, data values (especially CD4 counts and RNA levels) are often measured more frequently but may also be measured more sporadically than in traditional epidemiological cohorts, as information will be recorded as and when a patient attends the clinic, rather than at regular study visits. This type of cohort is perhaps more common in Europe.1720

Patients recruited to an observational clinical cohort study are usually representative of the clinic population from which they are drawn (indeed, often the cohort contains the entire clinic population). Patients generally have a level of adherence to treatment that is consistent with patients at that centre, and, as a result, information from such studies is thought to reflect more accurately what happens in practice. Using data from such cohorts it is possible to make comparisons of outcomes for many different treatments; theoretically, the number of such comparisons is only limited by the number of patients receiving each combination of specific drugs or drug classes in the cohort. For observational clinical cohorts, long-term follow-up is generally easier to maintain, provided patients remain at their regular clinic or attempts are made to obtain follow-up information on patients who transfer their care elsewhere. It is also often perceived that decisions about treatment effectiveness can be obtained much more quickly using observational cohort data than from an RCT.

Treatment comparisons in observational studies

Treatment comparisons in observational studies may be made by comparing those starting the treatment strategies/ regimens of interest, to those starting a comparative treatment strategy/regimen. Munoz et al.21 describe these comparisons as measures of ‘individual effectiveness’. For simplicity, we will consider the situation where we wish to compare two different treatment regimens, A and B. Patients are followed from some specified baseline date (often the date of starting the regimens A and B) until some endpoint, which may be either a clinical or a surrogate endpoint. The groups are then compared in terms of the time to, or presence of, this endpoint. This comparison is unbiased if the groups have the same prognosis at baseline and are treated similarly during follow-up (except for differences that are a direct consequence of whether the patient started regimen A or B). However, if there are systematic differences between the groups at the time of starting the regimen in terms of how they are expected to respond over time, then any comparison will be biased.

In an RCT, we would expect that the treatment groups would have a similar prognosis, on average, at the start of the study; this is one of the benefits of randomization. However, this may not be the case for observational studies; patient prognosis may differ between the groups and this will usually depend on the reasons why some patients were treated with regimen A and others with regimen B. For example, patients receiving early formulations of protease inhibitors (PIs) were at advanced stages of disease with a poor prognosis, whereas those receiving newer PIs are at less advanced stages, with better prognosis. Thus, any comparison of different types of PI would probably appear to show that these early PIs were less effective than the newer ones; this may indeed be true, but it may also reflect the discrepancies between the two groups at baseline.

The statistical approach to removing these differences involves adjusting for known prognostic factors at baseline. Assuming that we can capture all of the differences between the groups at baseline, then the resulting treatment comparison should be unbiased. However, it may not be possible to fully adjust for all factors, possibly because the measures we currently have available are not sufficiently sensitive. In this case, we would like to know whether the residual bias, after adjusting for the known factors, is large enough to give rise to seriously misleading conclusions. For example, it is very difficult to measure adherence. If one of the regimens in the treatment comparison is known to be difficult to adhere to, patients thought by their clinicians to have good adherence may be selectively chosen to receive this regimen. Thus, it may be hard to assess whether any difference in outcome between the treatment groups is genuine or can be explained simply by poorer adherence generally in the group receiving the less difficult regimen.

In a recent study, Phillips et al.22 considered whether the results from a number of major RCTs of HIV-infected individuals could be duplicated by three large observational cohort studies. For two of the three treatment comparisons considered the results were highly consistent between the trials and the observational studies. However, when considering the comparison of indinavir + zidovudine + lamivudine versus zidovudine + lamivudine, the results were inconsistent. The largest observational study gave statistically significant results that were in the opposite direction to those both from the other two observational studies and from the trial. In this case, it was felt that the regression models had not been able to adjust fully for the large differences in prognostic variables between the two treatment groups at baseline.

Unfortunately, for many treatment comparisons, results from an RCT may not be available. The results outlined above raise some concern about whether we can ‘believe’ the results on individual effectiveness from observational cohort studies. Munoz et al.21 argue that the real possibility of residual confounding means that observational studies are unlikely to mimic clinical trial results fully, and that such studies could be used to greater benefit by concentrating on measures of the effectiveness of therapies at the population level (‘population effectiveness’), which would then complement the results from clinical trials. However, the pressure to use observational study data to address individual effectiveness issues is likely to increase in the future, and thus some guidance on whether or when this is appropriate would be helpful.

When can we use observational cohort studies for treatment decisions?

Before considering whether observational cohort studies can be used to address questions relating to the treatment of HIV infection, two points should be noted. First, observational cohort studies will be of limited use if the strategy or treatment to be tested is not currently in routine clinical use (for example, it would not be possible to compare the effects of intermittent ‘pulse’ therapy versus continual therapy on the development of drug toxicities in an observational cohort study, as the intermittent strategy is not routinely used). A second, and related, point is that whilst observational studies are often perceived to be quicker than RCTs, this will only be the case if the data are already available, and patients can easily be identified from databases (for example, comparisons of scheduled treatment interruptions are sometimes difficult to perform as it may be hard to identify from a database whether a patient stopped his/her drug for short periods of time, and the reasons for these stoppages). In these cases, an observational study will have to be set up prospectively and will take the same amount of time to accrue evidence as an RCT, and in fact may even be slower, as treatment strategies may not become routine until positive results have been reported from initial RCTs.

Observational studies can provide useful additional information to that provided by an RCT. For example, when trying to study the effects of early versus late therapy on clinical events, RCTs may be extremely difficult to carry out, owing to the number of patients needed, the lack of equipoise in the view of clinicians and patients and the fact that the ‘delay time’ (the difference in the actual times of starting therapy in the early and deferred arms) may be small. This may be thought of as an ideal opportunity to use data from an observational cohort study: patients with a given CD4 count on a particular calendar date could be stratified according to whether they had or had not already received ARV treatment (early and late starters, respectively) and subsequent event rates compared in the two groups. If such a study were run in parallel with an RCT, while obviously susceptible to bias because of the non-randomized design, it would provide a useful piece of evidence in addition to the trial, because it may well have a much greater ‘delay time’ between the early and deferred groups. One point to note, however, which will apply equally to both RCTs and observational studies, is that patients treated later, especially over the last 10 years of the epidemic, are more likely to have received more intensive and more effective combination therapies than those treated earlier. Thus, any treatment comparison is unlikely to address the rather abstract question of early versus late therapy given a fixed set of drugs available throughout the trial, but is more likely to address the question of early therapy starting with the drugs of today versus later therapy starting with the drugs of tomorrow. For example, had such an RCT or observational study been performed over the previous 10 years, the treatment comparison made would essentially be monotherapy with a relatively inactive drug versus later combination therapy including PIs or non-nucleoside reverse transcriptase inhibitors (NNRTIs). However, the use of more advanced treatment combinations is an almost guaranteed consequence of later treatment and thus, the two issues cannot be divorced.

However, given that there may be some comparisons where data seem to be readily available from observational cohort studies, under what circumstances can we believe the results? Phillips et al.22 concluded that observational studies could provide useful insights if the results from several databases were considered jointly, and if imbalances between the prognostic variables were either small or acted in different directions. If the confounding goes in the same direction in each study, then it is unreasonable to expect the statistical methods to be able to adjust fully for this. However, even if this is the case, it is likely that substantial bias can only be ruled out if the treatment effect of interest is large. In practice, given that currently most novel treatment strategies may be expected to reduce the event rate by small amounts and that, more frequently, especially when considering drug-sparing regimens, we may be interested in treatment equivalence rather than difference, observational cohort studies may only allow us to rule out large differences between the treatments.

Conclusions

Long-term studies, whether RCTs or observational cohort studies, are essential for obtaining information on clinical events and toxicities in HIV infection, as well as for obtaining information on long-term surrogate marker data. RCTs are the gold standard for making treatment comparisons, and should be carried out where at all possible. However, this may not always be feasible, for either ethical or practical reasons, and few RCTs are powered for clinical endpoints. Even when trials are underway, there is often a desire to obtain additional information on treatment comparisons from a population that is more representative of the general clinic population. Thus, observational cohort studies are increasingly being called upon to make comparisons of specific treatment combinations or strategies. Whilst the analyses appear to be relatively straightforward to perform, the difficulty comes when trying to correctly interpret the results obtained.

Clearly, the comments made in this paper may also be relevant to other disease areas. Currently, HIV is somewhat unique in that the last few years have seen the rapid development of large numbers of new drugs. Thus, whilst the potential number of different treatment combinations has increased dramatically, the rapidity of these changes has meant that the efficacy of most of these different combinations has not been formally tested in RCTs. Although there are other disease areas where treatments are used in combination, and the number of available drugs is reasonably large (e.g. tuberculosis), the introduction of these drugs has generally been more gradual. Thus, the efficacy of different treatment strategies has often been assessed using evidence from RCTs. It is likely that in the future we may well see the rapid development of new drugs for other conditions (e.g. hepatitis C virus infection). There may be a similar call to use observational data (where they exist) to assess the efficacy of these new treatments and therefore, in these situations, the comments made in this paper will be relevant.

Notes

* Correspondence address. Department of Primary Care and Population Sciences, Royal Free and University College Medical School, Rowland Hill Street, London NW3 2PF, UK. Tel: +44-20-7830-2239 ext. 4752; Fax: +44-20-7794-1224; E-mail: c.sabin{at}pcps.ucl.ac.uk Back

References

1 . Fischl, M. A., Richman, D. D., Grieco, M. H., Gottlieb, M. S., Volberding, P. A., Laskin, O. L. et al. (1987). The efficacy of azidothymidine (AZT) in the treatment of patients with AIDS and AIDS-related complex. New England Journal of Medicine 317, 185– 91.[Abstract]

2 . Volberding, P. A., Lagakos, S. W., Koch, M. A., Pettinelli, C., Myers, M. W., Booth, D. K. et al. (1990). Zidovudine in asymptomatic human immunodeficiency virus infection. New England Journal of Medicine 322, 941–9.[Abstract]

3 . Concorde Coordinating Committee. (1994). Concorde: MRC/ ANRS randomised double-blind controlled trial of immediate and deferred zidovudine in symptom-free HIV infection. Lancet 343, 871–81.[ISI][Medline]

4 . Delta Coordinating Committee. (1996). Delta: a randomised double-blind controlled trial comparing combinations of zidovudine plus didanosine or zalcitabine with zidovudine alone in HIV-infected individuals. Lancet 348, 283–91.[ISI][Medline]

5 . White, I. R., Walker, S., Babiker, A. G. & Darbyshire, J. H. (1997). Impact of treatment changes on the interpretation of the Concorde trial. AIDS 11, 999–1006.[ISI][Medline]

6 . Wittes, J., Lakatos, E. & Probstfield, J. (1989). Surrogate endpoints in clinical trials: cardiovascular diseases. Statistics in Medicine 8, 415–25.[ISI][Medline]

7 . Hillis, A. & Seigel, D. (1989). Surrogate endpoints in clinical trials: ophthalmologic disorders. Statistics in Medicine 8, 427–30.[ISI][Medline]

8 . Lin, D. Y., Fischl, M. A. & Schoenfeld, D. A. (1993). Evaluating the role of CD4-lymphocyte counts as surrogate endpoints in human immunodeficiency virus clinical trials. Statistics in Medicine 12, 835–42.[ISI][Medline]

9 . Cameron, D. W., Japour, A. J., Xu, Y., Hsu, A., Mellors, J., Farthing, C. et al. (1998). Ritonavir and saquinavir combination therapy for the treatment of HIV infection. AIDS 13, 213–24[ISI]

10 . Hirsch, M., Steigbigel, R., Staszewski, S., Mellors, J., Scerpella, E., Hirschel, B. et al. (1999). A randomized, controlled trial of indinavir, zidovudine, and lamivudine in adults with advanced human immunodeficiency virus type 1 infection and prior antiretroviral therapy. Journal of Infectious Diseases 180, 659–65.[ISI][Medline]

11 . Moore, D. A. J., Goodall, R. L., Ives, N. J., Hooker, M., Gazzard, B. G. & Easterbrook, P. J. (2000). How generalizable are the results of large randomized controlled trials of antiretroviral therapy? HIV Medicine 1, 149–54.[Medline]

12 . Madge, S., Mocroft, A., Wilson, D., Youle, M., Lipman, M. C. I., Tyrer, M. et al. (2000). Participation in clinical studies amongst patients with HIV 1 in a single treatment centre over 12 years. HIV Medicine 1, 212–8.[Medline]

13 . Lovato, L. C., Hill, K., Hertert, S., Hunninghake, D. B. & Probstfield, J. I. (1997). Recruitment for controlled clinical trials: literature summary and annotated bibliography. Controlled Clinical Trials 18, 328–52.[ISI][Medline]

14 . Kaslow, R. A., Ostrow, D. G., Detels, R., Phair, J. P., Polk, B. F. & Rinaldo, C. R., Jr (1987). The Multicenter AIDS Cohort Study: rationale, organization, and selected characteristics of the participants. American Journal of Epidemiology 126, 310–8.[Abstract]

15 . Lang, W., Anderson, R. E., Perkins, H., Grant, R. M., Lyman, D., Winkelstein, W. et al. (1987). Clinical, immunologic, and serologic findings in men at risk for acquired immunodeficiency syndrome. The San Francisco Men's Health Study. Journal of the American Medical Association 257, 326–30.[Abstract]

16 . Barkan, S. E., Melnick, S. L., Preston-Martin, S., Weber, K., Kalish, L. A., Miotti, P. et al. (1998). The Women's Interagency HIV Study. WIHS Collaborative Study Group. Epidemiology 9, 117–25.[ISI][Medline]

17 . Lundgren, J. D., Phillips, A. N., Vella, S., Katlama, C., Ledergerber, B., Johnson, A. M. et al. (1997). Regional differences in the use of antiretrovirals and primary prophylaxis in 3122 European HIV-infected patients. Journal of Acquired Immune Deficiency Syndromes 16, 153–60.[ISI]

18 . Tassie, J.-M., Gasnault, J., Bentata, M., Deloumeaux, J., Boué, F., Billaud, E. et al. (1999). Survival improvement of AIDS-related progressive multifocal leukoencephalopathy in the era of protease inhibitors. AIDS 13, 1881–7.[ISI][Medline]

19 . Egger, M., Hirschel, B., Francioli, P., Sudre, P., Wirz, M., Flepp, M. et al. (1997). Impact of new antiretroviral combination therapies in HIV infected patients in Switzerland: prospective multicentre study. British Medical Journal 315, 1194–9.[Abstract/Free Full Text]

20 . Staszewski, S., Miller, V., Sabin, C., Schlecht, C., Gute, P., Stamm, S. et al. (1999). Determinants of sustainable CD4 lymphocyte count increases in response to antiretroviral therapy. AIDS 13, 951–6.[ISI][Medline]

21 . Munoz, A., Gange, S. J. & Jacobson, L. P. (2000). Distinguishing efficacy, individual effectiveness and population effectiveness of therapies. AIDS 14, 754–6.[ISI][Medline]

22 . Phillips, A. N., Grabar, S., Tassie, J.-M., Costagliola, D., Lundgren, J. D. & Egger, M. (1999). Use of observational databases to evaluate the effectiveness of antiretroviral therapy for HIV infection: comparison of cohort studies with randomized trials. AIDS 13, 2075–82.[ISI][Medline]





This Article
Extract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (6)
Disclaimer
Request Permissions
Google Scholar
Articles by Sabin, C. A.
Articles by Phillips, A. N.
PubMed
PubMed Citation
Articles by Sabin, C. A.
Articles by Phillips, A. N.