Estimating the number of Cubans infected sexually by human immunodeficiency virus using contact tracing data

Ying-Hen Hsieha, Hector de Arazozab, Shen-Ming Leec and Cathy WS Chenc

a Department of Applied Mathematics, National Chung-Hsing University, Taichung, Taiwan.
b Departamento Ecuaciones Diferenciales, Facultad Mathematica-Computacion, Universidad de la Habana, San Lazaro y L Habana 4, Cuba.
c Department of Statistics, Feng-Chia University, Taichung, Taiwan.

Ying-Hen Hsieh, Department of Applied Mathematics, National Chung-Hsing University, Taichung, Taiwan 402. E-mail: yhhsieh{at}dragon.nchu.edu.tw


    Abstract
 Top
 Abstract
 Introduction
 Data and Methods
 Results
 Discussion and Concluding...
 Appendix
 References
 
Background To estimate the yearly number of people in Cuba who are living with human immunodeficiency virus (HIV) and were infected through sexual contact but who have not developed acquired immunodeficiency syndrome (AIDS). Estimation was made directly from the yearly HIV seroprevalence data of the Cuban Partner Notification Programme from 1991 to 2000.

Methods The generalized removal model for open populations is utilized for the estimation. The total number of known HIV-infected Cubans at each sampling time is used in the prior to provide more reasonable approximations.

Results We estimated a yearly survival rate of 93%. The median estimates for the number of all living asymptomatic HIV-positive Cubans, infected by sexual contact, tripled from 714 in 1991 to 2170 in 2000. The number of unknown HIV-positive Cubans infected sexually is estimated to range from 174 in 1991 to 401 in 2000.

Conclusions A consistent increase in the number of sexually infected HIV-positive individuals in Cuba from 1991 to 2000 is evident from the estimates. From 1996 onwards more sexually active homosexual/bisexual contacts were traced and consequently more sexually-infected HIV-positives were detected. A consequence of increased detection is the levelling off and subsequent decrease in the number of unknown HIV-positives during this time period. The estimation procedure is useful in estimating prevalent population sizes of epidemiological and public health interest.

Keywords Cuba, epidemiology, HIV/AIDS, Latin America, sexual contact, Bayes statistics, contact tracing

Accepted 11 January 2002


    Introduction
 Top
 Abstract
 Introduction
 Data and Methods
 Results
 Discussion and Concluding...
 Appendix
 References
 
In December 2000, the United Nations AIDS Programme (UNAIDS) and the World Health Organization (WHO) Report on the Global HIV/AIDS Epidemic1 gave an estimate of 36.1 million people living with human immunodeficiency virus (HIV) worldwide, of which 390 000 live in the Caribbean region. Given the large number of HIV infections in the region (2.3% adult prevalence rate), it is remarkable that Cuba, with a population of 11 million, has had only 3230 cases of HIV-seropositivity by the end of 2000. Prominent among the many reasons for the low prevalence is the National Programme on HIV/AIDS established by the Cuban government in 1983.2,3 Initially the programme consisted mainly of testing blood donations and a hospital surveillance system to detect AIDS-related illness. The first HIV-seropositives were detected at the end of 1985. In 1986, when the first AIDS case was diagnosed, an HIV screening programme was initiated for travellers to countries with reported cases of HIV infection. The programme was systematically expanded to include various groups such as patients with sexually transmitted diseases (STD), pregnant women, and hospital patients. Roughly 23 million tests were performed between April 1986 and December 2000 and those with confirmed HIV-seropositivity were treated and placed in controversial sanatorium facilities (see, e.g. refs. 2,4). The sanatorium programme was designed to reduce the probability of transmission by HIV-positive individuals and has gone through several phases. Currently, after their initial admission to a sanatorium, most HIV-infected residents are permitted to return voluntarily to their communities following an evaluation and treatment period.

Another important aspect of the Cuban national AIDS programme is the ‘Partner Notification Programme’ (PNP) based on contact tracing and screening of the sexual partners of known HIV-positives. This also began in 1986 and the idea is to search out asymptomatic carriers before they develop AIDS. Indeed, the result is impressive given that 55% of those detected with HIV in Cuba have not developed AIDS. General screening and subsequent admission to a sanatorium have been reduced in recent years due to economic constraints and an evolving sanatorium policy and so the PNP has gradually assumed added importance. Since approximately 90% of the reported AIDS cases in Cuba by the end of 1997 were acquired by sexual (hetero-, homo-, or bisexual) contact,5 the number of HIV-positives detected via contact tracing should give a good indicator of the size of the HIV-infected population in general. Furthermore, recent growth in tourism has led to a re-emergence of prostitutes in the last few years.6 Perhaps understandably, recent data have shown an increase in the number of HIV-positives, starting in 1996.7,8 Hence it is especially worthwhile from the public health point of view to focus our attention on estimating the population size of HIV-positives infected through sexual activity.

Recent AIDS data have shown that approximately 14% of AIDS cases were unknown to the Health Authority before developing AIDS symptoms. To obtain an estimate of the size of unknown HIV-positive population, Arazoza7 and Lounes8 applied a mathematical model to compute the theoretical numbers of the known and unknown HIV-positives infected by sexual contact in Cuba. Their results indicate that roughly 20% to 30% of the HIV asymptomatic carriers have not been detected. Recently a ‘generalized removal model for open populations’ was proposed by Hsieh et al.9 which uses an empirical Bayes approach to estimate the number HIV-infected people in a hidden, hard-to-count population without any knowledge of the population size. The method was employed in a preliminary study of recent trends in HIV infections in Cuba.10 Here we use this method to estimate the known and unknown numbers of HIV-positives in Cuba infected via sexual contact. We will make use of HIV seroprevalence data from the PNP from 1991 to 2000. In contrast to Hsieh et al.,9 we have additional information in the Cuban data set, i.e. the total number of known individuals in the HIV-infected population at each sampling time. We will employ this additional knowledge in our priors to improve our estimates.

The paper is organized as follows. We briefly describe the data and the statistical method used. We then give the results of our estimates, followed by concluding remarks and comments. Statistical details are given in the Appendix.


    Data and Methods
 Top
 Abstract
 Introduction
 Data and Methods
 Results
 Discussion and Concluding...
 Appendix
 References
 
Cuban partner notification HIV seroprevalence data
Ever since the emergence of the global HIV pandemic, contact tracing has been debated as a control measure for HIV/AIDS.11,12 In Cuba, the PNP traces the sexual contacts of known HIV-seropositives.2,3,4,7,8 The programme is carried out by Epidemiology Departments at all levels of the Cuban health system. Following partner notification, sexual contacts are interviewed, then tested for HIV every 3 months for a period of one year after the last sexual contact with an HIV-positive person. They are observed as long they remain in contact. We wish to estimate the number of HIV-positives, infected through sexual contact, by using the HIV seroprevalence data of the PNP. For our purpose the data were treated statistically as if they were a random sample of all sexually active individuals in Cuba, although clearly the method of their ascertainment inevitably introduced some selection bias.

One of the main focuses of this study is to obtain an estimate for the number of unknown HIV-positives in the sexually active population in recent years who have not developed AIDS. Table 1Go gives the accumulated yearly number of known HIV-positives living in Cuba at the end of each year from 1991 to 2000. Table 2Go is the seroprevalence data from 1991 to 2000 with number of contacts tested, number of HIV-positives detected, and the percentage of positive tests for each year listed.


View this table:
[in this window]
[in a new window]
 
Table 1 Number of known human immunodeficiency virus (HIV)-infected individuals living in Cuba at the end of the years 1991–2000
 

View this table:
[in this window]
[in a new window]
 
Table 2 The Partner Notification Programme seroprevalence data, 1991–2000
 
A recent study gives an estimate of the incubation period for HIV-infected Cubans at roughly 9 years.8 Consequently, most of those HIV-positives who were infected in or before 1990 have already developed AIDS by 2000. Therefore we choose to use only the prevalence data from 1991 to 2000. Note the number of known HIV-positives includes all modes of transmission. Also note the sharp increase starting 1996 in all three sets of numbers. The increase is caused by the large numbers of homosexual and bisexual HIV-positives detected since 1996. These people tend to be more sexually active and have a larger number of sexual contacts. Subsequently more of their contacts have been traced and tested positive since 1996.

Methods
The ‘generalized removal model for open populations’9 proposed recently allows only recruitment (of new HIV-infected individuals) and deaths (removal of HIV-infected individuals due to development of AIDS) to occur during the sampling. In this paper, where the number to be estimated is the yearly number of HIV-infected individuals within the sexually active population in Cuba, there is no recapture of those HIV-positive individuals detected in previous samplings since it is reasonable to assume that those tested positive will not be tested again. Hence the removal model is the appropriate choice of model to work with. In each sample, a number of subjects (in our case, those with recent contact with known HIV-positives) are selected for testing. Moreover, the sample-taking would exclude anyone who has already developed AIDS symptoms, hence the estimate we obtain is the number of HIV-infected individuals who have not developed AIDS. It does not hinder public health assessment of the AIDS scenario because the size of population with AIDS symptoms can be easily counted from clinical records.

Since it is not possible to obtain a valid estimate of the HIV-infected population using maximum likelihood estimation, we propose a Bayesian estimation procedure. Bayesian inference of a population size for various models has been proposed in the literature (e.g. refs 13–15). The detailed derivation of the model is given in Hseih et al.9 Differing from Hseih et al., there is extra information in this data set for Cuba. Namely, we know the total number, Rj of known subjects in the HIV population infected by sexual contact (i.e. 90% of the known number of HIV-positives) just before time tj. Intuitively, we must have Nj >= Rj; i.e. the number of HIV-infected subjects known to the health authority cannot exceed the number of all HIV-infected subjects. We will make use of this additional knowledge in our priors. Using this extra information our posterior estimates could provide more reasonable approximations.

Statistical details of the method are given in the Appendix. An empirical Bayes analysis of the model is implemented using the Gibbs sampler, a Markov chain Monte Carlo (MCMC) method. Detailed discussions can be found in Casella and George.16 The Bayes estimates are based on Monte Carlo samples from the Gibbs sampler run of 6000 iterations after 2000 burn-in, and selecting every 5th sampled value.


    Results
 Top
 Abstract
 Introduction
 Data and Methods
 Results
 Discussion and Concluding...
 Appendix
 References
 
We are interested in estimating the number of the HIV-positives infected sexually in Cuba during the years 1991 to 2000 using the annual HIV seroprevalence data of the PNP during the same period (as given in Table 2Go). In the framework of the generalized removal model, we let the sexually active population in Cuba be the hidden and hard-to-count population in question. Since the fraction of sexually active males is more than three times that of the females,5 this population group consists mostly of males aged 15–49, both heterosexual and homo-bisexual, which account for 90% of all HIV-positives in Cuba. Arazoza et al.7 gave a mean estimate of the HIV incubation time, i.e. the time from HIV infection to symptomatic AIDS (when patients usually die within a year) of HIV-positives in Cuba of 9.2 years (SD 0.273, median 8.42 years, 95% CI : 7.5–8.83). For our work, we defined the yearly survival rate {phi} for HIV-infected individuals (i.e. the mean probability that an infected individual will not develop AIDS during the 12 months between samples) as 93%. A yearly survival rate of 93% implies that the survival rate after 9.5 years is approximately 50% (0.939.55 = 0.5), resulting in a median survival time of approximately 9.5 years. For a detailed discussion on the choice of constant survival rate, see Hseih et al.9 For the yearly number of known HIV-positives by sexual contact Rj, we use the yearly number of known HIV-positives in Table 1Go, but multiplied by 90% since data have shown that approximately 90% of all HIV-positives in Cuba are infected sexually.5 Table 3Go lists median, mean, standard error, and 95% CI for Nj obtained from 2.5% and 97.5% quantiles.


View this table:
[in this window]
[in a new window]
 
Table 3 Estimated number of human immunodeficiency virus (HIV)-infected individuals by sexual contact in Cuba, 1991–2000
 
For the purpose of comparing our result with other studies, we consider the following. In Figure 1Go, we plot our estimate of the total number of HIV-positives infected by sexual contact using a 93% survival rate with the 95% CI against 90% of the number of known HIV-positives in Table 1Go. The estimates are much higher, mainly due to the discrepancy caused by the unknown HIV-positives unaccounted for in the government data. Our result also shows there is consistent increase in the number of sexually infected HIV-positives in Cuba throughout the decade, despite the relatively unchanged number in the known data from 1992 to 1995, and the sharp increase from 1996 to 2000. To further understand our result in the context of public health concerns, we consider the estimate in terms of undetected HIV-positives.



View larger version (21K):
[in this window]
[in a new window]
 
Figure 1 The known number of human immunodeficiency virus (HIV)-positives by sexual contact in Cuba versus the estimated total (known + unknown) number of HIV-positives by sexual contact from 1991 to 2000 by generalized removal model

 
Arazoza et al.7 used a mathematical model to compute the numbers of known and unknown HIV-positives by sexual contact in Cuba. Here we compute the estimated number of unknown HIV-positives in Cuba by subtracting 90% of the known HIV-positives (which approximates the number of known HIV-positives by sexual contact) from our estimated total number in Table 3Go. Estimate I in Table 4Go is our estimate of unknown HIV-positives. Estimate II is the estimate by Arazoza et al.7 Our results show the increase in the number of unknown HIV-positives starts to level off in 1996, resulting in a decrease in number of unknown HIV-positives since 1998. This is consistent with the sharp increase in the number of HIV-positive contacts detected since 1996.


View this table:
[in this window]
[in a new window]
 
Table 4 Estimated number of unknown HIV-infected individuals by sexual contact living in Cuba during the years 1991–2000
 
We give the plots of our estimate of unknown HIV-positives and the theoretical estimate from Arazoza et al.7 in Figure 2Go. Contrary to the theoretical estimate of Arazoza et al.,7 our estimate clearly shows that a sharp increase in the number of unknown HIV-positives starting in 1992 was brought under control by the large number of detections from 1996 onwards. Note that this change of trend is not captured by the estimate in Arazoza et al.,7 mainly because the model in Arazoza et al. is a deterministic differential equations model which yields an estimate that is essentially a smooth curve over the time period considered.



View larger version (14K):
[in this window]
[in a new window]
 
Figure 2 The number of unknown HIV-positive pepole by sexual contact in Cuba from 1991 to 2000. The solid line is the estimate by generalized removal model and the broken line is the theoretical number computed by ref. 7

 

    Discussion and Concluding Remarks
 Top
 Abstract
 Introduction
 Data and Methods
 Results
 Discussion and Concluding...
 Appendix
 References
 
Our results show that, in agreement with the seroprevalence data, there is an increase in the number of HIV-positives in Cuba from 1996 to the present. The increase is consistent with the increase in the first half of the 1990s. Among the many possible underlying causes of the sharp increase in the government data is the fact that many more homosexual/bisexual contacts were traced and subsequently more HIV-positives detected from 1996 onwards. As a consequence, the number of unknown HIV-positives levelled off and has actually decreased since 1998.

We wanted to estimate the number of HIV-positives in the sexually (both heterosexual and homo-bisexual) active population. Hence we consider the sexual contacts of known HIV-positives who have been traced as a random sample of the sexually active population since they evidently are sexually active. However, as those traced and tested are known to have had contact with at least one HIV-positive in the past, the capture probability might be higher than it would be in a truly random sample. Consequently, this may result in some overestimation of the true numbers. On the other hand, it is generally unknown whether the contacts are made when the HIV-positive person is already infective, due to variance in infection time and progression of disease. Therefore the actual effect of this uncertainty on the estimate is not clear. In this respect, a possible future research direction is to improve the method by considering the detailed individual contact tracing data of the HIV-positives in Cuba. That would require a much more complicated and difficult model which is beyond our scope.

A full discussion of the advantages as well as the drawbacks and limitations of the generalized removal model is given in Hsieh et al.9 It suffices to point out the difficulty in obtaining information regarding hidden and elusive populations such as the sexually active population in a society. In practice, the dilemma has proved to be even more challenging in the context of HIV epidemic. This work and Hsieh et al.'s previous paper,9 in which we estimated the number of HIV-infected people in elusive, hard-to-count population groups, demonstrate the usefulness of the generalized removal model, not just in estimating the HIV-infected population sizes, but any prevalent population size of epidemiological and public health interest. As long as two or more (non-overlapping) random samplings of the prevalence data are obtained, one can use it to make inference of the prevalent population size.


    Appendix
 Top
 Abstract
 Introduction
 Data and Methods
 Results
 Discussion and Concluding...
 Appendix
 References
 
Statistical details
We consider a sequence of s samples taken from the seroprevalence data of the Partner Notification Programme. Let tj be the time when the jth sample is taken and let Bj be the number of HIV-positive individuals newly infected sexually between time tj and time tj+1. Assume also that all subjects in the HIV-infected population just before time tj who have not been detected in the first j – 1 samples have the same capture probabilities Pj in the jth sample. We define Nj to be the total number of subjects in HIV-infected population just before time tj, and Nj = B0 + ... + Bj–1.

The likelihood function can be obtained as follows:


((1))
where D = {u1,...,us}, B = (B0,...,Bs–1), P = (P1,...Ps), and uj is the number of HIV-infected individuals by sexual contact captured in the jth sample. Therefore, Mj+1 = u1 + ... uj is the number of observed HIV-infected individuals in the first j samples. We call this model a generalized removal model for open populations9 due to the removal of the observed HIV-infected individuals from the (open) sexually active population, Mj, j = 1,2,..., in each sampling occasion.

Suppose that the prior distribution of (N,P) where N = (N1,...Ns) is given by {pi}(N,P) = {pi}(N1,...Ns){pi}(P). This asserts that N and P are a priori independent. We assume that the priors of Pj's are a priori independent and follow a Beta distribution Be({gamma}1,{gamma}2). In addition, let

((2))
where Isj(·) is an indicator function of Nj and Sj is the set of Nj with Nj >= Rj.

The assumption on the prior of N in (2) is appropriate since intuitively Nj must be larger than Rj.

Such priors lead to conditional posteriors of the forms:


((3))


((4))
where N(–j) denotes the vector N with the Nj deleted. (Nj Mj +1) follows a truncated negative binomial with parameters uj+1 and Pj, and max {Nj–1, Rj, Mj+1} <= Nj <= Nj+1. Moreover, we know that N1 >= R1. Subsequently one can easily implement the Gibbs sampler to generate (Nj - Mj+1) from the truncated negative binomial in Equation (4Go), and therefore the estimates of Nj can be obtained. Since there are AIDS-related deaths during the process, we define the yearly survival rate specific to an HIV-infected individual between the (j–1)th and jth sample to be {phi}. The conditional expectations of Mj+1 and Nj+1 for (j+1)th sample given Mj (the number of distinct HIV-infected individuals captured in the first j–1 samples) and Nj (the total number of subjects in HIV-infected population just before time tj), respectively, are

((5))
We assume that the natural death-rate of the sexually active population during this time period is negligible compared to the AIDS-related death rate since the majority of the subjects in question are aged 15–49 when the natural mortality is low.


    Acknowledgments
 
Ying-Hen Hsieh, Shen-Ming Lee and Cathy WS Chen are supported by grants from National Science Council of Taiwan. The authors wishes to thank the Sanatorium for Persons Living with HIV in Santiago de Las Vegas, Havana, Cuba for the data used in this work.


    References
 Top
 Abstract
 Introduction
 Data and Methods
 Results
 Discussion and Concluding...
 Appendix
 References
 
1 Joint United Nations Programme on HIV/AIDS and World Health Organization, AIDS Epidemic Update: December 2000. Geneva: UNAIDS/ WHO, December 2000.

2 Granich R, Jacobs B, Mermin J, Pont A. Cuba's National Aids Program —The First Decade. West J Med 1995;163:139–44.[ISI][Medline]

3 Pèrez Avila J, Peña Torres R, Joanes Fiol J, Lantero Abreu M, Arazoza Rodriguez H. HIV control in Cuba. Biomed Pharmacother 1996;50:216–19.[CrossRef][ISI][Medline]

4 Swanson JM, Gill AE, Wald K, Swanson KA. Comprehensive care and the sanatorium: Cuba's response to HIV/AIDS. J Assoc Nurses AIDS Care 1995;6:33–41.[Medline]

5 Joint United Nations Programme on HIV/AIDS and World Health Organization, Epidemiological Fact Sheet on HIV/AIDS and Sexually Transmitted Diseases: Cuba. Geneva: UNAIDS/WHO, June 1998.

6 Burr C. Assessing Cuba's approach to contain AIDS and HIV. Lancet 1997;350:647.[ISI][Medline]

7 de Arazoza H, Lounes R, Hoang T, Interlan Y. Modeling HIV epidemic under contact tracing—The Cuban case. J Theoretical Medicine 2000; 2:267–74.

8 Lounes R, de Arazoza H. A two-type model for the Cuban national programme on HIV/AIDS. IMA J Math Appl Med Biol 1999;16:143–54.[CrossRef][Medline]

9 Hsieh YH, Chen CWS, Lee SM. Empirical Bayes approach to estimating the number of HIV-infected individuals in hidden and elusive populations. Stat Med 2000;19:3095–108.[CrossRef][ISI][Medline]

10 Hsieh YH, Chen CWS, Lee SM, de Arazoza H. On the recent sharp increase in HIV infections in Cuba. AIDS 2001;15:426–28.[CrossRef][ISI][Medline]

11 Burr C. The AIDS exception:Privacy vs public. Atlant Month June1997;279:57–67.

12 Rutherford G, Woo J. Contact tracing and the control of human immunodeficiency virus. JAMA 1988;259:3609–10.[CrossRef][ISI][Medline]

13 Lee SM, Chen CWS. Bayesian inference of population size for behavioral response models. Statistica Sinica 1998;8:1233–47.[ISI]

14 Chen CWS, Lee SM, Hsieh YH, Ungchusak K. A unified approach to estimating population size of births only model. Computational Statistics and Data Analysis 1999;32:29–46.[CrossRef]

15 George EI, Robert CP. Capture-recapture estimation via Gibbs sampling. Biometrika 1992;79:677–83.[ISI]

16 Casella G, George EI. Explaining the Gibbs sampler. American Statistician 1992;46:167–74.[ISI]





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (3)
Request Permissions
Google Scholar
Articles by Hsieh, Y.-H.
Articles by Chen, C. W.
PubMed
PubMed Citation
Articles by Hsieh, Y.-H.
Articles by Chen, C. W.