1 Division of Medical and Molecular Genetics, Guy's, King's and St Thomas' School of Medicine, London SE1 9RT, 2 ARC Epidemiology Research Unit, University of Manchester, Manchester M13 9PT, 3 Department of Applied Statistics, University of Reading, Reading RG6 2FN and 4 School of Biology, University of Leeds, Leeds LS2 9JT, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key words: genetics/mathematical modelling/misdiagnosis/preimplantation genetic diagnosis
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Previous work on modelling of errors in PGD (Navidi and Arnheim, 1991) considered three sources of error: (i) analysing an anucleate cell, (ii) contamination and (iii) non-amplification of alleles. This model was used to explore the magnitude of error rates, and the role of such errors in classifying embryos, but is now known to be a simplification of the laboratory process. We have extended this model as follows. Firstly, in each error category of Navidi and Arnheim, we use a more detailed model for the variability of cell biopsy and amplification. Secondly, our model incorporates a marker locus which is linked to the disease locus. The cell genotypes at the disease locus and the marker locus can then be used jointly to predict embryo disease genotype, and to determine whether the embryo should be replaced.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The disease genotype of the embryo fully determines disease risk, but alleles from the linked marker can identify whether errors have occurred in the biopsy and amplification process. For example, if only a single marker allele is amplified, we may suspect that the cell was not diploid. If the marker genotypes are discordant with the disease genotype, contamination may have occurred. However, the marker genotype is itself subject to error through recombination between marker and disease loci or through non-amplification of marker alleles.
We have constructed a model to describe the process of cell biopsy and amplification which, for a given embryo genotype, determines the probability of each cell genotype that may arise. The model has four different components: cell chromosomes, recombination, contamination and amplification, described below and summarized in Figure 1. For each component, we give values from a core parameter set which is used throughout the paper. The examples given in this section assume a recessive disease with disease and marker genotypes as in Table II
.
|
These probabilities are constrained to sum to 1 (= p0 + p1 + p2 + p3 + p4), and the subscript 0, 1, 2, 3, 4 denotes the total number of disease alleles present in the cell. This effectively accommodates chromosomal aneuploidy since triploid and haploid cells are equivalent at the chromosomal level to trisomic and monosomic cells, respectively. Triploids are rare, and initial modelling indicated that trisomic cells had little effect on model outcomes. For all following work, we assume that no triploid cells arise (p3 = 0). In the core parameter set, p2 = 0.8, p4 = 0.1, p3 = 0, p1 = 0.05, p0 = 0.05 are used.
Recombination
The recombination fraction, r, is the probability that a recombination has occurred between the marker and disease loci. These occur independently in each parent, and determine which marker alleles are inherited with the disease genotype. Consider a diploid cell with genotype aa for a recessive disease. If no recombination has occurred in either parent, the marker genotype will be 13, which occurs with probability (1 - r)2. Similarly the probabilities of a recombination in one or both parents are 2r(1 - r) and r2, respectively. A recombination fraction of 0.001 is used in the core parameter set.
Contamination
We allow for two sources of contamination: operator contamination and product (carry-over) contamination. Operator contamination consists of a disease genotype and a marker genotype. Genotypes are randomly selected using the population frequencies for the alleles at these loci, assuming that the operator is not affected with the disease. Thus for a dominant disease, the operator must be genotype aa, whereas for a recessive disease, the operator may be a carrier of the mutation under test at the disease locus (genotype Aa) or may be homozygous for the normal allele (AA). Product contamination consists of one disease allele and one marker allele. We assume that the product is from a previous analysis of the same disease, so the disease allele is randomly chosen from alleles present in the parents and the marker allele is randomly chosen from the population.
The contamination is controlled by three probabilities: cn for no contamination, co for operator contamination, and cp for product contamination (with cn, co, cp summing to 1). Contamination probabilities of cn = 0.95, co = 0.025, cp = 0.025 are assumed. Values are also needed for the allele frequency of the disease mutation (q = 0.01, approximating cystic fibrosis or spinal muscular atrophy) and the marker allele frequencies (probability f1, f2, f3, f4 for alleles 1, 2, 3, 4 having frequencies of 0.1, 0.4, 0.1 and 0.4 in the core parameter set, with the less common alleles linked to the mutation). These parameters define the probability distribution for any contamination genotype. For example, in a recessive disease, product contamination with A2 will occur with probability cpf2/2.
Amplification
At the next stage of the model, the genotypes from the biopsied cell and from any contamination are pooled by adding together the number of copies of each allele present. For example, a cell genotype of aa13, together with product contamination of A2, will give two copies of allele a, one copy of allele A, and one copy each of marker alleles 1, 2 and 3. Each allele has the potential to be amplified under the PCR, and we allow different amplification probabilities for disease alleles (probability d) and marker alleles (probability m). The genotype of the cell can only distinguish presence or absence of any allele, with no information on the number of copies present. With the alleles aaA123 above, allele A is present in the final genotype if it is amplified, which occurs with probability d. Allele a is present if at least one of the two copies is amplified, which has probability 1 - (1 - d)2. The amplification of the marker alleles is dealt with similarly. In the core parameters, d = 0.9 for disease allele amplification and m = 0.7 for marker allele amplification.
Misdiagnosis probabilities
Two types of misdiagnosis probabilities exist: an affected embryo could be classified as unaffected, and therefore replaced, or an unaffected embryo could be classified as affected, and consequently discarded. We will focus on the first, more serious, error. We assume that an embryo will only be replaced if it has an unaffected disease genotype and supporting marker genotypes. Any genotype with evidence of recombination, haploid cells, contamination or non-amplification will be discarded. For a recessive disease, embryos of genotype Aa14, Aa23, AA24 would be replaced, as the disease genotype is unaffected and the marker genotype is concordant with the disease genotype. For a dominant disease, embryos of genotype aa23 and aa24 will be replaced. A misdiagnosis error occurs if the cell genotype is unaffected but the embryo is affected. For example in a recessive disease, an Aa14 cell genotype implies that the embryo is unaffected, but it can also occur with an anucleate cell and contamination, or from a haploid cell with contamination. We classify this as a misdiagnosis error: replacing an affected embryo because the cell genotype appeared to be from an unaffected embryo.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We performed a sensitivity study, varying each parameter around its value from the core parameter set, to determine the effect on the misdiagnosis error. Error probabilities were calculated assuming that both disease and marker genotypes were tested and, for comparison, assuming that only the disease genotypes were tested (Table III). Separate sections of Table III
correspond to each set of parameter values, and the results from the core parameters are repeated (in bold type) in each section, for ease of comparison.
|
The variation in misdiagnosis error depends on whether the marker locus is tested. With an informative linked marker, the misdiagnosis error rates are stable across all parameter values tested and remain below 1% (Table III, columns 1 and 3). For a recessive disease, the largest error rate is 0.87%, which occurs when the total probability of contamination is 10% (= co + cp). The recombination fraction, disease frequency and cell probabilities have very little effect on the error rates. For a dominant disease, increasing error rates are found with an increase in haploid cells (error rate of 0.26%), or an increase in the recombination fraction (error rate 0.28%), but the misdiagnosis error rate remains below 0.5% for all tested values. In parallel with the recessive disease, doubling the contamination rate to 10% doubles the error probability to 0.2%.
Contamination has a major effect on error rates in both a dominant and recessive disease, and, if no contamination occurs, an affected embryo is very unlikely to be replaced. In the absence of contamination, errors in the cell genotype would arise from recombination, haploid or anucleate cells and lack of amplification, but they will rarely give rise to a cell genotype which is concordant with an unaffected embryo. Operator contamination is more likely than product contamination to result in an affected embryo being transferred; however, in practice, product contamination may occur more frequently.
The change in amplification rates can have a counter-intuitive effect on error rates, with a decrease in amplification leading to a lower error rate. For example, in the recessive disease, if the disease allele amplification probability is decreased from 0.9 to 0.7 and the marker allele amplification probability remains constant at 0.7, the error rate is decreased (from 0.44 to 0.36%). This occurs because contamination is less likely to be amplified, so the true genotype of the affected embryo may be obtained. This can occur for both dominant and recessive diseases, but the effect is small in each case.
When only disease genotypes are used, the misdiagnosis error rates rise substantially for all parameter values tested. The smallest error rate for a recessive disease occurs when no haploid cells arise (3.6%), or with no contamination (2.25%). The smallest misdiagnosis error rate for a dominant disease is 4%, for full amplification. These misdiagnosis error rates are high, and illustrate the dangers of relying on a single disease locus for identifying embryos which carry a high risk genotype. In a recessive disease, the cell chromosomes and contamination have the greatest effect on misdiagnosis errors. For a dominant disease, lack of amplification is the most important factor, followed by cell chromosomes.
Estimating contamination rates
Contamination levels can be estimated by testing a series of blank reactions for disease alleles or marker alleles. From the number of blanks tested, we can obtain an upper bound on the contamination level using the 95% confidence limit from a binomial distribution. A large number of blanks must be run in order to ensure that contamination levels are sufficiently low; for example, if no contamination is found in 25 blanks, this is still consistent with a contamination level of 11.3%. If one blank from 25 is contaminated, the true contamination rate could be as high as 20%. Table IV shows how the upper limit on the contamination rate depends on the number of blanks tested and on the number of contaminated reactions. The upper limit for the contamination rate falls only slowly with an increasing number of blanks: 300 blanks, all negative, are required to ensure that the contamination rate is <1%. A series of 28 negative blanks or one positive in 53 blanks would indicate the contamination rate is <10%. If the total contamination rate is
5%, misdiagnosis error rates (Table IV
) are reasonable provided a marker is genotyped. We recommend a two-stage testing procedure to ensure that laboratory contamination remains below this level. Prior to clinical implementation, a large series of blanks (e.g. 100) should be run. Thereafter, a smaller series (e.g. 25 blanks) should be run regularly to ensure that contamination rates remain low.
|
![]() |
One method to increase confidence in the embryo genotype is by repeat sampling, so that the final assessment of embryo genotype is based on genotypes from two cells. We assume the cells are genotyped independently, and therefore results may differ through the chromosomes present, contamination and amplification. Let the genotypes of the cells be C1 and C2, then the Bayesian analysis above becomes:
![]() |
We will refer to P(E = aa | C) as the implantation error, which is the probability that an implanted embryo is affected. We can define a decision-making process to determine which embryos will be replaced. This classifies all cell genotypes (or pairs of genotypes for the two cell analysis) into embryos that would be replaced or would be discarded. This requires a cut-off value , whereby if the implantation error for a genotype is
%, the embryo will be replaced and all other embryos will be discarded. For any replaced embryo the probability that it is affected is <
. However,
also controls the proportion of unaffected embryos that are available for replacement. A highly stringent value of
would give a negligible risk that a replaced embryo is affected, but could result in few unaffected embryos with sufficiently low implantation errors to be replaced.
Decision analysis for embryo diagnosis
This decision-making process was applied to embryos for recessive and dominant diseases, for one and two cells, using only the disease genotypes, or both marker and disease genotypes. For each analysis we have used an implantation error of = 1%: only embryos with a risk of <1% of being affected will be replaced. Table V
shows the associated probability that an unaffected embryo would be replaced, and the list of genotypes to be replaced is given (where possible).
|
For a dominant disease with marker and disease genotypes, 47% of unaffected embryos could be replaced in a single cell analysis and 85% when two cells are analysed. Using a less stringent value of allows us to classify aa2 genotypes as unaffected, and 66% of unaffected embryos would be replaced. However, the aa2 genotype has a 5% probability of arising from an affected embryo.
The decision analysis highlights the problems of decisions based on only the disease genotype. No embryo had a <1% probability of being affected. For a recessive disease, applying the model with the core parameter set shows that Aa and AA cell genotypes both have a 2% implantation error. Notice that the risks are similar for both genotypes: replacing embryos genotyped as AA, with no disease alleles present, does not reduce the implantation error. When two cells are analysed for the disease genotype, several genotype combinations for the two cells have an implantation error of <1%; 76% of unaffected embryos would be replaced. For single cell analysis of a dominant disease, 11% of replaced embryos would be affected. This is controlled mainly by the 10% probability that the disease allele A is present in the cell, but is not amplified. Analysing two cells can only reduce the probability of replacing an affected embryo to 1.4%. This occurs when aa genotypes are obtained for both cells, and allows 85% of unaffected embryos to be replaced.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Modelling biological systems is a compromise between capturing the essential variability that will affect clinical decisions and maintaining parsimony so that the model can be explored under different assumptions. Our model contains several sources of variation (cell chromosomes, recombination, contamination, amplification) and each component is assumed to be independent (for example, presence of a haploid cell does not affect the amplification of the alleles present in the cell). Further refinements of the model could allow for different parental marker genotypes, or for differential amplification of disease alleles (mutation and wildtype). The current model is restricted in the marker genotypes of the parents, assuming that parents carry four different alleles, and no other alleles exist in the population. Homozygous parents can be modelled through a subset of the specified output genotypes. If further marker alleles exist, contamination would be easier to detect, and the misdiagnosis and implantation errors would fall.
The model requires detailed probabilities for all aspects of cell chromosomes, contamination and amplification. Determining accurate values for these parameters can require an unreasonably high number of experiments, and few published estimates are available. Data on the frequency of haploid, diploid or more complex mosaic cells can be obtained through FISH. Several studies using FISH have provided estimates of chromosomal abnormalities (Munné et al., 1994; Harper et al., 1995
). A study on chromosome 7 using chromosome-specific probes found that at least one parental chromosome was absent in 6.5% of cells (Kuo et al., 1998
). Most cells were diploid, but extensive mosaicism was seen (including chaotic and haploid cells). Our parameters assume that 90% of cells have both parental chromosomes (diploid and tetraploid cells) and 10% of cells lack at least one parental chromosome. These definitions of diploid, haploid, etc. refer only to the chromosome with the disease gene and no information is obtained on possible errors at other chromosomes.
We have used contamination rates of 5%, arising equally from the operator and from a previously amplified PGD product. This estimate was based on contamination rates of 6% in 226 (Ray et al., 1998) and of 3% in 98 samples (Wu et al., 1993
). The source of contamination (operator or product) was not determined in these studies. E
timates for amplification rates are based on the
F508 mutation in cystic fibrosis: amplification occurred in 85% of 409 cells (Ray et al., 1998
) and in 94% of 64 samples (Liu et al., 1993
). Although individual laboratories may have confidence in the accuracy of their laboratory procedures, establishing that error probabilities are sufficiently low may require unreasonably large numbers of experiments.
We have investigated two types of error: a misdiagnosis error rate for the probability that an affected embryo produces an unaffected cell genotype, and an implantation error rate for the probability that a replaced embryo is affected. The second error is the most appropriate for making clinical decisions. Genotyping the disease locus in a single cell provides little information for classifying an embryo as affected or unaffected, and the implantation error rates can be high. In a recessive disease, 2% of replaced embryos would be affected. In a dominant disease, 11% of embryos would be affected. These error rates could be reduced by assuming low contamination rates and high amplification rates, but these properties are difficult to measure and control in the laboratory. The implantation errors represent a substantial reduction on the probability of conceiving an affected embryo without PGD (25% for a recessive disease, 50% for a dominant disease), but may not be acceptable to clinicians or patients.
Genotypes from a second locus (a linked marker) or from a second cell can substantially reduce error rates, particularly for a dominant disease. For a recessive disease, a 1% implantation error rate is attainable from a single cell genotyped at the disease and marker loci. Genotyping two cells increases the proportion of unaffected embryos that will be transferred (from 68.5 to 75.9%), but removal of two cells may decrease the embryonic implantation rate. The optimum genotyping strategy for a recessive disease is not clear. For the dominant disease and single cell analysis with a marker, 47% of unaffected embryos would be replaced. This accounts for only one-quarter of all tested embryos and may be too low to ensure that several unaffected embryos are available from each PGD cycle. The embryo yield can only be increased by replacing embryos with a 5% probability of being affected. If marker and disease genotypes are tested in two cells, the proportion of unaffected embryos eligible for replacement increases to 85%. Analysing a dominant disease will always be more problematic than a recessive disease, since presence or absence of a single mutation defines the disease status of the embryo and, a priori, 50% of embryos will be affected. In summary, these results suggest that for a recessive disease, two genotypes are required to determine the embryo diagnosis: this can be based on marker and disease genotypes from a single cell, or disease genotypes from two cells. This may explain the apparently increased risk of a misdiagnosis in compound heterozyotes in cystic fibrosis, since independent analysis of two separate mutations alone (without a linked marker) would increase the probability of a normal result for at least one mutation. For a dominant disease, both disease and marker genotypes from two cells may be required to give sufficient unaffected embryos with a low probability of error.
Protocols for PGD of spinal muscular atrophy (SMA) have been developed (Dreesen et al., 1998; G.Daniels and A.H.Handyside, unpublished data), and error probabilities for this disease can be calculated from the model. SMA is caused by a deletion in the SMN gene and does not fit the recessive disease model since only one allele is amplified (the normal allele A). We can only distinguish between genotypes aa (when no normal allele is amplified) or Aa/AA when at least one copy of the normal allele is amplified. SMA has misdiagnosis rates that are slightly higher than those for the recessive disease with two disease alleles. It is possible to use the same stringent criteria for replacement as for the diallelic recessive disease, but fewer unaffected embryos will be available for replacement. With a linked marker, and an implantation error of 1%, 45% of unaffected embryos will be replaced in contrast to 68% for a diallelic recessive disease.
Further experience of PGD will be required to establish the adequacy of the model and the accuracy of the parameter values. However, it is clear that basing embryo diagnosis on two genotypes (from a linked marker or a second cell) ensures a high level of accuracy while providing sufficient unaffected embryos for transfer. In clinical practice, the choice to transfer a particular embryo would depend on its cell genotype, its quality, and on these properties for the other biopsied embryos. It is possible that an embryo with a slightly higher probability of being affected would be replaced in preference to a clearly unaffected embryo of low quality. The model can be used as a clinical decision-making tool. In a clinical setting, implantation errors can be calculated using laboratory- and disease-specific parameter values. Together with other relevant factors such as embryo quality, they will determine which biopsied embryos from a PGD cycle would be replaced, stored or discarded.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Geraedts, J., Handyside, A., Harper, J. et al. (1999) ESHRE preimplantation genetic diagnosis (PGD) consortium: preliminary assessment of data from January 1997 to September 1998. ESHRE PGD consortium steering committee. Hum. Reprod., 14, 31383148.
Gianaroli, L., Magli, M.C., Ferraretti, A.P. et al. (1999) Preimplantation diagnosis for aneuploidies in patients undergoing in vitro fertilization with a poor prognosis: identification of the categories for which it should be proposed. Fertil. Steril., 72, 837844.[ISI][Medline]
Handyside, A.H. and Delhanty, J.D. (1997) Preimplantation genetic diagnosis: strategies and surprises. Trends Genet., 13, 270275.[ISI][Medline]
Handyside, A.H., Kontogianni, E.H., Hardy, K. et al. (1990) Pregnancies from biopsied human preimplantation embryos sexed by Y-specific DNA amplification. Nature, 344, 768770.[ISI][Medline]
Harper, J.C., Coonen, E., Handyside, A.H. et al. (1995) Mosaicism of autosomes and sex chromosomes in morphologically normal, monospermic preimplantation human embryos. Prenat. Diagn., 15, 4149.[ISI][Medline]
Kuo, H.C., Ogilvie, C.M. and Handyside, A.H. (1998) Chromosomal mosaicism in cleavage-stage human embryos and the accuracy of single-cell genetic analysis. J. Assist. Reprod. Genet., 15, 276280.[ISI][Medline]
Liu, J., Lissens, W., Devroey, P. et al. (1993) Polymerase chain reaction analysis of the cystic fibrosis delta F508 mutation in human blastomeres following oocyte injection of a single sperm from a carriers. Prenat. Diagn., 13, 873880.[ISI][Medline]
Munné, S., Weier, H.U., Grifo, J. et al. (1994) Chromosome mosaicism in human embryos. Biol. Reprod., 51, 373379.[Abstract]
Navidi, W. and Arnheim, N. (1991) Using PCR in preimplantation genetic disease diagnosis. Hum. Reprod., 6, 836849.[Abstract]
Sermon, K. (1998) Diagnostic accuracy in preimplantation diagnosis single-cell PCR for mendelian disorders. In Kempers, R., Cohen, J., Haney, A.F. and Younger, J.B. (eds), Fertility and Reproductive Medicine. Proceedings of the XVI World Congress on Fertility and Sterility, San Francisco, 49 October 1998. Elsevier, Amsterdam, pp. 687695.
Ray, P.F., Ao, A., Taylor, D.M. et al. (1998) Assessment of the reliability of single blastomere analysis for preimplantation diagnosis of the F508 deletion causing cystic fibrosis in clinical practice. Prenat. Diagn., 18, 14021412.[ISI][Medline]
Wu, R., Cuppens, H., Buyse, I. et al. (1993) Co-amplification of the cystic fibrosis delta F508 mutation with the HLA DQA1 sequence in single cell PCR: implications for improved assessment of polar bodies and blastomeres in preimplantation diagnosis. Prenat. Diagn., 13, 11111122.[ISI][Medline]
Submitted on June 6, 2000; accepted on October 9, 2000.