Is quality assurance in semen analysis still really necessary? A spermatologist’s viewpoint

W.V. Holt

Institute of Zoology, Zoological Society of London, Regent’s Park, London NW1 4RY, UK

E-mail: bill.holt{at}ioz.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 Quality assurance and sperm...
 What does morphology assessment...
 Towards new sperm function...
 Conclusions
 References
 
In a provocative article to this Journal, Anne Jecquier, an eminent andrologist who, more than 20 years ago, was a prime mover in suggesting the need for quality assurance (QA) in andrology laboratories, has now proposed that the QA schemes may no longer be needed. Here I reply to that proposition, largely by agreeing that, since the QA schemes have brought about higher technical standards in laboratories, Anne Jecquier’s assertion is possibly true. However, vigilance is still needed in discriminating between unproductive investment of time and energy in the refinement of tests that may offer little information about fertility, and maintaining technical standards such that where necessary they provide the requisite information. Thus, although it may not matter in practice whether a sperm concentration is estimated as 100 or 200 x 106/ml, distinguishing between 25 and 100 x 106/ml would probably influence a clinician’s treatment decisions. Anne Jecquier also suggested that sperm function tests have limited predictive value in terms of fertility assessment. While I agree that this is largely true at present, I also argue that these tests are probably not developed to their full potential. I am optimistic that tests to distinguish and quantify the population of fertilization-competent sperm within an ejaculate will eventually become available.

Key words: functional tests/quality assurance/semen analysis/sperm subpopulations


    Introduction
 Top
 Abstract
 Introduction
 Quality assurance and sperm...
 What does morphology assessment...
 Towards new sperm function...
 Conclusions
 References
 
Although it may be customary in a debate to reply to a controversial proposal with a show of horrified disagreement, I find myself largely in agreement with Anne Jecquier’s proposal that quality assurance (QA) is only needed in the andrology laboratory to maintain technical standards (Jequier, 2005Go). In saying this, I suggest that the value of ‘accuracy and precision’ in semen analysis has perhaps become confused with predictive value for diagnostic purposes. It is probably true therefore that investing a great deal of time and effort in ensuring that laboratories are able to distinguish between sperm counts of 50, 75 or 100 x 106/ml does not really help the clinician to recommend a course of action or treatment. Anne Jecquier based her arguments on pragmatism, recognizing that presenting a clinician with a set of accurate test results is only useful when the meaning of those results is understood. When the meaning is required in terms of fertility, it is clear from studies in a range of mammals (and humans are only another mammal) that laboratory tests have poor predictive power. The fertility of semen and sperm depends on the combined effects of many physiological attributes, some being more amenable to measurement, and indeed to treatment, than others (for review, see McLachlan, 2004Go). This emphasizes that semen assessment alone, as traditionally understood, has only limited prognostic value.


    Quality assurance and sperm counting
 Top
 Abstract
 Introduction
 Quality assurance and sperm...
 What does morphology assessment...
 Towards new sperm function...
 Conclusions
 References
 
Quality control in the andrology laboratory typically, and most usefully, addresses the measurement of semen volume and sperm concentration. Technical standards for performing these measurements have undoubtedly improved since the national QA scheme was set up in the UK, initially by the British Andrology Society. When the scheme was established more than two decades ago, the technical shortcomings within some centres were striking; single samples of semen were given disparate assessments that could have spelt the difference between being a semen donor or a candidate for infertility treatment. Similar experiences were reported in other countries (e.g. Neuwinger et al., 1990Go). Higher standards are now routinely achieved and it is important that these are maintained through technician training, both in-house and through national organizations. Good research laboratories all over the world take for granted that routine laboratory procedures such as sperm counting are performed competently, and I see no good reason why technicians in andrology laboratories should be any less able in this respect than their peers elsewhere.

Sperm concentration measurements are routinely performed using one of several available designs of counting chambers or disposable slides. Of these, some have inherently higher reproducibility, and therefore better precision, than others and it is relevant to consider whether this is important in the context of the debate. Christensen et al. (2005)Go recently compared the accuracy and precision of four types of counting chamber, Makler chambers (Sefi Medical Instruments, Haifa, Israel), Thoma 50 and 100 µm deep chambers (Thoma haemocytometers; Hecht-Assistent, Sondheim, Germany) and Bürker-Türk haemocytometers (BT, Brand, Wertheim, Germany). The tests were performed using bull and boar semen and were replicated by including data from more than one technician. Importantly, the counts were calibrated against flow cytometric estimation, which is widely accepted as an independent and objective counting technique. These authors found that the Makler chamber showed highest coefficients of variation (CV; 15 – 24%) and consistently underestimated the sperm concentrations by ~25%. In this study the CV of other chambers were clustered around 7 – 14%. An earlier study (Mahmoud et al., 1997Go) found that the Neubauer chamber produced a CV of ~7%.

It seems obvious that if any technique is being undertaken in a laboratory then it is good practice to optimize its use, obtain the best quality data, and that therefore the lowest CV is desirable. However, what does a CV of 25% mean in practice? Given an ideal set of data that conforms to a statistically normal distribution, individual measurements within the data may fall within 3 SD above and below the mean. As CV is calculated as (SD x 100/mean), if we consider a semen sample whose actual sperm concentration is 100 106/ml this sets the SD at ± 25. The normal distribution could therefore include individual values in the range 25–175 x 106/ml; compare this with a range of 79–121 x 106/ml if the CV is only 7%.

A number of studies have attempted to elucidate the relationship between sperm concentration and the likelihood of pregnancy, and it is widely understood that there is a positive, but complex, correlation. What seems a simple rule of thumb relationship was reported by Bonde et al. (1998)Go who conducted a prospective investigation of fertility in 430 couples to test the significance of various fertility-related parameters. They showed that a threshold concentration (40 x 106 sperm/ml) marked a significant boundary; below this threshold there was a positive correlation with pregnancy rate, but above it there was no particular relationship. This emphasizes that the clinical importance of carrying out sperm concentration analysis is to place the couple accurately above or below this threshold; the example above shows the impossibility of doing this correctly with a method whose CV is too high. It would be perfectly possible as long as the CV remained low; CV of 7 – 10% are therefore essential and a major aim of QA should be to maintain this standard.


    What does morphology assessment tell us?
 Top
 Abstract
 Introduction
 Quality assurance and sperm...
 What does morphology assessment...
 Towards new sperm function...
 Conclusions
 References
 
When the QA schemes include assessment of morphology or motility I begin to worry that their value is less obvious. As someone who sees that sperm within ejaculates are usually heterogeneous for many parameters including morphology, and that the extent to which this occurs is not necessarily linked to infertility, I am concerned that the concept of the ‘normal’ spermatozoon is inadequately based. The Tygerberg ‘Strict Criteria’ are mostly based on objectively derived linear measurements which, although laborious, can be universally understood; developing a quantitative basis for the analysis of any population is common practice within scientific circles and is essential if data are to be analysed and shared. Thus while I have no difficulty whatever with the use of metrically-based measurements, I believe that linking them with the concept of "normality" is superfluous and unnecessary. It leads to several unwarranted assumptions about relationships between aesthetic properties (subjective judgements of sperm head shape and appearance) and, by implication, about their inherent genetic quality.

For the purposes of this discussion I believe we should consider the head and flagellum separately. Analysis of flagellar defects is an important and relatively straightforward component of semen assessment as a high incidence may cause problems with several aspects of sperm motility, transport through the female reproductive tract and penetration of the zona pellucida. In this sense morphological assessment has a clear purpose, which is to screen for problems, and the "strict criteria" or similar assessment methods correctly include absence of midpiece and tail defects when defining a normal spermatozoon.

Applying the concept of normality to sperm head shape is more difficult from a comparative perspective as humans and a number of other species naturally produce pleiomorphic spermatozoa. The concept that a small number of spermatozoa in an ejaculate are "normal" implies that the rest are abnormal or defective, an assumption that does not withstand scrutiny. Tests to establish the relationships between the incidence of "normal" spermatozoa and fertilisation rate have confirmed that the significant threshold is low; in fact in their original IVF experiments Kruger et al. (1986)Go found that high fertilisation rates were obtained provided more than 14% of spermatozoa were classified as normal. Estimates of the percentages of normal spermatozoa in ejaculates are typically low when assessed in this way (less than 6–10%; Ryu et al. 2001Go; Steele et al. 2000Go). The poor correlation between normal head shape and genetic quality was highlighted in a significant paper by Ryu et al. (2001) who showed by direct comparisons of sperm phenotypes and chromosomal complements that morphologically normal spermatozoa from infertile men contained a significantly higher incidence of genetic abnormalities than normal sperm from a cohort of fertile controls. Classifying human spermatozoa as "normal" seems to be more of a problem than is the case for other species in which pleiomorphism is uncommon (e.g. bulls, boars, mice, rats, etc). Unusual sperm head shapes in these species are indeed indicative of defective spermatogenesis, as shown by several elegant studies of specific mutational effects (Burgoyne, 1975Go; Russell et al. 1992). It is relevant to add that some mutational studies of this nature have shown that experimentally-induced sperm abnormalities are sometimes correlated with inappropriate Sertoli cell-spermatid interactions. These may result in prolonged retention of spermatozoa by the Sertoli cells and phagocytosis within the testicular tubules (Connolly et al. 2005Go). Such effects undoubtedly also occur in the human testis and are possibly responsible for the correlation that sometimes occurs between poor sperm quality and low sperm output.

As objective image-analysis-based methods of sperm morphology assessment are developed and introduced there is every reason to suppose that new ways of classifying sperm head shape will be both informative and widely applicable between laboratories. Methods of sperm morphology assessment that employ Fourier descriptors of sperm head shape are already capable of discriminating shapes that the human eye cannot distinguish, and moreover it is apparent that these shape classes are physiologically meaningful and related to DNA condensation and spermatogenic processes (Ostermeier et al., 2001aGo,bGo; Thurston et al., 2001Go).


    Towards new sperm function tests
 Top
 Abstract
 Introduction
 Quality assurance and sperm...
 What does morphology assessment...
 Towards new sperm function...
 Conclusions
 References
 
In her polemic, Anne Jecquier asked whether careful testing of the ejaculate using sperm function tests would provide more information than the routine sperm assessments. Her own answer was ‘probably not’, because sperm numbers are such a major issue. It is certainly true that if I were designing an artificial insemination (AI) experiment to test the significance of sperm function tests in relation to conception rate in an animal species, the ideal design would involve using low sperm numbers. Quantitative relationships between conception rate and the number of sperm inseminated have been especially well defined in cattle (Den Daas, 1992Go), and it is clear that the dose–response curves reach a plateau above which conception rate never increases, despite the insemination of more and more sperm. However, one poorly explained aspect of these curves is that their shapes are specific to each bull. For some bulls the plateau is reached at low sperm numbers and vice versa; moreover, the highest attainable conception rate is also bull specific and some can never match others. These findings emphasize that the individual bulls are producing sperm with different capacities for sperm transport and fertilization. These effects have also been demonstrated in many experiments where equal numbers of sperm from two or more animals have been mixed and inseminated together (for review, see Dziuk, 1996Go). The outcomes of these experiments show that conception rates are usually skewed towards one of the males, whilst theoretically they are expected to be equivalent (Stewart et al., 1974Go; Beatty et al., 1976Go; Robl and Dziuk, 1988Go; Berger, 1995Go; Stahlberg et al., 2000Go).

The significance of these experiments is that functional differences between ejaculates are still apparent even though the numbers of sperm inseminated were controlled, implying that the female reproductive tract itself is capable of making the distinction between sperm from different males. We are still currently unable to explain these effects, although they must be multifactorial in origin. Personally, I have come around to the view that heterogeneity of sperm within the ejaculate must hold the key to these effects. Assuming that the functionality of an individual spermatozoon depends ultimately on the fidelity of all steps during spermatogenesis and sperm maturation, there are many possible control points that might influence its future fertilizing ability. Its DNA must be good enough to support embryonic development, its DNA delivery mechanism (flagellum, mitochondria, acrosome, etc.) must also be fully functional, and its surface characteristics must not attract the attention of phagocytes within the female reproductive tract. Furthermore, sperm are so well endowed with molecular receptors (Meizel, 2004Go) to which they must respond appropriately, that the integrity of biochemical signalling pathways within individual sperm must also determine fertility. A property of this type will remain invisible unless sperm are physiologically challenged to react to their environment, much as they are in a zona-free hamster oocyte assay. Since our present generation of tests generally measure these multiple aspects of sperm function separately, performing a thoroughly comprehensive series of tests would require considerable investment of time and effort. Omitting a single test might mean failing to diagnose a specific defect, especially one that is less than obvious without the use of a special technique. Tests for DNA fragmentation (comet assay, sperm chromatin structure assay (SCSA)) are worth a special mention here; they are highly predictive of embryonic survival and development (Larson-Cook et al. 2003Go; Lewis et al. 2004Go) but require some effort to set up and use. One anonymous reviewer of this article commented that the field of andrology has gone astray over its approach to functional assessments, with various laboratories arguing exclusively and in favour of their own favourite test. I agree with this sentiment and suggest that a more integrated approach is urgently required. It may be possible to define and suggest a minimum set of tests for maximum functional coverage. This approach should be more practical and cost-effective than the current practice of focusing on a small number of tests and performing them in great detail.

Given the multiple qualities that a fertilizing spermatozoon needs, one might ask whether so many sperm are produced because, like cheap mass production systems, the incidence of faulty assembly is high? This analogy has at least two interesting consequences: (i) individuals with the highest rate of success at producing good quality sperm would gain the fertility advantage in a mixed insemination situation; (ii) all ejaculates probably contain sperm that are faulty and incapable of fertilization in vivo. The in vivo qualification is important because it is likely that many defective sperm, incapable of surviving the rigours of sperm transport, would be able to fertilize under in vitro conditions.

Evolutionary biologists would probably argue that because the human social systems do not generally involve the intensive male–male competitive matings seen in other primate societies, there is less pressure to drive up the rate and quality of sperm production. Comparative analyses of testis size in humans and great apes have been used in justifying this argument (Harcourt et al., 1981Go), the humans and gorillas having relatively small testis/bodyweight ratios in comparison to chimpanzees and orang-utans. In view of this, it is likely that the human testis need only produce a relatively low proportion of fully competent sperm, but quite enough for a reasonable conception rate most of the time.


    Conclusions
 Top
 Abstract
 Introduction
 Quality assurance and sperm...
 What does morphology assessment...
 Towards new sperm function...
 Conclusions
 References
 
If a battery of functional tests were available to distinguish and quantify of those sperm within an ejaculate that are actually capable of fertilization, there is no doubt that the clinical value of andrological investigations would likely be transformed. On the other hand, simply refining existing tests to achieve ever higher accuracy is unlikely to provide much more in the way of useful information. I suspect that future tests will need to concentrate on elucidating the heterogeneity and subpopulation structures of ejaculates, and that data will in future be expressed not as simple measures such as ‘percentage normal forms’ but as proportions of sperm with the multiple attributes needed for fertilization.

In essence I support Anne Jecquier’s contention that QA schemes probably need to be revised, given the greater understanding of fertility issues today than when technical standards needed raising a quarter of a century ago. I also agree that our current generation of sperm function tests are too naïve to be useful in predicting conception rate. However, I am optimistic that better and more logical tests will ultimately be developed.


    References
 Top
 Abstract
 Introduction
 Quality assurance and sperm...
 What does morphology assessment...
 Towards new sperm function...
 Conclusions
 References
 
Beatty RA, Stewart DL, Spooner RL and Hancock JL (1976) Evaluation by the heterospermic insemination technique of the differential effect of freezing at –196°C on the fertility of individual bull semen. J Reprod Fertil 47,377–379.[ISI][Medline]

Berger T (1995) Proportion of males with lower fertility spermatozoa estimated from heterospermic insemination. Theriogenology 43,769–775.[CrossRef][ISI]

Bonde JPE, Ernst E, Jensen TK, Hjollund NHI, Kolstad H, Henriksen TB, Scheike T, Giwercman A, Olsen J and Skakkebaek NE (1998) Relation between semen quality and fertility: a population-based study of 430 first-pregnancy planners. Lancet 352,1172–1177.[CrossRef][ISI][Medline]

Burgoyne PS (1975) Sperm phenotype and its relationship to somatic and germ line genotype - study using mouse aggregation chimeras. Dev Biol 44, 63-76.

Christensen P, Stryhn H and Hansen C (2005) Discrepancies in the determination of sperm concentration using Burker–Turk, Thoma and Makler counting chambers. Theriogenology 63,992–1003.[CrossRef][ISI][Medline]

Connolly CM, Dearth AT and Braun RE (2005) Disruption of murine Tenr results in teratospermia and male infertility. Dev Biol 278, 13-21.

Den Daas N (1992) Laboratory assessment of semen characteristics. Anim Reprod Sci 28,87–94.[CrossRef][ISI]

Dziuk PJ (1996) Factors that influence the proportion of offspring sired by a male following heterospermic insemination. Anim Reprod Sci 43, 65–88.[CrossRef][ISI]

Harcourt AH, Harvey PH, Larson SG and Short RV (1981) Testis weight, body weight and breeding system in primates. Nature 293,55–57.[CrossRef][ISI][Medline]

Jequier AM (2005) New debate: Is quality assurance in semen analysis still really necessary? A clinician’s viewpoint. Hum Reprod, in press.

Kruger TF, Menkveld R, Stander FSH, Lombard CJ, Vandermerwe JP, Vanzyl JA and Smith K (1986) Sperm morphologic features as a prognostic factor in in vitro fertilization. Fertil Steril 46,1118–1123.[ISI][Medline]

Larson-Cook KL, Brannian JD, Hansen KA, Kasperson KM, Aamold ET and Evenson DP (2003) Relationship between the outcomes of assisted reproductive techniques and sperm DNA fragmentation as measured by the sperm chromatin structure assay. Fert Steril 80, 895-902.[CrossRef][ISI][Medline]

Lewis SEM, O"Connell M, Stevenson M, Thompson-Cree L and McClure N (2004) An algorithm to predict pregnancy in assisted reproduction. Hum Reprod 19, 1385-1394.[Abstract/Free Full Text]

Mahmoud AMA, Depoorter B, Piens N and Comhaire FH (1997) The performance of 10 different methods for the estimation of sperm concentration. Fertil Steril 68,340–345.[CrossRef][ISI][Medline]

McLachlan RI (2004) New developments in the evaluation and management of male infertility. Int Cong Ser 1266,10–20.[CrossRef]

Meizel S (2004) The sperm, a neuron with a tail: ‘neuronal’ receptors in mammalian sperm. Biol Rev Camb Phil Soc 79,713–732.[CrossRef][Medline]

Neuwinger J, Behre HM and Nieschlag E (1990) External quality control in the andrology laboratory: an experimental multicenter trial. Fert Steril 54, 308-314.[ISI][Medline]

Ostermeier GC, Sargeant GA, Yandell BS and Parrish JJ (2001a) Measurement of bovine sperm nuclear shape using Fourier harmonic amplitudes. J Androl 22,584–594.[Abstract/Free Full Text]

Ostermeier GC, Sargeant GA, Yandell BS, Evenson DP and Parrish JJ (2001b) Relationship of bull fertility to sperm nuclear shape. J Androl 22,595–603.[Abstract/Free Full Text]

Robl JM and Dziuk PJ (1988) Comparison of heterospermic and homospermic inseminations as measures of male fertility. J Exp Zool 245,97–101.[CrossRef][ISI][Medline]

Russell LD, Russell JA, Macgregor GR and Meistrich ML (1991) Linkage of manchette microtubules to the nuclear-envelope and observations of the role of the manchette in nuclear shaping during spermiogenesis in rodents. Am J Anat 192, 97-120.

Ryu H-M, Lin WW, Lamb DJ, Chuang W, Lipshultz LI and Bischoff FZ (2001) Increased chromosome X, Y, and 18 nondisjunction in sperm from infertile patients that were identified as normal by strict morphology: implication for intracytoplasmic sperm injection. Fert Steril 76, 879-883.[CrossRef][ISI][Medline]

Stahlberg R, Harlizius B, Weitze KF and Waberski D (2000) Identification of embryo paternity using polymorphic DNA markers to assess fertilizing capacity of spermatozoa after heterospermic insemination in boars. Theriogenology 53,1365–1373.[CrossRef][ISI][Medline]

Steele EK, McClure N and Lewis S (2000) A comparison of the morphology of testicular, epididymal, and ejaculated sperm from fertile men and men with obstructive azoospermia. Fertil Steril 73,1099–1103.[CrossRef][ISI][Medline]

Stewart DL, Spooner RL, Bennett GH, Beatty RA and Hancock JL (1974) A second experiment with heterospermic insemination in cattle. J Reprod Fertil 36,107–116.[ISI][Medline]

Thurston LM, Watson PF, Mileham AJ and Holt WV (2001) Morphologically distinct sperm subpopulations defined by Fourier shape descriptors in fresh ejaculates correlate with variation in boar semen quality following cryopreservation. J Androl 22,382–394.[Abstract/Free Full Text]

Submitted on May 9, 2005; resubmitted on June 9, 2005; accepted on June 10, 2005.





This Article
Abstract
Full Text (PDF )
All Versions of this Article:
20/11/2983    most recent
dei189v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Holt, W.V.
PubMed
PubMed Citation
Articles by Holt, W.V.