1 Department of Obstetrics and Gynecology, Women's Research Institute, University of Kansas School of Medicine-Wichita, Wichita, Kansas, 2 Sage BioPharma, San Clemente, California and 3 American Association of Bioanalysts Proficiency Testing Service, Brownsville, Texas, USA
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key words: antisperm antibodies/proficiency testing/sperm count/sperm morphology/sperm vitality
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The inaccuracies and lack of standardization associated with andrology testing have made it difficult, and in many cases impossible, for physicians to compare semen analysis results among laboratories. This is especially problematic when treating infertile couples referred from other clinics, who may have had fertility testing performed in other andrology laboratories. For example, owing to disagreements between laboratories, a patient could be classified as normal by one laboratory and infertile by another (Neuwinger et al., 1990). Improvement in inter-laboratory agreement of test results is one of the hallmarks of a national proficiency testing (PT) programme. PT is a process of external, inter-laboratory quality control whereby simulated patient samples are tested by participating laboratories, and the performance of the individual laboratory (i.e. the test result) is compared with the collective performance of all participants (Stull et al., 1998
). Organized PT was first introduced in the United States the mid-1940s (Sunderman, 1992
). With the advent of the Clinical Laboratory Improvement Act of 1966 and the Amendments of 1988, all clinical laboratories in the United States engaged in moderate or high complexity testing are now required to enrol in a government approved PT programme, if such a programme is available, and failure to achieve satisfactory performance in PT may result in sanctions against the laboratory (Keel, 1998
). Since the introduction of PT, numerous reports have indicated that participation in organized PT programmes has resulted in a decrease in inter-laboratory standard deviations and coefficients of variation with PT samples (Hanson, 1969
; Hain, 1972
; Rickman et al., 1993
), and a marked improvement in PT performance over time (Taylor and Fulford, 1981
; Nakamura and Rippey, 1985
; Rickman et al., 1993
; Tholen et al., 1995
). Thus, PT has caused a dramatic improvement in the quality of clinical laboratory testing and has served to ensure better agreement of results among laboratories.
Attempts at developing a multicentre, external interlaboratory PT programme in andrology testing are limited (Neuwinger et al., 1990; Walker, 1992
; Matson, 1995
; Cooper et al., 1999
). The American Association of Bioanalysts (AAB) Proficiency Testing Service (Brownsville, Texas, USA) began offering comprehensive external quality control PT programmes in 1949. In May of 1996, the AAB PT Service made available PT programmes for the clinical laboratory specialities of andrology and embryology. In this report, the results of this nationwide survey in andrology PT are presented.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Sperm count programme
Each testing event consisted of two aliquots (vials) of pooled stabilized (formalin) suspensions of human spermatozoa. The samples were prepared so that one contained `low' concentrations of spermatozoa and the other sample contained `high' concentrations (realizing that these terms are relative). The participating laboratories were instructed to remove the specimens from the refrigerator, warm to room temperature, vortex for a minimum of 10 s or until completely in suspension, and count the spermatozoa according to the laboratory's usual method. Results were recorded as x106 spermatozoa/ml in whole numbers. Only one reporting method was allowed, and the method was coded as CASA (computer assisted semen analysis), manual or other. The laboratory was requested to indicate the type of counting chamber used, and these data were coded as haemocytometer (laboratories did not specify the type), Makler (Sefi-Medical Instruments, Haifa, Israel), Cell-VU (Millennium Sciences Inc., New York, NY, USA), or Micro-Cell (Conception Technologies, San Diego, CA, USA).
Sperm morphology programme
Each testing event consisted of two unstained glass slides of semen smears. The smears were fixed with CytoPrep (Fisher Scientific, Pittsburgh, PA, USA) prior to shipping. Laboratories were instructed either to stain immediately with Papanicolaou stain, or if using a Wright Giemsa based stain, to first remove the fixative by soaking the slides in 95% ethanol for a minimum of 20 min, followed by staining the slides by the laboratory's usual method. The laboratory was then instructed to perform the morphological analysis of the stained smears by the usual method. Results were reported as percentage normal forms in whole numbers. Two reporting methods were allowed in the event that a screening and a more definitive (strict) method was performed. The methods were coded as American Society of Clinical Pathologists (ASCP) (Adelman and Cahill, 1989), strict (Kruger et al., 1986
, 1988
), WHO2 (World Health Organization, 2nd edn; WHO, 1987), WHO3 (3rd edition; WHO, 1992), or other. The laboratory was also requested to indicate the type of stain employed.
Sperm vitality programme
Each testing event consisted of two glass slides of semen smears that were stained with eosinnigrosin prior to shipment. The laboratories were instructed to perform sperm vitality assessment according to their usual method, and to record percentage viable in whole numbers.
Statistical analysis
The values for reported morphology were given in percentage and were subjected to arc sin transformations (arc sin of the square root of the proportion) to achieve a Gaussian distribution (Neuwinger et al., 1990) prior to calculation of CV.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The routine semen analysis involves the morphological evaluation of live, motile cells. Preparing live, motile sperm samples in large enough numbers to offer wide distribution presents a unique challenge for proficiency testing and interlaboratory comparisons. Rather than using cryopreserved specimens (Neuwinger et al., 1990), we and others chose to send stable suspensions of fixed spermatozoa for sperm counting and fixed semen smears for morphology determinations. Although the use of batched, premade semen smears eliminates the potential variation associated with individual slide preparation, it is recognized that this approach also precludes the comparison of the effect that this variation may have on individual laboratory performance. Furthermore, suspensions of fixed spermatozoa can be problematic for the participating laboratories, in that spermatozoa cell settling and clumping necessitates mixing of the specimen prior to analysis, and failure to mix the specimens carefully could result in erroneous results. The use of cryopreserved semen would appear to be a viable alternative of sample preparation, in that the `live' nature of the specimen is preserved. However, several reports have indicated a lack of consistency between aliquots of frozen semen when used for inter- and intra-laboratory variation determinations (Cooper et al., 1992
; Muller, 1992
; Clements et al., 1995
). While the assessment of sperm motility necessitated the use of cryopreserved specimens in two previous studies (Neuwinger et al., 1990
; Walker, 1992
), the high cost and inconvenience of cryopreserving and shipping the frozen semen makes this approach unrealistic on a large scale (Cooper et al., 1999
). The use of videotapes of fresh ejaculates, which can be duplicated from a master tape and distributed relatively inexpensively, is being evaluated as a suitable alternative for PT programmes in sperm motility.
The vast majority of laboratories participating in the sperm count programme employed manual, non-automated methods for counting spermatozoa. Furthermore, more laboratories used the standard haemocytometer than any other counting chamber. Baker et al. (Baker et al., 1994) surveyed 129 acute care community hospitals in the United States and found that when these laboratories performed semen analyses, they tended to use more conventional methods, including the use of the haemocytometer for sperm counts. Only 1.6% of hospital clinical laboratories surveyed performed automated semen analysis and only 3.1% used a Makler counting chamber (Baker et al., 1994
). Thus, the preference of methods for sperm counting and the selection of counting chambers may, in many cases, reflect the type of clinical services and level of expertise provided by the participating laboratory.
Wide variations in sperm concentrations between laboratories (CV ranging from 10 to 65%) have been previously reported for both manual (Jequier and Ukombe, 1983; Neuwinger et al., 1990
; Walker, 1992
; Matson, 1995
) and CASA (Walker, 1992
) methods. The results of the sperm count PT programme presented herein, from participants representing a wide spectrum of clinical laboratory settings and expertise, demonstrated alarmingly high CV and wide ranges in reported sperm concentrations. Indeed, reported sperm concentrations among the participating laboratories varied by as much as two orders of magnitude, indicating a sperm concentration of 3x106/ml in one laboratory, and 492x106/ml in another for the same sample. This variation appeared to be greater when results were compared among laboratories using manual methods versus CASA, which may reflect the expertise of the reporting laboratory. However, it should be pointed out that the type of CASA system used by the participating laboratory was not requested. Unless standardized procedures are agreed upon and strictly followed (Davis and Katz, 1992
), differences may exist between CASA systems in the ability to provide accurate values for sperm concentrations (Gill et al., 1988
; Mahony et al., 1988
; Agarwal et al., 1992
; ESHRE Andrology Special Interest Group, 1998
). This may account for at least part of the observed variation. Improper specimen handling (see above) and potential clerical and data entry/recording errors (see below) notwithstanding, the data presented herein suggest gross unreliability in the results of sperm concentrations reported by some clinical laboratories. They also highlight the urgent need for thorough technician training and the use of careful, standardized procedures and regular internal quality control evaluations (Mortimer et al., 1986
; Mortimer, 1994
).
It has been argued that of all individual semen parameters, sperm morphology is most closely related with fertility potential (Kruger et al., 1986, 1988
; Ombelet et al., 1995
). However, it is also recognized that there is considerable variation in this determination (Jequier and Ukombe, 1983
; Dunphy et al., 1989
; Neuwinger et al., 1990
; Baker et al., 1994
). Reasons for this variation include lack of standardization (Chong et al., 1983
; Dunphy et al., 1989
), differing techniques of smear preparation and staining procedures (Davis and Gravance, 1993
), and the level of technical expertise (Dunphy et al., 1989
; Neuwinger et al., 1990
). Lack of standardization can make it difficult if not impossible to compare results from one laboratory to another. Although some investigators have supported the use of computerized morphological assessments (i.e. CASA) as a means to establish standardization and reduce variability (Barroso et al., 1999
; Kruger and Coetzee, 1999
), others have concluded that the current generation of CASA instruments is not capable of analysing human sperm morphology in a manner adequate for routine clinical applications (ESHRE Andrology Special Interest Group, 1998
). Complicating the standardization issue is the fact that little consensus exists on the most appropriate classification system. In a recent survey of 410 fertility centres from all over the world, a wide and complex variation was found in different sperm morphology classification systems employed (Ombelet et al., 1997
). In the current study, approximately two-thirds of the participating laboratories utilized the more standardized World Health Organization or strict criteria for morphology determinations. However, the remainder of participating laboratories used the much less stringent ASCP criteria, or some other undefined protocol. It is evident from the data that overall, those laboratories using the ASCP criteria tended to classify the PT samples as normal while those laboratories using the strict criteria tended to classify the PT samples as teratozoospermic.
Considerable variation exists when comparing results of sperm morphology both between and within laboratories (Jequier and Ukombe, 1983; Ayodeji and Baker, 1986
; Neuwinger et al., 1990
; Clements et al., 1995
; Matson, 1995
; Coetzee et al., 1999
). In this study, a high degree of variation among laboratories participating in the morphology programme is reported, with CV ranging from 15 to 93%. Interestingly, the greatest variation was observed among laboratories using the most stringent criteria (strict) while the least variation was found among laboratories using the less stringent (ASCP). It should be pointed out that one must use caution when computing and comparing CVs among criteria that use vastly different percentages as normal ranges. The use of strict criteria results in a significant reduction in the mean values obtained, which results in an increase in the CV (Clements et al., 1995
) analogous to the precision profiles of assays in which increased variation is observed at lower analyte concentrations (Cooper et al., 1999
). This fact, coupled with the statistical concerns of calculating the variance of proportions, necessitated the use of data transformation prior to determining CV. Thus, the range of CV reported herein would be even greater if the data had not been transformed before analysis. Along these same lines, in order to obtain the same statistical confidence, more spermatozoa must be counted when using a criterion representing a low percentage of normal forms compared with a criterion which established a higher percentage of normal forms (Davis and Gravance, 1993
; Coetzee et al., 1999
). We did not determine or standardize the number of spermatozoa counted by the individual laboratories participating in this PT programme, and this may have contributed to some of the observed variation.
There appears to be only one other report which evaluated the performance of sperm vitality determinations among different laboratories (Walker, 1992). In that study, large CVs were reported, ranging from 42 to 90%. In contrast, relatively low CV were noted among the laboratories participating in this PT programme. This previous study (Walker, 1992
) collected data from a relatively few laboratories, and each participant was required to prepare and stain their own semen smears. In contrast, in the current study, premade and prestained smears were provided, which would have eliminated the variation associated with this process. Nevertheless, the data indicate that good agreement and interpretation of viable versus nonviable spermatozoa can be obtained among laboratories.
There are recognized shortcomings of PT. The results reported by PT participants represent a concurrence, or agreement upon a certain value, rather than a reflection of the actual value of the analyte measured. For example, a large majority of participating laboratories could agree on a certain value, yet be incorrect in estimating the true value (i.e. PT measures precision, not accuracy). The use of reference values, determined by a small group of expert referee laboratories, may help to circumvent this shortcoming. Nevertheless, PT does represent an effective mechanism to ensure accurate comparison of reported values between laboratories. In addition, there are several variables unique to the PT process which are unrelated to normal clinical laboratory performance and can lead to unreliable PT results (Stull et al., 1998). Clerical and mathematical errors in reporting results to PT programmes are a potential source of variation that is difficult to measure. Although the PT report forms used herein did not allow for the use of decimals, and the ability of laboratories to apply correct mathematical calculations prior to reporting was not addressed, values were accepted as submitted and no attempt was made to ensure accuracy of recording results. Limitations on the use of PT as an indicator of routine laboratory performance is also hampered by the fact that a PT event is a non-random sample of the work performed at a given testing site and is subject to biases inherent in such a process (Stull et al., 1998
). In other words, the individual processing the PT sample is often aware of the fact that the PT sample is unique and the results are subject to greater scrutiny, which may result in increased pressure to give the specimen `special handling' (Boone, 1992
). Such special handling may involve repeated analysis and averaging of multiple replicates, the use of the most trained technician or multiple technicians, and use of methods other than those routinely used (Boone, 1992
). Thus, in many cases, PT performance, as measured through mailed-in samples that are known to the laboratory as regulatory challenges, will reflect the best performance that laboratory is capable of providing, and not necessarily its typical or routine performance (Boone, 1992
). Although PT participants are instructed to use routine procedures in analysing PT specimens and avoid `special handling', if PT performance truly represents the best analytical work a laboratory is capable of producing (Stull et al., 1998
), then the data in this study would indicate that large variations exist in the results of semen analysis from the laboratories participating in this programme even under the best of circumstances.
The data presented here support the urgent plea for standardization of semen analysis methodologies expressed by others (Chong et al., 1983; Mortimer, et al., 1986
; Mortimer, 1994
; Ombelet et al., 1997
). It is strongly recommended that laboratories performing semen analyses reduce the variation observed herein by adhering to accepted standards, such as those proposed by the World Health Organization (WHO, 1992), and by employing active programmes of internal quality control. Numerous reports have indicated that participation in organized PT programmes has resulted in a decrease in interlaboratory SD and CV with PT samples (Hanson, 1969
; Hain, 1972
; Rickman et al., 1993
), and a marked improvement in PT performance over time (Taylor and Fulford, 1981
; Nakamura and Rippey, 1985
; Rickman et al., 1993
; Tholen et al., 1995
). Thus, it is hoped that PT in the field of clinical laboratory andrology will help to improve the quality of clinical laboratory testing and serve to ensure better agreement of results among laboratories.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Agarwal, A., Ozturk, E. and Loughlin, K.R. (1992) Comparison of semen analysis between the two HamiltonThorn semen analysers. Andrologia, 24, 327329.[ISI][Medline]
Ayodeji, O. and Baker, H.W. (1986) Is there a specific abnormality of sperm morphology in men with varicoceles? Fertil. Steril., 45, 839842.[ISI][Medline]
Baker, D.J., Paterson, M.A., Klaassen, J.M. et al. (1994) Semen evaluations in the clinical laboratory. How well are they being performed? Lab. Med., 25, 509514.[ISI]
Barroso, G., Mercan, R., Ozgue, K. et al. (1999) Intra- and inter-laboratory variability in the assessment of sperm morphology by strict criteria: impact of semen preparation, staining techniques and manual versus computerized analysis. Hum. Reprod., 14, 20362040.
Boone, D.J., (1992) Literature review of research related to the Clinical Laboratory Improvement Amendments of 1988. Arch. Pathol. Lab. Med., 116, 681693.[ISI][Medline]
Byrd, W. (1992) Quality assurance in the reproductive biology laboratory. Arch. Pathol. Lab. Med., 116, 418422.[ISI][Medline]
Chong, A.P., Walters, C.A. and Weinreib, S.A. (1983) The neglected laboratory test. The semen analysis. J. Androl., 4, 280282.
Clarke, G.N., Stojanoff, A., Cauchi, M.N. et al. (1983) Detection of antispermatozoal antibodies of IgA class in cervical mucus. Am. J. Reprod. Immunol., 5, 6165.[ISI]
Clements, S., Cooke, I.D. and Barratt, C.L.R. (1995) Implementing comprehensive quality control in the andrology laboratory. Hum. Reprod., 10, 20962106.[Abstract]
Coetzee, K., Kruger, T.F., Lombard, C.J. et al. (1999) Assessment of interlaboratory and intralaboratory sperm morphology readings with the use of a Hamilton Thorne Research integrated visual optical system semen analyzer. Fertil. Steril., 71, 8084.[ISI][Medline]
Cooper, T.G., Neuwinger, J., Bahrs, S. et al. (1992) Internal quality control of semen analysis. Fertil. Steril., 58, 172178.[ISI][Medline]
Cooper, T.G., Atkinson, A.D. and Nieschlag, E. (1999) Experience with external quality control in spermatology. Hum. Reprod., 14, 765769.
Davis, R.O. and Gravance, C.G. (1993) Standardization of specimen preparation, staining, and sampling methods improves automated sperm-head morphometry analysis. Fertil. Steril., 59, 412417.[ISI][Medline]
Davis, R.O. and Katz, D.F. (1992) Standardization and comparability of CASA instruments. J. Androl., 13, 8186.
Dunphy, B.C., Kay, R., Barratt, C.L.R. et al. (1989) Quality control during the conventional analysis of semen, an essential exercise. J. Androl., 10, 378385.
ESHRE Andrology Special Interest Group (1998) Guidelines on the application of CASA technology in the analysis of spermatozoa. Hum. Reprod., 13, 142145.
Gerrity, M. (1993) Legislative efforts affecting the reproductive biology laboratory. Curr. Opin. Obstetric. Gynecol., 5, 623629.
Gill, H.S., Van Arsdalen, K., Hypolite, J. et al. (1988) Comparative study of two computerized semen motility analyzers. Andrologia, 20, 433440.[ISI][Medline]
Hain, R.F. (1972) Proficiency testing in the physician's office laboratory: an ounce of prevention. S. Med. J., 65, 608610.[ISI]
Hanson, D.J. (1969) Improvements in medical laboratory performance. Postgrad. Med., 46, 5156.[ISI]
Jager, S., Kremer, J. and van Slochteren-Draaisma, T., (1978) A simple method of screening for antisperm antibodies in the human. Detection of spermatozoal surface IgG with the direct mixed antiglobulin reaction carried out on untreated fresh human semen. Int. J. Fertil., 23, 1221.[ISI][Medline]
Jequier, A.M. and Ukombe, E.B. (1983) Errors inherent in the performance of a routine semen analysis. Br. J. Urol., 55, 434436.[ISI][Medline]
Keel, B.A. (1998) The assisted reproductive technology laboratories and regulatory agencies. Infert. Reprod. Med. Clin. N. Am., 9, 311330.
Kruger, T.F. and Coetzee, K. (1999) The role of sperm morphology in assisted reproduction. Hum. Reprod. Update, 5, 172178.
Kruger, T.F., Menkveld, R., Stander, F.S.H. et al. (1986) Sperm morphologic features as a prognostic factor in in vitro fertilization. Fertil. Steril., 46, 11181123.[ISI][Medline]
Kruger, T.F., Acosta A.A., Simmons, K.F. et al. (1988) Predictive value of abnormal sperm morphology in in vitro fertilization. Fertil. Steril., 49, 112117.[ISI][Medline]
Mahony, M.C., Alexander, N.J. and Swanson, R.J. (1988) Evaluation of semen parameters by means of automated sperm motion analyzers. Fertil. Steril., 49, 876880.[ISI][Medline]
Mortimer, D. (ed.) (1994) Practical Laboratory Andrology. Oxford University Press, New York, NY.
Mortimer, D., Shi, M.A. and Tan, R. (1986) Standardization and quality control of sperm concentration and sperm motility counts in semen analysis. Hum. Reprod., 1, 299303.[Abstract]
Matson, P.L. (1995) Quality control assessment for semen analysis and sperm antibody detection: results of a pilot scheme. Hum. Reprod., 10, 620625.[Abstract]
Muller, C.H. (1992) The andrology laboratory in an assisted reproductive technologies program. J. Androl., 13, 349360.[Abstract]
Nakamura, R.M. and Rippey, J.H. (1985) Quality assurance and proficiency testing for autoantibodies to nuclear antigen. Arch. Pathol. Lab. Med., 109, 109114.[ISI][Medline]
Neuwinger, J., Behre, H.M. and Nieschlag, E. (1990) External quality control in the andrology laboratory: an experimental multicenter trial. Fertil. Steril., 54, 308314.[ISI][Medline]
Ombelet, W., Menkveld, R., Kruger, T.F. et al. (1995) Sperm morphology assessment: historical review in relation to fertility. Hum. Reprod. Update, 1, 543557.[Abstract]
Ombelet, W., Pollet H., Bosmans, E. et al. (1997) Results of a questionnaire on sperm morphology assessment. Hum. Reprod., 12, 10151020.[ISI][Medline]
Rickman, W.J., Monical, C. and Waxdal, M.J. (1993) Improved precision in the enumeration and absolute numbers of lymphocyte phenotypes with long-term monthly proficiency testing. Ann. N. Y. Acad. Sci., 677, 5358.[ISI][Medline]
Stull, T.M., Hearn, T.L., Hancock, J.S. et al. (1998) Variation in proficiency testing performance by testing site. JAMA, 279, 463467.
Sunderman Sr, F.W. (1992) The history of proficiency testing/quality control. Clin. Chem., 38, 12051209.
Taylor, R.N. and Fulford, K.M. (1981) Assessment of laboratory improvement by the Center for Disease Control Diangostic Immunology Proficiency Testing Program. J. Clin. Microbiol., 13, 29.
Tholen, D., Lawson, N.S., Cohen, T. et al. (1995) Proficiency test performance and experience with College of American Pathologists' programs. Arch. Pathol. Lab. Med, 119, 307311.[ISI][Medline]
Walker, R.H. (1992) Pilot surveys for proficiency testing of semen analysis. Arch. Pathol. Lab. Med., 116, 432424.
World Health Organization (1987) WHO Laboratory Manual for the Examination of Human Semen and SemenCervical Mucus Interaction, 2nd edn. Cambridge University Press, Cambridge, UK.
World Health Organization (1992) WHO Laboratory Manual for the Examination of Human Semen and SemenCervical Mucus Interaction, 3rd edn. Cambridge University Press, Cambridge, UK.
Submitted on September 3, 1999; accepted on December 2, 1999.