An evaluation of the inter-observer and intra-observer variability of the ultrasound diagnosis of polycystic ovaries

S.A.K.S. Amer1, T.C. Li1,5, C. Bygrave2, A. Sprigg3, H. Saravelos4 and I.D. Cooke1

1 Department of Reproductive Medicine and Surgery, The Jessop Wing, Sheffield Teaching Hospitals NHS Trust, Tree Root Walk, Sheffield S10 2SF, 2 Department of Medical Physics & Clinical Technology, Royal Hallamshire Hospital, Glossop Road, Sheffield S10 2JF, 3 Department of Radiology, Sheffield Children's Hospital NHS Trust, Western Bank, Sheffield S10 2TH, UK and 4 University Department of Obstetrics and Gynecology, Hippokration Hospital, Aristotle University of Thessaloniki, 54008 Thessaloniki, Greece


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
BACKGROUND: This prospective observational study was undertaken to evaluate the reliability and consistency of ultrasound diagnosis of polycystic ovarian syndrome (PCOS). METHODS: Eighteen women with clinical and biochemical features suggestive of PCOS and nine normal control women underwent transvaginal ultrasound scan by a single ultrasonographer. The 27 ovarian scans were video-recorded and the recordings were later edited and arranged randomly so that each record appeared twice at random on the tape producing a total of 54 ovarian scans. Four experienced observers independently reviewed the recordings. The observers scored each case as follows: normal, possible polycystic ovary (PCO) and definite PCO. RESULTS: The mean intra-observer agreement was 69.4% ({kappa} = 0.54) and the mean inter-observer agreement was 51% ({kappa} = 0.28). CONCLUSION: The results suggest that the currently used ultrasonographic criteria for the diagnosis of polycystic ovaries do have significant intra-observer and inter-observer variability and as such must be considered subjective. Transvaginal ultrasonography alone may not therefore be a reliable method of diagnosing or excluding PCOS.

Key words: observer variability/polycystic ovaries/transvaginal scan


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Polycystic ovarian syndrome (PCOS) is a common endocrine disorder, affecting women in the reproductive age group. Common presenting features include anovulatory infertility, oligo/amenorrhoea, hirsutism and obesity. In many cases, the condition is associated with a number of well-recognized biochemical features including raised serum LH level, high LH:FSH ratio and elevated plasma androgen levels. Ultrasonographically, the condition may be associated with bilaterally enlarged polycystic ovaries (PCO). Despite this classic picture, there is still much controversy in the diagnostic criteria for PCOS. In England and Europe, the diagnosis is primarily based on ovarian morphology as assessed by transvaginal scan (Fox et al, 1991Go; Balen et al., 1995Go; Homburg, 1996Go), whereas in North America, it appears that the emphasis is on biochemical features, especially that of hyperandrogenaemia (Lobo, 1995Go; Carmina et al, 1997Go; Lewis, 2001Go). In some studies, the diagnosis of PCOS was based on clinical and endocrinological features without reference to ultrasound morphology, e.g. in one study (Loucks et al., 2000Go) the diagnosis of PCOS was based on chronic anovulation plus one of the three other features: (i) hirsutism, (ii) hyperandrogenaemia, or (iii) increased LH:FSH ratio (>=2); only 21/63 (33%) of `PCOS' patients had ultrasound evidence of PCOS.

The classic ultrasonographic features of PCOS, which have been previously described (Swanson et al., 1981Go; Adams et al., 1986Go), include an enlarged ovary with multiple (>=10) small cysts (2–8 mm in diameter), which are typically arranged peripherally around an increased echogenic stroma. Although these are the most frequently used ultrasonographic criteria for the diagnosis of PCOS, there are several reasons why the criteria have not been universally accepted as a gold standard for diagnosis. First, considerable overlap exists between the normal and the PCO in follicular number and size and ovarian volume to the extent that a cut-off level with satisfactory sensitivity and specificity cannot be obtained for many of the parameters (Pache et al., 1992Go; Fox, 1999Go). The number of small cysts necessary to define PCO on ultrasound has been reported to vary between >5 (Yeh et al., 1987Go; Battaglia et al., 1999Go), >10 (Adams et al., 1985Go) and >15 (Fox et al., 1991Go). Furthermore, some of the criteria used to define PCO, such as the stromal echogenicity and follicular pattern, are purely subjective. Whilst some investigators believe that the ovarian volume is the most important criterion (Swanson et al., 1981Go), others put much emphasis on the stromal hyperechogenecity (Ardaens et al., 1991Go). Second, the precision of the commonly used ultrasound diagnostic criteria has never been formally evaluated. It could well account for the significant variation in the prevalence of PCO amongst various investigators: normal population 2.5–33% (Swanson et al., 1981Go; Polson et al., 1988Go; Clayton et al., 1992Go; Farquhar et al., 1994Go; Borgfeldt and Andolf, 1999Go; Koivunen et al., 1999Go; Michelmore et al., 1999Go; Loucks et al., 2000Go); anovulatory infertility patients 57–83% (Adams et al., 1986Go; Hull, 1987Go; Kousta et al., 1999Go); and recurrent miscarriage population 7.8–50% (Sagle et al., 1988Go; Regan et al., 1990Go; Liddell et al., 1997Go; Li et al., 2000Go; Rai et al., 2000Go).

In this study we aimed to evaluate the precision of one of the more widely used ultrasound criteria for the diagnosis of PCOS by measuring inter-observer and intra-observer variability using video-taped recordings of ovarian ultrasonography.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Subjects
A total of 27 women were included in the study. Eighteen of the women presented with either clinical or biochemical features suggestive of PCOS, including oligo/amenorrhoea, elevated serum LH (>=10 IU/l), elevated LH:FSH ratio of >=2, elevated androgen levels (testosterone >=2.5 nmol/l, androstenedione >=10 nmol/l, or free androgen index >4). The remaining nine subjects did not have any clinical or biochemical features indicative of the condition. In the latter group, all the women had normal menstrual cycles with cycle lengths of between 25 and 35 days, normal serum LH and androgen levels, normal body mass index (between 20 and 26 kg/m2) and aged between 20 and 35 years.

Transvaginal ultrasonography
All subjects underwent a transvaginal scan by a single ultrasonographer, using a Toshiba ultrasound machine (model Sonolayer SSA-250A) with a convex 6 MHz transvaginal ultrasound probe. Women with regular menstrual cycles were scanned on days 2–5 of the cycle, whereas women with irregular cycles were not timed according to the menstrual cycle. Each ovary was localized in relation to the iliac vessels, and scanned from inner to outer margins in longitudinal cross-sections and from upper to lower ends in transverse cross-sections. The three diameters of the ovary were measured (longitudinal, anteroposterior and transverse). All the ovarian scans were video-recorded using a Panasonic PAL video system (AG-6200).

Editing and randomization of ultrasonographic records
The 27 ultrasonographic records were duplicated using a Panasonic video-editing system (AG-5700). A total of 54 ultrasonographic records were therefore derived from the 27 original records from the 27 women who participated in the study. The 54 records were each given a number and arranged randomly in a final edited videotape record. In all, four identical videotapes were produced for evaluation by each of the four observers.

Evaluation of the ultrasound record by independent observers
Four individuals with experience in transvaginal ultrasonography were asked to evaluate the videotape records. Two of them were ultrasonographers with >15 years experience each in gynaecological ultrasonography, whereas the other two were gynaecologists with a special interest in Reproductive Medicine and transvaginal scan (who would normally perform >20 transvaginal scans per week). They were asked to examine the videotape records and score the appearance of each ovary using the following published criteria (Swanson et al., 1981Go; Adams et al., 1985Go). Definite PCO: if all the features of PCO are present (score = 2) including: (i) an increased echogenic (bright) stroma; (ii) >=10 small (2–8 mm) peripheral cysts; (iii) an increased ovarian volume (>=12 ml). Possible PCO: if some but not all the features of PCO are present (score = 1). Normal ovarian morphology, not indicative of PCO: if none of the above features of PCO are present (score = 0).

Ethical issues
This prospective study was approved by the South Sheffield Ethics Committee. Informed consent was obtained from each of the patients participating in the study.

Statistical analysis
The results of the scoring by each of the four observers were entered into the Statistical Package for Social Science (SPSS) for PC version 10.01. {kappa}-Statistics were used to determine the degree of intra-observer and inter-observer agreement after correction for the agreement expected by chance. A {kappa}-value has a maximum of 1.0 when agreement is perfect. A value of 0 indicates no agreement better than chance agreement. Values between 0 and 1 are interpreted according to published guidelines (Landis and Koch, 1977Go) subsequently modified ( Altman, 1991Go).


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Intra-observer agreement
The intra-observer agreement for each of the 27 cases for the four observers are summarized in Tables I and IIGoGo. Observer 1 agreed with himself in 20 out of 27 cases (74%, {kappa} = 0.54), observer 2 agreed with himself in 21 out of 27 cases (78%, {kappa} = 0.56), observer 3 agreed with himself in 17 out of 27 cases (63%, {kappa} = 0.47) and observer 4 agreed with himself in 17 out of 27 cases (63%, {kappa} = 0.46). The mean intra-observer agreement for all the four observers is therefore 69.4% and the corresponding {kappa}-value is 0.54 (Figure 1Go).


View this table:
[in this window]
[in a new window]
 
Table I. The intra-observer agreement for each observer
 

View this table:
[in this window]
[in a new window]
 
Table II. The intra-observer agreement and {kappa}-values for each observer
 


View larger version (77K):
[in this window]
[in a new window]
 
Figure 1. Overall intra-observer agreement

 
Inter-observer agreement
The inter-observer agreements between each pair of observers (1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4) are summarized in Tables III and IVGoGo. The agreement ranged from 43 to 72%. The corresponding {kappa}-values ranged from 0.11 to 0.58. The average agreement was 51% (Figure 2Go) and the {kappa}-value was 0.28.


View this table:
[in this window]
[in a new window]
 
Table III. The inter-observer agreement between each two observers
 

View this table:
[in this window]
[in a new window]
 
Table IV. Inter-observer agreement and {kappa}-values
 


View larger version (64K):
[in this window]
[in a new window]
 
Figure 2. Overall inter-observer agreement.

 
Unilaterality of ovarian morphology
The results of ovarian morphology and assessment for each of the ovaries were compared to the contralateral ovaries. In total, as the 27 cases were evaluated twice by the four investigators, it produced 216 pairs of results. Ten records were excluded due to suboptimal image quality of one of the two ovaries in each, leaving a total of 206 pairs of ovaries. The ovarian morphology was deemed to be the same between the right and left ovaries in 119/206 (58%). In the remaining 87 (42%) pairs of results, there was a lack of agreement in morphology of the right and left ovaries, in 12 (6%) there were two levels of disagreement, i.e. one ovary was deemed to be normal, the other ovary was deemed to be definite for PCO, and in the remaining 75 (36%) pairs, there was only one level of disagreement. Among 150 ovarian video records considered to show PCO, 87 (58%) occurred as unilateral phenomena whereas the remaining 63 (42%) occurred as part of a pair.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
In this study, we examined the intra-observer and inter-observer agreement of commonly used ultrasound criteria for the diagnosis of PCOS between four independent observers.

To evaluate inter-observer variability/agreement, it is necessary for all the observers to be present at the time of the ultrasound examination, or they need to perform a transvaginal scan on the same patient separately, or as in this case, they could look at the same record of the original ultrasound examination. We chose the latter approach because it is less intrusive to the patient, so that they do not have to go through repeated scans by different observers or have to undergo the transvaginal scan in the presence of four observers which could be an embarrassing experience for the woman concerned. Similarly, to evaluate intra-observer variability/agreement, without the use of video-recording, the patient would have to undergo examination by the same observer twice on separate occasions, making it even more onerous for the patient.

On the other hand, the use of video records of the original transvaginal scan may be criticized because the quality of the original images may be compromised. This may not affect so much the measurements, but the brightness, which may potentially influence the assessment of stromal echogenicity. However, as all the observers were assessing the same copies of the original, the minor changes should, if any, remain constant for all the video recordings and should not therefore significantly influence the results relating to inter-observer and intra-observer variability. Each of the observers was given an opportunity to assess the quality of the imaging; it was mentioned in the instructions to all the observers that they could indicate if the quality of the image was not good enough for evaluation. On 10 occasions, observers commented that the quality of the image was suboptimal in one of the two ovaries. However, the overall diagnosis in these cases was based on the appearance of the other ovary with the better image. In all other situations, the quality of the image was deemed to be sufficiently good for evaluation.

In this study, all the observers were blinded to the clinical information. In other words, they were not aware of the biochemical results of investigations including LH and androgen levels or of the pattern of menstruation in each of these subjects. It is uncertain if the provision of relevant clinical information may bias the observer into making a diagnosis of PCO. The omission of this clinical information from the evaluation ensures that the assessment is only on the basis of ovarian morphology, independent of clinical information.

Although the assessment of the ovarian morphology in the diagnosis of PCOS has been in use for 20 years, the precision of the measurement has never been previously evaluated. To the best of our knowledge, this is the first study ever conducted to evaluate the intra-observer and inter-observer variability of the measurement. In this study, we found that there was only modest agreement within the same observer (mean agreement 69.4%, {kappa}-value 0.54). This means that when a particular observer is asked to assess the same ovaries on a separate occasion, the likelihood of producing the same result is only 70%. In other words, there is a 30% chance that the same observer will disagree with himself/herself in a subsequent occasion. The inter-observer agreement, as a general rule, is less than that of intra-observer agreement. In our study, the mean agreement between observers was only 51% ({kappa}-value 0.28), which indicates only a fair amount of agreement between the observers.

Some optimists may argue that as the ultrasound diagnosis of PCOS is primarily subjective, a level of disagreement between definite PCO and possible PCO is not unusual and so if one-place disagreement is excluded the overall intra-observer agreement will be 96% and inter-observer agreement will be 89%.

Among the four observers, observers 3 and 4, who are both gynaecologists with an interest in pelvic ultrasonography, appear to have the best agreement (72%); whereas the agreement between the remaining combinations of the four observers produced a result ranging from 43 to 50%. Although it is possible that gynaecologists who have more ready access to clinical information are more likely to agree on a clinical diagnosis of PCOS, in this study the two gynaecologists were similarly blinded to the clinical information, as were the two radiologists. Nevertheless, the two gynaecologists spent 2 years working closely together in the same Reproductive Medicine Unit and it is possible that the sharing of experience in the same team produces better agreement, a phenomenon which is well recognised in the pathological evaluation (Langley et al., 1983Go).

Overall, our results suggest that the currently used criteria for the diagnosis of PCOS have significant intra-observer and inter-observer variability and must therefore be considered subjective. Transvaginal ultrasonography alone is therefore not a reliable method of diagnosing or excluding PCOS. Indeed, several criteria for the diagnosis of PCOS have been proposed (Swanson et al., 1981Go; Adams et al., 1985Go; Yeh et al., 1987Go; Ardaens et al., 1991Go; Fox et al., 1991Go; Franks, 1992Go; Battaglia et al., 1999Go). None of them has been universally agreed as the diagnostic criterion of choice. Given the subjective nature of ultrasound assessment, it will be necessary to quantify the measurements to improve their diagnostic value.

The lack of precision of the commonly used ultrasonographic criteria for the definition of PCO may be due to a number of reasons. First, the criteria refer to three different aspects of ovarian morphology. Whilst the conclusion may be reached easily if all three criteria are met, or if none of the three criteria are met, the difficulties arise if only one or two of the criteria are fulfilled. It is possible that an observer may then attempt to make a decision, subconsciously, on the overall impression of the ultrasonographic appearance, or even the appearance of the other ovary.

Secondly, a more in-depth analysis of the various components of the criteria suggests that each of them may be subjective in one way or another. In general terms, the ultrasound criteria used can be classified into two types: quantitative and qualitative. The former include parameters that are obtained by measuring physical entities such as the ovarian volume, stromal volume and the number and volume of small cysts. Although they are relatively objective, their quantification is influenced by the skill and the carefulness of the examiner (Dewailly, 1997Go). The qualitative parameters include stromal echogenicity and follicular pattern. Their evaluation is visual and therefore more subjective and depends not only on the perception of the sonographer but also on the setting of the ultrasound machine. The most objective component of the commonly used ultrasound criteria, at first glance, appears to be ovarian volume; however, this measurement does not take into consideration the possibility of finding a follicle or cyst of significant size, i.e >10 mm in diameter (perhaps >=20 mm). In this situation, the validity of the ovarian volume measurement is in question—it is possible that some observers may consider it as evidence against PCOS; others may accept it as a finding compatible with PCOS (on the assumption that ovulation occurs sporadically in anovulatory PCOS women) and so make certain allowance for the ovarian volume. In any case, there is no universal agreement of how to make allowance for such a finding; each observer is therefore left to make a subjective decision on each occasion.

Another component of the criteria relates to the finding of `>10 small (2–8 mm) peripherally arranged cysts'. Whilst it states the number of cysts to be >10, it is possible that certain observers do not actually count the number of cysts but merely form an impression that there are many small cysts. It is also possible that some observers realize that certain investigators accept a finding of more than five small follicles (Yeh et al., 1987Go; Battaglia et al., 1999Go), and so might find it difficult to be certain of the significance if the number of small cysts is >5 but <10.

The third component of the criteria relates to stromal echogenicity, which is subjective. The stromal echogenicity may be affected by the setting of the ultrasound machine. Nevertheless, many authors have emphasized the important diagnostic value of abnormal ovarian stroma (Adams et al., 1986Go; Conway et al., 1989Go; Eden et al., 1989Go; Ardaens et al., 1991Go). One study (Buckett et al., 1999Go) objectively measured ovarian stromal echogenicity and the stromal index (ratio of mean stromal echogenicity to mean echogenicity of the entire ovary) in normal ovaries (n = 77) and PCO (n = 46). They found no difference in the mean stromal echogenicity, but the stromal index was significantly greater in women with PCO. They concluded that the apparent subjective increase in stromal echogenicity in PCO, as exemplified by the greater stromal index, is due to a combination of the increased volume of ovarian stroma and the significantly lower mean echogenicity of the entire ovary in these women.

How may the precision and usefulness of the ultrasonographic criteria be improved?
First, the precision of the individual measurements of each of the three components should be improved. Ovarian and stromal volume could now be measured with greater precision by the use of 3-dimentional (3-D) ultrasound (Kyei-Mensah et al., 1996aGo,bGo). Kyei-Mensah et al. (1996a) compared the volume of ovarian follicles measured by transvaginal 2-D and 3-D ovarian scan carried out in 25 women immediately before follicular aspiration and the volume of follicular fluid aspirated during IVF treatment. They found that the true volume of ovarian follicles measured by a 3-D ultrasound system is more accurate than that measured by 2-D ultrasound techniques. In another study, the same authors (Kyei-Mensah et al., 1996bGo) investigated the reproducibility of ovarian and endometrial volume measurements obtained using transvaginal 3-D ultrasound. Three observers independently measured the volume of 20 stored ovarian and endometrial scans. The intra-observer coefficient of variation for both ovarian and endometrial volume was 8%. The inter-observer coefficient of variation was 9% for ovarian volume and 11% for endometrial volume. They concluded that transvaginal 3-D ultrasound produces highly reproducible ovarian and endometrial volume measurements (Kyei-Mensah et al., 1996bGo).

Similarly, the number of follicles 2–8 mm in diameter may be readily measured by the use of 3-D ultrasound.

As far as stromal echogenicity is concerned, 3-D ultrasound will not improve upon the objectivity and hence the precision of the measurement. It is unclear if stromal volume measurement, considered important by some investigators (Dewailly et al., 1997Go; Kyei-Mensah et al., 1998Go; Fox, 1999Go), could replace stromal echogenicity as one of the morphological criteria. If so, it is possible that 3-D ultrasound may, once again, provide a more precise measurement of the ovarian stroma than conventional 2-D ultrasound.

On the other hand, it is possible that a scoring system based on the measurement of the three separate components of ovarian morphology may help in the not so clear-cut cases, which are not uncommon in day-to-day clinical practice. Consideration should then be given to whether or not each component should be weighed: a separate study will be required to investigate such a possibility.

Finally, the combined assessment of the ovarian morphology by transvaginal ultrasound and colour Doppler flow analysis of the intraovarian and uterine vessels may be a promising new approach to define PCOS. One study (Battaglia et al., 1995Go) carried out a transvaginal colour Doppler measurement of the uterine and intraovarian vessel variations in 22 PCOS women and in 18 normal control women. They found significantly elevated uterine artery pulsatility index values associated with a typical low resistance index of stromal ovary vascularization. The pulsatility index was positively correlated with the LH:FSH ratio, and the resistance index was negatively correlated. The elevated uterine artery resistance was correlated with androstenedione levels. They concluded that Doppler analysis could be a valuable additional tool for the diagnosis of PCOS.

The significance of finding PCO on ultrasound scan
Several previous studies have investigated the significance of ultrasound appearance of PCO in normal women and in women with PCOS. Carmina et al. reported on the significance of ultrasonographic finding of PCO in 15 normal non-PCOS women (Carmina et al., 1997Go). The study found that about a third of this group of women had some evidence of hyperandrogenaemia and significantly lower insulin-like growth factor I (IGF-I) than women with normal ovaries. They concluded that the presence of PCO in apparently non-PCOS women may represent a part of the spectrum of the patients with PCOS or that these women may be susceptible to developing PCOS in the future. Furthermore, it was reported (Norman et al., 1995Go) that women with PCO without hyperandrogenaemia (n = 21) had disturbances in insulin and lipid profile similar to those with PCOS, i.e. those with PCO and hyperandrogenaemia (n = 97), suggesting that ultrasonographic finding of PCO alone, independent of clinical and endocrine manifestations, is predictive of the metabolic sequelae of PCOS. In contrast, another team (Clayton et al., 1992Go) investigated the significance of the ultrasound diagnosis of PCO in 41 non-PCOS women. They found that the prevalence of PCO was high (41/190, 22%) in non-PCOS women, but was associated with minimal clinical manifestations and no hormonal abnormalities. They concluded that an isolated finding of PCO might be a normal variation.

To conclude, there appears to be significant intra-observer and inter-observer variability in the currently used ultrasound criteria for the diagnosis of PCOS. It remains to be seen whether or not 3-D ultrasound evaluation, by providing a more objective means of assessing ovarian morphology, could improve the diagnostic accuracy of ultrasound in the diagnosis of PCOS. It will also be interesting in future studies to directly compare the positive predicted value and negative predicted value of transvaginal ultrasonography and biochemical measurement in the diagnosis of PCOS. For now, we have identified a clear need to continue to search for a better diagnostic tool for PCOS.


    Notes
 
5 To whom correspondence should be addressed. E-mail: s.amer{at}sheffield.ac.uk Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Adams, J., Franks, S., Polson, D.W., Mason, H.D., Abdulwahid, N., Tucker, M., Morris, D.V., Price, J. and Jacobs, H.S. (1985) Multifollicular ovaries: clinical and endocrine features and response to pulsatile gonadotrophin releasing hormone. Lancet, ii, 1375–1378.

Adams, J., Polson, D.W. and Franks, S. (1986) Prevalence of polycystic ovaries in women with anovulation and idiopathic hirsutism. Br. Med. J., 293, 355–359.[ISI][Medline]

Altman, D.G. (1991) Inter-rater agreement. In Practical Statistics for Medical Research. Chapman & Hall/CRC, London, pp. 403–409.

Ardaens, Y., Robert, Y., Lemaitre, L., Fossati, P. and Dewailly, D (1991) Polycystic ovarian disease: contribution of vaginal endosonography and reassessment of ultrasonic diagnosis. Fertil. Steril., 55, 1062–1068.[ISI][Medline]

Ardaens, Y., Robert, Y. and Dewailly, D. (1995) Polycystic ovaries: an imprecise ultrasonographic definition. [In French.] Contracept. Fertil. Sex., 23, 415–419.[ISI][Medline]

Balen, A.H., Conway, G.S., Kaltsas, G., Techatrasak, K., Manning, P.J., West, C. and Jacobs, H.S. (1995) Polycystic ovary syndrome: the spectrum of the disorder in 1741 patients. Hum. Reprod., 10, 2107–2111.[Abstract]

Battaglia, C., Artini, P.G., D'Ambrogio, G., Genazzani, A.D. and Genazzani, A.R. (1995) The role of color Doppler imaging in the diagnosis of polycystic ovary syndrome. Am. J. Obstet. Gynecol., 172, 108–113.[ISI][Medline]

Battaglia, C., Regnani, G., Petraglia, F., Primavera, M.R., Salvatori, M. and Volpe, A. (1999) Polycystic ovary syndrome: it is always bilateral? Ultrasound Obstet. Gynecol., 14, 183–187.[ISI][Medline]

Borgfeldt, C. and Andolf, E. (1999) Transvaginal sonographic ovarian findings in a random sample of women 25–40 years old. Ultrasound Obstet. Gynecol., 13, 345–350.[ISI][Medline]

Buckett, W.M., Bouzayen, R., Watkin, K.L., Tulandi, T. and Tan, S.L. (1999) Ovarian stromal echogenicity in women with normal and polycystic ovaries. Hum. Reprod., 14, 618–621.[Abstract/Free Full Text]

Carmina, E., Wong, L., Chang, L., Paulson, R.J., Sauer, M.V., Stanczyk, F.Z. and Lobo, R.A. (1997) Endocrine abnormalities in ovulatory women with polycystic ovaries on ultrasound. Hum. Reprod., 12, 905–909.[ISI][Medline]

Clayton, R.N., Ogden, V., Hodgkinson, J., Worswick, L., Rodin, D.A., Dyer, S. and Meade, T.W. (1992) How common are the polycystic ovaries in normal women and what is their significance for the fertility of the population? Clin. Endocrinol., 37, 127–134.[ISI][Medline]

Conway, G.S., Honour, J.W. and Jacobs, H.S. (1989) Heterogenity of polycystic ovarian syndrome: clinical, endocrine and ultrasound features in 556 patients. Clin. Endocrinol., 30, 459–470.[ISI][Medline]

Dewailly, D. (1997) Definition and significance of polycystic ovaries. Baillière's Clin. Obstet. Gynecol., 11, 350–368.

Eden, J.A., Place, J., Carter, G.D., Jones, J., Alaghband-Zadeh, J. and Pawson, M.E. (1989) The diagnosis of polycystic ovaries in subfertile women. Br. J. Obstet. Gynaecol., 96, 809–815.[ISI][Medline]

Farquhar, C.M., Birdsall, M., Manning, P., Mitchell, J.M. and France, J.T. (1994) The prevalence of polycystic ovaries on ultrasound scanning in a population of randomly selected women. Aust. NZ J. Obstet. Gynecol., 34, 67–72.[ISI][Medline]

Fox, R. (1999) Transvaginal ultrasound appearances of the ovary in normal women and hirsute women with oligomenorrhoea. Aust. NZ J. Obstet. Gynaecol., 39, 63–68.[ISI][Medline]

Fox, R., Corrigan, E., Thomas, P.A. and Hull, M.G. (1991) The diagnosis of polycystic ovaries in women with oligo-amenorrhoea: predictive power of endocrine tests. Clin. Endocrinol., 34, 127–131.[ISI][Medline]

Franks, S. (1992) Morphology of the polycystic ovary in polycystic ovary syndrome. In Dunaif, A., Given, J.R., Haseltine, F.P. and Merriam, G.R. (eds), Polycystic Ovary Syndrome. Blackwell, Boston, 19pp.

Homburg, R. (1996) Polycystic ovary syndrome—from gynaecological curiosity to multisystem endocrinopathy. Hum. Reprod., 11, 29–39.[Abstract]

Hull, M.G.R. (1987) Epidemiology of infertility and polycystic ovarian disease: endocrinological and dermographical studies. Gynecol. Endocrinol., 1, 235–245.[Medline]

Koivunen, R., Laatikainen, T., Tomas, C. Huhtaniemi, I., Tapanainen, J. and Martikainen, H. (1999) The prevalence of polycystic ovaries in healthy women. Acta Obstet. Gynecol. Scand., 78, 137–141.[ISI][Medline]

Kousta, E., White, D.M., Cela, E., McCarthy, M.I. and Franks, S. (1999) The prevalence of polycystic ovaries in women with infertility. Hum. Reprod., 14, 2720–2723.[Abstract/Free Full Text]

Kyei-Mensah, A., Zaidi, J., Pittrof, R., Shaker, A., Campbell, S. and Tan, S.L. (1996a) Transvaginal three-dimensional ultrasound: accuracy of follicular volume measurements. Fertil. Steril., 65, 371–376.[ISI][Medline]

Kyei-Mensah, A., Maconochie, N., Zaidi, J., Pittrof, R., Campbell, S. and Tan, S.L. (1996b) Transvaginal three-dimensional ultrasound: reproducibility of ovarian and endometrial volume measurements. Fertil. Steril., 66, 718–722.[ISI][Medline]

Kyei-Mensah, A.A., LinTan, S., Zaidi, J. and Jacobs, H.S. (1998) Relationship of ovarian stromal volume to serum androgen concentrations in patients with polycystic ovary syndrome. Hum. Reprod., 13, 1437–1441.[Abstract]

Landis, J.R. and Koch, G. (1977) The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.[ISI][Medline]

Langley, F.A., Baak, J.P.A. and Oort, J. (1983) Diagnosis making: error sources. In Baak, J.P.A. and Oort, J. (eds), A Manual of Morphometry in Diagnostic Pathology. Springer-Varlag, Berlin, 6pp.

Lewis, V. (2001) Polycystic ovary syndrome: a diagnostic challenge. Obstet. Gynecol. Clin. N. Am., 28, 1–20.[ISI][Medline]

Li, T.C., Spuijbroek, M.D.E.H., Tuckerman, E., Anstie, B., Loxley, M. and Laird, S. (2000) Endocrinological and endometrial factors in recurrent miscarriage. Br. J. Obstet. Gynaecol., 107, 1471–1479.[ISI]

Liddell, H.S., Sowden, K. and Farquhar, C.M. (1997). Recurrent miscarriage: screening for polycystic ovaries and subsequent pregnancy outcome. Aust. NZ J. Obstet. Gynecol., 37, 402–406.[ISI][Medline]

Lobo, R.A. (1995) A disorder without identity: `HCA,' `PCO,' `PCOD,' `PCOS,' `SLS'. What are we to call it? Fertil. Steril., 63, 1158–1160.[ISI][Medline]

Loucks, T.L., Talbott, E.O. and McHugh, P. (2000) Do polycystic-appearing ovaries affect the risk of cardiovascular disease among women with polycystic ovary syndrome? Fertil. Steril., 74, 547–552.[ISI][Medline]

Michelmore, K.F., Balen, A.H., Dunger, D.B. and Vessey, M.P. (1999) Polycystic ovaries and associated clinical and biochemical features in young women. Clin. Endocrinol., 51, 779–786.[ISI][Medline]

Norman, R.J., Hague, W.M., Masters, S.C. and Wang, X.J. (1995) Subjects with polycystic ovaries without hyperandrogenaemia exhibit similar disturbances in insulin and lipid profiles as those with polycystic ovary syndrome. Hum. Reprod., 10, 2258–2261.[Abstract]

Pache, T.D., Wladimiroff, J.W., Hop, W.C. and Fauser, B.C. (1992) How to discriminate between normal and polycystic ovaries: transvaginal US study. Radiology, 183, 421–423.[Abstract]

Polson, D.W., Adams, J., Wadsworth, J. and Franks, S. (1988) Polycystic ovaries—a common finding in normal women. Lancet, ii, 870–872.

Rai, R., Backos, M., Rushworth, F. and Regan, L. (2000) Polycystic ovaries and recurrent miscarriage—a reappraisal. Hum. Reprod., 15, 612–615.[Abstract/Free Full Text]

Regan, L., Owen, E.J. and Jacobs, H.S. (1990) Hypersecretion of luteinising hormone, infertility and miscarriage. Lancet, 336, 1141–1144.[ISI][Medline]

Sagle, M., Bishop, K., Ridley, N., Alexander, F.M., Michel, M., Bonney, R.C., Beard, R.W. and Franks, S. (1988) Recurrent early miscarriage and polycystic ovaries. Br. Med. J., 297, 1027–1028.[ISI][Medline]

Swanson, M., Sauerbrie, E.E. and Cooperberg, P.L. (1981) Medical implications of ultrasonically detected polycystic ovaries. J. Clin. Ultrasound, 9, 219–222.[ISI][Medline]

Yeh, H.C., Futterweit, W. and Thornton, J.C. (1987) Polycystic ovarian disease: US features in 104 patients. Radiology, 163, 111–116.[Abstract]

Submitted on March 15, 2001; resubmitted on October 29, 2001; accepted on January 25, 2002.