1 Department of Gynaecology and Obstetrics, 2 Department of Magnetic Resonance Imaging, 3 Research Unit for General Practice, Aarhus University and Aarhus University Hospital and 4 Department of Obstetrics and Gynaecology, Silkeborg Hospital, Denmark
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key words: adenomyosis/hysteroscopy/hysterosonography/magnetic resonance imaging/reproducibility
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
During recent decades the need for minimal invasive surgery has risen, as has the demand for effective evaluation and single-step treatment options for common benign gynaecological disorders (Baskett et al., 1996; Dueholm et al., 1999
). Therefore TVS, HSE and HY have become integrated into common gynaecological practice, whereas MRI as a developing, costly new imaging technique has not yet become generally available. The accuracy of these techniques in expert hands has been evaluated and compared in 106 patients undergoing hysterectomy for benign reasons (Dueholm et al., 2001a
,b
). The results were comparable for uterine cavity evaluation, but MRI was superior for evaluation of adenomyosis and submucous myomas. These and other evaluations of commonly used imaging techniques (Dudiak et al., 1988
; Emanuel et al., 1995
; Widrich et al., 1996
; Schwarzler et al., 1998
) are the basis for the use of imaging technique in benign gynaecology. However, the reliability of these findings in common gynaecological practice depends on their reproducibility by different observers and the reproducibility of gynaecological imaging techniques as such has been the subject of only sparse research (Broekmans et al., 1996
; Emanuel et al., 1996
; Wolman et al., 1996
,1998
; Delisle et al., 1998
; Spandorfer et al., 1998
) that has mostly been based on measurements of endometrial thickness at TVS (Delisle et al., 1998
; Spandorfer et al., 1998
; Wolman et al., 1996
,1998
) or HSE (Ballard et al., 2000
). Studies on inter-observer variations in the evaluation of the uterine cavity in pre-menopausal patients by HY, HSE or MRI have not, to our knowledge, been reported, and only a single study has been performed evaluating the uterine cavity by TVS (Emanuel et al., 1996
). Moreover, no studies have compared observer variations in the assessment of benign abnormalities in the myometrium by MRI and TVS.
The present blinded, prospective study was performed to evaluate and compare inter-observer agreement by TVS, HSE, MRI and HY in evaluating the uterus in pre-menopausal patients who underwent hysterectomy for benign diseases.
![]() |
Material and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The study included 108 consecutive pre-menopausal patients scheduled for hysterectomy for benign diseases at a university hospital from September 1998 to February 2000. All patients gave their informed consent and the study was approved by the local ethics committee. The indications for hysterectomy were abnormal uterine bleeding in 51 (48%), symptomatic myomas in 35 (33%), lower abdominal pain or endometriosis in 17 (16%), and dysplasia or prior borderline ovarian tumour in three (3%) patients. Abnormal bleeding was present in 82 (77%) of the patients. The mean age (± SD) was 44.7 ± 5.2 years (range 2858), parity was 1.73 ± 1.18 (04) and the number of pregnancies was 2.68 ± 1.59 (07). The mean uterine volume was 298 ± 271 (251290) ml.
Two patients were excluded as their uteri were morcelated at hysterectomy (laparoscopically assisted vaginal hysterectomy), and standard pathological examination could therefore not be performed. This left 106 patients for analysis. All underwent MRI followed immediately after by TVS and HSE. Hysterectomy was completed within 2 weeks of these examinations. The surgeons (specialized gynaecologists) performed HY in universal anaesthesia immediately before hysterectomy. MRI, TVS and HSE and HY were performed independently and with no knowledge of other investigators' findings and the results were evaluated consecutively. These results have been reported elsewhere with pathological examination as the true value for evaluation of the diagnostic accuracy (Dueholm et al., 2001a,b
).
Observers
In this study, two observers independently and blindly evaluated consecutive patients at each image technique (MRI, TVS, HSE and HY), and the inter-observer variation of each image technique was calculated and compared.
MRI specialists evaluated all MRI pictures. One of the authors (E.L.) evaluated all the MRI pictures. Another examiner (J.S.) described the MRI in the first 78 patients.
Patients were evaluated consecutively by two experienced gynaecologists, who used TVS and HSE. One author (M.D.) performed all 106 initial TVS and HSE examinations, and another author (S.L.) performed a second TVS and HSE in the first 60 patients. These two investigators alternated taking the first of the two examinations. The second observer performed TVS and HSE after a 510 min interval with the patient in the upright position to allow the uterine cavity to be emptied.
MRI and TVS observers were senior investigators (E.L., M.D.) and representative members of the specialist team (J.S., S.L.).
Six experienced surgeons performed the first observation with HY. One of the authors (M.D. or S.L.) ensured that HY was performed according to standard, and that interpretations of the findings had been correctly entered into the standard form. Another experienced gynaecologist (H.L.) evaluated all recorded videos at HY. This observer was a most senior staff member.
Table I shows that 56 of 60 consecutive patients were evaluated by two observers with both TVS and MRI, and that 68 consecutive patients were evaluated by two observers with both HY and MRI. A total of 51 patients had double examinations including all imaging techniques. The procedure for determining the number of patients to be included is explained below (power calculation).
|
Pictures with measurements were taken for documentation, and a short video was recorded at sonography.
Evaluation of the uterus by TVS, HSE, MRI and HY
Doubtful examinations with poor visualization and/or difficult interpretation were considered equivocal. The quality of an examination was stated as optimal, sub-optimal or not valuable. The uterine cavity contours and endometrial structure were studied. Presence or absence of abnormalities [polyps, submucous myomas (SM), other] was recorded. The myometrium was evaluated by MRI and TVS. All abnormal findings were described, measured and localized in figures in the sagittal plane and to the right and left respectively.
Data analysis and statistics
Kappa statistics were used for evaluation of observer agreement. Results were expressed as kappa values (95% confidence limits of kappa). Kappa is defined as the difference between observed and expected agreement (by chance) expressed as a fraction of the maximum difference. Kappa = (observed agreement expected agreement)/(1 expected agreement).
A kappa of <0.40 indicated poor agreement, 0.400.59 intermediate agreement, 0.600.74 good agreement and 0.75 excellent agreement (Fleiss, 1981
). For each image technique, cases were categorized according to agreement or disagreement between two observers. The Friedman test was used for multiple paired comparisons, the Krustal Wallis test for unpaired comparison. The McNemar test was used for comparison of the number of cases with agreements/disagreements between two different imaging techniques, and Wilcoxon's signed rank test for non-parametric non-dichotomized data. P < 0.05 was considered to be statistically significant.
Power calculation
We expected disagreement in 10% of the cases by HSE, and aimed to reach a 10% difference threshold between HSE on the one hand and MRI and HY on the other hand (type I error 5%, type II error 80%). We expected slightly less (7%) difference between HY and MRI, and therefore included another 10 patients for MRI, beyond the 60 double examinations at TVS, and evaluated all videos at HY to be able also to account for possible insufficient video examinations.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
In 17 cases with disagreement between the two observations at HY, the video recordings were re-evaluated for poor visualization and compared with true findings at pathology. Only one case did not have optimal visualization of abnormalities. Seven of 14 cases with normal cavities at the second observation represented true false-positive findings, and two of three cases with abnormal cavities at the second observation represented true findings. In the remaining eight cases the observed disagreement represented false interpretations of uterine cavities clearly seen on the videos.
Six different observers performed the primary hysteroscopy. There was no significant difference between their observations and those of the second hysteroscopy observer (not significant, KrustalWallis test).
Submucous myomas and polyps
Reproducibility of MRI identification of SM was excellent, but TVS, HSE and HY reproducibility only ranged from intermediate to good, which, however, was slightly better than for evaluation of normal/abnormal cavity (Table III). MRI, HY, TVS and HSE observers differed significantly (P = 0.02, n = 34, Friedman test) in their evaluation of SM findings and again MRI stood out as the technique that produced the fewest observer discrepancies, whereas observer disagreements reached comparable and higher levels by HY, TVS and HSE.
|
Most experienced versus senior members of the team
Table IV compares the agreement between the different techniques obtained by the most experienced observers with the results achieved by senior team members with average experience. As expected, we found significantly higher levels of concurrence by different techniques among the most experienced observers than among the other observers.
|
|
Observer agreement and reproducibility of findings of adenomyosis by MRI (Table VI) did not reach the excellent levels obtained for myomas, but, nevertheless, kappa values were good (0.73) and disagreement was restricted to 9% of the cases. However, inter-observer agreement was significantly lower between the two TVS observers (kappa = 0.38, disagreement in 23% of the cases) where only intermediate agreement was obtained. All cases without signs of adenomyosis were counted normal.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The difference between MRI and HSE and between HY and HSE was lower between the most senior members of the team, which indicates that HSE is crucially dependent on observer experience; a dependence that may be critical when this method is used by team members with average experience or when inconclusive examinations are included.
Inter-observer disagreements by use of HY may have been biased because the HY study design was not ideal, but it was the only possible design that would ensure patient acceptability. The second observation was performed by a single observer as a review of recorded hysteroscopy protocols. This design may falsely have increased observer reproducibility, because it does not reflect the observer's skill in visualizing the findings, which is an important issue of reproducibility at endoscopy. On the contrary, observer variation may have been biased so that lower levels were found because the first observations by HY were made by different observers. However, the two observations were equally accurate, and it seems unrealistic to expect the reproducibility at HY to be significantly different in another design. Nevertheless, the relatively high level of disagreement in interpretation of the same HY images was surprising as we only evaluated common benign abnormalities and not malignancies and did not attempt a detailed assessment.
The interpretation of small irregularities in the endometrium seemed to be problematic by both TVS, HSE and HY. To ensure patient acceptability, HSE was performed without balloon catheter, thenaculum and anaesthesia, which may have made visualization poorer compared with HY, which was performed in anaesthesia with a rigid hysteroscope.
Modern imaging techniques are high-accuracy techniques that demand utmost attention to detail and observer meticulousness, and different interpretations of even small irregularities will increase inter-observer disparities. The reproducibility of TVS and other endoscopic gynaecological techniques therefore usually falls short of excellence and remains at intermediate or good levels (Adhesion Scoring Group, 1994; Bowman et al., 1995
; Rock, 1995
; Delisle et al., 1998
).
Reproducibility also seems to be highly dependent upon the degree to which the particular imaging techniques recognize and visualize abnormalities, which in the case of endoscopy and ultrasonography may depend upon the skill of the performer of that technique in displaying a clear image illustrating the entire organ. In contrast, though the image produced in standard MRI settings has an insufficient sensitivity for diagnosis of endometrial polyps (Dueholm et al., 2001a), it does not depend on investigator skills, and inter-observer reproducibility only depends on differences in the interpretation of the images, which may account at least partly for the higher reproducibility seen by MRI. Moreover, the high level of agreement by MRI may spring from the fact that MRI observers were members of a highly specialized staff that conducts daily image conferences as opposed to TVS, HSE and HY observers, who were members of the gynaecological staff, where TVS, HSE and HY were implemented as part of clinical examinations.
TVS, HSE and HY may owe much of their efficacy as diagnostic tools to the experience and skill of the investigators (Loffer, 1989; Emanuel et al., 1996
; Widrich et al., 1996
; Schwarzler et al., 1998
), but the rising demand for these modern imaging techniques mandates their introduction into normal gynaecological clinical practice and the design of strategies to raise their effectiveness and reduce inter-observer disparities (Valentin, 1999
). These strategies may include a decentralized organization of gynaecological ultrasound with referral to a specialized educated staff, where well-defined more complicated image evaluations can be performed. Moreover, efforts must be devoted to instituting continuous systematic education in gynaecological ultrasound and hysteroscopy with creation of facilities for training and education. Such efforts may, however, be thwarted by the low observer variation and high accuracy of MRI which may gain ground in spite of its high cost and low availability and at the expense of the more difficult process of decreasing inter-observer variation with simpler and less costly techniques such as TVS, HSE and HY.
The relatively high level of inter-observer disagreement by TVS, HSE and HY reported here warrants continuous monitoring of the quality of gynaecological imaging, and further studies on observer variation in outpatient settings. Such studies should ideally be launched in similar set-ups and should aim at evaluating the reproducibility of outcomes with these imaging techniques.
In conclusion, these findings warrant that efforts should be made to reduce observer variation in the use of imaging techniques in routine gynaecological practice. The alternative is that the more costly and less available imaging technique such as MRI will replace a considerable part of gynaecological imaging techniques in the future.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Ballard, P., Tetlow, R., Richmond, I. et al. (2000) Errors in the measurement of endometrial depth using transvaginal sonography in postmenopausal women on tamoxifen: random error is reduced using saline instillation sonography. Ultrasound Obstet. Gynecol., 15, 321326.[ISI][Medline]
Baskett, T.F., O'Connor, H. and Magos, A.L. (1996) A comprehensive one-stop menstrual problem clinic for the diagnosis and management of abnormal uterine bleeding. Br. J. Obstet. Gynaecol., 103, 7677.[ISI][Medline]
Bowman, M.C., Li, T.C. and Cooke, I.D. (1995) Inter-observer variability at laparoscopic assessment of pelvic adhesions. Hum. Reprod., 10, 155160.[Abstract]
Broekmans, F.J., Heitbrink, M.A., Hompes, P.G. et al. (1996) Quantitative MRI of uterine leiomyomas during triptorelin treatment: reproducibility of volume assessment and predictability of treatment response. Magn. Reson. Imag., 14, 11271135.[ISI][Medline]
Delisle, M.F., Villeneuve, M. and Boulvain, M. (1998) Measurement of endometrial thickness with transvaginal ultrasonography: is it reproducible? J. Ultrasound Med., 17, 481484.
Dudiak, C.M., Turner, D.A., Patel, S.K. et al. (1988) Uterine leiomyomas in the infertile patient: preoperative localization with MR imaging versus US and hysterosalpingography. Radiology, 167, 627630.[Abstract]
Dueholm, M., Laursen, H. and Knudsen, U.B. (1999) A simple one-stop menstrual problem clinic with use of hysterosonography for the diagnosis of abnormal uterine bleeding. Acta Obstet. Gynecol. Scand., 78, 150154.[ISI][Medline]
Dueholm, M., Lundorf, E., Hansen, E.S. et al. (2001a) Magnetic resonance imaging, transvaginal sonography, hysterosonographic examination and diagnostic hysteroscopy in evaluation of the uterine cavity. Fertil. Steril., 76, 350357.[ISI][Medline]
Dueholm, M., Lundorf, E., Hansen, E.S. et al. (2001b) Magnetic resonance imaging and transvaginal ultrasonography for diagnosis of adenomyosis. Fertil. Steril., 76, 588594.[ISI][Medline]
Emanuel, M.H., Verdel, M.J., Wamsteker, K. et al. (1995) A prospective comparison of transvaginal ultrasonography and diagnostic hysteroscopy in the evaluation of patients with abnormal uterine bleeding: clinical implications. Am. J. Obstet. Gynecol., 172, 54752.[ISI][Medline]
Emanuel, M.H., Ankum, W.M., Verdel, M.J. et al. (1996) The reproducibility of the results of transvaginal sonography of the uterus in patients with abnormal uterine bleeding. Ultrasound Obstet. Gynecol., 8, 346349.[ISI][Medline]
Fleiss, J.L. (1981) Statistical Methods for Rates and Proportions. Wiley, New York.
Loffer, F.D. (1989) Hysteroscopy with selective endometrial sampling compared with D&C for abnormal uterine bleeding: the value of a negative hysteroscopic view. Obstet. Gynecol., 73, 1620.[Abstract]
Rock, J.A. (1995) The revised American Fertility Society classification of endometriosis: reproducibility of scoring. Zoladex Endometriosis Study Group. Fertil. Steril., 63, 110810.[ISI][Medline]
Schwarzler, P., Concin, H., Bosch, H. et al. (1998) An evaluation of sonohysterography and diagnostic hysteroscopy for the assessment of intrauterine pathology. Ultrasound Obstet. Gynecol., 11, 337342.[ISI][Medline]
Spandorfer, S.D., Arrendondo-Soberon, F., Loret de Mola, J.R. et al. (1998) Reliability of intraobserver and interobserver sonographic endometrial stripe thickness measurements. Fertil. Steril., 70, 152154.[ISI][Medline]
Valentin, L. (1999) High-quality gynecological ultrasound can be highly beneficial, but poor-quality gynecological ultrasound can do harm. Ultrasound Obstet. Gynecol., 13, 17.[ISI][Medline]
Widrich, T., Bradley, L.D., Mitchinson, A.R. et al. (1996) Comparison of saline infusion sonography with office hysteroscopy for the evaluation of the endometrium. Am. J. Obstet. Gynecol., 174, 13271334.[ISI][Medline]
Wolman, I., Jaffa, A.J., Sagi, J. et al. (1996) Transvaginal ultrasonographic measurements of endometrial thickness: a reproducibility study. J. Clin. Ultrasound, 24, 351354.[ISI][Medline]
Wolman, I., Amster, R., Hartoov, J. et al. (1998) Reproducibility of transvaginal ultrasonographic measurements of endometrial thickness in patients with postmenopausal bleeding. Gynecol. Obstet. Invest., 46, 191194.[ISI][Medline]
Submitted on May 23, 2001; accepted on September 21, 2001.