The Galton Laboratory, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK
Walters (2000) is correct: pooling data from a set of 2x2 frequency tables is (except under some conditions) invalid. Whether these conditions are fulfilled in the present instance will not be discussed though (as will be seen) they may be suspected of being approximated. He describes a method for simultaneously analysing a number of such tables where there is heterogeneity of unknown origin. In the present case, I suggest that the origin of the heterogeneity may be identified, and that a different form of analysis is appropriate and valid.
It may be noted that the heterogeneity of sex ratios (proportions male) exists only within the two categories of births (those reportedly conceived on `most fertile' and on `other' days). In contrast, the five total sex ratios (one for each study cited by Walters) shows remarkably little heterogeneity (2 = 2.2, with 4 degrees of freedom, P = 0.7). Moreover the heterogeneity within the two categories would be expected if the hypothesis under discussion were true (i.e. that cycle day of conception were associated with offspring sex). This may be illustrated as follows. Suppose the regression of sex ratio on cycle day of conception were as in Table I
. The overall sex ratio is 0.5. But the sex ratios of births associated with `most fertile' and `other' days would vary according to the chosen definitions of `most fertile'. If day 0 alone is chosen as `most fertile' the sex ratios would be respectively 0.45 and 0.53: in contrast, if days 1 to +1 are chosen as `most fertile', the corresponding sex ratios would be 0.475 and 0.6. And it is clear that different definitions have been used in the five data sets because the proportion of births in the `most fertile' category vary from 0.23 in sample 1 to 0.5 in sample 3. I suggest that that is the explanation of the heterogeneity of the sex ratios within the `most fertile' and `other' days.
|
For this purpose, one may use a MantelHaenzel method (e.g. Snedecor and Cochran, 1967). For the five data sets, the MantelHaenzel test statistic z (a normalized deviate) was evaluated at 3.05, P < 0.005. Since Gray (1991) had also pooled data, I disaggregated the data in his presentation and analysed those six data sets (i.e. those of Guerrero 1974; Harlap 1979; France et al., 1984; WHO, 1984; Perez et al., 1985; Gray 1991) together with the data sets sets subsequently located (von Koller and Degenhardt 1983; France et al., 1992
; Wilcox et al., 1995
; Gray et al., 1998
;). Table II
reproduces the sets pooled in Gray (1991). For the 10 data sets, the MantelHaenzel test statistic z = 2.87, P < 0.005. Table III
gives the details of the calculation, and may provide a useful basis for additions from further data sets.
|
|
In my previous letter (James 1999), I offered a number of qualifications which should be held in mind when attempting to interpret the above result.
References
France, J.T., Graham, F.M., Gosling, L. and Hair, P.I. (1984) A prospective study of the preselection of the sex of offspring by timing intercourse relative to ovulation. Fertil. Steril., 41, 894900.[ISI][Medline]
France, J.T., Graham, F.M., Gosling, L. et al. (1992) Characteristics of natural conceptual cycles in a prospective study of sex preselection, fertility awareness symptoms, hormone levels, sperm survival and pregnancy outcome. Int. J. Fertil., 37, 244255.[ISI][Medline]
Gray, R.H. (1991) Natural family planning and sex selection: fact or fiction? Am. J. Obstet. Gynecol., 165, 19821984.[ISI][Medline]
Gray, R.H., Simpson, J.L., Bitto, A.C. et al. (1998) Sex ratio associated with timing of insemination and length of the follicular phase in planned and unplanned pregnancies during use of natural family planning. Hum. Reprod., 13, 13971400.[Abstract]
Guerrero, R. (1974) Association of the type and time of insemination within the menstrual cycle and the human sex ratio at birth. N. Engl J. Med., 291, 10561059.[ISI][Medline]
Harlap, S. (1979) Gender of infants conceived on different days of the menstrual cycle. N. Engl J. Med., 300, 14451448.[Abstract]
James, W.H. (1999) The status of the hypothesis that the human sex ratio at birth is associated with the cycle day of conception. Hum. Reprod., 14, 21772178.
Perez, A., Eger, R., Domenichini, V. et al. (1985) Sex ratio associated with natural family planning. Fertil. Steril., 43, 152153.[ISI][Medline]
Snedecor, G.W. and Cochran, W.G. (1967). Statistical Methods. 6th edn. Iowa State University Press, Ames, IO, USA. p. 256.
Von Koller, S. and Degenhardt, K.H. (1983) Risikofaktoren der Schwangerschaft. Springer Verlag, New York, USA.
Walters, D.E. (2000) The need for statistical rigour when pooling data from a variety of sources. Hum. Reprod., 15, 12051206.
Wilcox, A.J., Weinberg, C.R., and Baird, D.D. (1995) Timing of sexual intercourse in relation to ovulation. Effects on the probability of conception, survival of the pregnancy and sex of the baby. N. Engl. J. Med., 333, 15171521
WHO Task Force on Method for the Determination of the Fertile Period (1984) A prospective multicenter study of the ovulation method of natural family planning. IV. The outcome of pregnancy. Fertil. Steril., 41, 593598.[ISI][Medline]
Thorpes, The Grip, Linton, Cambridge CB1 6NR, UK
I fear that Dr James has misinterpreted my letter, as regards the word heterogeneity. I used the word to denote heterogeneity of the `effect' being investigated; that is the day of conception. In the five papers quoted in Table I this effect is in one direction for four of the papers and in the reverse direction for one, quite large, study. The remainder of Professor James's response does not therefore address my objections. If one uses the Log-Odds ratio statistic to quantify the effect, the figures for the five papers are +0.224, 0.048, +2.485, +0.129, +0.132, where positive figures indicate a preponderance of males for `Other days'.
Dr James then illustrates his approach with the Mantel-Haenzel test. This test is in fact a means of pooling several 2x2 tables, when it is assumed that there is a CONSTANT population statistic. Many analysts test for heterogeneity of the effect before applying this pooling technique.
In answer to Dr James's question regarding P values (0.005 versus 0.12), the difference arises because he, implicitly, uses `within trial' errors to make the test, whereas I have effectively used variation between trials of the effect in question. In the philosophy of hypothesis testing, it is a matter of whether we are considering the finite model of the infinite model; that is, do we wish to make inferences about the general population, based on the evidence available. I suspect that analysts are almost invariably interest in the wider, global, inferences.