Reproducibility of skin characteristic measurements and reported sun exposure history

Stefano Rossoa, Rosa Miñarrob, Simon Schraubc, Rosario Tuminod, Silvia Franceschie and Roberto Zanettia

a CPO-Registro Tumori Piemonte, Turin, Italy.
b Registro de Càncer de Granada, Escuela Andaluza de Salud Pùblica, Granada, Spain.
c Registre des Tumeurs du Doubs, Besançon, France.
d Registro Tumori di Ragusa—Azienda Ospedaliera ‘Civile-MP Arezzo’, Ragusa, Italy.
e Field and Intervention Studies Unit. International Agency for Research on Cancer, Lyon, France.

Stefano Rosso CPO-Registro Tumori Piemonte via San Francesco da Paola, 31 10123 Torino, Italy. E-mail: stefano.rosso{at}asl1.to.it

Abstract

Background The aim of the present study is to investigate the reproducibility of information on sun exposure, skin characteristics and sunburn collected through a standardized questionnaire used in the multi-centre South European case-control study on skin cancer, ‘Helios’. We also intended to use results from reproducibility analysis for correcting odds ratio (OR) estimates from the original study.

Methods We re-interviewed, with the same questionnaire, a sample of 115 cases of basal cell and squamous cell carcinoma of the skin and 119 population controls, 18–26 months apart, in four centres of Italy, Spain and France. The questionnaire included questions on skin characteristics, sunburns and sun exposure histories. We investigated agreement, studying the association between the difference of the two measures and a set of explanatory variables. According to the results of the reproducibility analysis we corrected OR estimates from the original study simultaneously adjusting for random measurement error.

Results Hair and eye colour showed high agreement with intra-class correlation coefficients (ICC) of 0.81 and 0.74, respectively. Lifetime sun exposure showed substantial agreement with ICC ranging from 0.68 for time spent doing outdoor work to 0.79 for time spent outdoors during holidays and holidays at the beach. The poorest agreement was found for number of lifetime sunburns (ICC = 0.25), while sunburns during childhood showed a substantial agreement (Cohen's Kappa = 0.67). Lack of reproducibility was mainly associated with subjects' education, while no significant differences were observed between cases and controls. The corrected OR showed a moderate increase with a reinforcement of the effect of sun exposure and skin reaction to sun exposure.

Conclusion Overall, there was good reproducibility, particularly in the case of sun exposure histories, between answers given on two different occasions to a questionnaire administered by trained interviewers. However, since measurement error can substantially bias OR toward the null value, it should be taken into account in estimates of the effect of sun exposure on risk of skin cancer.

Keywords Case-control, reproducibility, skin neoplasm, sun exposure

Accepted 3 September 2001

Skin neoplasm, including melanoma, has been mainly associated with particular skin phenotypes (fair complexion, tendency to sunburn, freckles) and sun exposure.1,2 The relationship between the different patterns of sun exposure and the risk of skin tumours still remains to be fully explained, in particular the relative importance of intense and intermittent exposure to sunlight in developing cutaneous malignant melanoma (CMM) and basal cell carcinoma (BCC). The scientific support for these findings arises from several, as far as skin carcinoma is concerned, recent case-control studies3–7 that investigated these risk factors with the help of structured questionnaires and some objective measures of skin phenotypes.

The importance of investigating reliability and validity of information collected in this way is obvious. Unfortunately, while objective measures of some phenotypic characteristics such as hair colour are available, the majority of exposure measures do not have a clear ‘gold standard’. Yet, few studies exist on the validity or reliability of measure of sun exposure that have been used in epidemiological studies.8,11

The aim of the present study is to investigate the reproducibility of information gathered through a standard questionnaire on sun exposure and skin characteristics within the framework of the Helios study, a case-control study on skin cancer.6,7 Also of interest were the possible sources of lack of agreement such as subject characteristics, interview conditions or the questionnaire itself, and we aimed to investigate if the lack of agreement differs among cases and controls. Furthermore, since non-differential misclassification of confounding factors can lead to biased results in multivariate analysis, as already pointed out by Greenland,12 we corrected odds ratios (OR) with random measurement error estimates from the present reproducibility study.

Materials and Methods

Subjects
The multi-centre case-control study ‘Helios’ gathered information from incident cases of BCC (1540 cases) and of squamous cell carcinoma (SCC) (228 cases) and a sample of 1795 population controls in some areas of Italy, France, and Spain between 1989 and 1992. The main results were published in 1996.6,7 After an interval of about 22 months on average, we re-interviewed a sample of 236 subjects (116 cases and 120 controls) from four centres willing to take part in this study (Torino and Ragusa [Italy], Besançon [France], and Granada [Spain]). Subjects were randomly selected and then interviewed directly. Information was collected through the same standardized questionnaire used in the main study and was administered by the same trained interviewer in each centre.

Exposure measures
Skin characteristics were measured by grading hair colour on an 11-level visual scale and eye colour on a 3-level scale (black or brown, green, blue or grey). Reaction to sun exposure was assessed on a 4-level scale and ranged from subjects who always tan and never burn to subjects who always burn when exposed to the sun. Since reaction to sun exposure varies during lifetime according to the degree of melanin protection and skin thickness, we asked subjects to report their skin reaction experience when they were 20 years old. We investigated past experience of sunburn by asking subjects the number of sunburns they had experienced and their age when the first sunburn occurred.

Sun exposure assessment was undertaken by dividing the questionnaire into several sections, according to relevant life periods: childhood (<16 years of age), adolescence (16–19 years), adulthood (20–60 years), retirement (>60 years of age), with separate sections for: places of residence for more than 6 months, work, holidays, sports or other outdoor activities. For every reported outdoor activity, we recorded the duration of exposure (number of months/years), the prevalent season of exposure, its amount (number of hours and distribution during daylight), and the type of clothes usually worn. Questions on clothes were further structured into body sections (head, trunk, upper and lower limbs, feet). From available information we computed indices of lifetime sun exposure based upon duration and intensity (amount of skin exposed during hours/seasons with high UV irradiation). Amount of exposed skin values could then be used, weighting duration of exposure by the proportion of skin not protected by clothes. This was estimated as follows:

Weighted index of lifelong sun exposure for specific activity =


where:

i = index of year of exposure;

h = hours of exposure;

d = days of exposure;

s = months/weeks during season of activity;

wc = weights proportional to amount of exposed skin (from questions on clothes used during activity);

ws = weights proportional to solar intensity during season (summer versus winter) and place of activity.

These indices can reach values of about 70 000 weighted hours of exposure in a lifetime (40 years) for people with occupational exposures, generally farmers, working in Southern Europe (in the present study: Spain and Italy). By comparison, 10 years (one month a year) of sun exposure during holidays at the beach (Mediterranean coasts) usually sum to about 2000 weighted hours of sun exposure.

Analysis
We first computed univariate agreement indices for cases, controls and the total. Then, we used regression analysis with a stepwise procedure to assess the independent association between lack of agreement and a set of explanatory variables, such as case status, education, gender, age, time lag between the two interviews and place of interview. This analysis step was also useful in suggesting different strategies for improving reproducibility of exposure measurement. Finally, we used results from reproducibility analysis for correcting original OR estimates in the main study, simultaneously adjusting for random measurement error of variables entering in the final model.

Analysis of agreement
Given the variety of measurement scales resulting from the questionnaire items, ranging from dichotomous answers to continuous data with skewed distribution, such as weighted sun exposures indices, we computed appropriate statistics to assess test-retest agreement. In the case of qualitative data, we used the simple Cohen's Kappa coefficient for dichotomous items,13 or its weighted version (Kw)14,15 for ordinal items. Their 95% CI were then computed under the hypothesis that their (not null) true underlying values were the ones observed.16,17 For the continuous data, the measure of reliability is the intraclass correlation coefficient (ICC), here traditionally computed from the ANOVA model with its 95% CI based on the F test.

A stepwise selection procedure, with a critical value set at 0.05, was used to select those variables that could have been associated with lack of agreement. They included gender, age, education (years of courses attended), time of the first interview, place of interview (hospital, home, office), participating centre and, most notably, case/control status. In the regression model, the dependent variable was the ratio of the difference between the two measurements with half of their sum. That is, the within-subject differences were divided by a measure of the between-subject differences following the methodological suggestions of Bartko.18 This correction was needed to take into account the overall variability of the ‘true’ measure.

Correction of OR for random within-subject measurement error
Where results from agreement analyses showed that the measures taken during the main study and the reproducibility study were stable (i.e. without large within-subject variance) and there were no differences in reproducibility between cases and controls, they can be considered as simply affected by random measurement error. In this case, to evaluate the effect of the random measurement error we compared uncorrected and corrected OR multivariate estimates using the subject's averages as the true values for the correction procedures.12,19,20

To adjust OR for the effect of measurement error we applied the method suggested by Armstrong for case-control studies.21 In brief, the method applied the estimates of the measurement error (the within-subject random error from the reproducibility study) to the effect estimates in the main study in such a way that misclassification bias would be accordingly reduced. The method requires multivariate normality of the variables measured with error. As a consequence, we log-transformed continuous variables to improve normality and independence of the mean from the within-subject variance.

Since estimating standard errors of corrected parameters entails very cumbersome calculations, we rather computed bootstrap estimates of the asymptotic covariance matrices from the reproducibility and the main study separately. This procedure allows calculation of parameters' confidence limits, if estimates are from reasonably large samples, as in the case of the present study. The bias in estimating the parameter variances can be partly reduced if the covariance matrices are independently estimated, we therefore excluded the re-interviewed subjects from the original data when estimating such matrices. We implemented algorithms in an SAS macro developed to correct effect measures in cohort studies22 properly modified to include bootstrap estimates of their standard errors.

The procedure for correcting the OR for random measurement error started with the definition of the base model to apply the correction to. From the published analysis of the independent effect of sun exposure indices, including skin characteristics,6,7 we selected those risk factors that remained significant after controlling the other factors. They were: hair and eye colour, skin reaction to sun exposure, outdoor occupation for SCC, holidays and sports at the beach and sunburns before age 16 for BCC. Two unconditional logistic regression models were separately fitted to BCC and SCC cases. Furthermore, terms for age, sex and centre, considered as measured without error, were included to improve the control of design variables as already done in the original analysis. In the published results, risk factors were presented in categories, to improve their interpretation, while here we preferred to keep them in their original or log-transformed continuous scales, as the method's robustness partially lays on normal distribution assumptions.

Results

Eighty-three per cent of the subjects contacted agreed to be re-interviewed, compared to 80% overall compliance rate in the original ‘Helios’ study.6 The demographic characteristics of the subjects in the reproducibility sample and in the original ‘Helios’ study were also similar (Table 1Go), with slightly older and better educated subjects in the reproducibility sample, but without any statistical significant difference between the two groups. The BCC/SCC ratio was 6.7 in the original study, and 8.6 in the reproducibility study (difference not statistically significant).


View this table:
[in this window]
[in a new window]
 
Table 1 Demographic characteristics of subjects in the reproducibility and in the main study
 
The highest agreement was reached for years of education (ICC = 0.88; 95% CI : 0.84–0.91), which can be also regarded as the upper empirical limit of attainable reproducibility in studies based on similar methods of gathering information. Indices of agreement for skin characteristics and sunburns range from a high 0.8 (hair colour) to a low 0.25 (number of sunburns) (Table 2Go). In general, pigmentary traits (hair and eye colour) were measured with good reproducibility, while questions on skin reaction and age at first sunburn showed a lower but still substantial reproducibility between 0.6 to 0.7. When age at first sunburn was dichotomized, agreement increased to 0.67. However, number of sunburns in a lifetime, even aggregated in fewer categories, reached a value of only 0.43.


View this table:
[in this window]
[in a new window]
 
Table 2 Indices of agreement and their 95% CI for skin characteristics and number of sunburn episodes
 
Sun exposure was analysed according to different activities (Table 3Go). For each activity, we present the reproducibility of overall and weighted indices, as well as that of single items and components of the overall indices. The weighted index of sun exposure during outdoor work and its components (total number of hours and proportion of skin exposed) showed a substantial agreement above 0.65 (Table 3Go). However, single items on exposed skin exhibited a lower reliability than the overall index.


View this table:
[in this window]
[in a new window]
 
Table 3 Indices of agreement and their 95% CI for outdoor work
 
Outdoor exposure during holidays showed the highest agreement among sun exposure indices (Table 4Go). Agreement decreased to a lower but still acceptable level for those questions concerning periods of time dating back several years, such as the case of sun exposure occurring during childhood. Place of outdoor holidays influences agreement towards higher values for holidays at the beach in comparison with holidays in the mountains. The measure of overall proportion of exposed skin showed a high agreement (ICC = 0.75; 95% CI : 0.67–0.82), even if agreement in single items was not above 0.57.


View this table:
[in this window]
[in a new window]
 
Table 4 Indices of agreement and their 95% CI for sun exposure during holidays
 
Only moderate reproducibility was found for questions on outdoor sports (Table 5Go). In particular, outdoor sports at the beach showed an agreement (ICC = 0.40; 95% CI : 0.25–0.49) lower than the other outdoor sports indices. Since this is a compound index, the relative lack of agreement can depend only on some of its components, or on the cumulative effect of a small systematic error in the same direction. Agreement analysis of the other items related to outdoor sports showed that questions on frequency of practice showed low values. On the other hand, interviewed people maintained a substantial agreement on type of sports (Kw = 0.67; 95% CI : 0.62–0.72).


View this table:
[in this window]
[in a new window]
 
Table 5 Indices of agreement and their 95% confidence limits for outdoor sports
 
In the analysis of variables associated with the lack of agreement, case-control status emerged as a statistically significant variable (P < 0.05) only for the ‘outdoor beach holidays during childhood’ weighted index. The bias was towards higher reporting among controls during the first interview, while cases were more consistent with what they reported at the first interview, as recorded by the difference between an ICC of 0.72 (95% CI : 0.64–0.81) among cases and an ICC of 0.50 (95% CI : 0.32– 0.65) among controls.

Higher level of education was associated with the lack of agreement in the case of skin reaction to sun exposure, number of sunburns, age at first sunburn, intensity of sun exposure at work and beach holidays during childhood. However, the bias direction is not consistent, since people with higher education tend to report fewer sunburns and older age at first sunburn during the first interview, while they tend to report more time spent on holidays at the beach during childhood. Since the reproducibility sample had a slightly higher proportion of better educated subjects than the original study population, measurement error in the original study could also be different, although its direction was not predictable.

The time interval between the two interviews decreased the agreement of answers on skin reaction to the sun and recalling information on past occupations. Finally, age played a role only in the case of number of sunburns, while gender was never associated with the lack of agreement.

The correction procedure requires estimation of the multivariate reliabilities (i.e. the ICC coefficients corrected by the effect of the other variables measurement errors) and the within-subject random error matrices; thus it is possible to compare univariate and multivariate reliabilities, the magnitude of the random errors and to verify if they were correlated. This analysis showed that correlation among errors was modest with the highest at 0.19 (skin reaction and hair colour), therefore allowing some simplification required by the correction procedure.21 However, the differences between univariate and multivariate reliabilities estimates were small, with a slight increase in reliabilities of hair colour, and sun exposure indices, and a decrease from substantial to moderate in the case of eye colour, skin reaction to sun exposure and age at first sunburn (Table 6Go).


View this table:
[in this window]
[in a new window]
 
Table 6 Effect of correction of logistic regression odds ratio (OR) estimates by measurement error, multivariate reliabilities of variables included in the final models and bootstrap standard errors
 
A slight increase in OR of BCC for exposure to the sun during outdoor sports and skin reaction to sun exposure was observed after correction for measurement errors; while an increase in the bootstrap estimates of standard error for first sunburn before age 16 inflated its 95% CI with a loss of statistical significance (Table 6Go). Correcting SCC logistic estimates led to an increase in the OR for skin reaction to sun exposure, in addition to a general moderate increase of all estimates (Table 6Go). However, bootstrap standard errors showed an increase of the 95% confidence with eye colour lower limit falling before unity, reflecting a loss in its multivariate reliability.

We also analysed the effect of correction on sunburns risk. Even after multivariate correction, the OR estimates of number of sunburns measured as continuous or as ordinal variable (4 levels: 0, 1, 2, 3+) did not show a statistically significant increase. Including sunburns indicator in the models did not change significantly other variable estimates. Multivariate reliabilities of number of sunburns fell to 0.21 and 0.34 for continuous and ordinal indicators, respectively.

Discussion

Overall reliability indices were moderate to substantial with one factor, hair colour, showing an almost perfect agreement (ICC = 0.81); the poorest reliability was found for number of sunburns (ICC = 0.27).

In general, facts dating back many years, such as sunburns and sun exposure during childhood, are remembered less precisely, as expected, even if the agreement indices were moderate to substantial according to the definition of Landis and Koch.23 In the case of number of sunburns, its poor reproducibility, skewed distribution and non-homogeneous variance strongly suggest the need to re-formulate questions on this topic.

Concerning compound indices, single items exhibited a lower reliability than the overall index; however, this result was in a way expected since, as theories on measurement also dictate, a compounded index has a higher reliability than its single component, provided it is internally consistent.24 Furthermore, such findings support the usefulness of computing compounded exposure indices from multiple item measures.

Few other studies have directly investigated reproducibility9–11 and recall bias8 in gathering information on skin cancer risk factors. The study by Berwick and Chen9 investigated the reproducibility of reported sunburn history, skin reaction to sunlight and freckling. They found lower agreement than we did in our study, in particular for sunburn and freckling, but no evidence of differential misclassification, with the exception of freckling. In another study by Westerdahl,10 the reproducibility of a self-administered questionnaire was investigated by interviewing controls only, with encouraging results showing fair to good agreement of questions on skin characteristics, sunburns, use of sun-beds and sunscreens, with the only exception of self-reported number of raised naevi. On the other hand, correction of OR for misclassification with univariate methods did not substantially change the estimates. Weinstock et al. found evidence of recall bias in self-reported information on tanning ability collected through questionnaires mailed at first to a cohort of nurses and later to the ones who were diagnosed with a melanoma in the framework of a nested case-control study.8 On the contrary, hair colour showed good and non-differential reliability. However, as the authors themselves pointed out, the wording of the questions in the two questionnaires was not the same. A reproducibility analysis based on re-interviews, about 5 years apart, of subjects recruited in the Geraldton case-control study11 reported higher or comparable results for overall sun exposure indices. However, their results for pattern of exposure were much lower than ours for intermittent patterns, and notably, for exposure during holidays (Kw ranging from 0.29 to 0.45 according to age intervals). The authors pointed out that such low reproducibility could be due to the lack of reporting frequencies of exposure at the specific anatomical site as they simply asked whether the site was usually exposed or not. On the contrary, they found higher reproducibility for number of lifetime sunburns than ours, even if the highest value of Kw for sunburns was only 0.56 in cases. The authors also investigated if the differences were affected by factors such as case status, age and sex, finding out that only case-control status showed some effects. These results can be due, as underlined in the paper, to the longer time between interviews, and to the high-profile campaign to reduce sun exposure and to improve sun protection behaviours. However, a similar health education campaign were almost absent in countries where and when our study was held.

No substantial differences were found between answers of cases and controls during the two interviews, with the exception of sun exposure during holidays at the beach in childhood. However, the reproducibility sample may not be fully representative of the original study population, where measurement error could be substantially different. The bias could have led OR estimates towards the null value in the original analysis, although these results are also compatible with the hypothesis of an increase in the recall bias (tendency to report higher exposures among cases) with time. However, it is noteworthy that this was the only significant finding among 20 tests performed in this analysis. Indeed, the level of significance confirmed the possibility that this result would have appeared only by chance.

This favourable finding cannot be considered as a proof of the lack of recall bias, since a more appropriate study design should have measured exposures before patients were aware of a skin cancer diagnosis, as suggested by the published comments25–27 to the previously mentioned Berwick's analysis.9 However, we have shown that random error affected measures within subjects in a comparable way. The sequential administration of the same questions may raise the concern that the completion of the first questionnaire may influence the response to the following one, with reliability biased towards higher values. However, in our study some time elapsed between two questionnaires (18 and 24 months). Among case-control studies, those on diets had to deal with similar problems. However, studies on the reproducibility of questionnaire on diet in general showed a lower agreement than those found for skin characteristics and sun exposure.28–30 Levels of agreement comparable to our study were found in those studies where exposures were related to precise life events rather than to general habits, such as, for example, occupational exposure or risk factors related to pregnancy.

Correcting the OR reinforced effects of sun exposure, eye colour and skin reaction to sun for both BCC and SCC. Such an increase in the sun exposure effect is due to the fact that reproducibility is higher than the one showed by skin characteristics. Age at the first sunburn showed high instability in its multivariate reliability estimates as shown by standard error bootstrap estimates. This finding not only inflated the confidence interval, but also showed that the direction of correction is not easily predictable. On the other hand, Greenland12 and later Armstrong20,21 showed and discussed the need for multivariate correction of relevant variables measurement errors precisely in relation to the unpredictability of effects on risk estimates.

The method used to correct OR needs some assumptions, usually violated in case-control studies, such as estimation of the true values covariance matrix from the general population. However, Helios was a population-based case-control study with incident cases and with controls drawn from general populations not yet affected by mass health campaigns on skin cancer prevention at the time of the study. Effects of the inclusion in the model of ordinal or nominal variables measured with error has not been studied yet, however, estimate robustness was also supported by applying the bootstrap procedures to both the main and the reproducibility data sets, generating 1000 twin samples.

In conclusion, the substantial reproducibility of exposure measures supported confidence in our original findings6,7 where we found that solar radiation is associated with a risk of BCC even for relatively short periods of exposure such as during holidays and outdoor sports, whereas SCC develops later if exposure continues. Skin type modulates the response and prolonged sun exposure can cause BCC even in those who tan well or SCC, as exposure increases. These results were obtained through a questionnaire structured on life events and administered by a trained interviewer. Further improvements in future case-control studies should be mainly achieved by assessing potential recall bias through properly designed studies and by setting more reliable measures of sunburns and skin sensitivity (chronic and acute) to the sun in different period of life with more detailed questions, different wording and visual forms.


KEY MESSAGES

  • This paper reports on the reproducibility of skin characteristics and sun exposure used in a case-control study on non-melanocytic skin cancer.
  • Subjects were re-interviewed after about 2 years showing good agreement with their previous answers on sun exposure habits and pigmentary traits.
  • In contrast, facts dating back many years, such as sunburns and sun exposure during childhood, are remembered less precisely.
  • Lack of agreement was mainly associated with education, and there was a significant difference between cases and controls in their answers for only one indicator of exposure, holidays at the beach during childhood.
  • Multivariate correction of odds ratios for measurement error lead to an increase in the effect of sun exposure, skin reaction to sun and eye colour.

 

Acknowledgments

This study, part of the ‘Helios Project‘, was made possible by the dedication of researchers of the ‘HELIOS GROUP’ (R Zanetti, S Rosso [Torino]; S Franceschi, R Talamini [Aviano]; M Cristofolini [Trento]; L Gafà, R Tumino [Ragusa]; J Wechsler [Créteil]; H Sancho-Garnier [Montpellier]; C Martinez, R Miñarro, E Perea [Granada]; C Navarro, M Tormo [Murcia]; F Joris [Sion]; S Schraub [Strasbourg]; C Schrameck [Villejuif]), and was supported by grants from ‘Europe Against Cancer’ (Grants # 890139, 910539, 920584), from the AIRC (Associazione Italiana per la Ricerca sul Cancro) and from the FISS (Fundo de Investigacion Sanitaria de la Securidad Social) (Grant # 90/E0288). The authors would also like to thank Mrs Manuela Casale and Mrs Franca Foggetti for their valuable help in organizing and conducting the interviews and Mrs Stella Giordanengo for revising the manuscript.

References

1 Kricker A, Amstrong BK, English DR. Sun exposure and non-melanocytic skin cancer. Cancer Causes Control 1994;5:367–92.[ISI][Medline]

2 Weinstock MA. Issues in the epidemiology of melanoma. Hematol Oncol Clin North Am 1998;12:681–98.[ISI][Medline]

3 Kricker A, Armstrong BK, English DR, Heenan PJ. Pigmentary and cutaneous risk factors for non-melanocytic skin cancer—a case-control study. Int J Cancer 1991;48:650–62.[ISI][Medline]

4 Gallagher RP, Hill GB, Bajdik CD et al. Sunlight exposure, pigmentary factors, and risk of nonmelanocytic skin cancer. I. Basal cell carcinoma. Arch Dermatol 1995;131:157–63.[Abstract]

5 Gallagher RP, Hill GB, Bajdik CD et al. Sunlight exposure, pigmentary factors, and risk of nonmelanocytic skin cancer. II. Squamous cell carcinoma. Arch Dermatol 1995;131:164–69.[Abstract]

6 Zanetti R, Rosso S, Martinez C et al. The multi-centre South European study ‘HELIOS’ I: skin characteristics and sunburns in basal-cell and squamous-cell carcinomas of the skin. Br J Cancer 1996;73:1441–46.

7 Rosso S, Zanetti R, Martinez C et al. The multi-centre South-European study ‘HELIOS’ II: different sun exposure patterns in the etiology of basal-cell and squamous-cell carcinomas of the skin. Br J Cancer 1996;73:1448–54.

8 Weinstock MA, Colditz GA, Willett WC et al. Recall (Report) bias and reliability in the retrospective assessment of melanoma risk. Am J Epidemiol 1991;133:240–45.[Abstract]

9 Berwick M, Chen Y-T. Reliability of reported sunburn history in a case-control study of cutaneous malignant melanoma. Am J Epidemiol 1995;149:1033–37.

10 Westerdahl J, Anderson H, Olsson H et al. Reproducibility of a self-administered questionnaire for assessment of melanoma risk. Int J Epidemiol 1996;25:245–51.[Abstract]

11 English DR, Armstrong BK, Kricker A. Reproducibility of reported measurements of sun exposure in a case-control study. Cancer Epidemiol Bio Preven 1998;7:857–63.

12 Greenland S. The effect of misclassification in the presence of covariates. Am J Epidemiol 1980;112:564–69.[Abstract]

13 Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37–46.[ISI]

14 Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70:213–20.[ISI]

15 Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 1973;33:613–19.[ISI]

16 Fleiss JL, Cohen J, Everitt BS. Large sample standard errors of kappa and weighted kappa. Psychol Bull 1969;72:323–27.[ISI]

17 Fleiss JL, Cicchetti DV. Inference about weighted kappa in the non-null case. Appl Psychol Meas 1978;2:113–17.

18 Bartko JJ. General methodology II. Measures of agreement: a single procedure. Stat Med 1994;13:737–45.[ISI][Medline]

19 Kupper LL. Effects of the use of unreliable surrogate variables on the validity of epidemiologic research studies. Am J Epidemiol 1984;120: 643–48.[Abstract]

20 Armstrong BG. Measurement error in the generalised linear model. Commun Statist Simul Comput 1985;14:529–44.

21 Armstrong BG. Analysis of case-control data with covariate measurement error: application to diet and colon cancer. Stat Med 1989;8: 1151–63.[ISI][Medline]

22 Rosner B, Spiegelman D, Willet WC. Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. Am J Epidemiol 1992;136:1400–13.[Abstract]

23 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74.[ISI][Medline]

24 Nunnally JC. Psychometric Theory. London: McGraw-Hill, 1978.

25 Gefeller O, Brenner H. Re: ‘Reliability of reported sunburn history in a case-control study of cutaneous malignant melanoma’ (Letter). Am J Epidemiol 1996;144:707.[ISI][Medline]

26 Fox BH. Re: ‘Reliability of reported sunburn history in a case-control study of cutaneous malignant melanoma’ (Letter). Am J Epidemiol 1998;148:620.[ISI][Medline]

27 Gefeller O, Brenner H. Gefeller and Brenner reply (Letter). Am J Epidemiol 1998;148:621.[ISI]

28 Kune S, Kune GA, Watson LF. Observation on the reliability and validity of the design and diet history method in the Melbourne Colorectal Cancer Study. Nutr Cancer 1987;9:5–20.[Medline]

29 Morabia A, Moore M, Wynder EL. Reproducibility of food frequency measurement and inferences from a case-control study. Epidemiology 1990;1:305–10.[Medline]

30 Lindsted KD, Kuzma JW. Reliability of eight-year diet recall in cancer cases and controls. Epidemiology 1990;1:392–401.[Medline]