Analysing the difference due to risk and demographic factors for incidence or mortality

Sa Bashira,b and J Estèvea

a Service de Biostatistique, Batiment 1M, Centre Hospitalier Lyon Sud, 165 Chemin du Grand Revoyet, 69495 Pierre-Benite, France.
b Current address: Amgen Ltd, 240 Cambridge Science Park, Milton Road, Cambridge CB4 0WD, UK.


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
Background From an epidemiological and public health perspective there is an interest in quantifying differences in incidence and mortality between either time points, geographical areas or males and females. We propose a method for splitting such a difference in the number of cases/deaths into three components: (1) those due to risk; (2) those due to population structure (i.e. age distribution); and (3) those due to population size. We also propose graphical methods for presenting the results. Three examples are used to illustrate our methodology.

Keywords Incidence, mortality, risk, demography, rate, cancer

Accepted 24 March 2000


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
Temporal and geographical trends in disease incidence and mortality provide important information to epidemiologists. However, when it comes to analysing the difference between two given time points, two geographical areas or even between the sexes there is no clear reference as to how this is done in practice with a mathematical foundation.

Methods have been used that do not adjust the population leading to incorrect results.1,2 Macfarlane et al.2 present the percentage change in the crude rate and the percentage change in the number of deaths for cancers of the upper aerodigestive tract in a variety of different countries. Their aim is to relate these changes to national consumption of alcohol and smoking. However, just looking at the crude rate does not provide a change in risk as structural changes have not been considered. Further they state, ‘Numbers of deaths increased in some of the countries with decreasing rates, due to an ageing of the population’, but no supporting analysis has been provided. Even if the rate decreases and the population size increases (e.g. doubles or triples) without affecting the structure then we could still see an increase in the absolute number of deaths and this would not be as a result of population ageing. In this situation we would recommend using the method we describe below to calculate the risk and demographic changes. In order to decide whether the structural changes are due to population ageing, one would need to study the population proportions more closely. Engeland et al.1 essentially use the same method as us but they do not adjust the populations to the same size. If the population size and structure does not change the lack of adjustment does not result in large errors. Hence the population size influences the changes due to risk and structure.

In this paper we present a method for partitioning the variation in the number of cases or deaths between two groups (or chronological dates) with respect to demographic variation on the one hand and differences in exposure to risk factors on the other. The demographic component of the variation has to be split itself into that due to variation in population size, which is trivial, and that due to the change in the population structure (i.e. age distribution) which needs more attention.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
Assume that we have the age-group specific number of cases/ deaths and population statistics for two ‘groups’ that are going to be compared, and that the age groups are the same for both groups. Further let group 1 be the baseline group (i.e. the reference group) and group 2 the comparison group (i.e. this group will be compared to the baseline group). We are interested in partitioning the difference in the total number of cases/deaths between these two groups into three components: (1) differences due to the population size; (2) differences due to the population structure (age distribution); and (3) differences due to the risks.

To eliminate the effect of the population size we start by adjusting the populations so that they comprise of the same number of people (100 000, say) but keep their specific age distribution, i.e. the proportion in each age group are the same as in the total population. This is equivalent to working on the crude rate instead of the number of cases.

Then, we have only to partition the difference in the crude rates between those due to differences in the population structures and those due to the differences in the risks. This is done by comparing the rates in the two groups to an intermediate rate obtained by applying the baseline age-specific incidence/ mortality rate to the age distribution of the comparison group.

where S1 and S2 are the crude rates (per 100 000 people) for group 1 and group 2, respectively. S3 is the ‘intermediate’ rate obtained when the baseline age-group specific rate is applied to the comparison group population. S3 is an age-standardized rate for the baseline group using the population of the comparison group as the standard population (direct standardization3). Alternatively, S3 is also the expected rate of the comparison group considering the baseline group as the standard population (indirect standardization3).

The first component on the right-hand side of equation (1) represents the proportional change in the crude rates due to differences in the population structure (i.e. age distribution) and the second component those due to differences in the risk.

The full algebraic formulation is given in the Appendix.


    Examples
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
Differences between time points
To illustrate the idea more clearly we use data extracted from the WHO mortality database for lung cancer in French males. We aim to analyse the change between, say, 1970 and 1980, in terms of differences in demography and risk.

Table 1Go shows the traditional age-specific data (grouped into 5-year intervals) for the number of observed deaths and population. It can be seen that the total number of deaths increases from 9876 to 15 258 between 1970 and 1980 (i.e. an increase of 5382 [54.5%] deaths). However, looking at the absolute number of deaths can be misleading as the population increased but not uniformly. The total population increased by 6.1%; by about 37% in the age group 80–84 but it decreased in the age group 35–49 years.


View this table:
[in this window]
[in a new window]
 
Table 1 Observed number of deaths due to lung cancer in French males with population numbers and the populations adjusted to 100 000 people
 
Table 2Go gives the calculations needed for quantifying the differences due to risk and demography where


View this table:
[in this window]
[in a new window]
 
Table 2 Calculations for S1, S2 and S3 as shown in the Appendix
 
R1 is the set of expected age-group specific number of deaths in 1970 for a total population of 100 000. For example, for the 50–54 age group (Table 1Go) we have 2.074 = (514/1 017 700) x 4105. S1 is the sum of the age-specific R1s (i.e. S1 is the crude rate for 1970).

R2 is the set of expected age-group specific number of deaths in 1980 for a total population of 100 000. For example, for the 70–74 age group (Table 1Go) we have 10.806 = (2843/883 300) x 3357. S2 is the sum of the age-specific R2s (i.e. S2 is the crude rate for 1980).

R3 is the set of expected age-group specific number of deaths in 1980 if the risk was the same as in 1970 for a total population of 100 000. For example, for the 65–69 age group (Table 1Go) we have 7.699 = (2159/1 061 700) x 3786. S3 is the sum of the age-specific R3s (i.e. S3 is the age-standardized rate for the baseline group (1970) using the comparison group (1980) population as standard).

We are interested in splitting the difference between the total expected number of deaths in 1980 (S2) and 1970 (S1) for two populations of 100 000 (i.e. we want S2S1). We can break the simultaneous change in rate and structure into two steps. Our baseline (or starting point) is the total expected number of deaths in 1970 in a population of 100 000.

Step 1 hold the rate constant and change the population to that of 1980. This gives us the total expected number of deaths in 1980 (for a population of 100 000) if the risk had not changed since 1970 (S3). The difference between this and the baseline is due to structural changes in the population (i.e. S3 S1).

Step 2 change the rate to that of 1980. This gives us the total expected number of deaths in 1980 for a population of 100 000. The difference between this and the first step is due to changes in risk (i.e. S2S3).

Hence this means that the main interest focuses on the change in crude rate between 1970 and 1980 (i.e. S2S1). It is more interesting to look at the proportion (or percentage) increase compared to the baseline of 1970 and to get this we simply divide by S1 (i.e. [S2S1]/S1).

Table 3Go (using the results from Table 2Go) gives the breakdown in the change for lung cancer mortality in French males between 1970 and 1980. We can see that the crude rate increases by 18.2 per 100 000 population between 1970 and 1980, i.e. an increase of 45.5% ([18.2/39.0] x 100%). Of this, 17.5 (58.0 – 40.5) per 100 000 population was due to the change in risk, i.e. an increase of 43.8% ([17.5/39.0] x 100%) and 0.7 (40.5 – 39.8) due structural changes in the population, i.e. an increase of 1.7% ([0.7/39.8] x 100%). In terms of the absolute number of deaths, there was an increase of 5382 deaths of which 4327 (9876 x 43.8%) were due to the increased risk, 172 (9876 x 1.7%) were due to structural changes and the remaining 882 due to changes (i.e. an increase) in the size of the population.


View this table:
[in this window]
[in a new window]
 
Table 3 Analysis of the difference in lung cancer mortality for French males between 1970 and 1980 in terms of risk and demographic factors
 
This increase of 882 deaths due to the difference in the size of populations may at first appear to be misleading as it represents an increase of 9.0% whereas we stated above that the population had only increased by 6.1%. The extra 2.9% comes from the fact the structural and risk changes also affect the increase in the population hence the 45.5% increase in the crude rate has to be applied to the ‘extra’ 6.1% men in the population (i.e. 9.0% = 6.1% x 1.455).

Differences between geographical areas
Table 4Go shows the number of observed deaths due to lung cancer for males in 1990 for Denmark and the United Kingdom (UK). Here it can be seen more clearly that looking at the increase in absolute number of deaths is not relevant as the population in the UK is about 11 times the size of that in Denmark.


View this table:
[in this window]
[in a new window]
 
Table 4 Observed number of deaths due to lung cancer in 1990 for males in Denmark and the UK
 
Table 5Go shows that there is a 11.7% difference in the crude rate between Denmark and the UK (note that Denmark is being used as the baseline group). The risk in the UK is 12.6% greater than that in Denmark, i.e an extra 10.8 deaths per 100 000 population in the UK due to risk. However, due to differences in the population structure, there is a –0.9% difference in the crude rate between Denmark and the UK, i.e. 0.8 less deaths per 100 000 population in the UK. In terms of absolute numbers this means that if the UK had the same size population as Denmark in 1990 for males then there would have been an extra 274 deaths due to risk and 20 less deaths due to the different population structures. Hence the remaining 24 490 deaths are due to the larger population size in the UK.


View this table:
[in this window]
[in a new window]
 
Table 5 Analysis of the difference in lung cancer mortality for males in 1990 between Denmark and the UK in terms of risk and demographic factors
 
Difference between sexes
Table 6Go shows the number of observed deaths due to lung cancer for Irish males and females in 1990. Here the populations are of similar size but the males have a much larger number of deaths.


View this table:
[in this window]
[in a new window]
 
Table 6 Observed number of deaths due to lung cancer in 1990 in Ireland for males and females
 
Table 7Go shows that there is a 115.9% difference in the crude rate between males and females (note that females are being used as the baseline group). The risk in the males is 133.5% greater than that in females, i.e. an extra 36.9 deaths per 100 000 population in the males due to risk. However, due to differences in the population structure, there is –17.6% difference in the crude rate between males and females. i.e. 4.9 less deaths per 100 000 population in the males. In terms of absolute numbers this amounts to 648 more deaths due to risk, 85 less deaths due the population structure and 4 less deaths in the male population.


View this table:
[in this window]
[in a new window]
 
Table 7 Analysis of the difference in lung cancer mortality between Irish males and females in 1990 in terms of risk and demographic factors
 

    Graphical presentation
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
The graphical presention of these results can get quite complicated and we propose a method which should make it easier to interpret. Firstly, we will only deal with two components, the first is the risk component and the second is for the demographic component. If we are interested in presenting the change in the crude rate then the demographic component will just represent the change in population structure. However, when it is relevant to present the difference in number then the demographic component will represent the change in the population size and structure.

Figure 1Go is used to illustrate how we can present the differences under all possible scenarios. There are two situations with a total of six scenarios. In the first situation both the risk and demographic differences could be of the same sign (leading to two possible scenarios as represented by bars 1 and 2 in Figure 1Go) or they could be of opposite signs (leading to four possible scenarios as presented by bars 3–6 in Figure 1Go). Below we explain how to interpret the change as presented in Figure 1Go (bearing in mind the comments about the demographic component above):



View larger version (20K):
[in this window]
[in a new window]
 
Figure 1 A bar chart for the six possible situations when describing the overall change in the crude rate or the absolute number

 
Bar 1 We have a net change of +50% in the crude rate (absolute number) of which +10% is due to demographic factors and +40% due to risk.

Bar 2 We have a net change of –50% in the crude rate (absolute number) of which –10% is due to demographic factors and –40% due to risk.

Bar 3 We have a net change of +30% in the crude rate (absolute number) of which –10% is due to demographic factors and +40% due to risk. Note that the +40% change due to risk is represented by the total length of the bar.

Bar 4 We have a net change of –10% in the crude rate (absolute number) of which +30% is due to risk and –40% is due to demographic factors. Note that the –40% change due to demographic factors is represented by the total length of the bar.

Bar 5 We have a net change of +30% in the crude rate (absolute number) of which –10% is due to risk and +40% due to demographic factors. Note that the +40% change due to demographic factors is represented by the total length of the bar.

Bar 6 We have a net change of –10% in the crude rate (absolute number) of which +30% is due to demographic factors and –40% due to risk. Note that the –40% change due to risk is represented by the total length of the bar.

Further if the bars did not have the net change indicated, it would not be possible to work out whether Bars 3 and 4, and Bars 5 and 6 represented an overall positive or negative difference from the graphs alone. A marker must be added to indicate the end point of the sum of the two components (i.e. the net change). We could use a thick line for this purpose instead of giving the net change in figures.

Bars 3 to 6 can be difficult to interpret initially but if one thinks of starting at zero and then going in the opposite direction of the net change this simplifies the interpretation. So, for example, if we look at bar 4 again: we start at zero and firstly there is a 30% increase in risk (i.e. going in the opposite direction to the net change) which is offset by a –40% change in demographic factors (i.e. now starting at +30% and going down by 40%) leading to a net change of –10%.

Example
We use the results from ‘Differences between time points’ and ‘Differences between geographical areas’ described earlier under ‘Examples’ to illustrate the use of these graphs (Tables 3 and 5GoGo). We will only look at the differences in the crude rate and hence the difference due demographic factors will be due to the population structure only. The results are shown in Figure 2Go.



View larger version (10K):
[in this window]
[in a new window]
 
Figure 2 Bar ‘France’ represents the change in the crude rate for lung cancer in French males between 1970 and 1980. Bar ‘UK–Den’ represents the difference in the crude rate for lung cancer between Denmark and the UK in 1990

 
The first bar shows the difference in the crude rate for lung cancer in French males between 1970 and 1980 (Table 3Go). Here there is an overall change of 45.5% of which 43.8% is due to the increase in risk and 1.7% due to the population structure. The second bar shows the difference in the crude rate for lung cancer between Denmark and the UK in 1990 (Table 5Go). Here there is a net increase of 11.7% of which there is a 12.6% increase in the risk and a 0.9% decrease due to the population structure.


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
We have presented a method of analysing differences in disease incidence and mortality due to risk and demography between two groups. These groups may be two points in time, two geographical areas or both sexes. We used a two-step procedure to breakdown the difference. Starting from a baseline, in the first step we change the (adjusted) population (to a size of 100 000) to that of our comparison group (also adjusted to a size of 100 000) which gives us the structural differences in the population (i.e. differences in the age distribution). In the second step we change the rate to that of the comparison group resulting in the difference due to risk and the remaining number are due to the differences between the size of the two populations.

Care must be taken in presenting the differences in disease incidence or mortality as we can present this difference in terms of the crude rate or absolute number. Although we would advocate the use of the former it is very difficult to get away from the latter. From the public health perspective the absolute number is more useful. However, we must be careful not to misrepresent the true situation if there is a large difference between the two comparison populations. For example, it is quite plausible that the comparison population is many times larger than the baseline population (even between two time points, e.g. developing countries) but at the same time the risk is decreasing. Here we could still see an increase in the absolute number but we should be cautious as more importantly the risk is decreasing.

We also proposed a graphical method for the presentation of the two components (i.e. risk and demographic) using bar charts. However, we would advocate the use of tables ahead of any graphical presentation but at the same time concede there are probably situations in which one needs to present the information graphically.

In this paper we used the raw data in quantifying the difference in lung cancer mortality with respect to demographic and risk factors. However, we would recommend that the data are smoothed before doing such analysis. For example, before comparing two time points we could have done an analysis of time trends using age-period-cohort modelling.4.5 Here the comparison would have been made using the fitted values. Similarly, one could do some form of spatial smoothing before comparing two geographical points.6

Looking at two points may not be useful or it may not paint a clear picture of what is actually happening. We would suggest that multiple comparisons are made to see how the demographic and risk factors change. For example, looking at our example of lung cancer in French males, it would be more informative to look at the evolution of change. Figure 3Go and Table 8Go show the yearly change between 1970 and 1990 (using 1970 as a baseline). Here we can see that the risk increases faster between 1970 and 1978 compared to between 1979 and 1990. An important feature that would have been missed is that the gap between the net change and the change due to risk is increasing. This means that the changes due to the population structure are increasing (as can be seen from Figure 3Go and Table 8Go). We could have also presented these results as described in ‘Graphical presentation’.



View larger version (11K):
[in this window]
[in a new window]
 
Figure 3 Changes in the crude mortality rate for lung cancer in French males since 1970. • = risk changes; + = structural changes; —— = net change

 

View this table:
[in this window]
[in a new window]
 
Table 8 Percentage change in the crude rate for lung cancer in French males compared to the baseline year of 1970
 
One could also look at the change over time when comparing two geographical areas or even both sexes. Finally, a truncated age distribution could be used when making comparisons (e.g. 30–64 years).


    Appendix
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
Our aim is to partition the difference in the number of cases (deaths) for two given comparison groups. This difference can be split into differences in risk and those due to differences in demographic factors (i.e. population size and structure).

Assume that we have two comparison groups 1 and 2 for which we have the incidence (or mortality) and populations data by age groups. Further we assume that there are an equal and corresponding number of age groups for both comparison groups, say N.

Let us assume that we have C1 and C2 cases/deaths and a total population of P1 and P2 in groups 1 and 2, respectively. The relative difference between C1 and C2 can be expressed as

which is the quantity to be analysed. It can be split into the relative difference between the rates.

in noting that and the relative difference between P2 and P1.

It is therefore sufficient to analyse (S2/S1) – 1. Note that the second term above reflects that the change in the population size generates a change in the number of cases which depends on the changes in the crude rate.

This quantity (i.e. [S2/S1] – 1) is split into a component due to the differences in risk and a component due to differences in the population structure (i.e. age distribution).

Let use define

{lambda}ix rate in age group x for group i(2)

{omega}ix proportion of population in age group x for group i(3)

where i is 1 or 2. We will be using group 1 as the baseline group for comparison.

We are interested in analysing the difference in the crude rate between groups 1 and 2. Let S1 and S2 be the crude rates in comparison group 1 and 2, respectively. So we want to analyse the relative difference in the crude rate, i.e.

in terms of risk and population structure (i.e. age distribution). Using (2) and (3), let S3 be {Sigma}x{lambda}1x{omega}2x that is the rate in group 1 applied to the population proportion in group 2. We have

which translates into

The first component on the right-hand side represents the proportion of the difference in the crude rate between groups 1 and 2 due to the differences in the population structure and the second component represents the proportion due to differences in the risk.


    Acknowledgments
 
SAB was funded by the European Union's ‘European Net-work of Cancer Registries’ which is part of ‘Europe Against Cancer’. This work was completed whilst SAB was at the Service de Biostatistique, Batiment 1M, Centre Hospitalier Lyon Sud, 165 Chemin du Grand Revoyet, 69495 Pierre-Benite, France.


    References
 Top
 Abstract
 Introduction
 Methods
 Examples
 Graphical presentation
 Discussion
 Appendix
 References
 
1 Engeland A, Haldorsen T, Tretli S et al. Prediction of cancer mortality in the Nordic countries up to the years 2000 and 2010. Acta Path Microbiol Immunol Scand 1995;103(Suppl. 49):148.

2 Macfarlane GJ, Macfarlane TV, Lowenfels AB. The influence of alcohol consumption worldwide trends in mortality from upper aerodigestive tract cancers in men. J Epidemiol Community Health 1996;50:636–39.[Abstract]

3 Breslow NE, Day NE. Statistical Methods in Cancer Research, Vol. II: The Design and Analysis of Cohort Studies. Lyon: IARC Scientific Publications, 1987.

4 Clayton D, Schifflers E. Models for temporal variation in cancer rates. I: Age-period and age-cohort models. Statistics in Medicine, 1987;6:449–67.[ISI][Medline]

5 Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: Age-period-cohort models. Statistics in Medicine, 1987;6:469–81.[ISI][Medline]

6 Estève J, Benhamou E, Raymond L. Statistical Methods in Cancer Research Volume IV: Descriptive Epidemiology. Lyon: IARC Scientific Publications, 1994.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (2)
Request Permissions
Google Scholar
Articles by Bashir, S.
Articles by Estève, J
PubMed
PubMed Citation
Articles by Bashir, S.
Articles by Estève, J