The reliability of the three-dimensional FASTRAK measurement system in measuring cervical spine and shoulder range of motion in healthy subjects

K. Jordan1,2,, K. Dziedzic3, P. W. Jones2, B. N. Ong1 and P. T. Dawes4

1 Centre for Health Planning & Management, Darwin Building, Keele University,
2 Department of Mathematics, Mackay Building, Keele University,
3 Primary Care Sciences Research Centre and Department of Physiotherapy Studies, Keele University, Keele, Staffordshire ST5 5BG and
4 Staffordshire Rheumatology Centre, The Haywood, High Lane, Burslem, Stoke-on-Trent, Staffordshire ST6 7AG, UK


    Abstract
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 References
 
Objectives. To assess the inter-observer and intra-observer reliability of a new three-dimensional measurement system, the FASTRAK, in measuring cervical spine flexion/extension, lateral flexion and rotation and shoulder flexion/extension, abduction and external rotation in healthy subjects.

Methods. The study was conducted in two parts. One part assessed inter-observer reliability with two observers measuring 40 subjects. The other part assessed intra-observer reliability with one observer measuring 32 subjects on three occasions. All subjects had unrestricted, pain-free cervical spine and shoulder movement. Reliability was measured by the intraclass correlation coefficient [ICC(2,1)].

Results. The inter-observer ICCs for the cervical spine ranged from 0.61 to 0.89 and for the shoulder from 0.68 to 0.75. After removal of outliers, all ICCs were above 0.70. Intra-observer ICCs for the cervical spine ranged from 0.54 to 0.82 and for the shoulder from 0.62 to 0.81. After removal of outliers, all ICCs were above 0.70 except for shoulder abduction (0.62).

Conclusions. Whilst all movements measured by the FASTRAK showed good reliability, the reliability of the whole movement in a plane (e.g. left plus right lateral flexion) was better than for the separate movements (e.g. left and right lateral flexion taken separately). Inter-observer reliability was generally better than intra-observer reliability for most cervical spine movements, suggesting that variability of movement within subjects (e.g. over a period of days) for these movements was greater than variability between measures on the same occasion.

KEY WORDS: Cervical spine, Shoulder, Range of motion, FASTRAK, Reliability.


    Introduction
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 References
 
Measuring the degree of joint mobility is an important guide to assessing disability and an important aid for diagnosis of many conditions [1]. This is particularly true for disorders of the cervical spine and the shoulder joint complex and conditions such as ankylosing spondylitis (AS) and rheumatoid arthritis. Measures of cervical spine range of motion have been shown to correlate to radiological change in AS [2]. Assessing range of motion in AS is an aid to diagnosis and also contributes to assessment of change over time in the disease. A review of the literature reveals a number of tools which have been promoted as capable of reliably measuring range of motion. These include various forms of inclinometer and the tape measure for the cervical spine [3–7] and different forms of the goniometer for the shoulder [8–11]. The problem with these tools is that they can only measure the maximum range of motion in one plane. They cannot build composite pictures of movement which would include combinations of planes of movement and velocity of movement and can, therefore, only produce artificial patterns of movement. Further, the reliability studies on these tools often have major flaws in design or analysis or unconvincing results [12].

The FASTRAK, developed by Polhemus Incorporated (Colchester, VT, USA), is an electromagnetic three-dimensional tracking system able to locate the position and orientation of up to four small, remote sensors placed on the relevant parts of the body. It provides dynamic, real-time six degrees of freedom measurement in that it computes the position (X,Y,Z Cartesian coordinates) and orientation (azimuth, elevation and roll) of the sensor through space relative to the source transmitter. Each sensor, therefore, can measure data in three planes of joint motion: the primary plane of movement and the two secondary planes, collecting range of motion and speed over the time period of the movement. The FASTRAK has been used to assess the reproduction of a neutral lumbopelvic position following movement into flexion [13], primary and coupled rotations of the thoracic spine [14] and to assess the reproducibility of position sense measurements of the spine [15]. Software to read angular movement in three dimensions from the FASTRAK and to derive velocity and acceleration has been designed in Staffordshire, UK. This data can either be displayed graphically (as shown in Fig. 1Go) or recorded as numerical data.



View larger version (117K):
[in this window]
[in a new window]
 
FIG. 1. Example of a FASTRAK trace of cervical spine range of motion.

 
Johnson et al. [16] produced a small reliability study with a limited analysis aiming to determine the reliability of the predecessor to the FASTRAK, the ISOTRAK, on shoulder movement. The electromagnetic source had to be attached to the upper arm and the ISOTRAK has only a single sensor which was attached over the sternum. The FASTRAK allows the use of multiple (up to four) sensors and the source does not have to be attached to the subject, hence removing a possible hindrance to movement.

It was hoped that if the FASTRAK could be demonstrated to be a reliable measure of normal joint motion, a three-dimensional picture of movement to aid the clinical assessment of disease could then be developed. If the FASTRAK cannot reliably measure range of motion in the primary plane, then subsequent inferences about combined movements and velocity would not be possible. Current measurement tools are limited in their ability to measure motion of the cervical spine and shoulder motion. If the FASTRAK can be seen to be a reliable measure of these complex joint systems, then it should be relatively straightforward to apply the FASTRAK elsewhere. This paper reports on the first part of a study on the FASTRAK. This part of the study assessed the reliability of the FASTRAK in the primary plane of movement, both between and within observers.


    Subjects and methods
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 References
 
The reliability study was conducted in two parts; one assessing the inter-observer reliability and one assessing the intra-observer reliability of the FASTRAK. Subjects were recruited into one or other of the study parts and were recruited from volunteers amongst staff and postgraduate students from Keele University and the Staffordshire Rheumatology Centre. Background details of the subjects are given in Table 1Go. A screening questionnaire ascertained eligibility for the study. Evidence of current cervical spine or shoulder problems, receipt of medical care for cervical spine or shoulder problems within the past 12 months, operation to the cervical spine or shoulder, diagnosis of spondylosis or osteoporosis, attacks of dizziness, having ever suffered from a major illness and pregnancy were all exclusion criteria. Height and weight were recorded.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Background of subjects

 
Movements in each plane were performed consecutively without stopping and repeated so that there were three measurements of each movement. The FASTRAK was centred at 0° before movements in a new plane began. Movements were always performed in the same order.

For the cervical spine, the planes of movement were those relating to the movements of flexion and extension, lateral flexion to left and right and rotation to left and right, making six movements in total. Subjects were seated in a wooden chair. One sensor was fixed to a pair of safety spectacles and, hence, situated on the forehead. A second sensor was fixed on the sternum (to measure secondary trunk movement) using a Velcro strap. In terms of the reliability analysis reported here, only the forehead sensor was analysed, and the maximum of the three repetitions in the primary plane of movement was taken in this analysis. Subjects were asked to move as far as they could without moving their shoulders.

For the shoulder, the movements measured were flexion and extension, abduction and external rotation from the neutral position with elbow flexed to 90°. One sensor was placed just above the elbow using a Velcro strap, one on the acromion process using double-sided tape and, again, one on the sternum. In terms of the reliability analysis, only the elbow sensor was analysed, and the maximum of the three repetitions in the primary plane only was taken. The dominant shoulder was used for each subject. For flexion/extension and abduction, subjects were instructed to stand with their arm by their side and thumb pointing forward. For external rotation, subjects were told to keep their elbow into their side. Subjects were asked to move as far as they could in the requisite plane of movement.

The position and orientation of each sensor was computed relative to the source transmitter which was placed on a wooden pedestal behind the subject.

Inter-observer reliability
Two observers were used in this part of the study. One was a research physiotherapist (KD) with 16 yr of clinical experience and familiar with measuring range of motion with a goniometer and tape measure. The other observer was a non-clinician (KJ). The observers first performed a pilot study to establish the protocol for the movement and measurement process and to ensure adequate training of both observers in utilizing the FASTRAK.

To measure inter-observer reliability, subjects were tested with the order of observers randomized. The second observer followed the first immediately but was blind to their procedure. The subject performed each movement three times prior to the sensors being attached by the first observer. This constituted the warm-up for the subject. The sensors were then affixed to the subject. Instructions were repeated before each movement and the subject asked to perform the movement. After completion, the observer removed all the sensors before the second observer came into the room. The second observer replaced the sensors and explained each movement to the subject before asking him/her to perform it.

Intra-observer reliability
Subjects were measured on three occasions, 2 weeks apart, by a single observer (KJ). As far as possible, subjects were measured at the same time of day on each occasion. On each occasion each of the movements were described and the subject performed them without sensors. This again constituted the warm-up for the subject. The sensors were then affixed to the subject. Instructions were repeated before each movement.

Analysis
The intraclass correlation coefficient (ICC) based on the two-way random effects ANOVA for a single measurement, labelled ICC(2,1) [17], was used. The ICC ranges from 0 to 1 with 1 indicating perfect reliability. The ICC assesses the proportion of the total variability that is explained by the variation between subjects. Obviously we would expect some variation due to the subject; a subject is unlikely to be able to replicate exactly a movement at each time of asking. However, the assumption behind the ICC is that the variation between subjects should be much greater than that within subjects either on different occasions or measured by different observers. It is a more appropriate statistic for reliability than either the commonly used Pearson's correlation coefficient, the paired t-test or repeated measures ANOVA which have severe flaws in assessment of reliability [18].

An inspection for outliers amongst the subjects was also made. These were based on boxplots of the differences between the observers for each subject in the inter-observer study and the maximum difference between occasions for each subject in the intra-observer study. Outliers were defined as those cases over 1.5 times the interquartile range above the upper quartile figure (the SPSS for Windows convention). Normal probability plots and limits of agreement plots [19] were also used to examine potential outliers. SPSS for Windows version 8.0 was used for this analysis.

Sample size
Sample sizes were calculated based on the formula by Donner and Eliasziw [20] as the ICC was used to measure reliability. Based on two observers, a sample size of 40 is appropriate for assessing whether the value of the population ICC is above 0.6 and an assumed real value of 0.8. Similarly, based on these same values, a sample size of 32 is appropriate for the intra-observer reliability study based on three occasions.


    Results
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 References
 
A summary of the range of motion for each movement over all 72 subjects is given in Table 2Go. This is based on the measurements with KJ as an observer and, for those in the intra-observer study, the first occasion.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Summary of range of motion (°)

 
Levene tests suggested no inequality of variances. There was some apparent lack of normality for some of the movements but transformations to correct this had very little effect on the resultant ICC values or confidence limits. Hence, the ICC values reported here were based on the untransformed data. Inter-observer ICC values for the cervical spine are shown in Table 3Go and for the shoulder in Table 4Go. Intra-observer ICC values are shown in Tables 5Go and 6Go.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Inter-observer ICCs and mean difference between observers for cervical spine

 

View this table:
[in this window]
[in a new window]
 
TABLE 4. Inter-observer ICCs and mean difference between observers for shoulder

 

View this table:
[in this window]
[in a new window]
 
TABLE 5. Intra-observer ICCs for cervical spine

 

View this table:
[in this window]
[in a new window]
 
TABLE 6. Intra-observer ICCs for shoulder

 
Inter-observer reliability
Analysis of variance suggested no observer bias in cervical spine movements but that there was an order effect in cervical spine flexion, lateral flexion left and rotation right (P < 0.05) and for total flexion + extension and total rotation (P < 0.01). For all five of these measurements, the first observer's measurement tended to be greater than the second, regardless of who was the first observer.

ICC(2,1) values and mean differences between observers for the cervical spine are given in Table 3Go. All the ICC values were 0.70 or above except for left and right lateral flexion (above 0.60). There was one apparent outlier for flexion, left and right lateral flexion and right rotation, two for total lateral flexion and three for total rotation. The ICC values with these outliers removed were all above 0.70 and one-sided confidence lower limits were all above 0.60 except for right lateral flexion and right rotation (both 0.56).

Although there did not seem to be an order effect in the shoulder movements as there was for some of the cervical spine movements, there does appear to have been some observer bias for abduction. KD's measurements for abduction being on average around 6° (mean) more than KJ.

The ICC(2,1) values and mean differences between observers for the shoulder are given in Table 4Go. All ICC values were above 0.68. When one outlier was removed for each movement (two for abduction), all ICC values were above 0.70 and one-sided confidence lower limits were all above 0.60 except for flexion (0.58).

Intra-observer reliability
There did seem to be an effect of time on the cervical spine movements with measurements for the cervical spine falling significantly (P < 0.01 except rotation left and rotation total, P < 0.05) from occasion one for all movements apart from flexion and rotation right. However, there was no time effect for the shoulder movements.

ICC(2,1) values for the cervical spine are given in Table 5Go. All values were above 0.60 except for left rotation (0.54). When one outlier was removed for flexion, extension and right lateral flexion, two for flexion + extension, right and left rotation and three for left lateral flexion, all ICC values were above 0.72 and one-sided confidence lower limits above 0.60 except for flexion (0.59), left lateral flexion (0.56) and left rotation (0.55).

ICC(2,1) values for the shoulder are given in Table 6Go. All ICC values were above 0.60. Only abduction had its one-sided confidence lower limit below 0.60. There were no outliers for abduction.


    Discussion
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 References
 
The FASTRAK has the potential to develop a three-dimensional picture of movement which can aid clinicians in their assessment of disease. The first step in evaluating the FASTRAK is to assess its reliability in measuring range of motion. This study evaluated the reliability on subjects with unrestricted, pain-free cervical spine and shoulder joint complex movement.

The use of safety spectacles and Velcro straps to fix the sensors improved standardization and reduced the possibility of markings on the skin being visible to the second observer. Unfortunately, it was not practical to use a strap for the acromion process sensor and occasionally it was possible for the second observer to detect markings left by this sensor. However, this sensor was not used in this analysis so does not effect these results. By performing movements in the same order each time, the effect of tiredness on each movement's reliability was reduced.

All subjects in the inter-observer reliability part were measured in the morning. Attempts were made as far as possible to measure subjects in the intra-observer part at the same time of day on each occasion to remove any effects resulting from variations in range of motion at different times of day. Twenty of these 32 subjects were measured on the second and third occasions within an hour of the time on their first occasion and a further six within 2 h. Time changes for the remaining six were due to difficulties in scheduling and late unavoidable diary clashes.

An order effect was apparent for certain cervical spine movements in both the inter-observer stage and the intra-observer stage. This is being investigated by considering regression towards the mean [21]. The observation that, for some movements, subjects have greater range of motion for their first measurements than for later measurements could be due to a number of reasons. The subject may initially be more enthusiastic and, hence, move further on the first time. There may be a learning experience taking place. This learning experience idea suggests that subjects, unfamiliar with the movements initially, would on their first occasion, move to a point that is uncomfortable and on second and third occasions stop before they reach that point. Further, it may be that the cervical spine movements are less likely to be everyday movements and this explains why this ‘order’ effect seems to occur more for the cervical spine than shoulder. Subjects will perform shoulder flexion and abduction (or at least a hybrid version of them) when reaching up, for example, but are unlikely to perform pure lateral flexion of the cervical spine, or even full flexion/extension of the cervical spine in everyday life. Another possibility, which could also affect reliability, is that the most flexible individuals (i.e. those who move furthest) measure less on the second and third occasions, as they have no obvious ‘block’, or stopping point, to end their range of motion and so rely on experience of where to stop rather than a definitive block.

The values for reliability suggest that the movement over the whole plane (e.g. total rotation) has higher reliability than when separated into its parts (e.g. left and right rotation). This was not unexpected and will mainly be due to the difficulties in locating a consistent starting (neutral) point. Further, some subjects may have moved slightly from their set neutral position before zeroing of the FASTRAK, to watch the observer set the computer program running. A movement to the right of just 3° would reduce right movement by 3° and increase left by 3°, giving a ‘false’ difference of 6° between left and right movement.

Interpreting ICC coefficients and defining values of ICCs which could be considered as the minimum for acceptable reliability is an arbitrary process. Various authors have made recommendations. For example, an ICC value of 0.6 is the lower level for acceptable reliability recommended by Eliasziw et al. [22] for inter-observer reliability and Landis and Koch [23] suggest 0.61–0.80 to be substantial reliability. The values of the ICC recorded here and their respective one-sided confidence limits suggest that the FASTRAK, based on these classifications, has good reliability both between observers and when measured by the same observer over a reasonable period of time. Streiner and Norman [24] recommend performing only inter-observer reliability studies, suggesting that if you can show good inter-observer reliability then intra-observer reliability can be assumed, because intra-observer reliability tends to be greater than inter-observer reliability. The values in this study suggest that reliability is good when measurements are repeated immediately, irrespective of whether the same or a different observer conducts the repeated measurements. The ICC values for cervical spine flexion/extension and rotation were slightly higher in the inter-observer study than the intra-observer study. It may be that subjects are more variable in these cervical spine movements when there is a gap of a number of days between measurements—perhaps suggesting that the slight fall in reliability is more due to natural within-subject variation. The good reliability between observers is even more encouraging given that one observer was a non-clinician with no previous experience of measuring range of motion, whilst the other observer was very experienced.

Whilst other studies have used the ICC as a method of assessing reliability for the cervical spine and shoulder they have tended to use the ICC(1,1) variation (based on a one-way ANOVA rather than the two-way random or mixed effects versions). Several authors have produced tests for the equality of two or more ICCs [25, 26]. However, different authors [17, 27] have pointed out that the different ICC models will yield different values. It could be possible to re-analyse ICCs so that they are the same version; however, then they would be inappropriate for the study design used. Müller and Büttner [27] also contend that ICCs using different study designs cannot be easily compared and would lead to unreliable inferences if done so.

Whilst some other studies have yielded ICC values slightly larger than those reported here, the smaller sample sizes of those studies would indicate that the confidence intervals around those values (although no paper reported these) would be wide and would be unlikely to show a good level of reliability [12]. It would be recommended that authors produce the one-sided lower confidence limit for their estimates of the population ICC so that inference about the level of reliability can be made.

The evidence from this study is that the FASTRAK is a reasonably reliable range of motion measuring tool in pain-free, unrestricted cervical spine and shoulder movement. Further work is currently exploring the secondary movements, such as the effect of sternum movement, and the incorporation of speed of movement. Application to a population of subjects with painful, limited movement is also being investigated.


    Acknowledgments
 
This study is part of a larger study funded by a NHS Executive (West Midlands) New Blood Research Training Fellowship.


    Notes
 
Correspondence to: K. Jordan. Back


    References
 Top
 Abstract
 Introduction
 Subjects and methods
 Results
 Discussion
 References
 

  1. Greene WB, Heckman JD (eds). The clinical measurement of joint motion. Illinois: AAOS, 1994.
  2. Viitanen JV, Kokko M-L, Heikkilä S, Kautiainen H. Neck mobility assessment in ankylosing spondylitis: a clinical study of nine measurements including new tape methods for cervical rotation and lateral flexion. Br J Rheumatol1998;37:377–81.[ISI][Medline]
  3. Hsieh C-Y, Yeung BW. Active neck motion measurements with a tape measure. J Orthopaed Sports Phys Ther1986; 8:88–92.
  4. Balogun JA, Abereoje OK, Olaogun MO, Obajuluwa VA. Inter- and intratester reliability of measuring neck motions with tape measure and Mryin gravity-reference goniometer. J Orthopaed Sports Phys Ther1989;10:248–53.
  5. Youdas JW, Carey JR, Garrett TR. Reliability of measurements of cervical spine range of motion—comparison of three methods. Phys Ther1991;71:98–106.[ISI][Medline]
  6. Rheault W, Albright B, Byers C et al. Intertester reliability of the cervical range of motion device. J Orthopaed Sports Phys Ther1992;15:147–50.
  7. Hole DE, Cook JM, Bolton JE. Reliability and concurrent validity of two instruments for measuring cervical range of motion: effects of age and gender. Manual Ther1995;1:36–42.
  8. Boone DC, Azen SP, Lin C-M, Spence C, Baron C, Lee L. Reliability of goniometric measurements. Phys Ther1978;58:1355–60.[ISI][Medline]
  9. Pandya S, Florence JM, King WM, Robison JD, Oxman M, Province MA. Reliability of goniometric measurements in patients with Duchenne muscular dystrophy. Phys Ther1985;65:1339–42.[ISI][Medline]
  10. Youdas JW, Carey JR, Garrett TR, Suman VJ. Reliability of goniometric measurements of active arm elevation in the scapular plane obtained in a clinical setting. Arch Phys Med Rehab1994;75:1137–44.[ISI][Medline]
  11. Green S, Buchbinder R, Forbes A, Bellamy N. A standardized protocol for measurement of range of movement of the shoulder using the Plurimeter-V inclinometer and assessment of its intrarater and interrater reliability. Arthritis Care Res1998;11:43–52.[ISI][Medline]
  12. Jordan K. Assessment of published reliability studies for cervical spine range of motion measurement tools. J Manipulative Physiol Ther2000;23:in press.
  13. Maffey-Ward L, Jull G, Wellington L. Toward a clinical test of lumbar spine kinesthesia. J Orthopaed Sports Phys Ther1996;24:354–8.[ISI][Medline]
  14. Willems JM, Jull GA, Ng JK-F. An in vivo study of the primary and coupled rotations of the thoracic spine. Clin Biomech1996;11:311–6.[ISI]
  15. Swinkels A, Dolan P. Regional assessment of joint position sense in the spine. Spine1998;23:590–7.[ISI][Medline]
  16. Johnson GR, Fyfe NC, Heward M. Ranges of movement at the shoulder complex using an electromagnetic movement sensor. Ann Rheum Dis1991;50:824–7.[Abstract]
  17. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull1979;86:420–8.[ISI]
  18. Haas M. Statistical methodology for reliability studies. J Manipulative Physiol Ther1991;14:119–32.[ISI][Medline]
  19. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet1986;1:307–10.[ISI][Medline]
  20. Donner A, Eliasziw M. Sample size requirements for reliability studies. Stat Med1987;6:441–8.[ISI][Medline]
  21. Davis CE. The effect of regression to the mean in epidemiologic and clinical studies. Am J Epidemiol1976;104:493–8.[Abstract]
  22. Eliasziw M, Young SL, Wordbury MG, Fryday-Field K. Statistical methodology for the concurrent assessment of interrater and intrarater reliability: using goniometric measurements as an example. Phys Ther1994;74:777–88.[ISI][Medline]
  23. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics1977;33:159–74.[ISI][Medline]
  24. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. Oxford: Oxford University Press, 1995.
  25. Konishi S, Gupta AK. Testing the equality of several intraclass correlation coefficients. J Stat Plan Infer1989;21:93–105.[ISI]
  26. Khatri CG, Pukkila TM, Rao CR. Testing intraclass correlation coefficients. Commun Stat—Sim Comput1989;18:15–30.
  27. Müeller R, Büttner P. A critical discussion of intraclass correlation coefficients. Stat Med1994;13:2465–76.[ISI][Medline]
Submitted 13 May 1999; revised version accepted 21 October 1999.