All published articles of this journal are available on ScienceDirect.
Lung Function Monitoring; A Randomized Agreement Study
Abstract
Objective:
To determine the agreement between devices and repeatability within devices of the forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), peak expiratory flow (PEF) and forced expiratory flow at 50% of FVC (FEF50) values measured using the four spirometers included in the study.
Methods:
50 (24 women) participants (20-64 years of age) completed maximum forced expiratory flow manoeuvres and measurements were performed using the following devices: MasterScreen, SensorMedics, Oxycon Pro and SpiroUSB. The order of the instruments tested was randomized and blinded for both the participants and the technicians. Re-testing was conducted on a following day within 72 hours at the same time of the day.
Results:
The devices which obtained the most comparable values for all lung function variables were SensorMedics and Oxycon Pro, and MasterScreen and SpiroUSB. For FEV1, mean difference was 0.04 L (95% confidence interval; -0.05, 0.14) and 0.00 L (-0.06, 0.06), respectively. When using the criterion of FVC and FEV1 ≤ 0.150 L for acceptable repeatability, 67% of the comparisons of the measured lung function values obtained by the four devices were acceptable. Overall, Oxycon Pro obtained most frequently values of the lung function variables with highest precision as indicated by the coefficients of repeatability (CR), followed by MasterScreen, SensorMedics and SpiroUSB (e.g. min-max CR for FEV1; 0.27-0.46).
Conclusion:
The present study confirms that measurements obtained by the same device at different times can be compared; however, measured lung function values may differ depending on spirometers used.
INTRODUCTION
Respiratory diseases like asthma and chronic obstructive pulmonary disease (COPD) are major health problems worldwide [1, 2]. For such respiratory diseases, lung function measurements or spirometry is invaluable as a diagnostic tool [3]. The diagnosis of asthma is generally based on observations of respiratory symptoms like dyspnea, wheezing and/or cough. Measurements of lung function including demonstration of reversibility of lung function abnormalities is; however, important for the diagnostic confidence [4]. All bronchial provocation tests suspecting exercise-induced asthma (EIA) are based on accurate lung function measurements [5]. Furthermore, it has lately become more common to measure lung function in elite endurance athletes such as cross country skiers and biathlon skiers, but also in swimmers during training camps and competitions trying to catch up early signs of bronchial hyperresponsiveness (BHR) or exercise-induced bronchoconstriction (EIB) [6, 7]. Also when identifying COPD, a history of dyspnea, chronic cough or sputum production, and/or a history of exposure to risk factors for the disease is important but a diagnosis should be confirmed by lung function measurements [8].
Lung function measurements can be performed by using several different types of spirometers. Thus, the lung function values determined may depend on the device used as well as personal factors. Since lung function measurements are essential for COPD diagnosis, differences between lung function devices may induce misclassification of COPD that may result in increased rates of COPD diagnosis when using some models of spirometers [9]. Furthermore, both in research and in a clinical setting when comparing data, conducting multi-center studies or meta-analyses [10, 11], differences in devices used may influence lung function results. There are standardized test procedures and guidelines for lung function measurements, but either the possible challenge with agreement between different devices or with the repeatability of devices are mentioned [3]. Furthermore, few comparison studies have been carried out, and information is lacking whether lung function devices applied can be used interchangeably.
The aims of the present study were: 1) To determine the agreement between the forced expiratory volume in 1 s (FEV1), forced expiratory volume (FVC), peak expiratory flow (PEF) and forced expiratory flow at 50% of FVC (FEF50) values obtained by four different spirometers; 2) to determine the repeatability of measurement for the four devices.
SUBJECTS AND METHODS
Study Subjects and Design
26 men and 24 women (20-64 years of age) participated in the present study. All participants were of Caucasian origin. Exclusion criteria were any disease (e.g. respiratory or lung disease), the use of respiratory medication which might influence the results, and respiratory tract infection during the last 3 weeks before inclusion in the study [12]. All participants were non-smokers. The participants were not allowed to be physically active the test day and they were only allowed to drink water during the last two hours before testing.
For all subjects, lung function was measured using maximum forced expiratory flow manoeuvres with four different lung function devices. The order of the instruments tested was randomized and blinded for both the participants and the two technicians. The devices were randomized for each subject with a computerized random number generator. There was at least one minute between each expiratory manoeuvre and five minutes between manoeuvres sets (different lung function equipment). All testing per individual was conducted within 30 minutes. Re-testing (repeatability) was conducted on a following day within 72 hours at the same time of the day.
The Regional Committee for Medical and Health Research Ethics southeast was informed about the study aims and methods, and had no objections. All study subjects gave their written informed consent before inclusion in the study.
Procedures
Testing was carried out in ambient conditions (temperature 18-24°C, relative humidity 30-60% and barometric pressure above 950 hPa). Prior to measurements, height and body mass of the participants were determined. Furthermore, the participants had to answer questions about themselves regarding diseases and use of medication.
Anthropometry
Age of the participants was calculated by subtracting date of birth from the date of the testing day. BMI was calculated as body mass (kg) divided by height (m) squared. Subjects were weighed (Seca 709, Germany) wearing light clothing and without shoes to the nearest 0.1 kg. Height was measured to the nearest 0.5 cm by using a stadiometer.
Lung Function
Forced expiratory volume in one second (FEV1), forced vital capacity (FVC), peak expiratory flow (PEF) and forced expiratory flow at 50% of FVC (FEF50) were measured by maximum forced expiratory flow-volume curves. Predicted values of Stanojevic et al. [13] were used.
The subjects were using a nose clip and they were sitting during all lung function measurements. Calibration, taking into consideration ambient conditions, volume, pressure transducer, time constant and pressure-volume, was conducted before each test period according to the guidelines of the manufacturer. All manoeuvres complied with the general acceptability and reproducibility criteria of American Thoracic Society (ATS) and European Respiratory Society (ERS) [3]. Up to eight manoeuvres were performed to obtain three acceptable and reproducible manoeuvres. All individual flow-volume curves were reviewed for technical acceptability. The largest reading at the maximum forced expiratory flow-volume curve was reported using the envelope method of reading flows, which means that the highest flow at a given lung volume was chosen, irrespective of the curve.
The following lung function devices were included in the present study: 1) MasterScreen (Erich Jaeger® GmbH & Co KG, Würzburg, Germany), a pneumotachograph; 2) SensorMedics (Cardinal Health, Respiratory Technologies, 1100 Bird Center Drive, Palm Springs, California, USA), a mass flow sensor based device; 3) SpiroUSB (Micro Medical LTD, Rochester, Kent, UK), an ultrasonic based device; 4) Oxycon Pro (BeNeLux Bv, Breda, the Netherlands),.a turbine/rotating vane based device.
Statistical Analysis
Calculation of sample size was based on a standard deviation (SD) for FEV1 of 0.5 litres and a significance level of 0.05 with 80% power, and 45 subjects were needed to detect a mean difference of 0.3 litres between the spirometers.
The data were analysed using a repeated mixed models analysis of variance to investigate the agreement of measurements obtained by the four devices. The assumptions of the model were examined using studentized residuals and Cook’s D. The covratio statistic was used to assess the precision of the estimates in the final model. The coefficients of repeatability were calculated to investigate the repeatability of measurements for each device [14].
The analysis was performed using SPSS® (Statistical Package for Social Sciences, Version 15 for Windows, SPSS Inc. Chicago, USA, 2006), SAS (Statistical Analysis System, version 9.2, Cary, NC, USA) and R (http://www.R-project.org/).
RESULTS
Descriptive data of the participants presented by gender are given in Table 1. The mean and the CRs of the lung function values as measured by the four different devices are presented in Table 2. Overall, Oxycon Pro obtained most values of the lung function variables with highest precision as indicated by the CRs, followed by MasterScreen, SensorMedics and SpiroUSB. However, the order of precision of the devices differed for each of the lung function variables.
Male (n=26) Mean (SD) |
Female (n=24) Mean (SD) |
|
---|---|---|
Age (yrs), mean (min-max) | 28 (20-64) | 29 (21-58) |
Body mass (kg) | 81.2 (10.74) | 64.6 (7.71) |
Height (cm) | 184 (7) | 169 (7) |
BMI (kg/m2) | 24 (2.1) | 23 (2.3) |
Overweight, n (%)# | 7 (26.9) | 2 (8.7) |
FEV1 (% of predicted)* | 98 (9.0) | 99 (10.0) |
FVC (% of predicted)* | 103 (10.8) | 108 (6.5) |
FEV1/FVC (%)* | 78 (5.9) | 77 (7.3) |
SensorMedics | n | MasterScreen | n | Oxycon Pro | n | SpiroUSB | n | |
---|---|---|---|---|---|---|---|---|
FEV1 (L), test | 4.12 (3.86, 4.37) | 50 | 3.99 (3.74, 4.23) | 50 | 4.16 (3.91, 4.42) | 49 | 3.99 (3.74, 4.23) | 50 |
FEV1 (L), re-test | 4.05 (3.79, 4.30) | 41 | 4.03 (3.77, 4.29) | 41 | 4.20 (3.93, 4.48) | 41 | 3.97 (3.71, 4.24) | 39 |
CR | 0.41 | 0.42 | 0.27 | 0.46 | ||||
FVC (L), test | 5.29 (4.99, 5.59) | 50 | 5.11 (4.82, 5.41) | 50 | 5.22 (4.90, 5.53) | 49 | 5.06 (4.77, 5.36) | 50 |
FVC (L), re-test | 5.36 (5.03, 5.69) | 41 | 5.20 (4.88, 5,53) | 41 | 5.32 (4.97, 5.66) | 41 | 5.11 (4.77, 5.45) | 39 |
CR | 0.60 | 0.41 | 0.44 | 0.52 | ||||
FEF50 (L/s), test | 4.68 (4.23, 5.13) | 50 | 4.03 (3.69, 4.37) | 50 | 4.54 (4.15, 4.93) | 49 | 4.24 (3.86, 4.62) | 50 |
FEF50 (L/s), re-test | 4.32 (3.89, 4.75) | 41 | 4.07 (3.69, 4.45) | 41 | 4.36 (3.93, 4.79) | 41 | 4.04 (3.65, 4.44) | 39 |
CR | 1.59 | 1.68 | 1.42 | 1.71 | ||||
PEF (L/s), test | 9.24 (8.61, 9.87) | 50 | 8.97 (8.34, 9.60) | 50 | 9.42 (8.77, 10.07) | 49 | 8.83 (8.23, 9.43) | 50 |
PEF (L/s), re-test | 9.13 (8.48, 9.79) | 41 | 9.26 (8.58, 9.93) | 41 | 9.69 (8.96, 10.41) | 41 | 8.82 (8.16, 9.48) | 39 |
CR | 1.38 | 1.22 | 1.27 | 1.63 |
SensorMedics | MasterScreen | Oxycon Pro | ||||
---|---|---|---|---|---|---|
FEV1 (L) | Diff. (95% CI) | p-value | Diff. (95% CI) | p-value | Diff. (95% CI) | p-value |
MasterScreen | 0.13 (0.04, 0.21) | <0.001 | ||||
Oxycon Pro | 0.04 (-0.05, 0.14) | 0.903 | 0.17 (0.10, 0.24) | <0.001 | ||
SpiroUSB | 0.13 (0.05, 0.21) | <0.001 | 0.00 (-0.06, 0.06) | 1.000 | 0.17 (0.10, 0.24) | <0.001 |
FVC (L) | ||||||
MasterScreen | 0.17 (0.08, 0.27) | <0.001 | ||||
Oxycon Pro | 0.08 (-0.07, 0.23) | 0.743 | 0.10 (-0.02, 0.21) | 0.136 | ||
SpiroUSB | 0.22 (0.12, 0.32) | <0.001 | 0.05 (-0.02, 0.12) | 0.427 | 0.15 (0.05, 0.24) | <0.001 |
FEF50 (L/s) | ||||||
MasterScreen | 0.65 (0.27, 1.02) | <0.001 | ||||
Oxycon Pro | 0.12 (-0.24, 0.48) | 0.968 | 0.53 (0.12, 0.94) | 0.004 | ||
SpiroUSB | 0.44 (0.22, 0.67) | <0.001 | 0.20 (-0.13, 0.54) | 0.542 | 0.33 (-0.01, 0.66) | 0.060 |
PEF (L/s) | ||||||
MasterScreen | 0.27 (-0.13, 0.67) | 0.423 | ||||
Oxycon Pro | 0.17 (-0.13, 0.48) | 0.623 | 0.43 (0.13, 0.75) | 0.001 | ||
SpiroUSB | 0.41 (0.09, 0.73) | 0.004 | 0.14 (-0.12, 0.41) | 0.686 | 0.58 (0.30, 0.86) | <0.001 |
SensorMedics | MasterScreen | Oxycon Pro | ||||
---|---|---|---|---|---|---|
FEV1 (L) | Diff. (95% CI) | p-value | Diff. (95% CI) | p-value | Diff. (95% CI) | p-value |
MasterScreen | 0.02 (-0.03, 0.07) | 0.968 | ||||
Oxycon Pro | 0.15 (0.09, 0.22) | <0.001 | 0.17 (0.11, 0.23) | <0.001 | ||
SpiroUSB | 0.05 (-0.02, 0.12) | 0.321 | 0.03 (-0.03, 0.09) | 0.814 | 0.20 (0.15, 0.26) | <0.001 |
FVC (L) | ||||||
MasterScreen | 0.15 (0.05, 0.26) | <0.001 | ||||
Oxycon Pro | 0.04 (-0.09, 0.18) | 0.978 | 0.11 (0.02, 0.20) | 0.006 | ||
SpiroUSB | 0.24 (0.12, 0.37) | <0.001 | 0.09 (0.01, 0.17) | 0.022 | 0.20 (0.11, 0.29) | <0.001 |
FEF50 (L/s) | ||||||
MasterScreen | 0.24 (0.07, 0.42) | 0.002 | ||||
Oxycon Pro | 0.10 (-0.05, 0.25) | 0.379 | 0.34 (0.16, 0.53) | <0.001 | ||
SpiroUSB | 0.17 (0.02, 0.32) | 0.020 | 0.07 (-0.07, 0.22) | 0.731 | 0.270 (0.12, 0.42) | <0.001 |
PEF (L/s) | ||||||
MasterScreen | 0.15 (-0.12, 0.41) | 0.679 | ||||
Oxycon Pro | 0.52 (0.18, 0.86) | <0.001 | 0.37 (0.11, 0.64) | 0.001 | ||
SpiroUSB | 0.25 (0.02, 0.49) | 0.024 | 0.40 (0.19, 0.61) | <0.001 | 0.77 (0.47, 1.08) | <0.001 |
To investigate the agreement between the FEV1, FVC, PEF and FEF50 values determined by the four different lung function devices, mean differences of the respective values were analyzed for test and re-test. The results are presented in Tables 3 and 4. The devices which obtained the most comparable values for all lung function variables during testing were SensorMedics and Oxycon Pro, and MasterScreen and SpiroUSB (e.g. FEV1 mean difference (95% confidence intervals (CI)) p-value; 0.04 L (-0.05, 0.14) and 0.00 L (-0.06, 0.06), respectively). However, during re-testing, this pattern was less clear, as confirmed by significant differences for two of the four lung function variables investigated when comparing SensorMedics and Oxycon Pro, and MasterScreen and SpiroUSB. In general, SensorMedics and Oxycon Pro obtained higher values than MasterScreen and SpiroUSB (e.g. FEV1 mean (95% CI); 4.12 L (3.86, 4.37); 4.16 L (3.91, 4.42) vs. 3.99 L (3.74, 4.23); 3.99 L (3.74, 4.23), respectively) (Table 2). 67% of the comparisons of the measured lung function values obtained by the four devices were within ≤ 0.150 L as given as acceptable repeatability criterion for FVC and FEV1 set by ATS and ERS [3]. Furthermore, 63% of the mean differences between the devices reached statistically significance.
DISCUSSION
Comparing lung function results performed by different devices should be handled carefully. In the present study, all measures both during testing and re-testing were performed in the same laboratory at the same time and under similar ambient conditions. Nevertheless, all devices did not obtain comparable lung function values when using the criterion set by ATS and ERS regarding acceptable repeatability [3], and as indicated by statistically significant differences in the variables FEV1, FVC, FEF50 and PEF between devices. There are some published studies comparing spirometers [9, 15-20]. In studies comparing office and portable spirometers with standard laboratory spirometers, it has in general been suggested a reasonable agreement, but the lung function values are often not interchangeable [9, 15-20]. The mean differences in lung function values between the different devices ranges in the same magnitude as in the present study. As examples of mean differences, Barr et al. [16] reported a mean difference of 0.12 L and 0.17 L for FVC and FEV1, respectively, when comparing an EasyOne and a SensorMedics spirometer. Swart et al. [17] reported a mean difference of 0.03 L and -0.01 L for FVC and FEV1, respectively, when comparing a Spirospec and a Jaeger spirometer. Regarding interchangeability of the values measured, statistically significant mean differences (biases) between the devices may be important to consider. By using different devices obtaining significant different values may result in false associations in research studies and incorrect conclusions in clinics. Therefore, both in research and in a clinical setting, different lung function devices on the same patient or in the same study must be used with caution.
In the present study, the repeatability of lung function measures for each device seems acceptable, as indicated by the CRs and that the mean difference between test and re-test values of FVC and FEV1 were for all devices ≤ 0.150 L, the criterion set by ATS and ERS regarding acceptable repeatability [3]. Thus, the present study confirms that measurements obtained by the same device at different times can be compared. It is important; however, to be aware of the influence of barometric pressure, temperature and humidity on lung function measurements when comparing lung function measurements determined by the same device under different climatic conditions. To reduce the sources of error, satisfactory calibration routines should be emphasized both in the laboratory and especially during field testing.
The present study has several strengths. The lung function devices included are often used in clinics and in research, thus the obtained results are very important in both settings. Furthermore, the study sample was heterogeneous, including both women and men, and reflects the adult population. The researchers have extensive experience using the devices and the spirometers were calibrated, giving high quality data for each device. The order of the spirometers tested was randomized during testing and re-testing and blinded for both the participant and the two technicians, and the testing of all instruments was performed during the same time period. Furthermore, the repeatability of lung function measures for each device was investigated by performing the testing and re-testing at separate days, to mimic research and clinical settings, but within 72 hours at the same time of the day. Thus, differences in lung function values within and between devices due to the study design were minimized. Unfortunately, there exist no gold standard for lung function measurements; however, this is a limitation regarding the lung function devices, not the present study itself.
CONCLUSION
The present study confirms that measurements obtained by the same device at different times can be compared. Even though the present study obtained high quality spirometric data, all devices did not obtain sufficient comparable lung function values for the variables FEV1, FVC, FEF50 and PEF. By using different devices that do not obtain sufficient comparable values, may result in false associations in research studies and incorrect conclusions in clinics. The present study illustrates the importance of knowing the type of spirometer being used when comparing measured lung function values, as repeated measures of the same individual or when conducting multicentre studies or meta-analyses.
CONFLICT OF INTEREST
The authors confirm that this article content has no conflict of interest.
ACKNOWLEDGEMENTS
Declared none.