Mapi Publications

Relationship between FEV1 and patient-reported outcomes changes: results of a meta-analysis of randomized trials in stable COPD

de La Loge C, Tugaut B, Fofana F, Lambert J, Hennig M, Tschiesner U, Vahdati-Bolouri M, Ismaila AS, Punekar YS.


Chronic obstructive pulmonary disease (COPD) is a progressive disease of the respiratory system characterized by chronic airway inflammation. The resulting airflow limitation is not fully reversible. Disease progression is associated with more severe and frequent exacerbations and declining lung function. Nevertheless, COPD is frequently under-diagnosed and under-treated. The global burden of COPD is high and by 2020 will increase to reach a rank of 5 for burden of disease and 3 for cause of death.2 According to Global initiative for chronic Obstructive Lung Disease (GOLD) recommendations, assessment of COPD is based on the patient’s level of symptoms, exacerbation history, severity of spirometric abnormality, and identification of comorbidities.2 Although spirometry is now required for a confident diagnosis of COPD, diagnosis and management of the disease should not be purely based on spirometric categorization. Given the evidence that the level of forced expiratory volume in 1 second (FEV 1) poorly represents COPD status, revised GOLD guidelines recommend that both disease impact (symptom burden and activity limitation) and future risk of disease progression, particularly exacerbations, must be considered for adequate management of stable COPD. Patient-reported outcomes (PROs) based on symptom severity, activity limitation or health status are highly relevant for assessing disease severity or treatment impact from the perspective of policy makers and payers.3,4 Such outcomes are routinely collected in clinical trials using fully validated and widely accepted PRO instruments such as St. George’s Respiratory Questionnaire (SGRQ) and the Transition Dyspnea Index (TDI). However, there is limited evidence on the relationship between the typical regulatory endpoints such as FEV1 and the PRO endpoints which often creates challenges for policy makers while making reimbursement decisions for specific treatments. The primary objective of the study was to assess the relationship between changes in spirometric measurements (particularly trough FEV1) and changes in PROs (SGRQ, TDI, and exacerbation rates) after at least 6 months of follow-up, using study treatment group level data. The analysis was repeated using treatment arms with active treatments (excluding placebo groups) and using treatment effect measurements (difference over placebo) for placebo-controlled studies.


A systematic literature review was performed using a predefined search strategy to identify randomized controlled trials (RCTs) of 24 weeks’ duration or more in patients with COPD. Independent bibliographic systematic searches were conducted in April 2014 using the following databases (from inception to April 2014): MEDLINE, MEDLINE In‑Process, EMBASE, the Cochrane Library, Database of Abstracts of Reviews of Effects, and Health Technology Assessment websites. Secondary systematic searches in clinical trial registries such as (the U.S. National Institutes of Health clinical trial register), World Health Organization International Clinical Trials Registry Platform, International Standard Randomised Controlled Trial Number registry, and the European and Clinical Trials Register, were performed. Experienced researchers developed search strategies specifically tailored for each database. As an example, the search strategy for MEDLINE and MEDLINE In-Process is provided in Appendix 1 of the online supplementary data. RCTs of at least 24 weeks’ duration conducted in adults with COPD (per GOLD guidelines) receiving long-acting muscarinic antagonists (LAMAs) and/or long-acting β2-agonists (LABAs) were included. Furthermore, only studies reporting at least 1 spirometric measurement of interest (trough FEV1, time-adjusted FEV1 AUC) and at least one PRO of interest (SGRQ, TDI, and exacerbation rates) at baseline and 6 and/or 12 months were selected. The search was limited to English language. The search was directed to studies with similar pharmacodynamics properties: studies of monotherapy with LAMAs or LABAs (monotherapy with aclidinium bromide, formoterol, glycopyrronium, indacaterol, salmeterol, tiotropium, umeclidinium, or vilanterol) and/or the fixed-dose or free combination of both (umeclidinium/vilanterol, aclidinium/formoterol, tiotropium/olodaterol, or indacaterol/glycopyrronium). Studies with any of these treatments were included. Studies were excluded if: (a) data were not available simultaneously for spirometric measurement and PRO endpoints at any time-point of interest; (b) the reported FEV1 was measured postdose; or (c) there was no evidence that FEV1 was measured predose. Furthermore, studies limited to patients with alpha-1 antitrypsin deficiency-related COPD and to non-white populations (e.g., Chinese, Japanese patients) were excluded. The SGRQ assesses 3 domains (symptoms, activity, and impacts), with a total score ranging between 0 and 100. Higher values of SGRQ are associated with lower health-related quality of life.5,6 TDI characterizes a change in dyspnea from baseline and provides values between −9 and 9.7 Positive values in the TDI score correspond to clinical improvement. A 4-unit change in the total score of the SGRQ,8 a 1-unit change in TDI,9 and a change of 100 mL in FEV110 are considered as minimal clinically important differences (MCIDs) for these instruments. There is no agreed MCID for exacerbation rates although several estimates have been reported in literature. The relevance of each identified citation was assessed according to the predefined selection criteria. Selection was performed by 2 researchers (BT and JL) independently along with standardized quality assessments of the selected studies. Any discrepancies between researchers were resolved by consensus. The selected citations were grouped per study, as 1 study could have been published in several sources such as a conference abstract, full-text article, or trial registration. Data extraction was performed by 2 researchers (BT and JL) independently. Any discrepancies were discussed and resolved by consensus. Data were primarily extracted from the text and tables of the source documents. If the data of interest were available solely as figures, these were extracted using DigitizeIt software version 2.0.3 (Digitize It, Braunschweig, Germany, ). For each study, study characteristics, population characteristics, treatment groups, and spirometric and PRO endpoints of interest at selected time points (mean CFB, mean baseline, and mean follow-up values) were extracted. If mean CFB values were unavailable, these were calculated by subtracting the mean value at baseline from the mean value at follow-up. Study and patient characteristics, as well as outcome results (spirometric measurements and PROs at 6 or 12 months follow-up and last assessment) were summarized across all studies using (1) weights proportional to the sample size of the study treatment group in relation to the total number of patients across all treatment groups (weighted approach), and (2) equal weights for each study treatment group (unweighted approach). Methods used to assess the relationship between PROs and spirometric endpoints included scatter and bubble plots (1 dot representing a treatment group results for both endpoints considered; the size of the dot being proportional to the sample size of the considered treatment group), linear regressions, and Pearson correlation coefficients with 95% confidence interval (CI). The linear regression equations were used to estimate the mean change in FEV1 corresponding to the established MCID thresholds of the PROs and to estimate the mean change in PROs corresponding to the established MCID threshold of a 100-mL change in FEV1.10 Similarly, the rate and incidence of exacerbations corresponding to a change of 100 mL in FEV1 also were calculated. Primary analysis involved quantifying the relationship between trough FEV1 CFB and SGRQ CFB at last assessment (i.e., assessment at the 12-month follow-up if available for both considered endpoints, or if not available, at the 6-month follow-up). Further statistical analyses were conducted to facilitate interpretation of results and explore the data. The regression and correlation analyses were conducted after exclusion of the placebo groups. We also conducted regression and correlation analyses between the active treatment group effect beyond placebo in FEV1 CFB and the effect beyond placebo in the various PROs (analyses conducted using data from placebo-controlled studies only, where the placebo group result is subtracted from each treatment group result). All these analyses were conducted only when data for at least 15 study treatment groups were available. Such a sample size allows detecting a correlation coefficient of 0.7 with more than 85% power and associated type I error of 0.05.12 Interpretation of the amplitude of the absolute values of correlation coefficients were based on Cohen’s conventions (0.1-0.3, small/weak; 0.3-0.5, medium/moderate; > 0.5, large).12 No statistical correction for multiple tests was performed. All statistical analyses were conducted based on a predefined statistical analysis plan and using SAS software for Windows (Version 9.2, SAS Institute, Inc., Cary, NC, USA).


The systematic bibliographic search identified 3006 abstracts from which a total of 2515 were excluded in the abstract/title screening phase. After full-text screening, a further 261 publications were excluded. The systematic registry search identified 4720 trial registrations from which 4636 were excluded (Figure 1). Three additional recently published references were identified through conference abstract and the registry search. Therefore, 233 full text publications and 84 trial registrations were retained for final study selection. Overall, 118 studies were identified from the citations extracted based on the systematic literature search. Thirty-nine studies from the registry search did not have any results published or posted on the registry websites at the time of the search. The outcomes of 27 studies were out of scope of present meta-analysis; these studies were also excluded.In total, 52 unique studies13-62 were selected for this meta-analysis and the data for all these studies were extracted from all available sources, including clinical trial registries. A description of key study characteristics is summarized in Table 1. The 52 unique studies included 163 treatment groups and 62,385 patients. The median study duration was 11.7 months. A majority of the studies (80.8%) did not allow background LABA and 57.7% allowed background ICS treatment. A majority of studies considered a lower threshold inclusion criterion of 10 pack years of cigarette smoking (82.7%) but no inclusion criteria regarding the number of exacerbations over the past year (71.2%). The upper thresholds most commonly encountered for the percentage of FEV1 inclusion criterion were 80% (28.8%) and 70% (23.1%). The patients’ characteristics weighted by the sample size of each group across the 163 treatment groups from the 52 selected studies are summarized in Table 2. The number of patients in each study treatment group varied from 6 to 3006, with a median of 419. The mean (standard deviation [SD]) age was 63.7 (25.0) years. The proportion of men across the treatment groups varied from 43.0% to 100.0% (weighted mean proportion 70.4%). Large variation in baseline characteristics was seen for disease severity with the percentage of patients classified as severe or very severe (GOLD stage III or IV) ranging from 19.7% to 100.0% (median, 53.0%) and mean baseline trough FEV1 ranging from 890 to 1681 mL (median, 1180 mL). Most treatment groups were receiving LABA (25.2%), LAMA (21.5%), placebo (20.9%), or LABA and ICS (19.6%). The online supplementary Table 1 provides treatment group-level data on endpoints of interest for all the included studies. The combinations of endpoints with at least 15 study-treatment groups (N) are described in Table 3. In combination with FEV1, SGRQ was the most reported endpoint (111 treatment groups; 38 studies) followed by TDI (68; 22), all exacerbations (24; 10) and moderate/severe exacerbations (69; 23). FEV1 AUC0-12h and SGRQ data at last assessment were available from 5 studies with 22 treatment arms. The duration between baseline and the last assessment varied across endpoint combinations. The duration was longest for the analysis of the combination of SGRQ with trough FEV1 (median, 11.1 months; 55.9% at 12 months) and shortest for the analysis of FEV1 AUC0-12h with trough FEV1 (median, 6.0 months; 81.8% at 6 months). The correlation and regression results of the primary and secondary analyses are shown in Tables 4-6 and Figures 2-4. Table 4 provides weighted and unweighted Pearson correlation coefficients and linear regression results showing values corresponding to known MCIDs for each combination of endpoints at available time points. Figure 2 provides visual representation of the association between these combinations of endpoints at the last assessment using bubble plots.Primary analysis conducted at the last assessment with weighted means of changes from baseline in trough FEV1 and SGRQ showed a large, significant negative correlation coefficient (r [95% CI], N), corresponding to −0.68 ([−0.77, −0.57], 111) (Table 4). The regression results (weighted) confirmed this highly significant association (p < 0.0001) with an improvement of 100 mL in trough FEV1 corresponding to a reduction of 5.9 in SGRQ total score and a reduction of 4 units on the SGRQ total score, equating to a 40 mL improvement in trough FEV1 (Table 4, Figure 2A). Results of weighted analyses between trough FEV1 and the TDI score at the last assessment showed a large, significant positive correlation, with an improvement of 100 mL in trough FEV1 corresponding to an improvement of 1.9 on the TDI score, while an improvement of 1 point on TDI was equivalent to a 48 mL reduction in trough FEV1 (p < 0.0001) (Table 4, Figure 2B). A large, negative correlation coefficient was obtained using the time-adjusted FEV1 AUC0-12h and SGRQ at the last assessment. Weighted regression results also indicated a highly significant association (p = 0.0031) between FEV1 AUC0-12h and SGRQ at last assessment, with an improvement of 100mL in FEV1 AUC0-12h corresponding to an improvement of -5.75 on SGRQ, while an improvement of 4 units on SGRQ corresponds to a 10 mL reduction in FEV1 AUC0-12h (Table 4, Figure 2C). Statistically significant negative correlations were obtained between trough FEV1 and the annual rate of exacerbations (overall, moderate or severe). Table 4 and Figures 2D and 2E show that improvement in FEV1 leads to reduction in the annual rate of exacerbations. An improvement of 100 mL in trough FEV1 corresponds to an annual rate of exacerbations of 0.5, while no change on FEV1 corresponds to an annual rate of exacerbations of 2.3 (p = 0.0002). An improvement of 100 mL in trough FEV1 corresponds to an annual rate of moderate or severe exacerbations of 0.7, while no change on FEV1 corresponds to an annual rate of moderate or severe exacerbations of 0.9 (p < 0.0001). Results of the sensitivity analyses conducted at other time points (6 and/or 12 months, subject to availability of data, Table 4) were comparable. Results of the unweighted analyses (Table 4) also were consistent with the results of the weighted analyses. Further analyses conducted at the last assessment excluding the placebo groups are shown in Table 5. The weighted correlation coefficients at last assessment for the following pairs, trough FEV1 and SGRQ (−0.63), trough FEV1 and TDI (0.31), FEV1 AUC0-12h and SGRQ (−0.49), exacerbation rate (overall) and trough FEV1 (−0.88) and exacerbation rate (moderate/severe) and trough FEV1 (−0.67) were statistically significant (all p < 0.05) (Table 5, Figure 3). Overall, these results limited to active treatment were similar to the main analysis. The correlations of FEV1 with PROs were significant although slightly decreased; correlations with exacerbation rates were significant and slightly increased. Further analyses conducted at the last assessment with weighted means of difference over placebo in trough FEV1 and in SGRQ showed a medium and statistically significant correlation coefficient −0.35 [(−0.56, −0.08), 53] (Table 6). The weighted regression results indicate a significant association between the change beyond placebo in trough FEV1 and in SGRQ at the last assessment (p < 0.05), with an improvement over placebo of 100 mL in trough FEV1 corresponding to a reduction of 2.9 in SGRQ total score and conversely, a reduction of 4 units on the SGRQ total score, corresponding to a 201 mL improvement in trough FEV1 beyond placebo (Table 6, Figure 4). Analysis of all other combinations of endpoints exploring the association of effects beyond placebo on FEV1 and on PROs, with weighted or unweighted approach (Table 6) lead to non-significant results (p > 0.05).


Both objectively measured lung function and subjectively measured PROs are frequently assessed during COPD clinical management. Both of these endpoints remain important to decision makers with regulators preferring to assess benefits of new treatments on lung function and payers on PROs. However, data on the association between spirometric measurements and PROs among patients with COPD are sparse, generally limited to a single study context and with different methodologies and outcomes potentially leading to variable conclusions.63-66 A previous meta-analysis67 evaluated the association between lung function measurements and PROs in bronchodilator trials. This study further explores the relationship between spirometric measurements and PROs and includes current evidence from combination therapies in COPD trials. Our primary analysis showed a large and highly significant association between SGRQ and trough FEV1. Analyses with other pairings of spirometric measurements and PROs showed correspondingly large correlation coefficients, and a similar trend: A MCID change in FEV1 corresponding to a larger than MCID change in PROs. Such trends, where significant changes in PROs are associated with subclinical changes in objective parameters (such as FEV1), are often encountered in clinical trials. Potential contributing factors to this phenomenon are the Hawthorne effect, wherein the study participants change their behavior because they are observed, or the Pygmalion effect whereby the patients’ desire to meet the expectations of their clinician or the study sponsor tends to exaggerate their symptoms and their impacts at inclusion and minimize these at follow-up,68 leading to optimistic change over time. As these factors are observed in both active and placebo arms, there are no consequences for treatment group comparisons, though the phenomena may result in apparent discrepancies in MCID values and regression estimates for subjective and objective measurements, as observed in the present study. Further, it must be considered that as each MCID has been established independently and using different methods,8-10 it is therefore not surprising to obtain results that do not match. Result of our analyses on combination therapies including newly launched combination bronchodilators, provides a more comprehensive meta-analysis (52 studies; 62,385 patients versus 22 studies; 23,654 patients) compared to the Westwood et al analysis.67 The results of the analysis at 6 and 12 months’ follow-up suggest that the correlation of trough FEV1 with SGRQ and TDI strengthens with time, consistent with the previous study.67 This association decreased slightly after removal of the placebo groups from the analysis and decreased largely when analyzing treatment effects beyond placebo. The association between FEV1 and SGRQ however, remained significant. Overall, the results were consistent with the Westwood et al study suggesting that the association between trough FEV1 and PROs observed in bronchodilator studies remains with combination therapies. Results of the analysis exploring the association of treatment effects beyond placebo are of particular interest. The correlation between FEV1 and SGRQ at last assessment was significant while all other associations did not reach statistical significance. Corresponding regression results indicated that an improvement of 100 mL over placebo in trough FEV1 corresponds to a reduction of 2.9 in SGRQ total score and conversely, a reduction of 4 units in the SGRQ total score corresponds to a 201 mL improvement in trough FEV1 beyond placebo. These estimates are broadly consistent with the results observed in recent studies of dual bronchodilators17,69 and indicate that after eliminating the placebo effect, a 4 point (MCID) change difference on the SGRQ score represents a much larger change than the 100 mL MCID for FEV1. It must be noted that these analyses beyond placebo effect excluded 17 clinical trials that were not placebo-controlled– generally conducted in patients with more severe disease–which may have led to a selection bias. Limiting the analysis to more severe disease with limited variability is particularly detrimental to regression analyses. Further research is needed to address this conclusively. Some limitations of our meta-analysis must be acknowledged. Given the unavailability of individual-patient data, the meta-analysis was conducted using study-level data and the precision of the results would have been increased if the individual-patient data were available. Although we conducted an extensive search of the clinical trial registries and websites of the regulatory authorities to minimize publication bias, this meta-analysis is still limited by the availability of data in the public domain. Furthermore, not all endpoints of interest are available for all studies and also, the endpoint definitions may differ between studies especially for variables such as exacerbation rate and severity of exacerbation. However, given the rigorous methodology followed while ascertaining the endpoint definitions for each study, the risk of misclassification should be minimal. As the studies included are clinical trials of bronchodilators, the study populations for these trials do not usually include an exacerbating patient population, which may lead to fewer exacerbations in these trials. Furthermore, exacerbations are included as safety rather than efficacy endpoints. Thus, these trials are not powered to assess differences in exacerbation rates of the study groups, which would affect the corresponding results of our study.


The results of this meta-analysis provide important clinically meaningful insights into the relationship between FEV1, the standard primary endpoint for COPD clinical trials, and PROs, namely SGRQ health status measure, TDI, and annual exacerbation rates. Besides including additional clinical trials published in the past few years, the study provides results on new endpoints such as the relationship between FEV1 and the annual rate of exacerbations. The strength of these associations is largely decreased when results beyond placebo effect are assessed. Overall, the results of our correlation and regression analyses demonstrate a strong association between changes in spirometric measurements and changes in PROs from their baseline values. This study was funded by GlaxoSmithKline. All listed authors met the criteria for authorship set forth by the International Committee for Medical Journal Editors. The authors would like to acknowledge Eline Huisman for her technical support in the systematic review and data extraction and Juliette Meunier for her technical support in SAS programming. Medical writing services were provided by Vidula Bhole, MD, MHSc, of Cactus Communications and funded by GlaxoSmithKline. YSP, MH, M V-B and ASI are employees of GlaxoSmithKline and hold stock in GlaxoSmithKline. UT was an employee of GlaxoSmithKline at the time of this study and held stock in GlaxoSmithKline. BT, FF, and JL are employed by Mapi and were paid consultants to GlaxoSmithKline. CdL works as an independent consultant and was paid by Mapi to participate in this study. All authors contributed to the conception and design of the study. CdL, BT, FF and JL contributed to data acquisition and analysis. All authors contributed to data analysis and interpretation.

Read full article