The implications of competing risks and direct treatment disutility in cardiovascular disease and osteoporotic fracture: risk prediction and cost effectiveness analysis

Guthrie B, Rogers G, Livingstone S, Morales DR, Donnan P, Davis S, Youn JH, Hainsworth R, Thompson A, Payne K
Record ID 32018011176
English
Authors' objectives: Clinical guidelines commonly recommend preventative treatments for people above a risk threshold. Therefore, decision-makers must have faith in risk prediction tools and model-based cost-effectiveness analyses for people at different levels of risk. Two problems that arise are inadequate handling of competing risks of death and failing to account for direct treatment disutility (i.e. the hassle of taking treatments). We explored these issues using two case studies: primary prevention of cardiovascular disease using statins and osteoporotic fracture using bisphosphonates. Externally validate three risk prediction tools [QRISK®3, QRISK®-Lifetime, QFracture-2012 (ClinRisk Ltd, Leeds, UK)]; derive and internally validate new risk prediction tools for cardiovascular disease [competing mortality risk model with Charlson Comorbidity Index (CRISK-CCI)] and fracture (CFracture), accounting for competing-cause death; quantify direct treatment disutility for statins and bisphosphonates; and examine the effect of competing risks and direct treatment disutility on the cost-effectiveness of preventative treatments. Clinical guidelines help define and disseminate best practice. Guidelines increasingly use risk prediction tools to help target primary preventative treatments at people at highest risk. In National Institute for Health and Care Excellence (NICE) guidelines, the choice of risk threshold is commonly informed by model-based cost-effectiveness analyses (CEAs) for different levels of baseline risk. Risk prediction modelling and model-based CEA are therefore increasingly important for developing guidelines that recommend long-term preventative medicines, including primary prevention of cardiovascular disease (CVD) using statins and prevention of osteoporotic fracture using bisphosphonates. Risk prediction and competing mortality risk Most risk prediction models do not account for competing mortality risk, which is when someone dies of another condition (e.g. lung cancer) before experiencing the event being predicted (e.g. CVD or fracture). This can lead to overprediction of event rates among older people and those with multimorbidity. For CVD modelling, CPRD GOLD data were used to define a cohort aged 25–84 years without CVD or prior statin prescription. The outcome was incident CVD. Multiple imputation was used to account for missing data. The performance of the published QRISK3–2017 model was evaluated in terms of discrimination (the ability of a tool to distinguish between those with and those without an event) and calibration (whether or not predicted risk is the same as observed risk) in the whole population, stratified by age and Charlson Comorbidity Index (CCI), and in subgroups with type 1 diabetes, type 2 diabetes and chronic kidney disease (CKD). Observed risk was estimated with and without accounting for competing risk (using Aalen–Johansen and Kaplan–Meier estimators, respectively). For fracture modelling, the cohort was aged 30–99 years (prior fracture or bisphosphonate treatment were allowed) with follow-up to specified fracture, death from non-fracture causes, deregistration or end of study. Two outcomes were defined: major osteoporotic fracture (MOF) and hip fracture. QFracture-2012 performance was evaluated as for QRISK3. For both cohorts, the earliest study entry date was 1 January 2004 and the end of the study was 31 March 2016. Using the same data set as objective 1, participants were randomly allocated to derivation and test data sets in a 2 : 1 ratio. For CVD, two Fine–Gray models were derived in the derivation data set and internally validated in the test data set, alongside QRISK3. The competing mortality risk model (CRISK) accounted for competing mortality only, whereas the competing mortality risk model with Charlson Comorbidity Index (CRISK-CCI) also included the modified CCI as a predictor. Model performance was examined using discrimination and calibration. For fracture, separate Fine–Gray models (CFracture) were estimated for MOF and hip fracture. The same data were used as for objective 1, but with the age range restricted to 30–84 years to match QRISK®-Lifetime (ClinRisk Ltd). As lifetime risk is not observed in this data set, model performance was evaluated at 10 years, and reclassification examined the characteristics of those recommended for treatment on the basis of a QRISK3 10-year risk of > 10%, a QRISK-Lifetime 10-year risk of > 10% and the QRISK-Lifetime highest risk, with thresholds chosen to recommend the same number of people for treatment as with QRISK3 > 10%. Two groups of participants were recruited to studies to elicit DTD of preventative statins and bisphosphonates: people with direct experience of taking one of the medicines and a sample of the general population. We described the process of taking each medicine (one tablet per day for statins, one tablet per week taken on an empty stomach with a requirement to stay upright for at least 30 minutes for bisphosphonates). Elicitation used time trade-off (TTO) (primary analysis) and best–worst scaling (BWS) (exploratory analysis) surveys iteratively developed using think-aloud interviews with 19 patients, and online pilot studies. For statins for the primary prevention of CVD, we modified the cohort-level decision-analytic model used in NICE’s lipid modification guideline [NICE. Lipid Modification: Cardiovascular Risk Assessment and the Modification of Blood Lipids for the Primary and Secondary Prevention of Cardiovascular Disease. Clinical Guideline (CG181). Methods, Evidence and Recommendations. July 2014. URL: https://web.archive.org/web/20220201050407/https://www.nice.org.uk/guidance/cg181/evidence/lipid-modification-update-full-guideline-pdf-243786637 (accessed 12 October 2022)]. General updates included rapid reviews to identify utility values and costs associated with CVD events, new regressions to predict baseline quality of life for people without CVD (based on Health Survey for England data) and type of first CVD event (based on data from objective 1), and inputs (costs, life expectancy) were updated to present-day values. For bisphosphonates for the prevention of fracture, we used the discrete-event simulation developed for NICE’s Technology Appraisal 464 [NICE. Bisphosphonates for Treating Osteoporosis. Technology Appraisal Guidance (TA464). London: NICE; 2017]. For both models, we explored competing risk by parameterising probability of non-cause-specific death using relative survival models adjusting for predicted risk (QRISK3 or QFracture-2012). We incorporated DTD as elicited in objective 4 under three assumptions (lifelong, time limited, diminishing over time). We explored how these factors alone or in combination affect the estimated value of the preventative medicines in terms of cost per quality-adjusted life-year (QALY). Discrimination of QRISK3 in the whole external validation cohort was excellent (Harrell’s c = 0.865 for women, 0.834 for men), and comparable to the previous internal validation. However, discrimination was worse among people with more comorbidity, and was poor to moderate among older people (e.g. c = 0.611 for women and 0.585 for men aged 75–84 years). Calibration in the whole population, ignoring competing risks, was very good, with minor overprediction. There was larger overprediction among older people, which was considerable after accounting for competing risks. Among people with type 1 diabetes, discrimination was excellent (c = 0.830 for women, 0.853 for men). There was evidence of overprediction at higher levels of predicted risk, which was larger after accounting for competing risks, although most overprediction happened well above the NICE 10% threshold for offering treatment. Discrimination among people with CKD was only moderate (women, c = 0.705; men, c = 0.671), but calibration was reasonable at recommended treatment thresholds. The new competing risk model (CRISK-CCI) had similar discrimination to QRISK3 in the whole population (women, c = 0.864; men, c = 0819), with the same pattern of worse discrimination among older people and those with more comorbidity. Calibration was systematically better than QRISK3, although, as with QRISK3, there was overprediction in some subgroups with high predicted risk. Observed age-stratified incidences of both MOF and hip fracture were considerably higher in this study than in a previous external validation, which was partly explained by the use of hospital data in this study to ascertain fractures. Discrimination of QFracture-2012 in external validation was excellent among women (MOF, c = 0.813; hip fracture, c = 0.918) and good to excellent among men (MOF, c = 0.738; hip fracture, c = 0.888), similar to QFracture-2012 internal validation, but had poor to moderate discrimination among older people. Ignoring competing risks, QFracture-2012 showed serious underprediction in the whole population and in all subgroups of age and comorbidity, which was worse for hip fracture than for MOF. Accounting for competing risks reduced observed underprediction in the whole population, but there was very major overprediction among older people and at higher levels of predicted risk among people with more comorbidity. The new competing risk model (CFracture) had similar discrimination to QFracture-2012 in the internal validation cohort (women: c = 0.813 for MOF, c = 0.914 for hip fracture; men: c = 0.734 for MOF, c = 0.883 for hip fracture). CFracture was better calibrated than QFracture-2012 but showed overprediction at higher levels of predicted risk for MOF (both sexes) and for hip fracture (among men). CFracture calibration was poor among people aged 85–99 years for both outcomes. Evaluated at 10 years’ follow-up, QRISK-Lifetime had excellent discrimination (women, c = 0.844; men, c = 0.808) in the whole population, with the same pattern as QRISK3 and CRISK-CCI of worse discrimination among older people and those with high comorbidity. QRISK-Lifetime underpredicted 10-year risk among people at higher predicted risk, particularly older people, implying that estimated lifetime risk will be underpredicted. A total of 5.3% of participants were recommended for treatment by both QRISK3 and QRISK-Lifetime, and 27.4% by one or the other, but not both. Participants recommended for treatment by QRISK-Lifetime were younger than those recommended by QRISK3 (mean age: women, 50.5 vs. 71.3 years, respectively; men, 46.3 vs. 63.8 years, respectively), were much more likely to have a strong family history of CVD (women: 36.3% vs. 6.3%, respectively; men: 20.0% vs. 7.2%, respectively) and had many fewer observed events during the 10-year follow-up (women with a CVD event: 4.0% vs. 11.9%, respectively; men with a CVD event: 4.3% vs. 10.8%, respectively). When measured by TTO, long-term statin use was associated with mean DTD of 0.034 among people willing to take statins; the equivalent number for bisphosphonates was significantly greater, at 0.067. The findings from the BWS experiment had face validity in that inconvenience influenced preferences. However, the estimated values for DTD are implausibly large. Consistent with previous studies, these findings suggest three distinct preference phenotypes: some people would avoid taking the medicines at all costs, some people see no problem with them and some people are willing to trade length of life to avoid treatment. The first group are unlikely to initiate treatment and the second group do not anticipate DTD; in the third group, depending on the individual’s strength of preference to avoid treatment and the magnitude of expected QALY gains from prevention, DTD may imply that a preventative medicine’s negative characteristics outweigh its benefits. General updates to the CVD model made high-intensity statins more cost-effective for primary prevention. Introducing accurate adjustment for competing risk of non-CVD death had the expected effect: more QALYs among people with below-average CVD risk for their sex and age (who experience lower rates of other-cause mortality) and fewer QALYs among people with above-average risk (whose non-CVD life expectancy is attenuated). However, the impact on incremental cost-effectiveness is minor, and statins remain almost universally cost-effective. Incorporating DTD has a more obvious effect, especially when we assume that it applies undiminished for as long as people take statins for primary prevention. Under that circumstance, the threshold at which expected long-term benefits outweigh DTD-related harm rises with age: for a 40-year-old, a 10-year risk of ≥ 8% would be enough to make treatment net beneficial whereas, for an 80-year-old, that figure rises to 38%. The model assessing bisphosphonates for the primary prevention of osteoporotic fragility fracture shows that we overestimate value for money among people at the highest risk if we do not adjust for competing risk of non-fracture death. However, this generally affects only the magnitude of expected net benefit among people for whom some degree of benefit is expected. Even among people at highest risk of fracture, average QALY gains associated with bisphosphonates are small and swamped by DTD of any duration. Consequently, it is impossible to identify any group of people for whom oral bisphosphonates represent an effective use of NHS resources, if we assume population-level average DTD for everyone to whom the decision applies.
Authors' results and conclusions: CRISK-CCI has excellent discrimination, similar to that of QRISK3 (Harrell’s c = 0.864 vs. 0.865, respectively, for women; and 0.819 vs. 0.834, respectively, for men). CRISK-CCI has systematically better calibration, although both models overpredict in high-risk subgroups. People recommended for treatment (10-year risk of ≥ 10%) are younger when using QRISK-Lifetime than when using QRISK3, and have fewer observed events in a 10-year follow-up (4.0% vs. 11.9%, respectively, for women; and 4.3% vs. 10.8%, respectively, for men). QFracture-2012 underpredicts fractures, owing to under-ascertainment of events in its derivation. However, there is major overprediction among people aged 85–99 years and/or with multiple long-term conditions. CFracture is better calibrated, although it also overpredicts among older people. In a time trade-off exercise (n = 879), statins exhibited direct treatment disutility of 0.034; for bisphosphonates, it was greater, at 0.067. Inconvenience also influenced preferences in best–worst scaling (n = 631). Updated cost-effectiveness analysis generates more quality-adjusted life-years among people with below-average cardiovascular risk and fewer among people with above-average risk. If people experience disutility when taking statins, the cardiovascular risk threshold at which benefits outweigh harms rises with age (≥ 8% 10-year risk at 40 years of age; ≥ 38% 10-year risk at 80 years of age). Assuming that everyone experiences population-average direct treatment disutility with oral bisphosphonates, treatment is net harmful at all levels of risk. Ignoring competing mortality in risk prediction overestimates the risk of cardiovascular events and fracture, especially among older people and those with multimorbidity. Adjustment for competing risk does not meaningfully alter cost-effectiveness of these preventative interventions, but direct treatment disutility is measurable and has the potential to alter the balance of benefits and harms. We argue that this is best addressed in individual-level shared decision-making. Objectives 1 and 2: predicting cardiovascular disease Discrimination of QRISK3 in the whole external validation cohort was excellent (Harrell’s c = 0.865 for women, 0.834 for men), and comparable to the previous internal validation. However, discrimination was worse among people with more comorbidity, and was poor to moderate among older people (e.g. c = 0.611 for women and 0.585 for men aged 75–84 years). Calibration in the whole population, ignoring competing risks, was very good, with minor overprediction. There was larger overprediction among older people, which was considerable after accounting for competing risks. Among people with type 1 diabetes, discrimination was excellent (c = 0.830 for women, 0.853 for men). There was evidence of overprediction at higher levels of predicted risk, which was larger after accounting for competing risks, although most overprediction happened well above the NICE 10% threshold for offering treatment. Discrimination among people with CKD was only moderate (women, c = 0.705; men, c = 0.671), but calibration was reasonable at recommended treatment thresholds. The new competing risk model (CRISK-CCI) had similar discrimination to QRISK3 in the whole population (women, c = 0.864; men, c = 0819), with the same pattern of worse discrimination among older people and those with more comorbidity. Calibration was systematically better than QRISK3, although, as with QRISK3, there was overprediction in some subgroups with high predicted risk. Implications for healthcare Ignoring competing mortality in risk prediction overestimates the risk of CVD and fracture among older people and those with multimorbidity, which will lead to overestimation of the benefits of treatment. This affects fracture risk prediction more than CVD because CVD is a more substantial proportion of total mortality. The QFracture-2012 prediction tool simultaneously underestimates fracture risk among people without high competing mortality risk, partly because it did not include fractures recorded only in hospital data in its derivation. CVD and fracture risk prediction are improved by accounting for competing mortality risks, and transparency of the tools would be improved by fully publishing the codes used to define events and predictors. We have demonstrated an effective method of making accurate adjustment for competing risk of non-cause-specific death in decision-analytic CEAs. Although it made relatively little difference to the estimated cost-effectiveness of preventative interventions in the examples we explored, we have shown that it could potentially be important. Therefore, we recommend that modellers consider this issue when designing analyses of preventative treatments. Although we have demonstrated that DTD exists and has the potential to alter the balance of benefits and harms for preventative treatments, we do not recommend that population-level average DTD is incorporated in base-case CEAs. Rather, we recommend that decision-makers review scenarios with and scenarios without DTD and highlight its possible impact, enabling prescribers to engage in shared decision-making that gives appropriate weight to individual preferences.
Authors' methods: Discrimination and calibration of risk prediction models (Clinical Practice Research Datalink participants: aged 25–84 years for cardiovascular disease and aged 30–99 years for fractures); direct treatment disutility was elicited in online stated-preference surveys (people with/people without experience of statins/bisphosphonates); costs and quality-adjusted life-years were determined from decision-analytic modelling (updated models used in National Institute for Health and Care Excellence decision-making). Treating data as missing at random is a strong assumption in risk prediction model derivation. Disentangling the effect of statins from secular trends in cardiovascular disease in the previous two decades is challenging. Validating lifetime risk prediction is impossible without using very historical data. Respondents to our stated-preference survey may not be representative of the population. There is no consensus on which direct treatment disutilities should be used for cost-effectiveness analyses. Not all the inputs to the cost-effectiveness models could be updated. Objective 1 methods For CVD modelling, CPRD GOLD data were used to define a cohort aged 25–84 years without CVD or prior statin prescription. The outcome was incident CVD. Multiple imputation was used to account for missing data. The performance of the published QRISK3–2017 model was evaluated in terms of discrimination (the ability of a tool to distinguish between those with and those without an event) and calibration (whether or not predicted risk is the same as observed risk) in the whole population, stratified by age and Charlson Comorbidity Index (CCI), and in subgroups with type 1 diabetes, type 2 diabetes and chronic kidney disease (CKD). Observed risk was estimated with and without accounting for competing risk (using Aalen–Johansen and Kaplan–Meier estimators, respectively). For fracture modelling, the cohort was aged 30–99 years (prior fracture or bisphosphonate treatment were allowed) with follow-up to specified fracture, death from non-fracture causes, deregistration or end of study. Two outcomes were defined: major osteoporotic fracture (MOF) and hip fracture. QFracture-2012 performance was evaluated as for QRISK3. For both cohorts, the earliest study entry date was 1 January 2004 and the end of the study was 31 March 2016.
Details
Project Status: Completed
Year Published: 2024
URL for additional information: English
English language abstract: An English language summary is available
Publication Type: Full HTA
Country: England, United Kingdom
MeSH Terms
  • Cardiovascular Diseases
  • Osteoporosis
  • Fractures, Bone
  • Osteoporotic Fractures
  • Risk Assessment
  • Risk Factors
  • Cost-Effectiveness Analysis
  • Primary Prevention
  • Diphosphonates
  • Drug Therapy
Contact
Organisation Name: NIHR Health Services and Delivery Research programme
Contact Address: NIHR Journals Library, National Institute for Health and Care Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK
Contact Name: journals.library@nihr.ac.uk
Contact Email: journals.library@nihr.ac.uk
This is a bibliographic record of a published health technology assessment from a member of INAHTA or other HTA producer. No evaluation of the quality of this assessment has been made for the HTA database.