Software with artificial intelligence-derived algorithms for detecting and analysing lung nodules in CT scans: systematic review and economic evaluation
Geppert J, Auguste P, Asgharzadeh A, Ghiasvand H, Patel M, Brown A, Jayakody S, Helm E, Todkill D, Madan J, Stinton C, Gallacher D, Taylor-Phillips S, Chen YF
Record ID 32018014163
English
Authors' objectives:
Lung cancer is one of the most common types of cancer and the leading cause of cancer death in the United Kingdom. Artificial intelligence-based software has been developed to reduce the number of missed or misdiagnosed lung nodules on computed tomography images. To assess the accuracy, clinical effectiveness and cost-effectiveness of using software with artificial intelligence-derived algorithms to assist in the detection and analysis of lung nodules in computed tomography scans of the chest compared with unassisted reading. Lung nodules are found in different populations: (1) when people are referred for a computed tomography (CT) scan of the chest because they have signs and symptoms suggestive of lung cancer (symptomatic), (2) when people are investigated for conditions unrelated to lung cancer (incidental), or (3) through lung cancer screening programmes (screening). CT scans are also undertaken to assess whether the growth of previously identified nodules indicates malignancy and if further assessment or treatment is needed (surveillance). Nodules may be challenging to detect because of their small size, varying shape and proximity to other structures. This assessment focuses on the use of software with artificial intelligence (AI)-derived algorithms to assist in the detection and analysis of lung nodules in CT chest scans. For the detection and analysis of lung nodules in symptomatic, incidental, screening or surveillance populations, the following key questions are asked. Key question 1 What is the accuracy of CT image analysis assisted by AI software, and what are the practical implications and impacts on patient management?
Authors' results and conclusions:
Twenty-seven studies were included. All were rated as being at high risk of bias. Twenty-four of the included studies used retrospective data sets. Seventeen compared readers with and without artificial intelligence software. One reported prospective screening experiences before and after artificial intelligence software implementation. The remaining studies either evaluated stand-alone artificial intelligence or provided only non-comparative evidence. (1) Artificial intelligence assistance generally improved the detection of any nodules compared with unaided reading (three studies; average per-person sensitivity 0.43–0.68 for unaided and 0.79–0.99 for artificial intelligence-assisted reading), with similar or lower specificity (three studies; 0.77–1.00 for unaided and 0.81–0.97 for artificial intelligence-assisted reading). Nodule diameters were similar or significantly larger with semiautomatic measurements than with manual measurements. Intra-reader and inter-reader agreement in nodule size measurement and in risk classification generally improved with artificial intelligence assistance or were comparable to those with unaided reading. However, the effect on measurement accuracy is unclear. (2) Radiologist reading time generally decreased with artificial intelligence assistance in research settings. (3) Artificial intelligence assistance tended to increase allocated risk categories as defined by clinical guidelines. (4) No relevant clinical effectiveness and cost-effectiveness studies were identified. (5) The de novo cost-effectiveness analysis suggested that for symptomatic and incidental populations, artificial intelligence-assisted computed tomography image analysis dominated the unaided radiologist in cost per correct detection of an actionable nodule. However, when relevant costs and quality-adjusted life-years from the full clinical pathway were included, artificial intelligence-assisted computed tomography reading was dominated by the unaided reader. For screening, artificial intelligence-assisted computed tomography image analysis was cost-effective in the base case and all sensitivity and scenario analyses. Artificial intelligence-assisted analysis of computed tomography scan images may reduce variability of and improve consistency in the measurement and clinical management of lung nodules. Artificial intelligence may increase nodule and cancer detection but may also increase the number of patients undergoing computed tomography surveillance unnecessarily. No direct comparative evidence was found, and nor was any direct evidence found on clinical outcomes and cost-effectiveness. Artificial intelligence-assisted image analysis may be cost-effective in screening for lung cancer but not for symptomatic populations. However, reliable estimates of cost-effectiveness cannot be obtained with current evidence. Key question 1 Twenty-seven studies covering eight NICE-specified AI software and evaluating nodule detection or measurement accuracy/concordance, practical implications and/or impact on patient management were identified. All studies were rated as being at high risk of bias and had multiple applicability concerns. Twenty-four studies used retrospective data sets, 17 of which compared the performance of readers seeing and not seeing the findings of AI software concurrently (‘concurrent AI’). Nine of them allowed comparison with stand-alone AI software without human input (‘stand-alone AI’). One study evaluated readers with concurrent AI only (vs. a reference standard); five studies evaluated stand-alone AI only; and one further study compared stand-alone AI with unaided readers. Only three studies reported on prospective screening experiences based on a pilot trial conducted in the Republic of Korea: two studies reported on software-assisted reading only and one study used a before-and-after design. Accuracy and reliability Detection of any nodules Three studies found that AI assistance significantly increased sensitivity of detecting people with nodules. Pooled per-person sensitivity varied from 0.43 to 0.68 for unaided reading and from 0.79 to 0.99 for AI-assisted reading. Average specificity decreased slightly in two studies while it improved slightly in one study (0.77–1.00 without and 0.81–0.97 with AI assistance). A fourth study reported improved average per-nodule sensitivity from 0.72 to 0.84 with no difference in false-positive rates with AI assistance. AI-assisted detection and analysis of lung nodules increases consistency of nodule measurement and risk classification compared with unaided reading, but its effect on measurement accuracy is unclear. AI assistance appears to improve sensitivity for lung nodule and cancer detection but can be accompanied by a decrease in specificity and an increase in false-positive findings per scan, as well as raising risk categorisation. The reported performance of AI-assisted reading varies substantially among published studies (for any nodules: per-person sensitivity 0.79–0.99, per-person specificity 0.81–0.97), possibly attributable to heterogeneous study and reader populations, other study design features and risk of bias in addition to potential differences in the performance of individual technologies. No eligible studies directly compared the performance of different AI software. Given the paucity of evidence, it is currently not possible to reliably establish the cost-effectiveness of AI-assisted reading compared with unaided reading, or the relative effectiveness and cost-effectiveness of strategies adopting different AI software to assist nodule detection and analysis. However, our preliminary results suggest that AI-assisted reading is dominant for the screening population, but reading without AI assistance dominates for symptomatic and incidental populations. Published studies have largely been conducted retrospectively in a research rather than a clinical environment. All studies in this assessment were rated as being at high risk of bias and had multiple applicability concerns for UK settings. No studies evaluating downstream clinical outcomes were identified. Further studies are required.
Authors' methods:
Systematic review and de novo cost-effectiveness analysis. Searches were undertaken from 2012 to January 2022. Company submissions were accepted until 31 August 2022. Study quality was assessed using the revised tool for the quality assessment of diagnostic accuracy studies (QUADAS-2), the extension to QUADAS-2 for assessing risk of bias in comparative accuracy studies (QUADAS-C) and the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) checklist. Outcomes were synthesised narratively. Two decision trees were used for cost-effectiveness: (1) a simple decision tree for the detection of actionable nodules and (2) a decision tree reflecting the full clinical pathways for people undergoing chest computed tomography scans. Models estimated incremental cost-effectiveness ratios, cost per correct detection of an actionable nodule, and cost per cancer detected and treated. We undertook scenario and sensitivity analyses. Due to the heterogeneity, sparseness, low quality and low applicability of the clinical effectiveness evidence and the major challenges in linking test accuracy evidence to clinical and economic outcomes, the findings presented here are highly uncertain and provide indicators/frameworks for future assessment. Data sources Databases including MEDLINE, EMBASE, Cochrane Database of Systematic Reviews, Cochrane CENTRAL, Health Technology Assessment (HTA) database (Centre for Reviews and Dissemination), International HTA database (INAHTA), Science Citation Index Expanded (Web of Science) and Conference Proceedings – Science (Web of Science) were searched from 1 January 2012 to January 2022. Preprints, trials registries, reference lists of included studies, relevant systematic reviews and forwards citations were also searched. Additional economics sources included NHS Economic Evaluation Database (NHS EED), Cost-Effectiveness Analysis registry (Tufts Medical Center), EconPapers and ScHARRHUD. Company submissions were accepted until 31 August 2022. The intervention was analysis of chest CT images assisted by one of the 13 AI software specified by the National Institute for Health and Care Excellence (NICE).
Details
Project Status:
Completed
URL for project:
https://www.journalslibrary.nihr.ac.uk/programmes/hta/NIHR135325
Year Published:
2025
URL for published report:
https://www.journalslibrary.nihr.ac.uk/hta/JYTW8921
URL for additional information:
English
English language abstract:
An English language summary is available
Publication Type:
Full HTA
Country:
England, United Kingdom
DOI:
10.3310/JYTW8921
MeSH Terms
- Lung Neoplasms
- Multiple Pulmonary Nodules
- Tomography, X-Ray Computed
- Artificial Intelligence
- Cost-Effectiveness Analysis
Contact
Organisation Name:
NIHR Health Technology Assessment programme
Contact Address:
NIHR Journals Library, National Institute for Health and Care Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK
Contact Name:
journals.library@nihr.ac.uk
Contact Email:
journals.library@nihr.ac.uk
This is a bibliographic record of a published health technology assessment from a member of INAHTA or other HTA producer. No evaluation of the quality of this assessment has been made for the HTA database.