Artificial Intelligence technologies for assessing skin lesions for referral on the urgent suspected cancer pathway to detect benign lesions and reduce secondary care specialist appointments: early value assessment
Walton M, Llewellyn A, Uphoff E, Lord J, Harden M, Hodgson R, Simmonds M
Record ID 32018014989
English
Authors' objectives:
Skin cancers are some of the most common types of cancer. Dermatology services receive about 1.2 million referrals a year, but only a small minority are confirmed skin cancer. Artificial intelligence may be helpful in the diagnosis of skin cancer by identifying lesions that are or are not cancerous. To investigate the clinical and cost-effectiveness of two artificial intelligence technologies: DERM (Deep Ensemble for Recognition of Malignancy, Skin Analytics) and Moleanalyzer Pro (FotoFinder), as decision aids following a primary care referral. Skin cancers are some of the most common types of cancer. Over 16,000 cases of melanoma, and more than 210,000 cases of non-melanoma skin cancer are diagnosed every year in the UK. In current practice, patients with suspicious skin lesions are referred to secondary care through the urgent suspected skin cancer referral pathway, where people attend a secondary care dermatology department for a face-to-face appointment with a consultant dermatologist. As benign skin lesions and skin cancer are so common, this places a very high burden on dermatology clinics, which may lead to a reduction in capacity to handle other skin conditions. Artificial intelligence (AI) may be helpful in the diagnosis of skin cancer. An AI system could potentially identify which referred lesions are not cancerous using a high-quality photograph. An AI system could be used alone, or in combination with a dermatologist looking at the photograph. People judged not to have cancer could then be quickly discharged prior to secondary care consultation, while people whose lesion may be cancerous may be seen by a specialist in person. AI systems could therefore potentially speed up the diagnostic process and reduce the burden on the health service. AI systems are already used in the NHS in a research context, but there is a need to evaluate their clinical impact and value. This project investigated whether two such AI technologies – Deep Ensemble for Recognition of Malignancy (DERM; Skin Analytics) and Moleanalyzer Pro (FotoFinder) –can produce clinically meaningful benefits for skin cancer diagnosis, and whether they have the potential to be cost-effective. The aim of the project was to investigate the clinical and cost-effectiveness of the two AI technologies, DERM and Moleanalyzer Pro, as decision aids to triage and diagnose suspicious skin lesions following a referral on the urgent suspected skin cancer pathway. To achieve this, the following objectives were proposed: To perform a rapid systematic review, narrative synthesis, and, where feasible, a meta-analysis, of the diagnostic accuracy, clinical impact and practical implementation of the included AI technologies. To perform a rapid systematic review of published cost-effectiveness studies of diagnostic strategies used to aid the diagnosis of skin cancer. To develop a conceptual model that will identify likely drivers of health benefit, harms and costs associated with implementing the included AI technologies in the NHS and identify areas for further research.
Authors' results and conclusions:
Four studies of DERM and two of Moleanalyzer Pro were subject to full synthesis. DERM had a sensitivity of 96.1% to detect any malignant lesion (95% confidence interval 95.4 to 96.8); at a specificity of 65.4% (95% confidence interval 64.7 to 66.1). For detecting benign lesions, the sensitivity was 71.5% (95% confidence interval 70.7 to 72.3) for a specificity of 86.2% (95% confidence interval 85.4 to 87.0). Moleanalyzer Pro had lower sensitivity, but higher specificity for detecting melanoma than face-to-face dermatologists. DERM might lead to around half of all patients being discharged without assessment by a dermatologist, but a small number of malignant lesions would be missed. Patient and clinical opinions showed substantial resistance to using artificial intelligence without any assessment of lesions by a dermatologist. No published assessments of the cost-effectiveness of the technologies were identified; three assessments related to skin cancer more broadly in a National Health Service setting were identified. These studies employed similar model structures, but the mechanism by which diagnostic accuracy influenced costs and health outcomes differed. An unpublished cost–utility model was provided by Skin Analytics. Several issues with the modelling approach were identified, particularly the mechanisms by which value is driven and how diagnostic accuracy evidence was used. The conceptual model presents an alternative approach, which aligns more closely with the National Institute for Health and Care Excellence reference case and which more appropriately characterises the long-term consequences of basal cell carcinoma. DERM shows promising diagnostic accuracy for triage and diagnosis of suspicious cancer lesions in selected patients referred from primary care. Its impact on the diagnostic pathway and patient care is, however, uncertain. Moleanalyzer Pro shows promising accuracy for diagnosing melanoma, but its evidence base is limited. Diagnostic accuracy and clinical impact of DERM Six studies of DERM were identified, of which four were considered for full synthesis. Those four studies were all conducted in the UK. All studies excluded a substantial proportion of participants from assessment, which may produce biased results. Meta-analysis of diagnostic accuracy data supplied by the company suggested that DERM has a high sensitivity of 96.1% to detect any malignant lesion [95% confidence interval (CI) 95.4 to 96.8], at a specificity of 65.4% (95% CI 64.7 to 66.1). The diagnostic accuracy for detecting melanoma or squamous cell carcinoma specifically was similar. For the detection of benign lesions, the sensitivity was 71.5% (95% CI 70.7 to 72.3) for a specificity of 86.2% (95% CI 85.4 to 87.0). This appears to be comparable in diagnostic accuracy to that achieved by dermatologists without the use of DERM. The diagnostic accuracy of combining DERM with assessment by a dermatologist could not be assessed. Data on the clinical impact of using DERM were limited, and mostly unpublished. Some trial data suggested that autonomous use of DERM would lead to approximately half of patients being referred to a dermatologist for further assessment, and half being discharged. However, around 1% of people would be discharged with malignant lesions [mostly basal cell carcinomas (BCCs)]. DERM could potentially be used as part of a teledermatology service. However, use of DERM may slow progress to diagnosis. Patient and clinical opinions of DERM were generally favourable towards accepting its use as part of the diagnostic pathway. However, there was very substantial resistance, particularly among clinicians, to using DERM without any assessment of lesions by a dermatologist. Impact on practice The diagnostic accuracy of DERM suggests that it has potential for use within a post-primary care referral setting. This could be either alongside assessment by dermatologists or as an autonomous tool within the post-referral pathway within a subset of patients. However, the practical impact and clinical benefit of using DERM in a post-referral setting is currently unclear. In particular, the impact on referrals and secondary care appointments, the burden on clinicians and the subsequent clinical impact on patients are largely unclear. Although Moleanalyzer Pro shows promising accuracy for diagnosing melanoma, its evidence base is currently too limited to fully assess its clinical value. Evidence on the diagnostic accuracy and clinical value of AI in people with darker skin tones or with lesions that are more difficult to assess (such as when versions are large, or obscured by scarring, tattooing or hair) was largely absent. Only a small number of people with darker skin tones were recruited to the included studies, and people with hard-to-assess lesions were often excluded. This raises concerns as to whether AI could be used in these people. Current economic evidence supporting the cost-effectiveness of DERM is limited, and it is unclear whether the plausible advantages of DERM represent value for money relative to other strategies. Company-sponsored analyses suggested that DERM used autonomously and with a second read could be highly cost-effective compared to current 2-week wait diagnostic models. However, much of this value is generated through potentially optimistic assumptions around the diagnostic accuracy of comparators, and of the surrounding pathway (confidential information has been removed). Notably, the magnitude of uncaptured non-cash-releasing benefits remains unquantified. There is currently no economic evidence supporting the use of Moleanalyzer Pro, but assuming a similar use case to DERM and appropriate data collection, the value of Moleanalyzer Pro could be assessed using the conceptual framework presented by the EAG.
Authors' methods:
A rapid systematic review of evidence on the two technologies was conducted. A narrative synthesis was performed, with a meta-analysis of diagnostic accuracy data. Published and unpublished cost-effectiveness evidence on the named technologies, as well as other diagnostic technologies were reviewed. A conceptual model was developed that could form the basis of a full economic evaluation. The rapid review approach meant that some relevant material may have been missed, and capacity for synthesis was limited. The proposed conceptual model does not capture non-cash benefits associated with demand on dermatologist time. An assessment of the likely budget impact and resource use could not be provided. Data sources MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials and the Association for Computing Machinery Digital Library were searched in November 2023. Clinical trial registries were searched. Unpublished material supplied by the included companies was also assessed. Inclusion criteria Any clinical study evaluating DERM or Moleanalyzer Pro in people with skin lesions suspicious of cancer, presenting in primary care, rapid diagnostic clinic, teledermatology or secondary care settings were eligible for inclusion. Included studies must report diagnostic accuracy, clinical outcomes, or evidence on implementation. The comparator was clinical judgement by dermatologists, but this did not need to be reported for a study to be eligible. The preferred reference standard for diagnosis was histology, but for unbiopsied lesions, clinical confirmation of non-malignancy was accepted. The cost-effectiveness review included any economic evaluation including budget impact models, return on investment analysis, and other cost-only analyses of either DERM or Moleanalyzer Pro in the above population and setting. It was anticipated that no relevant studies would be identified for the named technologies; therefore, additional searches were also conducted to identify cost-effectiveness studies looking at any technology used to aid diagnosis of skin cancer in an NHS setting.
Authors' identified further research:
The diagnostic accuracy of AI in a post-primary care referral pathway is uncertain and requires further evaluation. A lack of key comparative data on diagnostic accuracy means the relative clinical and cost-effectiveness of pathways incorporating AI technologies and teledermatology remains highly uncertain. Assessments of diagnostic accuracy of AI in people with darker skin tones or with hard-to-assess lesions are urgently needed. Directly comparable evidence on the diagnostic accuracy of AI technologies and teledermatology in a post-referral setting compared with unassisted teledermatology is required to assess the potential value of AI technologies. This would require studies comparing AI with dermatologists’ assessments, recruiting a representative population and case-mix, use of up-to-date versions of AI and dermoscopy, and with a robust independent reference standard for all patients. A better understanding of the clinical benefits and resource implications associated with the implementation of AI technologies will also require further research to set up AI and teledermatology services in the NHS. Further research must also be undertaken to quantify the benefits to population health within skin cancer and other dermatological indications associated with any release of NHS consultant dermatologist resource, and understand the effects of these technologies on waiting times for final diagnosis. This could potentially be achieved through continuations and extensions of existing ongoing pilot studies of DERM, but truly comparative evidence may also be required. Moleanalyzer Pro requires evaluation within a UK teledermatology setting. The substantial resistance from both patients and clinicians to using AI without any human dermatological assessment means that if AI is to be used to direct discharge autonomously, more evidence is needed to demonstrate that it has clear benefits to patients, without sacrificing accuracy.
Details
Project Status:
Completed
URL for project:
https://www.journalslibrary.nihr.ac.uk/programmes/hta/NIHR136014
Year Published:
2026
URL for published report:
https://www.journalslibrary.nihr.ac.uk/hta/GJMS0317
URL for additional information:
English
English language abstract:
An English language summary is available
Publication Type:
Full HTA
Country:
England, United Kingdom
DOI:
10.3310/GJMS0317
MeSH Terms
- Skin Neoplasms
- Melanoma
- Artificial Intelligence
- Image Interpretation, Computer-Assisted
Contact
Organisation Name:
NIHR Health Technology Assessment programme
Contact Address:
NIHR Journals Library, National Institute for Health and Care Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK
Contact Name:
journals.library@nihr.ac.uk
Contact Email:
journals.library@nihr.ac.uk
This is a bibliographic record of a published health technology assessment from a member of INAHTA or other HTA producer. No evaluation of the quality of this assessment has been made for the HTA database.