Variation within and between digital pathology and light microscopy for the diagnosis of histopathology slides: blinded crossover comparison study
Snead DRJ, Azam AS, Thirlwall J, Kimani P, Hiller L, Bickers A, Boyd C, Boyle D, Clark D, Ellis I, Gopalakrishnan K, Ilyas M, Kelly P, Loughrey M, Neil D, Rakha E, Roberts IS, Sah S, Soares M, Tsang YW, Salto-Tellez M, Higgins H, Howe D, Takyi A, Chen Y, Ignatowicz A, Madan J, Nwankwo H, Partridge G, Dunn J
Record ID 32018014369
English
Authors' objectives:
Digital pathology refers to the conversion of histopathology slides to digital image files for examination on computer workstations as opposed to conventional microscopes. Prior to adoption, it is important to demonstrate pathologists provide equivalent reports when using digital pathology in comparison to bright-field and immunofluorescent light microscopy, the current standard of care. A multicentre comparison of digital pathology with light microscopy for reporting of histopathology slides, measuring variation within and between pathologists on both modalities. There is considerable interest in the development of digital pathology (DP) as a means of reporting histopathology samples. The flexibility that electronic distribution of the reporting workload permits is seen as an important development to improve quality and efficiency of histopathology, which is currently a major cause of delay in many cancers as well as many other chronic disease pathways. Previous studies have not reported on cancer screening samples and include few large (1000 plus cases) multisite studies. In addition, some studies have shown there may be important differences in the way pathologists report cases on DP compared with light microscopy (LM), particularly with reference to identifying bacteria, grading dysplasia, recognising calcium oxalate crystals or small nodal metastases. Concerns over the quality of evidence supporting DP in cancer screening samples led to an embargo on the use of the technology for reporting these samples pending further data, which remains in place. Additional interest lies in understanding how transformational change of this character will be seen by pathologists and laboratory technicians and how it may impact on existing laboratory workflow. The change to DP requires capital investment in slide scanning equipment, workstations, computer servers and networking infrastructure, all of which will place considerable strain on already overstretched information technology resources. Therefore, there is considerable interest in how these investment costs may be offset by improved efficiency in the service, particularly in whether DP provides any advantage to the speed of reporting slides over conventional LM. Finally, since in radiology, which as a diagnostic imaging modality shares some parallels with DP, the use of eye-tracking studies has led to an understanding of how poor examination technique can contribute to errors in screening images, we were interested to learn if similar approaches may be relevant to DP. The primary objective was to estimate intra-pathologist agreement between reports issued on DP in comparison to LM. The secondary objectives were to estimate inter-pathologist agreement for LM reports, estimate inter-pathologist agreement for DP reports and compare diagnosis confidence for LM and DP reports. A qualitative study to understand the views of pathologists and technicians on the impact DP on laboratory practice was conducted before and during the study. A health economics study analysed measurements made on how long the reporting of cases took using DP in comparison to LM, and an eye-tracking study examined different pathologists’ examination techniques using DP.
Authors' results and conclusions:
Two thousand and twenty-four cases (608 breast, 607 gastrointestinal, 609 skin, 200 renal) were recruited, with breast and gastrointestinal including screening samples [207 (34%) breast, 250 (41%) gastrointestinal]. Overall, in light microscopy versus digital pathology comparisons, clinical management concordance levels were 99.95% (95% confidence interval 99.91 to 99.97). Similar results were observed within specialties [breast: 99.40% (95% confidence interval 99.06 to 99.62); gastrointestinal 99.96% (95% confidence interval 99.89 to 99.99); skin 99.99% (95% confidence interval 99.92 to 100.0); renal 99.99% (95% confidence interval 99.57 to 100.0)], and within screening cases [98.96% (95% confidence interval 98.42 to 99.32), breast 96.27% (94.63 to 97.43), gastrointestinal 99.93% (95% confidence interval 99.68 to 99.98)]. Reporting time between digital pathology and light microscopy was similar, but pathologists became faster on digital pathology with familiarity. Pathologists recorded high levels of confidence in their diagnosis with light microscopy, significantly higher than digital pathology. Clinical management concordance levels between the two modalities exceed the reference 98.3% in breast, gastrointestinal, skin and renal specialties, and pooled breast and large bowel cancer screening cases. Subgroup analysis of clinically significant differences revealed a range of differences including areas where interobserver variability is known to be high, which were distributed between reads performed with both platforms and without apparent trends to either. A total of 2024 cases were included in the study. These comprised 608 breast, 607 GI, 609 skin and 200 renal. Cancer screening samples from the breast cancer screening service numbered 207 (34%) and there were 250 (41%) samples from the large bowel cancer screening programme. Overall, the primary end-point LM versus DP comparisons showed CMC levels were 99.95% (95% CI 99.91 to 99.97). Similar results were observed within specialties groups, namely, breast 99.40% (95% CI 99.06 to 99.62); GI 99.96% (95% CI 99.89 to 99.99); skin 99.99% (95% CI 99.92 to 100.0); and renal 99.99% (95% CI 99.57 to 100.0). Within cancer screening cases, overall CMC was 98.96% (95% CI 98.42 to 99.32), breast 96.27 (94.63 to 97.43), large bowel 99.93 (99.68 to 99.98). Pathologists recorded high levels of confidence in all specialty groups, with higher confidence seen in LM compared with DP, although this was not statistically significant. The qualitative study showed there were a range of views expressed on the impact of DP. In order to achieve wide acceptance, it is important DP needs to integrate seamlessly into the laboratory workflow. The advantages DP offers will not be realised if on implementation pathologists and/or technicians have to constantly move between systems to complete tasks or if networking speed impacts the systems performance. The need for accurate data on the benefits of DP is likely to be important in helping laboratories make the decision to transition to DP. Successful implementation requires careful planning avoid the many potential pitfalls. The health economics study showed no clear advantage with either modality, but clear evidence about pathologists’ speed in reporting with DP improved over the course of the study. While there are likely to be considerable benefits in transitioning to DP, the differences in time taken to report cases between the two modalities appear very small and probably insignificant. The eye-tracking study showed that a collection of data relating to slide examination is feasible and there was a clear correlation between experience and diagnostic accuracy. There were differences in examination technique between experienced and less experienced pathologists, with the latter showing greatly more efficient slide examination, and more use of low and intermediate power, with targeted use of high-power objectives. Experienced pathologists were quicker to recognise features and move on than less experienced pathologists. This is the first study to comprehensively examine intra-pathologist and inter-pathologist variability using LM and DP compared to a consensus GT on the same set of slides, and the first study to examine cancer screening samples. The results show pathologists give equivalent results with either modality in all the areas studied. No trends to favour either modality were identified, even concerning the identification of small objects such as the detection of Helicobacter pylori or the grading of dysplasia. The study provides definitive evidence that pathologists provide equivalent results when using DP as they would using LM. However, pathologists did show a trend to increased confidence using LM compared with DP which did not reach statistical significance, but which may reflect the improved resolution of this modality, and/or increased familiarity with LM. This is also the first study to assess DP as a means of reporting native and transplant renal biopsies, including assessing fluorescence-stained slides. This is a potentially transformational technology for this specialty. Renal biopsies represent a small-volume, highly complex area of diagnostic pathology. Providing adequately trained pathologists to serve the needs of these patients across the country is a major challenge, even more so to provide cover out of hours which is needed to support the care of renal transplant recipients in need of urgent assessment. The results, particularly from the renal biopsy cases which demand fine resolution for interpretation, suggest DP is very likely to be suitable for other specialty areas with similar demands, such as haematopathology and neuropathology. Furthermore, immunofluorescence-stained sections are non-permanent. DP provides a potential solution for all these challenges, enabling difficult cases to be shared with experts many miles distant from the host laboratory, thereby providing the basis for a more resilient service 24/7. Finally, DP images provide, for the first time, a permanent record of the fluorescence-stained sections performed as a routine on native biopsies.
Authors' methods:
A blinded crossover 2000-case study estimating clinical management concordance (identical diagnoses plus differences not affecting patient management). Each sample was assessed twice by four pathologists (once using light microscopy, once using digital pathology, the order randomly assigned and a 6-week gap between viewings). Random-effects logistic regression models, including crossed random-effects terms for case and pathologist, estimated percentage clinical management concordance. Findings were interpreted with reference to 98.3% concordance (Azam AS, Miligy IM, Kimani PKU, Maqbool H, Hewitt K, Rajpoot NM, Snead DRJ. Diagnostic concordance and discordance in digital pathology: a systematic review and meta-analysis. J Clin Pathol 2021;74:448–55. https://doi.org/10.1136/jclinpath-2020-206764). Sixteen consultant pathologists, four for each specialty, from six National Health Service laboratories. Experience ranged from 3 to 35 years. Some were early adopters of digital pathology, but the majority were new to digital pathology. Eight viewings per sample (four pathologists with light microscopy and with digital pathology), culminating in a consensus ground truth, enabling measurement of agreement within and between readers. Samples enrolled reflected routine practice, included cancer screening biopsies, and were enriched for areas of difficulty [e.g. dysplasia (7, 10, 11)]. State-of-the-art digital pathology equipment designed for diagnosis, and holding either Conformité Européene or Food and Drug Administration approval, was used. Cytology cases and specialty groups outside those tested were not examined. The study used two digital pathology scanning systems. Other systems available on the market were not tested. The main study was a multicentre validation comparison study, with a blinded crossover design measuring intraobserver variability of pathologists’ diagnoses of histopathology samples using LM and DP and interobserver variability measuring pathologists’ diagnoses on LM and DP against consensus ground truth (GT). Pathologists recorded confidence of diagnoses made on a seven-point Likert scale, and recorded the time taken to report the cases. A questionnaire survey was undertaken examining the viewpoints of a range of pathologists and laboratory technicians at the start and during the course of the study. Eye tracking of pathologists was undertaken on a subset of study cases examining the technique used by different pathologists.
Details
Project Status:
Completed
URL for project:
https://www.journalslibrary.nihr.ac.uk/programmes/hta/17/84/07
Year Published:
2025
URL for published report:
https://www.journalslibrary.nihr.ac.uk/hta/SPLK4325
URL for additional information:
English
English language abstract:
An English language summary is available
Publication Type:
Full HTA
Country:
England, United Kingdom
DOI:
10.3310/SPLK4325
MeSH Terms
- Pathology, Clinical
- Microscopy
- Image Interpretation, Computer-Assisted
- Digital Technology
- Pathologists
- Breast Neoplasms
- Gastrointestinal Neoplasms
- Colorectal Neoplasms
- Skin Neoplasms
- Kidney Neoplasms
Contact
Organisation Name:
NIHR Health Technology Assessment programme
Contact Address:
NIHR Journals Library, National Institute for Health and Care Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK
Contact Name:
journals.library@nihr.ac.uk
Contact Email:
journals.library@nihr.ac.uk
This is a bibliographic record of a published health technology assessment from a member of INAHTA or other HTA producer. No evaluation of the quality of this assessment has been made for the HTA database.