Computer-based image analysis promising for ROP
No difference in diagnosis
Looking at 50 babies with plus disease, the researchers found that on average, there was no difference between the diagnosis by bedside ophthalmoscopy and imaging grading in the detection of plus disease.
They validated this reference standard diagnosis by sending the 100 of the images to eight international experts, who each had more than 10 years of clinical experience in retinopathy of prematurity and more than 5 publications on this condition. Five were retina specialists and three were pediatric ophthalmologists.
These eight experts disagreed with each other. The number of images they diagnosed as having plus disease ranged from 6 to 29 out of the 100.
But the references standard diagnosis fell in the middle of the spectrum of diagnoses from these eight experts. So the researchers used the reference standard diagnosis for each image as the gold standard with which to evaluate both the computer-based algorithm and individual clinicians’ diagnoses.
The mean weighted Cohen’s kappa for agreement of seven clinicians at bedside with the reference standard diagnosis was 0.49, with a range from 0.13 to 0.86 (where 1.0 is perfect agreement). The mean weighted kappa for agreement of three image graders with the reference standard diagnosis was 0.80 (0.68-0.91).
Based on these findings, they argue that a “single expert’s diagnosis” should not be the gold standard for diagnosis of plus disease. But consulting with multiple experts on every diagnosis is not feasible for most clinicians, said Dr. Campbell.
So Dr. Campbell and his colleagues compared a computer algorithm to the reference standard diagnosis. They designed the algorithm to identify vascular features that classify normal, pre-plus, and plus disease, using 11 measurements of dilation and tortuosity.
On a set of 73 images, they compared the results of the same eight international experts and the computer algorithm to the reference standard diagnosis. They found that the experts agreed with the reference standard diagnosis 79% to 99% of the time, with a mean of 97%.