Deep Learning to Detect Diabetic Retinopathy: Understanding the Implications

Deep Learning to Detect Diabetic Retinopathy:

Understanding the Implications

Ehsan Rahimy, MD, and Peter Karth, MD, MBA

The terms “artificial intelligence” and “machine learning” have generated significant buzz levels in the media. Earlier this year, DeepMind’s AlphaGo system defeated the world champion, South Korean Lee Sedol, 4 games to 1 in the board game Go (baduk). Beyond the world of pop culture, machine learning plays a significant role in technologies we take for granted or are not aware of: spam filters for email, speech recognition on a smartphone, language translation, search engine recommendations, internet advertising.

More recently, a surge in the use of machine learning within the medical field is taking place, with commercial entities emerging to help improve pathologic detection of cancer1 and interpretation of radiology images.2 The latest breakthrough in the field occurred on Nov. 29, when JAMA published online, Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs,3 which has received nearly 57,000 hits as of December 27. Google’s algorithm was able to interpret fundus photographs depicting various stages of diabetic retinopathy at least as accurately as a cohort of ophthalmologists. The potential ramifications of this study are considerable, not only from a global healthcare perspective but also locally in the United States, as this type of technology could revolutionize the way day-to-day ophthalmology is practiced. Below are five salient discussion points from the study.   

1. Understanding deep learning
Deep learning is a form of machine learning in which an algorithm is able to program itself from observing a large set of labeled examples, removing the need to specify rules explicitly.4 Drawing inspiration from the structure of the human mind, artificial neural networks analyze large datasets to discover underlying patterns. 
One fascinating aspect of deep learning is that “feature engineering” is not required.5 Instead of researchers hand-coding instructions to the algorithm on what a microaneurysm, hemorrhage, or neovascular frond looks like, they instead input an image labeled as “severe diabetic retinopathy” for example, and with enough labeled data, the computer eventually learns what that is. While it is possible that the algorithm independently comprehends the same classical features of diabetic retinopathy, it is also feasible that it has identified its own pattern recognition of disease beyond the scope of how human’s interpret and analyze it. Elucidating what exactly the machine “sees” is the subject of ongoing work.
2. Lots of data
Supervised learning with a deep learning neural network is dependent upon having a varied and large enough dataset to “train” itself. In the context of diabetic retinopathy, this study required access to tens of thousands of color fundus photographs from a diverse patient demographic (age, gender, and ethnicity) generated through various acquisition protocols (multiple clinical sites, different camera types, mydriatic/nonmydriatic image capture).
For the current study, 128,175 macula-centered fundus photographs were obtained from EyePACS in the United States and three eye hospitals in India (Aravind Eye Hospital, Sankara Nethralaya, and Narayana Nethralaya) among individuals presenting for diabetic retinopathy screening. Each image was graded three to seven times among 54 ophthalmologists, and nearly 10% of the images were randomly selected to be regraded by the same physicians to assess for intragrader reliability. Images were analyzed for the degree of diabetic retinopathy based on the International Clinical Diabetic Retinopathy scale: none, mild, mod, severe, or proliferative.6 Furthermore, referable diabetic macular edema (DME) was defined as hard exudates within 1 disc diameter of the fovea, which is a proxy for macular edema when stereoscopic views are not available.7 Once the human grading was completed, this development set was subsequently presented to the algorithm for training.
3. As good as an ophthalmologist
For the second portion of the study, the investigators utilized two sets of new images (EyePACS-1 set = 9,963 images, and Messidor-2 set = 1,748 images) in order to “test” the algorithm against a reference standard of board-certified ophthalmologists (eight in the first set, and seven in the second set). In these validation sets, when the algorithm was programmed for high sensitivity as would be employed for a screening protocol, it achieved 97.5% and 96.1% sensitivity and 93.4% and 93.9% specificity in each set, respectively. For comparison, guidelines for diabetic retinopathy screening initiatives recommend at least 80% sensitivity and specificity.
These results further demonstrated that the algorithm graded at least on par with the participating ophthalmologists. For example, for the EyePACS-1 validation set, the algorithm had an F-score (metric of combined sensitivity and specificity) of 0.95 (maximum possible = 1.00), which was slightly higher than the median F-score (0.91) of the eight board-certified ophthalmologists serving as the reference standard. Altogether, these findings suggest that automated diabetic retinopathy screening, even though in its nascent stages, may someday assist ophthalmologists in evaluating more patients in a more efficient manner.
4. Augmenting — not replacing — the physician
The initial reflex reaction to machine learning advances in medicine is often one of concern that artificial intelligence systems will eventually replace physicians. In actuality, this revolutionary technology has been developed to work in tandem and synergistically with clinicians. Given that diabetes is one of the fastest growing and leading causes of blindness worldwide, the investigators at Google identified this as an area of significant unmet need.
Potential benefits of a deep learning-based diabetic retinopathy-screening program would include:
  1. increasing efficiency and coverage of screening (i.e. an algorithm is programmed to withstand repetitive image processing, can work in parallel, and does not fatigue).
  2. reducing barriers to access in areas where an eye-care provider may not be present.
  3. providing earlier detection of referable diabetic eye disease.
  4. decreasing overall healthcare costs through earlier intervention of treatable disease rather than resorting to more costly interventions in the more advanced phases of pathology. The introduction of such programs is likely to increase, not decrease, the volume of diabetic eye referrals into the ophthalmologist’s office as a result of capturing a greater portion of the afflicted patient population with diabetic retinopathy that is currently not even receiving recommended screening measures.
5. The road ahead
While publication of this study has provided reasons for optimism, much work still remains. Several limitations from the study design warrant mention and serve as areas for future improvement. Notably, the authors used the majority decision among the panel of seven to eight ophthalmologists as the gold standard to validate the algorithm. Future iterations may focus on refining the gold standard and the criteria by which it is defined (i.e. ascribing weighted scores to more accurate graders). Next, this algorithm was tested only for the detection of diabetic-related eye disease, and not other conditions affecting the posterior segment, such as age-related macular degeneration and glaucoma. While both diseases are logical next targets to train the deep learning algorithm with, they each present logistical challenges if labeling is limited to nonstereoscopic fundus images. Along these lines, the 2D interpretation of diabetic macular edema (DME) in this study may not be clinically optimal. Moving forward, applying deep learning to ancillary imaging modalities, such as optical coherence tomography (OCT), may significantly improve the true detection of DME, and is currently being investigated by DeepMind and Moorfields Eye Hospital in the United Kingdom.
Looking further into the future, this technology offers promise in helping to solve a number of our overburdened healthcare system’s growing problems. As of now, Google’s algorithm has been successfully trained to diagnose and grade one disease well. However, if we can obtain enough images in sequence from the same sets of patients over an extended period of time (i.e. years), could we then start to infer patterns of disease progression, and potentially make predictions from them? If those images could then be tied in with systemic data points (i.e. blood pressure, hemoglobin A1c, renal function, and so on) from the corresponding patients, could we infer the risk of systemic morbidity/mortality from a single fundus photograph? In this emerging world of precision medicine, we may one day be able to tailor treatments and intervention to those at highest risk of disease progression at an earlier state. For example, diabetic retinopathy could potentially be reclassified along a scale where a numeric grade denotes a patient’s risk of developing DME or progressing to proliferative disease.  
Given the ongoing advances in deep learning as a field and its applications in medicine, this study is likely the first of many to follow. Because of it, though, the fundamental question has changed. Before,, it was a matter of: should we, could we, can we? Now that we know it is possible, we need to start asking ourselves, how do we make this work for our specialty and our patients so we can provide them the best care possible? RP
Author information
Ehsan Rahimy is a vitreoretinal specialist at the Palo Alto Medical Foundation. He can be reached, or SFretina on Twitter.
Peter A. Karth is a vitreoretinal specialist at Oregon Eye Consultants and serves as adjunct faculty at Stanford University. He can be reached at or PeterKarthMD on Twitter. Both Dr. Rahimy and Dr. Karth serve as consultants on the DeepMind project.
1. PathAI. Accessed December 19, 2016.
2. Enlitic. Accessed December 19, 2016.
3. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410.
4. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444.
5. Mookiah MRK, Acharya UR, Chua CK, Lim CM, Ng EYK, Laude A. Computer-aided diagnosis of diabetic retinopathy: a review. Comput Biol Med. 2013;43(12):2136-2155.
6. American Academy of Ophthalmology. International Clinical Diabetic Retinopathy Disease Severity Scale Detailed Table. Accessed December 19, 2016.
7. Bresnick GH, Mukamel DB, Dickinson JC, Cole DR. A screening approach to the surveillance of patients with diabetes for the presence of vision-threatening retinopathy. Ophthalmology. 2000;107(1):19-24.