Glaucoma diagnosis
automation

Glaucoma diagnosis automation

Glaucoma, a leading cause of irreversible blindness worldwide, involves optic nerve damage. The optic nerve undergoes structural and functional changes, resulting in a gradual, often unnoticed, loss of vision. These changes are primarily attributed to elevated intraocular pressure (IOP), stressing the optic nerve fibers, impairing their functioning, and causing sudden degeneration in the case of closed-angle glaucoma, or a more progressive loss of vision in the case of open-angle glaucoma.

The clinical diagnosis of glaucoma relies on qualitative and quantitative assessments, such as visual acuity, visual field testing, and optic nerve head evaluation using imaging techniques like optical coherence tomography (OCT). However, although these conventional diagnostic methods are helpful, they have inherent limitations, and early-stage glaucoma can often go undetected. To tackle these issues, the integration of Machine Learning (ML) approaches has emerged as a promising technique to aid in glaucoma detection.

Keep reading to learn more about the integration of Machine Learning approaches in glaucoma detection or download the full white paper.


Glaucoma, a group of progressive optic pathologies, is a significant global health concern characterized by damage to the optic nerve, which leads to irreversible vision loss when left undiagnosed and untreated. It affects over 80 million people globally, with the number projected to reach about 112 million by 2040. Glaucoma primarily affects older individuals and varies among ethnicities and geographical regions.

The disease’s silent and asymptomatic nature makes a timely diagnosis challenging, as individuals may remain unaware of their condition until it is too late and irreversible damage has been done. Traditional diagnostic techniques, such as intraocular pressure measurement and visual field testing, generate data with inherent limitations in terms of reliability.

Optical Coherence Tomography (OCT) and Visual Field Test (VFT) are the exams most relied upon when checking for glaucoma. The result of the OCT is a set of high-resolution images of different parts of the eye. These images are often summarized using metrics that are aggregated and presented in reports, relative to the specific part of the eye being evaluated.? As far as the VFT is concerned, the report contains a map of the eye’s blind spots as well as other measurements that compare the state of the eye with itself in the past (follow-up), as well as with the general population. It is worth noting that the VFT is prone to errors since it is somewhat subjective and exposed to many outside factors that can influence the results directly.


Example of the VFT and RNFL exams, highlighting the most relevant features


Advancements in ML techniques have shown promising potential for improving glaucoma detection and monitoring. By leveraging large datasets, ML algorithms can extract intricate patterns and features from clinical data, enabling more accurate and efficient identification of possible cases of glaucoma.

With this work, we hope to contribute to the development of robust and reliable ML solutions that can aid clinicians in the early detection, monitoring, and management of glaucoma. Ultimately, integrating ML techniques into clinical practice could revolutionize glaucoma detection, leading to better outcomes regarding the patients’ ocular health in the future.


Relevant works

The use of ML for glaucoma diagnosis in the literature has been gaining popularity in the last few decades, following two main approaches: one that uses features already extracted from medical exams in a tabular approach, while the other uses high-quality medical scans of the eye and Deep Learning (DL) to derive its own features. The feature engineering and the tabular nature of the data used in the first approach usually lead to more explainable models from which we can extract relevant insights. On the other hand, DL often leads to better classification results at the cost of interpretability.


Methods

Data sources and ingestion

This paper is a collaboration between Altice Labs and the Centro Cirúrgico de Coimbra (CCC), which provided different kinds of exams for healthy, suspect, and glaucomatous eyes.

Besides the anonymized exams’ data, a diagnosis was also provided, indicating non-glaucoma, suspect, or glaucoma cases. Information regarding additional pathologies was included to ensure the model distinguishes eyes with glaucoma vs. non-glaucoma, instead of healthy vs. unhealthy.

Utilizing OCR technology, specifically Tesseract, scripts were tailored to extract relevant information from the image-formatted exams. The approach integrated tabular and classic ML methods to process data.

Features and dataset creation

The dataset development involved features directly extracted from the exams, as well as features derived from them. The final feature selection process included:

  • Feedback from CCC with insights on exam reliability and priority features to look at when making a diagnosis;
  • Correlation studies among the features and between the features and the target, along with statistical analysis;
  • The results of the feature importance assessments based on Principal Component Analysis (PCA).


Training

ML models

In our experiments, we relied mostly on classic ML models such as Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM), with the most out of the ordinary models being the Light Gradient Boosting Model (LGBM) and the Explainable Boosting Machine (EBM) from the InterpretML library. While LR, RF, and SVM commonly occur in literature, LGBM was chosen for its state-of-the-art performance on classification tasks with tabular datasets. The EBM was chosen because of its explainability, although our explainability tool (ExplainerDashboard) was enough to satisfy our needs in a model-agnostic manner.? All this considered, LGBM was selected as the final model to apply.

Setup

In preparing our machine learning model, we addressed missing values within the dataset. These missing values resulted from either the absence from the corresponding exam or the absence of the whole exam itself. To manage this, we opted to exclude cases where an entire exam was missing and used the KNN imputer method to handle missing values by imputing them based on the weighted average of nearest neighbors. This imputer was fitted to avoid information leakage.

The resulting class distribution of the number of eyes was the following:

  • Non-Glaucoma: 623 (64%)
  • Suspect: 153 (15.7%)
  • Glaucoma: 198 (20.3%)


Class distribution of the full dataset


We divided this dataset into train/test and validation subsets (70%/30%) ensuring a similar class distribution in both sets.

Results

We’ve incorporated diverse classification metrics, including Accuracy, Precision, Recall, F1-Score, and ROC-AUC, along with a thorough analysis of confusion matrices. Given the imbalance in class distribution, special importance was given to maximizing the F1-score metric, especially of the positive class (Glaucoma).

The F1-scores and confusion matrices from the final model suggest its ability to distinguish between eyes with glaucoma and those without while struggling with classifying “suspect” cases. This might be explained by the distributions of the feature values in the non-glaucoma and suspect classes, which tend to be more similar to each other than to the glaucoma class feature value distribution. Despite bringing up some classification difficulties, we still consider the inclusion of the suspect class to be the correct approach, since it provides valuable information in a real-life scenario.

Critical errors primarily involved misclassifying eyes with suspicion or actual glaucoma as non-glaucoma, potentially affecting patient care. However, other errors, while undesirable, still flagged non-glaucoma patients for monitoring.

Explainability

We relied on a tool called Explainerdashboard, a tool that generates visualizations that allow healthcare professionals to assess the relevance of each feature to the final result for a particular case. This tool allows the clinician to measure how the diagnosis would change if some particular values were altered.


Example of the use of Explainer dashboard to interpret the decision of one case in particular (an eye with glaucoma that was correctly classified as such in this example)


The Explainer dashboard uses primarily SHAP values to help visualize how each feature influences the model’s final decision. SHAP values indicate the magnitude and direction of a feature’s impact on the model’s prediction. For example, a large positive SHAP value means that a given feature has a significant impact on a positive diagnosis (glaucoma in this case), while a large negative value suggests a non-glaucoma diagnosis.


Analysis (SHAP) of the impact of each feature in the glaucoma diagnosis (ordered from most to least impactful)


After analyzing these visualizations, we arrived at conclusions corroborated by the medical community regarding glaucoma diagnosis. The visual analysis revealed the relevance of the RNFL exam, as its features consistently topped the feature importance charts across various iterations. In many cases, RNFL features alone could provide a reasonably accurate diagnosis without relying on other exams.


Authors

  • Rodrigo Ferreira
  • Rita Oliveira, Altice Labs
  • Maria Manuel Castro, Altice Labs
  • Luís Cortes?o, Altice Labs
  • António Travassos, Centro Cirúrgico de Coimbra
  • Ana Travassos, Centro Cirúrgico de Coimbra
  • Robert van Velze, Centro Cirúrgico de Coimbra


Keywords: Machine Learning, Data Science, Glaucoma



Contact us if you want to engage in a deeper discussion on this topic!

要查看或添加评论,请登录

Altice Labs的更多文章

社区洞察

其他会员也浏览了