ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Classification of cardiomegaly using Convolutional Neural Network

Anoop Singh

Staff Software Engineer at Twilio

å‘å¸ƒæ—¥æœŸ: 2018å¹´2æœˆ7æ—¥

+ å…³æ³¨

GitHub link

(Completed as Udacity capstone project as part of the Machine Learning Engineer Nanodegree program)

I. Definition

Project Overview

From Wikipedia:

Cardiomegaly is a medical condition in which the heart is enlarged. It is more commonly referred to as an enlarged heart. The causes of cardiomegaly may vary. Many times this condition results from high blood pressure (hypertension) or coronary artery disease. An enlarged heart may not pump blood effectively, resulting in congestive heart failure.

X-ray images help see the condition of the lungs and heart. If the heart is enlarged on an X-ray, other tests will usually be needed to find the cause. A useful measurement on X-ray is the cardio-thoracic ratio, which is the transverse diameter of the heart, compared with that of the thoracic cage." These diameters are taken from PA chest x-rays using the widest point of the chest and measuring as far as the lung pleura, not the lateral skin margins. If the cardiac thoracic ratio is greater than 50%, pathology is suspected, assuming the x-ray has been taken correctly. The measurement was first proposed in 1919 to screen military recruits. A newer approach to using these x-rays for evaluating heart health, takes the ratio of heart area to chest area and has been called the two-dimensional cardiothoracic ratio.

X-ray exams are the first step in diagnosing cardiomegaly in a patient. Once the x-rayâ€™s available, a radiologist looks at it and tries to diagnose the disease.

From GlobalDiagnostiX - Context: According to WHO figures, more than two thirds of the worldâ€™s population does not have access to this essential x-ray imaging equipment. Too often in developing countries, patients die of trivial problems, which, due to a lack of access to diagnosis, take dramatic proportions.

From Most of the World Doesn't Have Access to X-Rays: After the 2010 earthquake in Haiti, Mendel and Partners in Health stocked the University Hospital in Mirebalais (UHM) with a CT scannerâ€”the first in a public hospital in Haiti and the first to cost their patients nothing. But the hospital still doesnâ€™t have enough money to hire a radiologist to run the machine, Mendel says.

One solution is telemedicine. UHM uses a picture-archiving and communication system that sends CT scans to a server in Boston, which stores the images and creates an electronic medical record for volunteer radiologists in the U.S. and Canada to read. The volunteers log on twice a week to look over scans, which each take about ten minutes. In 2014, 40 volunteers read approximately 4,000 CT scans.

But telemedicine has limits, especially in an emergency. â€œBus accidents happen every day in Nepal,â€ Schwarz explains. â€œYou have literally 25 patients all at once, who are all bleeding. Thatâ€™s challenging enough. Youâ€™re certainly not waiting for someone in a different country or time zone to tell you what an x-ray shows.â€

With initiatives like GlobalDiagnostiX, thereâ€™s hope that low cost x-ray systems will be more readily available in underdeveloped countries as time progresses. Radiologist availability still remains limited and lives are lost while patients wait for diagnosis.

The ImageNet challenge has led to successful advances in the field of computer vision and the ability to use convolutional neural networks for image recognition tasks. Using transfer learning, there have been several instances where a successful ImageNet architectureâ€™s used and modified/re-trained to recognize images in a particular field with high accuracy.

Due to the availability of large datasets, progress in computer hardware and an active interest in using machine learning for medical diagnosis, researchers have demonstrated at-par or better performance when compared to medical professionals.

Radiologists will be 'obsolete' in five years led me to gain interest in this particular problem. If we can solve this problem, it can drastically bring down diagnosis time (particularly in developing countries) and potentially make diagnosis cheaper.

When compared to medical professionals, we may get better diagnosis as well due to:

the ability to train on tens and hundreds of thousands of images, to pick up intricate details
no overworked radiologists who have to occasionally screen over 100 x-rays a day, potentially leading to errors

Research Citations

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

Problem Statement

Develop an algorithm that can detect cardiomegaly from chest X-rays.

This is a binary classification problem: does the image exhibit Cardiomegaly or not?

Inputs for the problem: Images labeled as Cardiomegaly or No Finding, weâ€™ll ignore all the other features.

Outputs for the problem: label for the image (Cardiomegaly or No Finding)

Datasets and Inputs

CXR8 dataset provided by the National institute of Health Clinical Center

It has labels that diagnose a chest x-ray into the following 8 thoracic pathologies: At- electasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia and Pneumathorax. An image may be labeled as one or more of these pathologies. A No Finding label indicates that the image was not labeled as any of the 8 pathologies.

Data:

Image Index,Finding Labels,Follow-up #,Patient ID,Patient Age,Patient Gender,View Position,OriginalImage[Width,Height],OriginalImagePixelSpacing[x,y],

00000001_000.png,Cardiomegaly,0,1,58,M,PA,2682,2749,0.143,0.143

00000001_001.png,Cardiomegaly|Emphysema,1,1,58,M,PA,2894,2729,0.143,0.143

00000002_000.png,No Finding,0,2,81,M,PA,2500,2048,0.171,0.171

00000008_000.png,Cardiomegaly,0,8,69,F,PA,2048,2500,0.171,0.171 (for sample image)

Intended solution

As part of data pre-processing, we separate the images into two folders: cardiomegaly and no finding using the metadata in the dataset
This is further separated into train, valid and test. (Eg. train/cardiomegaly, valid/cardiomegaly and test/cardiomegaly â€¦)
Images are reduced to 224x224 from 1024x1024 to bring the dataset size down for faster and more economical processing
Using Keras and TensorFlow, we create our own CNN model and assess its performance
Using transfer learning, we train on ImageNet trained models like Inception, ResNet50, VGG19 and Xception, assessing their performance

We hope to settle on a model, that can do well on the task of binary classifying a chest X-ray image into Cardiomegaly or No Finding.

Metrics

Our dataset has 2,776 Cardiomegaly images and 60,361 No Finding images. This means, that a naive algorithm classifying everything as No Finding would give us an accuracy of 95.6%.

Since our data set will have a strong bias towards No Finding vs Cardiomegaly, F-1 score is a better metric than accuracy.

Weâ€™ll compare our own modelâ€™s results with observations from the benchmark (ImageNet trained models). Weâ€™ll be looking at:

F-1 score (we specify average as micro to take the label imbalance into account)
Precision score
Recall score

II. Analysis

Data Exploration

(See Datasets and Inputs section too)

How are the images structured?
PNG, 1024 x 1024, 400 - 500 KB
Are there color layers?
The images are in grayscale and should not need any color transformation before they can be used
What are the dimensions (or ranges of dimensions)?
1024 x 1024
How many examples are there in the dataset that you'll be using?
The dataset contains over 100,000 anonymized chest x-ray from more than 30,000 patients and takes 45.14 GB of space when compressed
Weâ€™ll be using every image
How many classes will there be (just two? or more?) and are they balanced?
Just two - Cardiomegaly and No Finding
are they balanced? - No, the dataset will be biased towards more images with the No Finding label
How will you split the data into training/validation/testing sets?
May start off with a 70 - 10 - 20 split and see how it goes
Will you do anything to maintain class balances across each subset?
Yes, will aim for a consistent ratio of images labeled Cardiomegaly to images labeled No Finding within each subset

Algorithms and Techniques

In this section, weâ€™ll go over the architecture for our own model and talk briefly about transfer learning using ImageNet trained models.

To solve the problem at hand, we employ a technique called Convolutional Neural Networks (CNN).

(reference link)

In a standard neural network, where neurons are connected with each other, signals are passed in a single direction only. This is known as forward-feed. Although successful, this required all the neurons to be connected to each other driving up complexity, especially for large datasets.

CNN was developed as a better alternative and involves four main steps: convolution, subsampling, activation and full connectedness.

Convolution

The first layers that receive an input signal are called convolution filters. Convolution is a process where the network tries to label the input signal by referring to what it has learned in the past. The resulting output signal is then passed on to the next layer.

Each convolution filter represents a feature of interest (e.g whiskers, fur), and the CNN algorithm learns which features comprise the resulting reference (i.e. cat). The output signal strength is not dependent on where the features are located, but simply whether the features are present.

Subsampling

Inputs from the convolution layer can be â€œsmoothenedâ€ to reduce the sensitivity of the filters to noise and variations. This smoothing process is called subsampling, and can be achieved by taking averages or taking the maximum over a sample of the signal.

Activation

The activation layer controls how the signal flows from one layer to the next, emulating how neurons are fired in our brain. Output signals which are strongly associated with past references would activate more neurons, enabling signals to be propagated more efficiently for identification. CNN is compatible with a wide variety of complex activation functions to model signal propagation, the most common function being the Rectified Linear Unit (ReLU), which is favored for its faster training speed.

Fully Connected

The last layers in the network are fully connected, meaning that neurons of preceding layers are connected to every neuron in subsequent layers. This mimics high level reasoning where all possible pathways from the input to output are considered.

Learning algorithms require feedback. This is done using a validation set where the CNN would make predictions and compare them with the true labels or ground truth. The predictions which errors are made are then fed backwards to the CNN to refine the weights learned, in a so called backwards pass. Formally, this algorithm is called backpropagation of errors, and it requires functions in the CNN to be differentiable (almost).