How to handle limited ground truth?
Recommendation by Szymon Adamski

How to handle limited ground truth?

1. Introduction to handling limited ground truth

In this post, I will talk about the problem of limited ground truth (GT), with particular emphasis on machine-learning algorithms in healthcare applications and how we approached this issue in one of our projects. In our context, GT is a manual (or semi-automated) segmentation that is delivered by a human rater – it is utilized to train the supervised learners which should learn the patterns (e.g., how to delineate a tumor) from such example data. The problem with limited GT appears when there is too little labeled data, which makes it very difficult to train the model that can effectively generalize over unseen data. 

A large amount of data is often necessary to train a deep neural network. For example, JFT-300M set used by Google, contains more than 300 million images with tags from 18291 categories [1]. With such extensive datasets, occasional labelling erros happen. However, in fields such as medicine, it is critical to reduce the risk of mistakes. For black-box models, it is also crucial to prove precision on reliable test data. Expert knowledge is required to produce high-quality GT (such as segmentation of organs, tumors, etc.), but preparation is usually expensive, difficult and time-consuming. Fortunately, to remedy this, data scientists have developed various methods.

2. Approaching the problem of limited ground truth

2.1 Transfer Learning

Perhaps the most common approach is to use Transfer Learning (TL). In simplest terms, it concerns a situation where we want to migrate knowledge reached in solving one problem and apply it to another, related problem. Fig. 1 shows a simplified version of the TL.

No alt text provided for this image

Fig 1. A diagram showing transfer learning technique [14]

One of the TL subcategories is domain adaptation. This method arises when the task remains the same, but there is a shift between the source data distribution and the target (but related) data distribution. For a more in-depth analysis, I recommend A Survey on Deep Transfer Learning [2] [3].

2.2 Data generation

The next approach is very intuitive. Since we lack data, so let's create it. This process is called the creation of synthetic data. As the name suggests, this data is artificially created. However, in order to extend the initial set with artificial data, they must retain coherence and have the same characteristics, while not being simple copies of the original ones. For the simplification of the explanation, I arbitrarily divided generating synthetic data into basic data augmentation and various approaches toward generating artificial data (GAD).

  • Data augmentation is a popular technique that helps improve generalization capabilities. These techniques generate synthetic training examples via Affine Image Transformations, Flip, Rotation, Translation, Scaling, Cropping and others. In the aspect of Brain-Tumor segmentation, our colleague Jakub Nalepa, a machine learning expert wrote a highly recognised review [9]. He is also a reviewer for this 2022 paper on the same subject [10].
  • The popular concept of generative adversarial networks (GAN) is well suited for the GAD challenge. GAN is a class of deep neural network architectures that is able to generate new data with the same characteristics as the training one. GAN consist of two neural networks, the generator and the discriminator, which contest with each other. A simplified version of the scheme is shown in Fig. 2. The generator is trained to produce fake data. The discriminator, in contrast, is trained to distinguish the generator’s fake data from real examples. If the discriminator recognizes data as fake, the generator is penalized. This is how the learning process happens, resulting in more and more plausible examples.
No alt text provided for this image

Fig 2. Standard GAN architecture [15]

This article describes GAN-based Synthetic Medical Image Augmentation [4]. In addition, one of our research projects is testing Enhanced CycleGAN to improve the quality of scans. This network architecture is used to perform a style transfer, which changes the style of an image in one group to the style of an image in another group. In our case, the characteristic features are transferred from higher quality medical images to lower quality ones.

To learn about the current state of synthetic data in medicine it is worth reading the commentary article [5].

2.3 Different paradigms

Practically from the very beginning of the rise of the machine learning discipline, the difficulty of preparing GT manually was identified. The researchers wondered whether it was necessary every time to prepare and show to the algorithm exactly what we expected for it to learn. Could be possible to design more clever, so that the algorithm does not need our supervision? This issue is one of the reasons why different concepts other than supervised learning (which is the most popular approach) are developing rapidly.

  • Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Thus, the model will use a much larger set than the one limited containing GT [7]. This method can only be applied if there exists some relationship to the underlying distribution of data. An example of the classification is shown in Fig 3.
No alt text provided for this image

Fig. 3 An example of semi-supervised learning. The top panel shows a decision boundary we might adopt after seeing only one positive (white circle) and one negative (black circle) example. The bottom panel shows a decision boundary we might adopt if we were given a collection of unlabeled data (gray circles).

  • Reinforcement learning (RL) is a very interesting concept, but it is beyond the scope of this post. GT has a slightly different role in the whole learning process. The autonomous agent (something which perceives the environment and may learn) takes actions in an environment in order to optimize reward. In helping to understand, I will use one of the most well-known examples: AlphaZero [16]. It is a program that has mastered to play chess shogi and go. AlphaZero was trained solely via "self-play", without any domain knowledge except the rules. In this case, it is difficult to think of GT in the classic sense of the term.
  • For more information read the summary of RL in Healthcare [6].

2.4 Medical solutions remark

Reducing the focus to only medical image segmentation, I highly recommend a detailed review of current data GT (annotations) limitations and how to deal with them [8].

It is worth noting that some of the above solutions (e.g. RL, generating artificial data) are rarely used in medicine. One of the reasons why these solutions are problematic is the process of certification of the model which includes verification of the final commercial product. Using GAD as an example, we train the model from synthetic data. It is our obligation to validate and present detailed characteristics of the training and test data, with the former being a bit challenging if it is synthesized. It may happen that a model, despite improved capabilities, cannot pass validation due to training on anatomically incorrectly generated data. In the following cases, for legal reasons, it is difficult to go through the procedure. And certification is necessary to sell the product as a medical device.

3. Case study: Segmenting Optic pathway gliomas (OPGs)

In the project that we did together with Children's Memorial Health Institute Department of Diagnostic Imaging, we operated on only 22 multi-modal MRI brain scans with OPG manual segmentations. These heterogeneous low-grade neoplasms affect mostly children. We have obtained fairly accurate OPG segmentations while still maintaining high generalization capabilities to low-/high-grade gliomas. Here, we followed the multi-fold cross-validation procedure to quantify the capabilities of the deep models, and the OPG dataset was split into four non-overlapping folds at the patient level with stratification reflecting the distribution of the whole-tumor volume in OPG dataset, and each fold is treated as the unseen test set exactly once. Next, we decided to use the 369 MRIs from BraTS data for a related task LGG/HGG segmentation. We considered that extending the training set with these data improves the network generalization. Finally, we introduced two training strategies pre-training (PT) and TL for nnU-Nets [13]. The nnU-Net is a deep learning segmentation method that automatically adapts the U-net architecture, and configures pre-processing, post-processing and training hyper-parameters based on the characteristics of the training data and target segmentation problem. Our approaches were presented in this paper: "Segmenting pediatric optic pathway gliomas from MRI using deep learning" [11].

In the PT, we exploit the larger source data (BraTS dataset) of similar characteristics to avoid a significant domain shift. We train on this data and then the entire pre-trained nnU-Net (meaning all its trainable weights) is adapted in the training process which is performed over the OPG training MRIs.

In the TL, similarly to the PT strategy, the nnU-Net is pre-trained over the source data (BraTS dataset). Afterwards, the architecture is fine-tuned over the target OPG training data. A graphic summary of the method is presented in Fig. 4.

No alt text provided for this image

Fig. 4. Two training strategies. It comes from our paper [11]

3. Summary

I hope I have made you curious and at least slightly familiar with such a broad topic. Moreover, a data-centric approach is becoming more and more mainstream (as we have already mentioned [12]). This approach recommended by one of the AI pioneers, Andrew Ng, involves exploiting often small datasets and obtaining models with high efficiency, accuracy and bias. As the name suggests, this paradigm focuses on the appropriate preparation of data during the development of an AI system. Tools such as synthetic data mentioned above are very useful for this goal, which makes it even more valuable to explore the limited ground truth issue.

Recommendation by Szymon Adamski


[1] Google AI Blog. ‘Revisiting the Unreasonable Effectiveness of Data’. Accessed 22 April 2022. https://ai.googleblog.com/2017/07/revisiting-unreasonable-effectiveness.html.

[2] Tan, Chuanqi, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. ‘A Survey on Deep Transfer Learning’. ArXiv:1808.01974 [Cs, Stat], 6 August 2018. https://arxiv.org/abs/1808.01974.

[3] Jin, Zhixiong, Jiwon Kim, Hwasoo Yeo, and Seongjin Choi. ‘Transformer-Based Map Matching Model with Limited Ground-Truth Data Using Transfer-Learning Approach’. ArXiv:2108.00439 [Cs], 7 October 2021. https://arxiv.org/abs/2108.00439.

[4] Frid-Adar, Maayan, Idit Diamant, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. ‘GAN-Based Synthetic Medical Image Augmentation for Increased CNN Performance in Liver Lesion Classification’. Neurocomputing 321 (December 2018): 321–31. https://doi.org/10.1016/j.neucom.2018.09.013.

[5] Chen, Richard J., Ming Y. Lu, Tiffany Y. Chen, Drew F. K. Williamson, and Faisal Mahmood. ‘Synthetic Data in Machine Learning for Medicine and Healthcare’. Nature Biomedical Engineering 5, no. 6 (June 2021): 493–97. https://doi.org/10.1038/s41551-021-00751-8.

[6] Yu, Chao, Jiming Liu, and Shamim Nemati. ‘Reinforcement Learning in Healthcare: A Survey’. ArXiv:1908.08796 [Cs], 24 April 2020. https://arxiv.org/abs/1908.08796.

[7] Yang, Xiangli, Zixing Song, Irwin King, and Zenglin Xu. ‘A Survey on Deep Semi-Supervised Learning’. ArXiv:2103.00550 [Cs], 22 August 2021. https://arxiv.org/abs/2103.00550.

[8] Tajbakhsh, Nima, Laura Jeyaseelan, Qian Li, Jeffrey Chiang, Zhihao Wu, and Xiaowei Ding. ‘Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation’. ArXiv:1908.10454 [Cs, Eess], 11 February 2020. https://arxiv.org/abs/1908.10454.

[9] Nalepa, Jakub, Michal Marcinkiewicz, and Michal Kawulok. ‘Data Augmentation for Brain-Tumor Segmentation: A Review’. Frontiers in Computational Neuroscience 13 (2019). https://www.frontiersin.org/article/10.3389/fncom.2019.00083.

[10] Zhang, Chunling, Nan Bao, Hang Sun, Hong Li, Jing Li, Wei Qian, and Shi Zhou. ‘A Deep Learning Image Data Augmentation Method for Single Tumor Segmentation’. Frontiers in Oncology 12 (2022). https://www.frontiersin.org/article/10.3389/fonc.2022.782988.

[11] Nalepa, Jakub, Szymon Adamski, Krzysztof Kotowski, Sylwia Chelstowska, Magdalena Machnikowska-Sokolowska, Oskar Bozek, Agata Wisz, and Elzbieta Jurkiewicz. ‘Segmenting Pediatric Optic Pathway Gliomas from MRI Using Deep Learning’. Computers in Biology and Medicine 142 (1 March 2022): 105237. https://doi.org/10.1016/j.compbiomed.2022.105237.

[12] https://www.dhirubhai.net/pulse/gli-recommendations-graylight-imaging/?trk=organization-update-content_share-article

[13] Isensee, Fabian, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. ‘NnU-Net: A Self-Configuring Method for Deep Learning-Based Biomedical Image Segmentation’. Nature Methods 18, no. 2 (February 2021): 203–11. https://doi.org/10.1038/s41592-020-01008-z.

[14] ‘An Introduction to Transfer Learning | by Azin Asgarian | Georgian Impact Blog | Medium’. Accessed 22 April 2022. https://medium.com/georgian-impact-blog/transfer-learning-part-1-ed0c174ad6e7.

[15] ?ngün, Cihan, and Alptekin Temizel. ‘Paired 3D Model Generation with Conditional Generative Adversarial Networks’. ArXiv:1808.03082 [Cs], 15 March 2019. https://arxiv.org/abs/1808.03082.

[16] Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, et al. ‘Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm’. ArXiv:1712.01815 [Cs], 5 December 2017. https://arxiv.org/abs/1712.01815.

要查看或添加评论,请登录

Graylight Imaging的更多文章

社区洞察

其他会员也浏览了