Self-supervised learning

Self-Supervised learning is a part of unsupervised learning in which the representation is learned through a supervised task using pseudo-labels. The pseudo-labels refer to labels generated automatically by the computer, and the representation of data corresponds to the features learned by the self-supervised model (seen as a features extractor). There are two types of tasks in a self-supervised setting :

  • Pretext task: the classification task that uses the pseudo-labels in training. That gives us at the end our representation function.
  • Downstream task: the task that will reuse the representation given by the pretext task to solve a particular real ML problem. 

Self-supervised learning has been very successful in recent years, especially in visual features learning: images and videos representations. There are various approaches to generate pseudo-labels for image-related tasks, namely [Jing and Tian, 2020]:

  • Learn a context similarity: This is ensured, for example, by including a clustering task that generates pseudo-labels for the pretext task.
  • Learning a spatial structure: One of the famous techniques is called puzzling. It consists of dividing the image into small patches that are fed to the network in an incorrect order. The network is asked to learn the correct relative order of patches.
  • Exploring free spatial labels: this includes making use of the contour information, depth, etc.
  • Learning a temporal structure: For video data, we can train a classifier to decide if video frames are in the correct order or not or eventually to recognize the correct order.

Unfortunately, the application of self-supervised learning on time series data is scarce. One can cite, for example, the built of a discriminative model by [Franceschi et al. 2019] that helps distinguish positive samples from negative samples using a fully unsupervised triplet loss. The lack of rich literature on self-supervision for time series is mainly due to the fact time series is still an emerging field in deep learning applications. Besides, generating pseudo-labels for data such as videos or images seems to be more straightforward unlike general time series (electricity consumption, financial indicators,... ).

During my master's project at EDF Lab, I tried to build a simple pretext task for electricity consumption time series. The pretext task consists of distinguishing the original intact time series from a damaged one. The damage consists of adding a relatively high Gaussian noise to a selected random region of the time series. We have thus a binary classification model. The main assumption was that by distinguishing the original time series from the damaged one, we will learn a latent representation that captures well the general explanatory factors of the input. The efficiency of the representation was tested on two real classification tasks (the downstream tasks): Presence/absence of electric domestic hot water and of electric heating. The results showed to be very competitive to classification models trained end-to-end on the original time series and the computational cost was remarkably cheaper. Sadly, I can't give you more concrete details about the design and results of the experiments for confidentiality reasons.

I hope this article was insightful and I would be very happy to exchange with the curious!

[Franceschi et al., 2019] Franceschi, J.-Y., Dieuleveut, A., and Jaggi, M. (2019). Unsupervised scalable representation learning for multivariate time series. In Advances in Neural Information Processing Systems, pages 4652–4663.

[Jing and Tian, 2020] Jing, L. and Tian, Y. (2020). Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了