🌑

Mark-00z Puchen Zheng

TS-TCC

Time-Series Representation Learning via Temporal and Contextual Contrasting. IJCAI, 2021

Abstract

Developing effective representations from unlabeled time-series data, considering their temporal complexities, poses a significant challenge. In this paper, we propose an unsupervised Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC), to learn time-series representation from unlabeled data. First, the raw timeseries data are transformed into two different yet correlated views by using weak and strong augmentations. Second, we propose a novel temporal contrasting module to learn robust temporal representations by designing a tough cross-view prediction task. Last, to further learn discriminative representations, we propose a contextual contrasting module built upon the contexts from the temporal contrasting module.

Introduction

Time-series data is increasingly collected through IoT devices and wearables for a variety of applications, such as in healthcare and manufacturing. However, due to the complexity and the lack of recognizable patterns in time-series data, it is often difficult and resource-intensive to label this data. As deep learning models typically require extensive labeled data for training, this presents a significant challenge when applying these models to real-world time-series data with limited labeling.

Self-supervised learning has emerged as a promising approach to extract useful representations from unlabeled data. These representations can achieve performance comparable to supervised models when used for downstream tasks, even with a smaller amount of labeled data. Various self-supervised methods have been developed, relying on different pretext tasks to train models and learn representations. However, these tasks can be limiting in terms of the generality of the learned representations.

Contrastive learning, in particular, has demonstrated its effectiveness in the computer vision domain by learning invariant representations from augmented data. This method maximizes the similarity of different views of the same sample while minimizing the similarity with views from different samples. Despite the success of contrastive learning in images, directly applying these methods to time-series data is not straightforward due to the unique properties of time-series, such as temporal dependencies, and the lack of suitability of certain image augmentation techniques for time-series data.

To address these challenges, the authors propose a novel framework called Time-Series Representation Learning via Temporal and Contextual Contrasting (TS-TCC). The TS-TCC framework introduces a method for unsupervised learning of time-series representations from unlabeled data. It employs data augmentation techniques tailored to time-series data and incorporates a temporal contrasting module to learn robust temporal features. Additionally, a contextual contrasting module is introduced to further enhance the discriminative power of the learned representations. The framework's effectiveness is demonstrated through extensive experiments on real-world time-series datasets, showing that the learned representations perform well in both supervised and semi-supervised learning scenarios, as well as in transfer learning settings.

Method

Time-Series Augmentation

Propose 2 separate Augmentations:

  • Weak: jitter-and-scale
  • Strong: permutation-and-jitter

Temporal Contrasting

The strong augmentation generates and the weak augmentation generates We propose a tough cross-view prediction task by using the context of the strong augmentation to predict the future timesteps of the weak augmentation and vice versa. The contrastive loss tries to minimize the dot product between the predicted representation and the true one of the same sample, while maximizing the dot product with the other samples within the minibatch. Accordingly, we calculate the two losses and ​ as follows:

Then we use Transformer as the autoregressive model. This is its architecture:

image-20240419230028407

The MLP block is composed of two fully-connected layers with a non-linearity ReLU function and dropout in between. Pre-norm residual connections, which can produce more stable gradients, are adopted in our Transformer. We stack identical layers to generate the final features. Inspired by BERT model, we add a token to the input whose state acts as a representative context vector in the output.

Next, we attach the context vector into the features vector such that the input features become , where the subscript 0 denotes being the input to the first layer. Next, we pass through Transformer layers as in the following equations:

Finally, we re-attach the context vector from the final output such that This context vector will be the input of the following contextual contrasting module.

Contextual Contrasting

This part aims to learn more discriminative representations.

It starts with applying a non-linear transformation to the contexts using a non-linear projection head.

Given a batch of input samples, we will have two contexts for each sample from its two augmented views, and thus have contexts.

For a context , we denote as the positive sample of that comes from the other augmented view of the same input, and hence, are considered to be a positive pair.

Meanwhile, the remaining contexts from other inputs within the same batch are considered as the negative samples of , which can form ​ negative pairs with its negative samples.

𝟙

where denotes the dot product between normalized and (i.e., cosine similarity), 𝟙 is an indicator function, evaluating to 1 iff and is a temperature parameter. Eq. defines the contextual contrasting loss function Given a context , we divide its similarity with its positive sample by its similarity with all the other samples, including the positive pair and negative pairs, to normalize the loss.

The overall self-supervised loss is the combination of the two temporal contrasting losses and the contextual contrasting loss as follows.

where and are fixed scalar hyperparameters denoting the relative weight of each loss.

pipeline-tstcc

Experiment

Datasets

The downstream task of all experiments is classification

Dataset Train Test Length Channel Class
HAR 7352 2947 128 9 6
Sleep-EDF 25612 8910 3000 1 5
Epilepsy 9200 2300 178 1 2
FD 8184 2728 5120 1 3

The first three datasets are used to check performance, FD dataset was used to investigate the transferability of TS-TCC learned features .

Models

  • Random init: training a linear classifier on top of randomly initialized encoder

  • Supervised: trained with same architecture (encoder & classifier) with labeled data

  • SSL-ECG

  • CPC

  • SimCLR

Results

image-20240419235806913

TS-TCC fine-tuning achieves significantly better performance than supervised training with only 1% of labeled data. For example, TS-TCC fine-tuning can still achieve around 70% and 90% for HAR and Epilepsy datasets respectively.

The proposed cross-view prediction task generates robust features and thus improves the performance by more than 5% on HAR datasets, and 1% on Sleep-EDF and Epilepsy datasets.

Additionally, the contextual contrasting module further improves the performance, as it helps the features to be more discriminative.

Studying the augmentations effect, we find that generating different views from the same augmentation type is not helpful with HAR and Sleep-EDF datasets. Epilepsy dataset can achieve comparable performance with only one augmentation.

Conclusion

The proposed TS-TCC framework first creates two views for each sample by applying strong and weak augmentations. Then the temporal contrasting module learns robust temporal features by applying a tough cross-view prediction task. We further propose a contextual contrasting module to learn discriminative features upon the learned robust representations.

The experiments show that a linear classifier trained on top the features learned by our TS-TCC performs comparably with supervised training. In addition, our proposed TS-TCC shows high efficiency on few-labeled data and transfer learning scenarios, e.g., our TS-TCC by using only 10% of the labeled data can achieve close performance to the supervised training with full labeled data.

— Jan 21, 2024

Search