Developing effective representations from unlabeled time-series data, considering their temporal complexities, poses a significant challenge. In this paper, we propose an unsupervised Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC), to learn time-series representation from unlabeled data. First, the raw timeseries data are transformed into two different yet correlated views by using weak and strong augmentations. Second, we propose a novel temporal contrasting module to learn robust temporal representations by designing a tough cross-view prediction task. Last, to further learn discriminative representations, we propose a contextual contrasting module built upon the contexts from the temporal contrasting module.
Time-series data is increasingly collected through IoT devices and wearables for a variety of applications, such as in healthcare and manufacturing. However, due to the complexity and the lack of recognizable patterns in time-series data, it is often difficult and resource-intensive to label this data. As deep learning models typically require extensive labeled data for training, this presents a significant challenge when applying these models to real-world time-series data with limited labeling.
Self-supervised learning has emerged as a promising approach to extract useful representations from unlabeled data. These representations can achieve performance comparable to supervised models when used for downstream tasks, even with a smaller amount of labeled data. Various self-supervised methods have been developed, relying on different pretext tasks to train models and learn representations. However, these tasks can be limiting in terms of the generality of the learned representations.
Contrastive learning, in particular, has demonstrated its effectiveness in the computer vision domain by learning invariant representations from augmented data. This method maximizes the similarity of different views of the same sample while minimizing the similarity with views from different samples. Despite the success of contrastive learning in images, directly applying these methods to time-series data is not straightforward due to the unique properties of time-series, such as temporal dependencies, and the lack of suitability of certain image augmentation techniques for time-series data.
To address these challenges, the authors propose a novel framework called Time-Series Representation Learning via Temporal and Contextual Contrasting (TS-TCC). The TS-TCC framework introduces a method for unsupervised learning of time-series representations from unlabeled data. It employs data augmentation techniques tailored to time-series data and incorporates a temporal contrasting module to learn robust temporal features. Additionally, a contextual contrasting module is introduced to further enhance the discriminative power of the learned representations. The framework's effectiveness is demonstrated through extensive experiments on real-world time-series datasets, showing that the learned representations perform well in both supervised and semi-supervised learning scenarios, as well as in transfer learning settings.
Propose 2 separate Augmentations:
The strong augmentation generates
Then we use Transformer as the autoregressive model. This is its architecture:
The MLP block is composed of two fully-connected layers with a
non-linearity ReLU function and dropout in between. Pre-norm residual
connections, which can produce more stable gradients, are adopted in our
Transformer. We stack
Next, we attach the context vector into the features vector
Finally, we re-attach the context vector from the final output such
that
This part aims to learn more discriminative representations.
It starts with applying a non-linear transformation to the contexts using a non-linear projection head.
Given a batch of
For a context
Meanwhile, the remaining
where
The overall self-supervised loss is the combination of the two temporal contrasting losses and the contextual contrasting loss as follows.
where
The downstream task of all experiments is classification
Dataset | Train | Test | Length | Channel | Class |
---|---|---|---|---|---|
HAR | 7352 | 2947 | 128 | 9 | 6 |
Sleep-EDF | 25612 | 8910 | 3000 | 1 | 5 |
Epilepsy | 9200 | 2300 | 178 | 1 | 2 |
FD | 8184 | 2728 | 5120 | 1 | 3 |
The first three datasets are used to check performance, FD dataset was used to investigate the transferability of TS-TCC learned features .
Random init: training a linear classifier on top of randomly initialized encoder
Supervised: trained with same architecture (encoder & classifier) with labeled data
SSL-ECG
CPC
SimCLR
TS-TCC fine-tuning achieves significantly better performance than supervised training with only 1% of labeled data. For example, TS-TCC fine-tuning can still achieve around 70% and 90% for HAR and Epilepsy datasets respectively.
The proposed cross-view prediction task generates robust features and thus improves the performance by more than 5% on HAR datasets, and 1% on Sleep-EDF and Epilepsy datasets.
Additionally, the contextual contrasting module further improves the performance, as it helps the features to be more discriminative.
Studying the augmentations effect, we find that generating different views from the same augmentation type is not helpful with HAR and Sleep-EDF datasets. Epilepsy dataset can achieve comparable performance with only one augmentation.
The proposed TS-TCC framework first creates two views for each sample by applying strong and weak augmentations. Then the temporal contrasting module learns robust temporal features by applying a tough cross-view prediction task. We further propose a contextual contrasting module to learn discriminative features upon the learned robust representations.
The experiments show that a linear classifier trained on top the features learned by our TS-TCC performs comparably with supervised training. In addition, our proposed TS-TCC shows high efficiency on few-labeled data and transfer learning scenarios, e.g., our TS-TCC by using only 10% of the labeled data can achieve close performance to the supervised training with full labeled data.
Notes — Jan 21, 2024
Made with ❤ and at Earth.