ASBERT: ASR-Specific Self-Supervised Learning with Self-Training
Authors : Hyung Yong Kim, Byeong-Yeol Kim, Seung Woo Yoo, Youshin Lim, Yunkyu Lim, Hanbin Lee
Conference : SLT
Year Published : 2022
Topics : Speech Recognition

Abstract


Pre-training of self-supervised learning (SSL) generally shows a good performance on various speech processing tasks. However, this pre-training scheme may lead to a sub-optimal solution for fine-tuning a specific task, such as automatic speech recognition (ASR). In order to provide a more optimal pre-trained model for ASR, we introduce an ASR-Specific hidden-unit BERT with self-training, namely ASBERT. Motivated by self-training, we extract linguistic-related pseudo labels from the fine-tuned model, and these labels are used in the next pre-training procedure. Experimental results on LibriSpeech test-clean and test-other datasets show that ASBERT without language model (LM) outperforms the conventional SSL and self-training model, achieving a 6.3/2.0% and 15.4/13.2% relatively word error rate reduction (RERR). Moreover, without using pseudo-transcription, ASBERT yields comparable performance to the conventional self-training method.