Publication
Joint Unsupervised and Supervised Learning for Context-aware Language Identification
2023.04.19

2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)에서 발표 예정인 박진석, 김형용, 박지환, 김병열, 최석재, 임윤규 저자의 “Joint unsupervised and supervised learning for context-aware language identification” 논문을 소개합니다. ICASSP는 음향, 음성 및 신호 처리 분야의 top-tier 국제 학회로 음성 신호처리 분야의 연구자들이 최신 기술과 연구 결과를 공유하고 있습니다.


Conference


• The 48th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) will be held in Rhodes Island, Greece, from June 4 to June 10, 2023, at the Rodos Palace Luxury Convention Center (https://2023.ieeeicassp.org/).

• The paper “Joint unsupervised and supervised learning for context-aware language identification” written by Jinseok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim, has been accepted by the ICASSP 2023.

• Click the link below for details.

https://arxiv.org/abs/2303.16511



Publication


• Title: Joint unsupervised and supervised learning for context-aware language identification

• Authors: Jinseok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim

• Abstract: Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. The proposed method learns the context of speech through masked language modeling (MLM) loss and simultaneously trains to determine the language of the utterance with supervised learning loss. The proposed joint learning was found to reduce the error rate by 15.6% compared to the same structure model trained by supervised-only learning on a subset of the VoxLingua107 dataset consisting of sub-three-second utterances in 11 languages.


Jinseok Park | Speech



I’m in charge of developing multilingual speech recognition models for conversational Intelligence.

42dot LLM 1.3B
Tech
2024.05.25
42dot at CES 2024: Software-Defined Vehicle Technology
Tech
2024.05.25
영지식 증명과 블록체인 그리고 SDV, 모빌리티
Tech
2024.05.25
Team 42dot Wins 2nd Place in the Autonomous Driving Challenge at CVPR 2023
Tech
2024.05.25
AWS IoT Core Resource Deployment via CDK
Tech
2024.05.25
ML Data Platform for Continuous Learning
Tech
2024.05.25
속도와 보안이 강화된 OTA 업데이트
Tech
2024.05.25
Self-Supervised Surround-View Depth Estimation with Volumetric Feature Fusion
Publication
2024.05.25
Foros : 자동차에 합의 알고리즘을?
Tech
2024.05.25
42dot MCMOT(Multi-Camera Multi-Object Tracking) 챌린지
Tech
2024.05.25
42dot이 그리는 미래 모빌리티 세상
Insight
2024.05.25