Boosting Unknown-number Speaker Separation With Transformer Decoder-based Attractor
Speech Separation, ICASSP, 2024
Voxtlm: Unified Decoder-only Models for Consolidating Speech Recognition/Synthesis and Speech/Text Continuation Tasks
Speech Recognition, Synthesis, ICASSP, 2024
Learning Contextualized Representation On Discrete Space Via Hierarchical Product Quantization
Speech Recognition, ICASSP, 2024
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Speech Separation, IEEE/ACM TASLP, 2023
That's What Said: Fully-Controllable Talking Face Generation
Computer Vision, Pattern Recognition, ACM/MM, 2023
Luminance-aware Color Transform for Multiple Exposure Correction
Computer Vision, Ehancement, ICCV, 2023
SlaBins: Fisheye Depth Estimation using Slanted Bins on Road Environments
Computer Vision, 3D, ICCV, 2023
SpeedFormer: Learning Speed Profiles with Upper and Lower Boundary Constraints Based on Transformer
Motion Planning, IROS, 2023
Factspeech: Speaking a Foreign Language Pronunciation Using Only Your Native Characters
Speech Synthesis, Interspeech, 2023
MiLO: Multi-task Learning with Localization Ambiguity Suppression for Occupancy Prediction
Computer Vision, 3D, CVPRW, End-to-end autonomous driving, 2023
RUFI: Reducing Uncertainty in behavior prediction with Future Information
Machine Learning, Motion Prediction, CVPRW, Vision-Centric Autonomous Driving, 2023
BAAM: Monocular 3D pose and shape reconstruction with bi-contextual attention module and attention-guided modeling
Computer Vision, 3D, CVPR, 2023
Masked Token Similarity Transfer for Compressing Transformer-Based ASR Models
Speech Recognition, ICASSP, 2023
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Speech Synthesis, Cross-Lingual, ICASSP, 2023
Metric Learning for User-defined Keyword Spotting
Keyword Spotting, ICASSP, 2023
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
Speech Enhancement, ICASSP, 2023
Joint unsupervised and supervised learning for context-aware language identification
Language Identification, ICASSP, 2023
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Speaker Separation, ICASSP, 2023
ASBERT: ASR-Specific Self-Supervised Learning with Self-Training
Speech Recognition, SLT, 2022
An Empirical Study of Training Mixture Generation Strategies on Speech Separation: Dynamic Mixing and Augmentation
Speech Separation, APSIPA, 2022
Self-supervised surround-view depth estimation with volumetric feature fusion
Computer Vision, Depth Estimation, NeurIPS, 2022
Character decomposition to resolve class imbalance problem in Hangul OCR
Computer Vision, OCR, ECCVW, TiE: Text in Everything, 2022
Eigenlanes: Data-driven lane descriptors for structurally diverse lanes
Computer Vision, Lane Detection, CVPR, 2022
Harmonious semantic line detection via maximal weight clique selection
Computer Vision, CVPR, 2021
Instance-level future motion estimation in a single image based on ordinal regression
Computer Vision, ICCV, 2019
Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation
Computer Vision, Domain Adaptation, ICCV, 2019
Anchor Loss: Modulating Loss Scale based on Prediction Difficulty
Computer Vision, Machine Learning, ICCV, 2019