I am Mohan Shi, a second-year Ph.D. student in Electrical and Computer Engineering at University of California, Los Angeles (UCLA), advised by Prof. Abeer Alwan. I received my master degree at the University of Science and Technology of China (USTC) and worked under the guidance of Prof. Li-Rong Dai for three years. My research interests span a variety of domains in the world of speech processing:
- Automatic Speech Recognition
- Speech-centric Large Language Models
- Child/Low-resource Speech Processing
- Speech Tokenization
- Cocktail Party Problems
My summer intern paper at Microsoft is now on arXiv! We use Phi-4-Multimodal for joint ASR and speaker diarization, achieving SOTA on both short and long audio.
Experience
Microsoft Research, Redmond, USA
- Research Intern, CoreAI Speech Team, June 2025 – Sep 2025
- Manager: Jinyu Li
- Mentor(s): Xiong Xiao, Ruchao Fan, Shaoshi Ling
Tencent AI Lab, Bellevue, USA (remote)
- Research Intern, Seattle Speech Lab, Sep 2023 – August 2024
- Manager: Dong Yu
- Mentor(s): Yong Xu, Shi-Xiong (Austin) Zhang
Alibaba Group, Hangzhou, China
- Research Intern, Tongyi Speech Team, Jul 2022 – May 2023
- Manager: Zhijie Yan
- Mentor(s): Shiliang Zhang, Zhihao Du
Selected Publications
Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio, ICASSP 2026 [pdf]
Mohan Shi, Xiong Xiao, Ruchao Fan, Shaoshi Ling, Jinyu LiSTACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs, ICASSP 2026 [pdf] [code]
Kaiyuan Zhang*, Mohan Shi*, Eray Eren, Natarajan Balaji Shankar, Zilai Wang, Abeer Alwan
Advancing Multi-talker ASR Performance with Large Language Models, SLT 2024 [pdf]
Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong YuLibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization, Interspeech 2024 (Oral) [pdf]
Zengrui Jin*, Yifan Yang*, Mohan Shi*, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel PoveyCASA-ASR: Context-Aware Speaker-Attributed ASR, Interspeech 2023 [pdf]
Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong DaiSemantic VAD: Low-Latency Voice Activity Detection for Speech Interaction, Interspeech 2023 (Oral) [pdf]
Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai
Education
University of California, Los Angeles
- Ph.D. student, Electrical and Computer Engineering, Sep 2024 – Present
- Advisor: Abeer Alwan (Distinguished Prof., Fellow of IEEE/ISCA/ASA)
University of Science and Technology of China
- Master of Engineering, Electronic Engineering and Information Science, Sep 2021 – Jun 2024
- Advisor: Li-Rong Dai
Dalian University of Technology
- Bachelor of Engineering, Electronic Information Engineering, Sep 2017 – Jun 2021
- GPA Rank: 1 / 185
