I am Mohan Shi, a second-year Ph.D. student in Electrical and Computer Engineering at University of California, Los Angeles (UCLA), advised by Prof. Abeer Alwan. I received my master degree at the University of Science and Technology of China (USTC) and worked under the guidance of Prof. Li-Rong Dai for three years. My research interests span a variety of domains in the world of speech processing:

  • Automatic Speech Recognition
  • Speech-centric Large Language Models
  • Child/Low-resource Speech Processing
  • Speech Tokenization
  • Cocktail Party Problems

My summer intern paper at Microsoft is now on arXiv! We use Phi-4-Multimodal for joint ASR and speaker diarization, achieving SOTA on both short and long audio.

Experience


Microsoft Research, Redmond, USA

Tencent AI Lab, Bellevue, USA (remote)

Alibaba Group, Hangzhou, China

Selected Publications


  1. Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio, ICASSP 2026 [pdf]
    Mohan Shi, Xiong Xiao, Ruchao Fan, Shaoshi Ling, Jinyu Li

  2. STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs, ICASSP 2026 [pdf] [code]
    Kaiyuan Zhang*, Mohan Shi*, Eray Eren, Natarajan Balaji Shankar, Zilai Wang, Abeer Alwan

  1. Advancing Multi-talker ASR Performance with Large Language Models, SLT 2024 [pdf]
    Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

  2. LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization, Interspeech 2024 (Oral) [pdf]
    Zengrui Jin*, Yifan Yang*, Mohan Shi*, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey

  3. CASA-ASR: Context-Aware Speaker-Attributed ASR, Interspeech 2023 [pdf]
    Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

  4. Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction, Interspeech 2023 (Oral) [pdf]
    Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

Education


University of California, Los Angeles

  • Ph.D. student, Electrical and Computer Engineering, Sep 2024 – Present
  • Advisor: Abeer Alwan (Distinguished Prof., Fellow of IEEE/ISCA/ASA)

University of Science and Technology of China

  • Master of Engineering, Electronic Engineering and Information Science, Sep 2021 – Jun 2024
  • Advisor: Li-Rong Dai

Dalian University of Technology

  • Bachelor of Engineering, Electronic Information Engineering, Sep 2017 – Jun 2021
  • GPA Rank: 1 / 185