I am Mohan Shi, a first-year Ph.D. student in Electrical and Computer Engineering at University of California, Los Angeles (UCLA), where I am fortunate to be advised by Prof. Abeer Alwan (Fellow of IEEE/ISCA/ASA). I received my master degree at the University of Science and Technology of China (USTC) and had the pleasure of working under the guidance of Prof. Li-Rong Dai for three years. My research interests span a variety of domains in the world of speech processing:

  • Automatic Speech Recognition
  • Discrete Speech Units
  • Large Language Models
  • Cocktail Party Problem

I am eagerly seeking research internship opportunities for the Summer of 2025. Please reach out if you have any leads.

Publications


  1. Advancing Multi-talker ASR Performance with Large Language Models, SLT 2024 [pdf]
    Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

  2. LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization, Interspeech 2024 (Oral) [pdf]
    Zengrui Jin*, Yifan Yang*, Mohan Shi*, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey

  3. CASA-ASR: Context-Aware Speaker-Attributed ASR, Interspeech 2023 [pdf]
    Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

  4. Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction, Interspeech 2023 (Oral) [pdf]
    Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

  5. A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings, APSIPA ASC 2023 [pdf]
    Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai

  6. The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR, ASRU 2023 [pdf]
    Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

  7. Non-autoregressive End-to-End Speaker-Attributed ASR, ASRU 2023 [pdf]
    Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie

  8. The USTC-NELSLIP offline speech translation systems for IWSLT 2022, IWSLT 2022 [pdf]
    Weitai Zhang, Zhongyi Ye, Haitao Tang, Xiaoxi Li, Xinyuan Zhou, Jing Yang, Jianwei Cui, Pan Deng, Mohan Shi, Yifan Song, Dan Liu, Junhua Liu, Lirong Dai

Education


University of California, Los Angeles

  • Ph.D. student, Electrical and Computer Engineering, Sep 2024 – Present
  • Advisor: Dr. Abeer Alwan (Distinguished Prof. & Vice Chair, Fellow of IEEE/ISCA/ASA)

University of Science and Technology of China

  • Master of Engineering, Electronic Engineering and Information Science, Sep 2021 – Jun 2024
  • Advisor: Dr. Li-Rong Dai (Deputy Director of NERC-SLIP)

Dalian University of Technology

  • Bachelor of Engineering, Electronic Information Engineering, Sep 2017 – Jun 2021
  • Rank: 1 / 185

Experience


Tencent AI Lab, Bellevue, USA (remote)

Alibaba Damo Academy, Hangzhou, China