J-GLOBAL ID:201801014603986279   Update date: Sep. 17, 2024

LI Sheng

リ シェン | LI Sheng
Affiliation and department:
Job title: Researcher
Homepage URL  (2): https://ast-astrec.nict.go.jp/member/sheng-li/index-modern-jp.htmlhttps://ast-astrec.nict.go.jp/member/sheng-li/index.html
Research field  (1): Perceptual information processing
Research keywords  (5): large language model (speech, text) ,  security-aware speech processing ,  multimodal speech process ,  computer assisted language learning ,  Speech Recognition/translation
Research theme for competitive and other funds  (10):
  • 2023 - 2028 意図を的確に伝える音声対話翻訳の基盤技術の創出
  • 2023 - 2026 M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition
  • 2023 - 2024 Spoof Detection for Automatic Speaker Verification
  • 2024 - enhancing large language model
  • 2022 - 2024 Bridging Eurasia from Sea -- Multilingual Speech Recognition for Maritime Silkroad
Show all
Papers (107):
  • Sheng Li, Chen Chen, Chin Yuen Kwok, Chenhui Chu, Eng Siong Chng, Hisashi Kawai. Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses. Interspeech 2024. 2024. 1315-1319
  • Sheng Li, Jiyi Li, Yang Cao. Automatic Post-Editing of Speech Recognition System Output Using Large Language Models. The DASFAA 2024 Workshop. 2024
  • Sheng Li, Bei Liu, Jianlong Fu. Revisiting Generative Adversarial Network for Downstream Task of Speech Recognition. 2024
  • Sheng Li, Jiyi Li, Chenhui Chu. Voices of the Himalayas: Benchmarking Speech Recognition Systems for the Tibetan Language. International Journal of Asian Language Processing. 2024
  • Yankun Wu, Yuta Nakashima, Noa Garcia, Sheng Li, Zhaoyang Zeng. Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis. Proceedings of the 2024 International Conference on Multimedia Retrieval. 2024
MISC (14):
  • Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara. MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction. 2024
  • Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li. End-to-End Speech-to-Speech Translation toolkit. ACM Multimedia Asia 2023 workshop released tookit. 2023
  • Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li. FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection. 2023
  • Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He. GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System. 2023
  • Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He. Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization. 2023
Patents (7):
  • 推論器および推論器の学習方法
  • 推論器、推論プログラムおよび学習方法
  • 言語識別モデルの訓練方法及び装置、並びにそのためのコンピュータプログラム
  • 識別器、学習済モデル、学習方法
  • 音声認識システム、音声認識方法、学習済モデル
Books (4):
  • Voices of the Himalayas: Investigation of Speech Recognition Technology for the Tibetan Language
    2023 ISBN:9784904020289
  • Bridging Eurasia: Multilingual Speech Recognition for Silkroad
    2023 ISBN:9784904020296
  • Phantom in the Opera: The Vulnerabilities of Speech-based Artificial Intelligence Systems
    2022 ISBN:9784904020265
  • Automatic speech recognition: Speech-to-Speech Translation
    Springer Singapore 2020
Lectures and oral presentations  (58):
  • 大規模言語モデルの統合による音声認識システムの改善
    (NICT Open House 2024 2024)
  • Combining Large Language Model with Speech Recognition System in Low-resource Settings
    (NLP2024 2024)
  • Investigating effective methods for combining large language model with speech recognition system
    (Acoustical Society of Japan 151st (Spring 2024) meeting 2024)
  • Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
    (ICT-innovation 2023 (Kyoto Univ.) 2024)
  • Self-Supervised Learning MOS Prediction with Listener Enhancement
    (VoiceMOS mini workshop 2023)
Works (7):
  • HSoftmax: Hierachical Softmax (https://github.com/Derek-Gong/hsoftmax/)
    Zhuo Gong, Qianying Liu, Sheng Li, Zhengdong Yang, Yuhang Yang 2020 -
  • very deep residual time-delay neural network (TDNN) with LFMMI objective implemented with MS-CNTK
  • online speech recognition module for Erica the human robot
  • Julius decoder with EESEN CTC acoustic model
  • VTLN for Julius/HTK acoustic model
Education (3):
  • 2012 - 2016 Kyoto University Graduate School Ph.D Informatic Science
  • 2007 - 2009 Nanjing University Joint Program of Chinese Academy of Sciences, Chinese University of Hong Kong and Nanjing University M.E
  • 2002 - 2006 Nanjing University B.S Computer Science
Professional career (1):
  • Ph.D Informatic Science (Kyoto University)
Work history (8):
  • 2020 - 現在 National Institute of Information and Communications Technology (NICT) Advanced Speech Technology Laboratory (ASTL) Tenure-track Researcher
  • 2024/02 - 2024/03 Nanyang Technological University visiting researcher
  • 2021/12 - 2023/03 Kyoto University master course advisor
  • 2019/04 - 2019/05 Oxford University Computer science department visiting researcher
  • 2017 - 2019 National Institute of Information and Communications Technology (NICT) Advanced Speech Technology Laboratory (ASTL) researcher (hired by Tokyo Olympic2020 project)
Show all
Committee career (13):
  • - 2026 APSIPA Speech, Language, and Audio (SLA) Technical Committee (till 2026)
  • 2024/06 - 2024/12 Publicity Chair of ACM Multimedia Asia 2024
  • 2024/12 - Co-organizing ACM Multimedia Asia 2024 workshop: Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental) Co-organizer
  • 2024/07 - Session Chair of DASFAA2024
  • 2024/04 - 2024/04 Session Chair of IEEE-ICASSP2024
Show all
Awards (22):
  • 2023/12 - ICASSP2024 ICMC-ASR (In-Car Multi-Channel Automatic Speech Recognition) Challenge top2 in one track
  • 2023/12 - 1st place in one track in ASRU2023 special session: VoiceMOS challenge
  • 2023/05 - IEEE signal processing society IEEE-SPS grant for IEEE-ICASSP2023 oral presentation (Co-supervised PhD student Qianying Liu)
  • 2022 - 1st place in 6 indexes (total 16) of Main/OOD tracks in INTERSPEECH2022 special session: VoiceMOS challenge
  • 2021/12 - Oriental language recognition challenge 2021 3rd/4th place in constrained/unconstrained resource multilingual ASR tracks of OLR2021 challenge
Show all
Association Membership(s) (7):
APNNS (Asia Pacific Neural Network Society) ,  APSIPA (Asia Pacific Signal and Information Processing Association) ,  SIG-CSLP (Chinese Spoken Language Processing) ,  ASJ (日本音響学会) ,  ISCA (International Speech Communication Association) ,  IEEE/IEEE-SPS ,  ACM (Association for Computing Machinery)
※ Researcher’s information displayed in J-GLOBAL is based on the information registered in researchmap. For details, see here.

Return to Previous Page