LI Sheng

リシェン | LI Sheng

Affiliation and department：
Job title： Researcher
Homepage URL (2)： https://ast-astrec.nict.go.jp/member/sheng-li/index-modern-jp.html , https://ast-astrec.nict.go.jp/member/sheng-li/index.html

Research field (1)： Perceptual information processing

Research keywords (5)： large language model (speech, text) , security-aware speech processing , multimodal speech process , computer assisted language learning , Speech Recognition/translation

Research theme for competitive and other funds (10)：

2023 - 2028 意図を的確に伝える音声対話翻訳の基盤技術の創出
2023 - 2026 M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition
2023 - 2024 Spoof Detection for Automatic Speaker Verification
2024 - enhancing large language model
2022 - 2024 Bridging Eurasia from Sea -- Multilingual Speech Recognition for Maritime Silkroad

Show all

Papers (107)：

Sheng Li, Chen Chen, Chin Yuen Kwok, Chenhui Chu, Eng Siong Chng, Hisashi Kawai. Investigating ASR Error Correction with Large Language Model and Multilingual 1-best Hypotheses. Interspeech 2024. 2024. 1315-1319
Sheng Li, Jiyi Li, Yang Cao. Automatic Post-Editing of Speech Recognition System Output Using Large Language Models. The DASFAA 2024 Workshop. 2024
Sheng Li, Bei Liu, Jianlong Fu. Revisiting Generative Adversarial Network for Downstream Task of Speech Recognition. 2024
Sheng Li, Jiyi Li, Chenhui Chu. Voices of the Himalayas: Benchmarking Speech Recognition Systems for the Tibetan Language. International Journal of Asian Language Processing. 2024
Yankun Wu, Yuta Nakashima, Noa Garcia, Sheng Li, Zhaoyang Zeng. Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis. Proceedings of the 2024 International Conference on Multimedia Retrieval. 2024

ｍore...

MISC (14)：

Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara. MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction. 2024
Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li. End-to-End Speech-to-Speech Translation toolkit. ACM Multimedia Asia 2023 workshop released tookit. 2023
Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li. FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection. 2023
Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He. GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System. 2023
Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He. Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization. 2023

ｍore...

Patents (7)：

推論器および推論器の学習方法
推論器、推論プログラムおよび学習方法
言語識別モデルの訓練方法及び装置、並びにそのためのコンピュータプログラム
識別器、学習済モデル、学習方法
音声認識システム、音声認識方法、学習済モデル

ｍore...

Books (4)：

Voices of the Himalayas: Investigation of Speech Recognition Technology for the Tibetan Language
2023 ISBN:9784904020289
Bridging Eurasia: Multilingual Speech Recognition for Silkroad
2023 ISBN:9784904020296
Phantom in the Opera: The Vulnerabilities of Speech-based Artificial Intelligence Systems
2022 ISBN:9784904020265
Automatic speech recognition: Speech-to-Speech Translation
Springer Singapore 2020

Lectures and oral presentations (58)：

大規模言語モデルの統合による音声認識システムの改善
(NICT Open House 2024 2024)
Combining Large Language Model with Speech Recognition System in Low-resource Settings
(NLP2024 2024)
Investigating effective methods for combining large language model with speech recognition system
(Acoustical Society of Japan 151st (Spring 2024) meeting 2024)
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
(ICT-innovation 2023 (Kyoto Univ.) 2024)
Self-Supervised Learning MOS Prediction with Listener Enhancement
(VoiceMOS mini workshop 2023)

ｍore...

Works (7)：

HSoftmax: Hierachical Softmax (https://github.com/Derek-Gong/hsoftmax/)
Zhuo Gong, Qianying Liu, Sheng Li, Zhengdong Yang, Yuhang Yang 2020 -
very deep residual time-delay neural network (TDNN) with LFMMI objective implemented with MS-CNTK
online speech recognition module for Erica the human robot
Julius decoder with EESEN CTC acoustic model
VTLN for Julius/HTK acoustic model

ｍore...

Education (3)：

2012 - 2016 Kyoto University Graduate School Ph.D Informatic Science
2007 - 2009 Nanjing University Joint Program of Chinese Academy of Sciences, Chinese University of Hong Kong and Nanjing University M.E
2002 - 2006 Nanjing University B.S Computer Science

Professional career (1)：

Ph.D Informatic Science (Kyoto University)

Work history (8)：

2020 - 現在 National Institute of Information and Communications Technology (NICT) Advanced Speech Technology Laboratory (ASTL) Tenure-track Researcher
2024/02 - 2024/03 Nanyang Technological University visiting researcher
2021/12 - 2023/03 Kyoto University master course advisor
2019/04 - 2019/05 Oxford University Computer science department visiting researcher
2017 - 2019 National Institute of Information and Communications Technology (NICT) Advanced Speech Technology Laboratory (ASTL) researcher (hired by Tokyo Olympic2020 project)

Show all

Committee career (13)：

- 2026 APSIPA Speech, Language, and Audio (SLA) Technical Committee (till 2026)
2024/06 - 2024/12 Publicity Chair of ACM Multimedia Asia 2024
2024/12 - Co-organizing ACM Multimedia Asia 2024 workshop: Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental) Co-organizer
2024/07 - Session Chair of DASFAA2024
2024/04 - 2024/04 Session Chair of IEEE-ICASSP2024

Show all

Awards (22)：

2023/12 - ICASSP2024 ICMC-ASR (In-Car Multi-Channel Automatic Speech Recognition) Challenge top2 in one track
2023/12 - 1st place in one track in ASRU2023 special session: VoiceMOS challenge
2023/05 - IEEE signal processing society IEEE-SPS grant for IEEE-ICASSP2023 oral presentation (Co-supervised PhD student Qianying Liu)
2022 - 1st place in 6 indexes (total 16) of Main/OOD tracks in INTERSPEECH2022 special session: VoiceMOS challenge
2021/12 - Oriental language recognition challenge 2021 3rd/4th place in constrained/unconstrained resource multilingual ASR tracks of OLR2021 challenge

Show all

Association Membership(s) (7)：

APNNS (Asia Pacific Neural Network Society) , APSIPA (Asia Pacific Signal and Information Processing Association) , SIG-CSLP (Chinese Spoken Language Processing) , ASJ (日本音響学会) , ISCA (International Speech Communication Association) , IEEE/IEEE-SPS , ACM (Association for Computing Machinery)

※ Researcher’s information displayed in J-GLOBAL is based on the information registered in researchmap. For details, see here.

Return to Previous Page