2022 - 2025 Analysis of audio commentary voice and development of speech synthesis algorithm for understanding e-sports
2023 - 2025 インタラクティブ強化学習に基づく人間参加型適応的テキスト音声合成
2023 - 2024 Self-supervised Generative Spoken Dialogue Modeling from In-the-wild Data
2022 - 2023 Human-in-the-loop training of multi-speaker voice conversion based on federated learning
2021 - 2023 Continual Learning に基づく持続的に学習可能な音声合成
2021 - 2022 Adaptable end-to-end text-to-speech based on error correction feedback from user
2018 - 2021 聞き手モデルに基づく能動的音声合成に関する研究
Grants for Researchers Attending International Conferences
Show all
Papers (44):
Xuan Luo, Shinnosuke Takamichi, Yuki Saito, Tomoki Koriyama, Hiroshi Saruwatari. Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence. APSIPA Transactions on Signal and Information Processing. 2024. 13. 1
Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari. JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions. IEEE Access. 2024. 12. 19752-19764
Yuki Saito, Kohei Yatabe, Shogun. Does controller sound contain valuable information for video game scene analysis? Case study by character identification of Super Smash Bros. Ultimate. Acoustical Science and Technology. 2024. 45. 2. 113-116
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari. COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control. 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023. 2023
Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari. ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2023. 2023-August. 3048-3052
渡邊亞椰, 高道慎之介, 齋藤佑樹, XIN Detai, 猿渡洋. Generation of Mid-Attribute Non-existent Speakers by Gaussian Mixture Model Interpolation based on Optimal-Transport. 日本音響学会研究発表会講演論文集(CD-ROM). 2023. 2023
2018 - 2021 The University of Tokyo The Graduate School of Information Science and Technology Department of Information Physics and Computing
2016 - 2018 The University of Tokyo The Graduate School of Information Science and Technology Department of Creative Informatics, Graduate School of Information Science and Technology
2014 - 2016 National Institute of Technology, Kushiro College Advanced Engineering Course Electronic Information System Engineering Course
2009 - 2014 National Institute of Technology, Kushiro College Department of Information Engineering
Professional career (1):
Ph.D. (Information Science and Technology) (The University of Tokyo)
Work history (5):
2024/04 - 現在 The University of Tokyo Lecturer
2023/04 - 2024/03 The University of Tokyo
2021/04 - 2023/03 The University of Tokyo Project Research Associate
2019/04 - 2021/03 The University of Tokyo Research Assistant
2018/04 - 2021/03 Japan Society for the Promotion of Science Research fellow (DC1)
2023/06 - 情報処理学会 音学シンポジウム2023 優秀発表賞 ChatGPT-EDSS: ChatGPT由来のContext Word Embeddingから学習される共感的対話音声合成モデル
2023/03 - Funai Foundation Funai Information Technology Award for Young Researchers Study on speech synthesis that minimizes the differences between humans and computers
2022/06 - 電子情報通信学会 2021年度電子情報通信学会 論文賞 Real-time full-band voice conversion with sub-band modeling and data-driven phase estimation of spectral differentials
2021/06 - IEEE Signal Processing Society Young Author Best Paper Award Statistical parametric speech synthesis incorporating generative adversarial networks
2019/01 - NEC C&C 財団 C&C若手優秀論文賞 Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors
2018/11 - IEEE Signal Processing Society Japan Student Journal Paper Award Statistical parametric speech synthesis incorporating generative adversarial networks
2018/08 - 電子情報通信学会 音声研究会 研究奨励賞 Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors
2017/03 - IEEE Signal Processing Society Spoken Language Processing Student Grant of ICASSP Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis