J. Shen et al., “Natural TTS synthesis by conditioning Wavenet on mel spectrogram predictions,” in Proc. ICASSP 2018, 2018, pp. 4779-4783.
N. Wataru et al., “Audiobook speech synthesis conditioned by cross-sentence context-aware word embeddings,” in Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 2021, pp. 211-215.
P. Wu et al., “End-to-end emotional speech synthesis using style tokens and semi-supervised training,” in Proc. APSIPA ASC 2019, 2019, pp. 623-627.
Y.-J. Zhang et al., “Learning latent representations for style control and transfer in end-to-end speech synthesis,” in Proc. ICASSP 2019, 2019, pp. 6945-6949.
J. Pan et al., “A chapter-wise understanding system for text-to-speech in Chinese novels,” in Proc. ICASSP 2021, 2021, pp. 6069-6073.