T. Afouras et al., “My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions,” in Proc. Interspeech 2019, 2019, pp. 4295-4299.
T. Ochiai et al., “Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues,” in Proc. Interspeech 2019, 2019, pp. 2718-2722.