Technical Program

SS-P5: Multimodal Representation Learning for Language Generation and Understanding

Session Type: Poster
Time: Thursday, May 16, 15:30 - 17:30
Location: Poster Area E, Meeting Room 1A
Session Chairs: Florian Metze, Carnegie Mellon University, Christian Fuegen, Facebook and Ramon Sanabria, Carnegie Mellon University
 
SS-P5.1: MODELS OF VISUALLY GROUNDED SPEECH SIGNAL PAY ATTENTION TO NOUNS: A BILINGUAL EXPERIMENT ON ENGLISH AND JAPANESE
Manuscript Link:  Click here to view manuscript on IEEE Xplore
         William Havard; LIG, Univ. Grenoble Alpes
         Jean-Pierre Chevrot; LIDILEM, Univ. Grenoble Alpes
         Laurent Besacier; LIG, Univ. Grenoble Alpes
 
SS-P5.2: MULTIMODAL ONE-SHOT LEARNING OF SPEECH AND IMAGES
Manuscript Link:  Click here to view manuscript on IEEE Xplore
         Ryan Eloff; Stellenbosch University
         Herman Engelbrecht; Stellenbosch University
         Herman Kamper; Stellenbosch University
 
SS-P5.3: LEARNING FROM MULTIVIEW CORRELATIONS IN OPEN-DOMAIN VIDEOS
Manuscript Link:  Click here to view manuscript on IEEE Xplore
         Nils Holzenberger; Johns Hopkins University
         Shruti Palaskar; Carnegie Mellon University
         Pranava Madhyastha; Imperial College London
         Florian Metze; Carnegie Mellon University
         Raman Arora; Johns Hopkins University
 
SS-P5.4: WAV2PIX: SPEECH-CONDITIONED FACE GENERATION USING GENERATIVE ADVERSARIAL NETWORKS
Manuscript Link:  Click here to view manuscript on IEEE Xplore
         Amanda Duarte; Barcelona Supercomputing Center
         Francisco Roldan; Universitat Politecnica de Catalunya
         Miquel Tubau; Universitat Politecnica de Catalunya
         Janna Escur; Universitat Politecnica de Catalunya
         Santiago Pascual; Universitat Politecnica de Catalunya
         Amaia Salvador; Universitat Politecnica de Catalunya
         Eva Mohedano; Insight Centre for Data Analytics
         Kevin McGuinness; Insight Centre for Data Analytics
         Jordi Torres; Barcelona Supercomputing Center
         Xavier Giro-i-Nieto; Universitat Politecnica de Catalunya
 
SS-P5.5: NEURAL CODES TO FACTOR LANGUAGE IN MULTILINGUAL SPEECH RECOGNITION
Manuscript Link:  Click here to view manuscript on IEEE Xplore
         Markus Müller; Karlsruhe Institute of Technology
         Sebastian Stüker; Karlsruhe Institute of Technology
         Alex Waibel; Karlsruhe Institute of Technology
 
SS-P5.6: MULTIMODAL SPEAKER ADAPTATION OF ACOUSTIC MODEL AND LANGUAGE MODEL FOR ASR USING SPEAKER FACE EMBEDDING
Manuscript Link:  Click here to view manuscript on IEEE Xplore
         Yasufumi Moriya; Dublin City University
         Gareth Jones; Dublin City University
 
SS-P5.7: MULTIMODAL GROUNDING FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
Manuscript Link:  Click here to view manuscript on IEEE Xplore
         Ozan Caglayan; Le Mans University
         Ramon Sanabria; Carnegie Mellon University
         Shruti Palaskar; Carnegie Mellon University
         Loïc Barrault; Le Mans University
         Florian Metze; Carnegie Mellon University