AASP-P11.3
FROM CONTRAST TO COMMONALITY: AUDIO COMMONALITY CAPTIONING FOR ENHANCED AUDIO-TEXT CROSS-MODAL UNDERSTANDING IN MULTIMODAL LLMS
Yuhang Jia, Xu Zhang, Yujie Guo, Yang Chen, Shiwan Zhao, nankai.edu.cn, China
Session:
AASP-P11: Audio Captioning, Retrieval, and Understanding Poster
Track:
Audio and Acoustic Signal Processing [AA]
Location:
Poster Area 26
Presentation Time:
Wed, 6 May, 14:00 - 16:00
Session AASP-P11
AASP-P11.1: SEGMENTWISE PRUNING IN AUDIO-LANGUAGE MODELS
Marcel Gibier, Inria, France; Raphael Duroselle, AMIAD, France; Pierre Serrano, Olivier Boeffard, Inria, France; Jean-François Bonastre, AMIAD, France
AASP-P11.2: HIERARCHICAL ACTIVITY RECOGNITION AND CAPTIONING FROM LONG-FORM AUDIO
Peng Zhang, Qingyu Luo, Philip Jackson, Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland
AASP-P11.3: FROM CONTRAST TO COMMONALITY: AUDIO COMMONALITY CAPTIONING FOR ENHANCED AUDIO-TEXT CROSS-MODAL UNDERSTANDING IN MULTIMODAL LLMS
Yuhang Jia, Xu Zhang, Yujie Guo, Yang Chen, Shiwan Zhao, nankai.edu.cn, China
AASP-P11.4: IMPROVING AUDIO QUESTION ANSWERING WITH VARIATIONAL INFERENCE
Haolin Chen, Idiap Research Institute, Switzerland
AASP-P11.5: ONE MODEL–THREE TASKS: DISCOVERING A SHARED WINNING TICKET FOR LOW-COMPLEXITY AUDIO INTELLIGENCE
Maxim Surkov, ITMO University, Russian Federation
AASP-P11.6: ACAVCAPS: ENABLING LARGE-SCALE TRAINING FOR FINE-GRAINED AND DIVERSE AUDIO UNDERSTANDING
Yadong Niu, MiLM Plus, Xiaomi Inc, Beijing, China, China; Tianzi Wang, The Chinese University of Hong Kong, Hong Kong, China, China; Heinrich Dinkel, Xingwei Sun, MiLM Plus, Xiaomi Inc, Beijing, China, China; Jiahao Zhou, Beijing University of Posts and Telecommunications, Beijing, China, China; Gang Li, Jizhong Liu, Junbo Zhang, Jian Luan, MiLM Plus, Xiaomi Inc, Beijing, China, China
AASP-P11.7: CASTELLA: LONG AUDIO DATASET WITH CAPTIONS AND TEMPORAL BOUNDARIES
Hokuto Munakata, LY Corporation, Japan; Takehiro Imamura, Nagoya University, Japan; Taichi Nishimura, Tatsuya Komatsu, LY Corporation, Japan
AASP-P11.8: Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
Runyan Yang, Yuke Si, Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang, China Mobile, China
AASP-P11.9: AUDIOSETCAPS: AN ENRICHED AUDIO-CAPTION DATASET USING AUTOMATED GENERATION PIPELINE WITH LARGE AUDIO AND LANGUAGE MODELS
Jisheng Bai, Xi'an University of Posts & Telecommunications, China; Haohe Liu, Meta, United States of America; Mou Wang, Institute of Acoustics, Chinese Academy of Sciences, China; Dongyuan Shi, Northwestern Polytechnical University, China; Wenwu Wang, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Mark Plumbley, King's College London, United Kingdom of Great Britain and Northern Ireland; Woon-seng Gan, Nanyang Technological University, Singapore; Jianfeng Chen, Northwestern Polytechnical University, China
AASP-P11.10: Acoustic Prompt Tuning: Empowering Large Language Models With Audition Capabilities
Jinhua Liang, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland; Xubo Liu, Wenwu Wang, Mark Plumbley, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Huy Phan, Meta, France; Emmanouil Benetos, Queen Mary University of London, United Kingdom of Great Britain and Northern Ireland
Contacts