The rapid advancement of foundational large language models (LLMs) has transformed multiple domains by significantly boosting performance across a wide range of downstream tasks. Furthermore, in recent years, LLMs have been increasingly utilized in foundational speech and audio processing tasks such as ASR, Audio Captioning, etc., as well as in the development of new and innovative tasks such as open-ended Question Answering. However, despite the growing interest in this area, the adoption of LLMs for speech and audio tasks has been slower due to several challenges. These challenges include the limited availability of high-quality data, especially in non-English languages, the absence of comprehensive evaluation metrics, and the need for improved architectures and training methodologies that can effectively address the unique complexities of speech and audio processing.
The first Workshop on Speech and Audio Language Models (SALMA), co-located with ICASSP 2025, is focused on exploring how Large Language Models (LLMs) can be utilized to advance speech and audio processing. This workshop aims to bring together researchers specializing in speech, audio, and language models to foster in-depth discussions and identify synergies. The goal is to develop effective methodologies for leveraging LLMs to improve performance across various tasks in speech, audio, and music domains, including classification, generation, and retrieval. The workshop will also address fundamental questions such as:
For any questions, please email at salmaicassp2025@gmail.com