AI

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

1 Mins read

In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs to a Large Language Model (LLM). SELMA is designed to handle three primary and two auxiliary tasks related to interactions with virtual assistants simultaneously within a single end-to-end model. We employ low-rank adaptation modules for parameter-efficient training of both the audio encoder and the LLM. Additionally, we implement a feature pooling strategy enabling the system to recognize global patterns and improve accuracy on tasks less reliant on individual sequence elements. Experimental results on Voice Trigger (VT) detection, Device-Directed Speech Detection (DDSD), and Automatic Speech Recognition (ASR), demonstrate that our approach both simplifies the typical input processing pipeline of virtual assistants significantly and also improves performance compared to dedicated models for each individual task. SELMA yields relative Equal-Error Rate improvements of 64% on the VT detection task, and 22% on DDSD, while also achieving word error rates close to the baseline.


Source link

Related posts
AI

Hugging Face Releases OlympicCoder: A Series of Open Reasoning AI Models that can Solve Olympiad-Level Programming Problems

3 Mins read
In the realm of competitive programming, both human participants and artificial intelligence systems encounter a set of unique challenges. Many existing code…
AI

From Genes to Genius: Evolving Large Language Models with Nature’s Blueprint

3 Mins read
Large language models (LLMs) have transformed artificial intelligence with their superior performance on various tasks, including natural language understanding and complex reasoning….
AI

Limbic AI's Generative AI–Enabled Therapy Support Tool Improves Cognitive Behavioral Therapy Outcomes

2 Mins read
Recent advancements in generative AI are creating exciting new possibilities in healthcare, especially within mental health services, where patient engagement is often…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *