AI

MARRS: Multimodal Reference Resolution System

1 Mins read

*= All authors listed contributed equally to this work

Successfully handling context is essential for any dialog understanding task. This context maybe be conversational (relying on previous user queries or system responses), visual (relying on what the user sees, for example, on their screen), or background (based on signals such as a ringing alarm or playing music). In this work, we present an overview of MARRS, or Multimodal Reference Resolution System, an on-device framework within a Natural Language Understanding system, responsible for handling conversational, visual and background context. In particular, we present different machine learning models to enable handing contextual queries; specifically, one to enable reference resolution, and one to handle context via query rewriting. We also describe how these models complement each other to form a unified, coherent, lightweight system that can understand context while preserving user privacy.


Source link

Related posts
AI

Gemini Robotics uses Google’s top language model to make robots more useful

2 Mins read
Although the robot wasn’t perfect at following instructions, and the videos show it is quite slow and a little janky, the ability…
AI

Gemini Robotics brings AI into the physical world

5 Mins read
Research Published 12 March 2025 Authors Carolina Parada Introducing Gemini Robotics, our Gemini 2.0-based model designed for robotics At Google DeepMind, we’ve…
AI

Google AI Releases Gemma 3: Lightweight Multimodal Open Models for Efficient and On‑Device AI

3 Mins read
In the field of artificial intelligence, two persistent challenges remain. Many advanced language models require significant computational resources, which limits their use…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *