Generative AI has emerged as a pivotal field with the rise of large language models (LLMs). These models are capable of producing complex outputs based on a variety of prompts. One notable area within this domain is Retrieval Augmented Generation (RAG), which integrates external information into LLMs to enhance factual accuracy. RAG specifically addresses the need to produce reliable, contextually relevant information. With rapid advancements in this area, RAG frameworks have become central to solving knowledge-based tasks, where models are required to generate answers grounded in external sources. This reliance on external documents has prompted researchers to refine and develop models that can better comprehend the context and deliver results with minimal errors.
However, large language models need help processing conflicting or insufficient information despite advancements. Many LLMs are prone to hallucination, generating responses that are factually incorrect or irrelevant to the context provided. In some cases, when insufficient contextual information is available, these models revert to their pre-trained knowledge, which may not always align with the specific requirements of the task at hand. They often need help with multi-hop reasoning, requiring them to infer answers by synthesizing multiple pieces of context. As the demand for accurate, context-grounded answers grows, the need for models that can efficiently handle these complexities becomes critical. The challenge remains to improve these models’ ability to process external contexts without generating unreliable information or omitting essential citations.
Existing approaches in Retrieval Augmented Generation involve a retriever that locates relevant documents and a generator, often an LLM, that processes the retrieved context to generate responses. These setups, though useful, are limited in several ways. For instance, models like GPT-4o and Command-R+ rely heavily on large parameter counts—104 billion parameters for Command-R+ and 79.24 billion for GPT-4o. Despite their large size, these models frequently struggle when conflicting information is presented. This often leads to inaccuracies and a failure to handle unanswerable queries, a significant drawback in knowledge-dependent scenarios. Existing models are not specifically tuned to prioritize reliability in their outputs, so they are often forced to rely on pre-trained data instead of retrieving new, relevant information.
Researchers at Salesforce AI Research introduced a new model called SFR-RAG, a 9-billion-parameter model fine-tuned for context-grounded generation. Despite its relatively smaller size than other models, SFR-RAG was designed to outperform its larger counterparts in specific tasks requiring retrieval-augmented answers. The model is tailored to minimize hallucination and handle scenarios where the contextual information is insufficient or conflicting. By focusing on reducing parameter count while maintaining high performance, the team aimed to introduce a model that would be more efficient without sacrificing accuracy. The SFR-RAG model incorporates function-calling capabilities, allowing it to dynamically interact with external tools to retrieve high-quality contextual information.
SFR-RAG’s innovative approach includes a novel chat template, which adds two key roles, ”Thought” and “Observation.” The Thought role enables the model to reason through multiple steps internally, while the Observation role captures any external information retrieved by the model during its process. This structure allows SFR-RAG to differentiate between information processing steps and generate accurate, user-friendly responses. The model is also fine-tuned to be resilient against low-quality or irrelevant contexts, distinguishing it from traditional LLMs that often falter under such conditions. SFR-RAG’s architecture enables it to perform complex multi-hop reasoning, synthesizing multiple pieces of retrieved information to generate coherent and factual responses.
Experimental results demonstrated the success of SFR-RAG, particularly in the ContextualBench evaluation suite. This suite comprises seven contextual tasks, including HotpotQA, TriviaQA, and TruthfulQA, designed to test models’ ability to generate accurate, contextually relevant answers. Despite significantly fewer parameters, SFR-RAG achieved state-of-the-art results in three of these seven tasks, outperforming larger models like GPT-4o in key areas. For example, in 2WikiHopQA, SFR-RAG exhibited a 25% increase in performance compared to GPT-4o. It also performed competitively across other benchmarks, including Natural Questions and Musique. Notably, SFR-RAG’s performance remained robust even when contextual information was altered or when the context contained conflicting information. This resilience is crucial for applications where accurate information retrieval is necessary, and the results underscore the effectiveness of SFR-RAG’s architecture.
In conclusion, SFR-RAG presents a major advancement in Retrieval Augmented Generation by addressing the common problems larger models face. Its relatively small parameter count of 9 billion allows it to operate efficiently while maintaining high accuracy and reliability. By introducing innovative features like the Thought and Observation roles, SFR-RAG can handle complex, multi-step reasoning while avoiding the pitfalls of hallucination and irrelevant context generation. Its impressive performance across various benchmarks, including state-of-the-art results in multiple tasks, highlights the potential of smaller, fine-tuned models in generating accurate, context-grounded outputs. In the evolving field of generative AI, SFR-RAG represents a shift towards more efficient, reliable models that can better handle the challenges of external context processing.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 50k+ ML SubReddit
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.