AI

Patronus AI Releases Lynx v1.1: An 8B State-of-the-Art RAG Hallucination Detection Model

2 Mins read

Patronus AI released the LYNX v1.1 series, representing a significant step forward in artificial intelligence, particularly in detecting hallucinations in AI-generated content. Hallucinations, in the context of AI, refer to the generation of information that is unsupported or contradictory to the provided data, which poses a considerable challenge for applications relying on accurate and reliable responses. The LYNX models address this problem using retrieval-augmented generation (RAG), a method that helps ensure the answers generated by the AI are faithful to the given documents.

The 70B version of LYNX v1.1 has already demonstrated exceptional performance in this area. On the HaluBench evaluation, which tests for hallucination detection in real-world scenarios, the 70B model achieved an impressive 87.4% accuracy. This performance surpasses other leading models, including GPT-4o and GPT-3.5-Turbo, and it has shown superior accuracy in specific tasks such as medical question answering in PubMedQA.

The 8B version of LYNX v1.1, known as Patronus-Lynx-8B-Instruct-v1.1, is a finely tuned model that balances efficiency and capability. Trained on a diverse set of datasets, including CovidQA, PubmedQA, DROP, and RAGTruth, this version supports a maximum sequence length of 128,000 tokens and is primarily focused on the English language. Advanced training techniques like mixed precision training and flash attention are employed to enhance efficiency without compromising accuracy. Evaluations were conducted on 8 Nvidia H100 GPUs to ensure precise performance metrics.

Since the release of Lynx v1.0, thousands of developers have integrated it into various real-world applications, demonstrating its practical utility. Despite efforts to reduce hallucinations using RAG, large language models (LLMs) can still produce errors. However, Lynx v1.1 significantly improves real-time hallucination detection, making it the best-performing RAG hallucination detection model of its size. The 8B model has shown substantial improvements over baseline models like Llama 3, with an 87.3% score on HaluBench. It outperforms models such as Claude-3.5-Sonnet by 3% and GPT-4o on medical questions by 6.8%. Additionally, compared to Lynx v1.0, it has a 1.4% higher accuracy on HaluBench and surpasses all open-source models on LLM-as-judge tasks.

In conclusion, the LYNX 8B model of the LYNX v1.1 series is a robust and efficient tool for detecting hallucinations in AI-generated content. While the 70B model leads in overall accuracy, the 8B version offers a compelling balance of efficiency and performance. Its advanced training techniques, coupled with substantial performance improvements, make it an excellent choice for various machine learning applications, especially where real-time hallucination detection is critical. Lynx v1.1 is open-source, with open weights and data, ensuring accessibility and transparency for all users.


Check out the Paper, Try it out on HuggingFace Spaces, and Download Lynx v1.1 on HuggingFace. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here


Shreya Maji is a consulting intern at MarktechPost. She is pursued her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. An AI enthusiast, she enjoys staying updated on the latest advancements. Shreya is particularly interested in the real-life applications of cutting-edge technology, especially in the field of data science.



Source link

Related posts
AI

Critic-RM: A Self-Critiquing AI Framework for Enhanced Reward Modeling and Human Preference Alignment in LLMs

3 Mins read
Reward modeling is critical in aligning LLMs with human preferences, particularly within the reinforcement learning from human feedback (RLHF) framework. Traditional reward…
AI

Meet GRAPE: A Plug-and-Play Algorithm to Generalize Robot Policies via Preference Alignment

3 Mins read
The field of robotic manipulation has witnessed a remarkable transformation with the emergence of vision-language-action (VLA) models. These advanced computational frameworks have…
AI

Adaptive Attacks on LLMs: Lessons from the Frontlines of AI Robustness Testing

2 Mins read
The field of Artificial Intelligence (AI) is advancing at a rapid rate; specifically, the Large Language Models have become indispensable in modern…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *