AI

Patronus AI Releases Lynx v1.1: An 8B State-of-the-Art RAG Hallucination Detection Model

2 Mins read

Patronus AI released the LYNX v1.1 series, representing a significant step forward in artificial intelligence, particularly in detecting hallucinations in AI-generated content. Hallucinations, in the context of AI, refer to the generation of information that is unsupported or contradictory to the provided data, which poses a considerable challenge for applications relying on accurate and reliable responses. The LYNX models address this problem using retrieval-augmented generation (RAG), a method that helps ensure the answers generated by the AI are faithful to the given documents.

The 70B version of LYNX v1.1 has already demonstrated exceptional performance in this area. On the HaluBench evaluation, which tests for hallucination detection in real-world scenarios, the 70B model achieved an impressive 87.4% accuracy. This performance surpasses other leading models, including GPT-4o and GPT-3.5-Turbo, and it has shown superior accuracy in specific tasks such as medical question answering in PubMedQA.

The 8B version of LYNX v1.1, known as Patronus-Lynx-8B-Instruct-v1.1, is a finely tuned model that balances efficiency and capability. Trained on a diverse set of datasets, including CovidQA, PubmedQA, DROP, and RAGTruth, this version supports a maximum sequence length of 128,000 tokens and is primarily focused on the English language. Advanced training techniques like mixed precision training and flash attention are employed to enhance efficiency without compromising accuracy. Evaluations were conducted on 8 Nvidia H100 GPUs to ensure precise performance metrics.

Since the release of Lynx v1.0, thousands of developers have integrated it into various real-world applications, demonstrating its practical utility. Despite efforts to reduce hallucinations using RAG, large language models (LLMs) can still produce errors. However, Lynx v1.1 significantly improves real-time hallucination detection, making it the best-performing RAG hallucination detection model of its size. The 8B model has shown substantial improvements over baseline models like Llama 3, with an 87.3% score on HaluBench. It outperforms models such as Claude-3.5-Sonnet by 3% and GPT-4o on medical questions by 6.8%. Additionally, compared to Lynx v1.0, it has a 1.4% higher accuracy on HaluBench and surpasses all open-source models on LLM-as-judge tasks.

In conclusion, the LYNX 8B model of the LYNX v1.1 series is a robust and efficient tool for detecting hallucinations in AI-generated content. While the 70B model leads in overall accuracy, the 8B version offers a compelling balance of efficiency and performance. Its advanced training techniques, coupled with substantial performance improvements, make it an excellent choice for various machine learning applications, especially where real-time hallucination detection is critical. Lynx v1.1 is open-source, with open weights and data, ensuring accessibility and transparency for all users.


Check out the Paper, Try it out on HuggingFace Spaces, and Download Lynx v1.1 on HuggingFace. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here


Shreya Maji is a consulting intern at MarktechPost. She is pursued her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. An AI enthusiast, she enjoys staying updated on the latest advancements. Shreya is particularly interested in the real-life applications of cutting-edge technology, especially in the field of data science.



Source link

Related posts
AI

Microsoft and Tsinghua University Researchers Introduce Distilled Decoding: A New Method for Accelerating Image Generation in Autoregressive Models without Quality Loss

4 Mins read
Autoregressive (AR) models have changed the field of image generation, setting new benchmarks in producing high-quality visuals. These models break down the…
AI

Neural Networks for Scalable Temporal Logic Model Checking in Hardware Verification

3 Mins read
Ensuring the correctness of electronic designs is critical, as hardware flaws are permanent post-production and can compromise software reliability or the safety…
AI

Optimizing costs of generative AI applications on AWS

14 Mins read
The report The economic potential of generative AI: The next productivity frontier, published by McKinsey & Company, estimates that generative AI could…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *