This AI Research from China Explains How Common 7B Language Models Already Possess Strong Mathematical Capabilities

Large Language Models (LLMs) have demonstrated impressive capabilities in almost every domain. From generating unique content just like humans, answering questions to summarizing massive textual paragraphs, completing codes and translating languages, LLMs are one of the best advancements in the field of Artificial Intelligence (AI).

However, it is widely believed that in order for language models to have great mathematical capabilities, they are required to be very vast in scale or go through a rigorous pre-training process involving mathematics. A recent research challenges this idea by demonstrating that the LLaMA-2 7B model already displays outstanding mathematical abilities, even with standard pre-training.

It can choose the optimum response from 256 random generations with remarkable accuracy rates of 97.7% and 72.0% on the GSM8K and MATH benchmarks, respectively. The main problem with the existing base model is that, although it can produce accurate answers with high accuracy, it cannot reliably evoke its innate mathematical capabilities. The considerable decline in accuracy to 49.5% and 7.9% on the GSM8K and MATH benchmarks, respectively, when focusing solely on the first response, emphasizes this discrepancy.

To address this issue, the team has suggested scaling up supervised fine-tuning (SFT) data. The accuracy of the responses generated can be greatly improved by increasing the amount of data used for fine-tuning. However, the lack of publicly available math problems limits the potential for large-scale scalability. The team has used synthetic data, which works almost as well as real data, to get around this restriction.

The team has created fictitious math problems with the GPT-4 Turbo model and has found that utilizing GPT-4 Turbo for verification after implementing a basic generating approach yields incredibly efficient outcomes. Using artificially generated maths problems allows for large scaling of the supervised fine-tuning data, with nearly matching real-world accuracy.

By using this simple method, the team was able to enhance accuracy noticeably. They attained 82.6% accuracy on GSM8K and 40.6% accuracy on MATH using LLaMA-2 7B models, which exceeds the accuracy of earlier models by 14.2% and 20.8%, respectively.

The research has also offered insights into scaling behaviors across various mistake types and reasoning difficulties. This analysis clarifies methods to reduce errors during the scaling process and helps comprehend how the model’s performance changes as data volumes increase.

In conclusion, this study demonstrates that language models can attain excellent mathematical capabilities without requiring large-scale models or intensive pre-training. Considerable progress in the area of mathematical problem-solving with language models can be made by utilizing synthetic data and increasing the amount of supervised fine-tuning.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

Source link

This AI Research from China Explains How Common 7B Language Models Already Possess Strong Mathematical Capabilities

Leave a Reply Cancel reply

About

Categories

This AI Research from China Explains How Common 7B Language Models Already Possess Strong Mathematical Capabilities

Related posts

PRISE: A Unique Machine Learning Method for Learning Multitask Temporal Action Abstractions Using Natural Language Processing (NLP)

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference

Self-Route: A Simple Yet Effective AI Method that Routes Queries to RAG or Long Context LC based on Model Self-Reflection

Leave a Reply Cancel reply

About

Categories