AI

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

1 Mins read

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.


Source link

Related posts
AI

MMR1-Math-v0-7B Model and MMR1-Math-RL-Data-v0 Dataset Released: New State of the Art Benchmark in Efficient Multimodal Mathematical Reasoning with Minimal Data

2 Mins read
Advancements in multimodal large language models have enhanced AI’s ability to interpret and reason about complex visual and textual information. Despite these…
AI

Comparison of Popular Platforms ['25]

8 Mins read
Serverless functions enable developers to run code without having to manage a server. This allows them to focus on writing and deploying…
AI

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

1 Mins read
Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of unlabeled speech data and then used for…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *