AI

Can Language Models Reason Beyond Words? Exploring Implicit Reasoning in Multi-Layer Hidden States for Complex Tasks

2 Mins read

Large Language Models (LLMs) have shown remarkable capabilities in tasks like language understanding and reasoning, marking a paradigm shift in how we interact with AI systems. To augment the proficiency of LLMs, researchers generally employ the chain of thought prompting technique, which involves intermediate reasoning steps to guide the model’s response. Although this technique is similar to how humans solve a problem, it does not fully utilize the computational prowess of LLMs, and the authors of this paper have tried to explore an alternate reasoning approach.

Chain of thought (CoT) methods have shown great results, but the downside to their use is that they delay the generation of the desired final answer. The researchers have introduced a new approach called implicit chain-of-though that, as the name suggests, makes the steps involved in CoT reasoning implicit so that the model produces the final answer directly.

Unlike explicit CoT reasoning, where the LLM is trained to produce the intermediate steps before the final output, in implicit CoT reasoning, the model sees the intermediate steps only during the training phase and not during testing. It processes these steps in its internal states and learns to internalize the concept thoroughly, bypassing explicit reasoning.

The researchers used a ‘teacher training’ method instead of the traditional ‘teacher forcing’ method to achieve implicit CoT reasoning. Their strategy first involves training a student model to read the teacher’s hidden states and utilize some of them to produce the final answer. They then employ knowledge distillation, a process of transferring knowledge from a larger model to a smaller one. They train an emulator to predict the teacher’s hidden states based on input. Importantly, this emulation happens vertically across the model’s layers, eliminating the need for explicit reasoning steps.

The final step involves combining the emulator with the student, which produces the final output based on the emulated teacher’s thought process. The integrated system is then optimized end-to-end, enabling the student model to develop its own reasoning methods, which may differ from the teacher’s.

The researchers conducted experiments on two tasks – multi-digit multiplication and grade school math problems. The results showed that their method equipped the models to solve previously unsolvable tasks without explicit CoT. They observed that the GPT-2 Small model, which achieved 97% accuracy on 4-digit multiplication under implicit CoT, performed poorly when tested on 5-digit multiplications, which suggests that the effectiveness of the technique is dependent on having sufficient intermediate layers for the required calculations. They also observed that the implicit CoT technique has a higher inference speed, especially for tasks that require multiple intermediate steps.

A few major issues associated with this technique are the lack of transparency, heavy dependence on the teacher’s thought processes, and lagging in performance compared to explicit CoT. However, this work marks just an initial step toward building implicit CoT, and the researchers believe that many adjustments could be built on top of this work to optimize this process further and augment LLMs’ ability to reason.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.


I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.



Source link

Related posts
AI

MatMamba: A New State Space Model that Builds upon Mamba2 by Integrating a Matryoshka-Style Nested Structure

3 Mins read
Scaling state-of-the-art models for real-world deployment often requires training different model sizes to adapt to various computing environments. However, training multiple versions…
AI

LightRAG: A Dual-Level Retrieval System Integrating Graph-Based Text Indexing to Tackle Complex Queries and Achieve Superior Performance in Retrieval-Augmented Generation Systems

3 Mins read
Retrieval-augmented generation (RAG) is a method that integrates external knowledge sources into large language models (LLMs) to provide accurate and contextually relevant…
AI

GORAM: A Graph-Oriented Data Structure that Enables Efficient Ego-Centric Queries on Federated Graphs with Strong Privacy Guarantees

3 Mins read
Ego-centric searches are essential in many applications, from financial fraud detection to social network research, because they concentrate on a single vertex…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *