AI

Meet LongLLaMA: A Large Language Model Capable of Handling Long Contexts of 256k Tokens

2 Mins read

Researchers have made significant advancements in various fields using language models. However, effectively incorporating extensive new knowledge into these models remains a challenge. Fine-tuning, the common practice, is resource-intensive and complex to manage, and it only sometimes provides a straightforward method for incorporating new knowledge. Researchers propose a promising alternative called Focused Transformer (FOT) to address this.

The FOT technique aims to overcome the challenge of limited context length in language models. As the number of documents increases, the ratio of relevant to irrelevant tokens diminishes, leading to overlaps between keys related to irrelevant and relevant values. This issue is referred to as the distraction issue. The FOT allows a subset of attention layers to access an external memory of (key, value) pairs using the k-nearest neighbors (kNN) algorithm. This mechanism effectively extends the context length and helps address the distraction issue.

The training procedure of the Focused Transformer draws from contrastive learning. During training, the memory attention layers are exposed to both relevant and irrelevant keys, resembling negative samples from unrelated documents. This approach encourages the model to differentiate between keys connected to semantically diverse values, enhancing their structure.

The researchers introduce LONGLLAMAs, which are fine-tuned OpenLLaMA models with FOT. This method demonstrates that it does not require long context during training and can be applied to existing models. LONGLLAMAs significantly improve tasks requiring long-context modeling, such as passkey retrieval.

The research contributions include identifying the distraction issue as a significant challenge to scaling up context length in Transformer models, developing the Focused Transformer (FOT) to address this issue, and providing a simple implementation method that allows existing models to be augmented with memory without modifying their architecture. The resulting models, LONGLLAMAs, exhibit enhancements in tasks that benefit from increasing the number of few-shot demonstrations in the extended context. The FOT’s capabilities are further analyzed across various datasets and model sizes, demonstrating improvements in perplexity over baselines in long-context language modeling tasks.

In summary, the Focused Transformer (FOT) technique addresses the distraction issue and allows context length extension in language models. Training the model to differentiate between relevant and irrelevant keys enhances the structure and significantly improves tasks requiring long-context modeling. The FOT method can be applied to existing models without architectural modifications, making it a cost-effective solution for augmenting models with memory.


Check out the Paper and GitHub link. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 100’s AI Tools in AI Tools Club


Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.



Source link

Related posts
AI

Trellix lowers cost, increases speed, and adds delivery flexibility with cost-effective and performant Amazon Nova Micro and Amazon Nova Lite models

4 Mins read
This post is co-written with Martin Holste from Trellix.  Security teams are dealing with an evolving universe of cybersecurity threats. These threats…
AI

OfferUp improved local results by 54% and relevance recall by 27% with multimodal search on Amazon Bedrock and Amazon OpenSearch Service

9 Mins read
This post is co-written with Andrés Vélez Echeveri and Sean Azlin from OfferUp. OfferUp is an online, mobile-first marketplace designed to facilitate…
AI

Meet Satori: A New AI Framework for Advancing LLM Reasoning through Deep Thinking without a Strong Teacher Model

2 Mins read
Large Language Models (LLMs) have demonstrated notable reasoning capabilities in mathematical problem-solving, logical inference, and programming. However, their effectiveness is often contingent…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *