AI

Google Researchers Propose MEMORY-VQ: A New AI Approach to Reduce Storage Requirements of Memory-Augmented Models without Sacrificing Performance

1 Mins read

Recent research in language models has emphasized the importance of retrieval augmentation for enhancing factual knowledge. Retrieval augmentation involves providing these models with relevant text passages to improve their performances, but it comes at a higher computational cost. A new approach, depicted by LUMEN and LUMEN-VQ, aims to speed up the retrieval augmentation by pre-encoding passages from the corpus. This approach helps in reducing the computational burden while maintaining quality. However, pre-encoding requires substantial storage, which has been a challenge. 

LUMEN-VQ, a combination of product quantization and VQ-VAE methods, addresses this storage problem by achieving a 16x compression rate. It implies that memory representations for vast corpora can be stored efficiently. This advancement marks a significant step towards practical large-scale retrieval augmentation, benefiting language understanding and information retrieval tasks.

Google researchers introduce MEMORY-VQ as a method for reducing storage requirements. It does this by compressing memories using vector quantization and replacing original memory vectors with integer codes that can be decompressed on the fly. The storage requirements for each quantized vector depend on the number of subspaces and the number of bits required to represent each code, determined by the logarithmic of the number of codes. This approach is applied to the LUMEN model, resulting in LUMEN-VQ. It employs product quantization and VQ-VAE for compression and decompression, with careful codebook initialization and memory division. 

In conclusion, MEMORY-VQ is a pioneering method that effectively reduces storage demands in memory-augmented language models while maintaining high performance. It makes memory augmentation a practical solution for achieving substantial inference speed boosts, particularly when dealing with extensive retrieval corpora.


Check out the PaperAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..


Astha Kumari is a consulting intern at MarktechPost. She is currently pursuing Dual degree course in the department of chemical engineering from Indian Institute of Technology(IIT), Kharagpur. She is a machine learning and artificial intelligence enthusiast. She is keen in exploring their real life applications in various fields.



Source link

Related posts
AI

Meet FineFineWeb: An Open-Sourced Automatic Classification System for Fine-Grained Web Data

2 Mins read
Multimodal Art Projection (M-A-P) researchers have introduced FineFineWeb, a large open-source automatic classification system for fine-grained web data. The project decomposes the…
AI

This AI Paper from aiXplain Introduces Bel Esprit: A Multi-Agent Framework for Building Accurate and Adaptive AI Model Pipelines

3 Mins read
Artificial intelligence has progressed from handling atomic tasks to addressing intricate, real-world problems requiring the integration of multiple specialized models. This approach,…
AI

Top 6 Data Governance Case Studies with Real-life Examples

7 Mins read
Data governance is an effective strategy for developing internal data standards and policies that govern who has access to data, and how…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *