AI

Researchers at KAUST Use Anderson Exploitation to Maximize GPU Efficiency with Greater Model Accuracy and Generalizability

3 Mins read

Escalation in AI implies an increased infrastructure expenditure. The massive and multidisciplinary research exerts economic pressure on institutions as high-performance computing (HPC)  costs an arm and a leg. HPC is financially draining and critically impacts energy consumption and the environment. By 2030, AI is projected to account for 2% of global electricity consumption. New approaches are required to maximize computational efficiency while reducing iterations to convergence. Anderson Extrapolation is a low acceleration memory technique that could be utilized to achieve the objective above. This article delves into the latest research applying it to GPUs to maximize return on computational investments.

Researchers at King Abdullah University of Science and Technology utilized matrix-free Anderson Extrapolation on GPUs. They showed its influence on training models and forward passes (i.e., running inferences on models). The said method accelerated AI performance by reusing previous iterations to avoid unnecessary gradient calculations, gaining benefits that were expected from second-order methods. Let’s define what Anderson Exploitation means to set the groundwork for the rest of this article. It is a vector-to-vector mapping technique based on a window of historical iterations. This technique is used for accelerating nonlinear fixed point iterations and is widely used in sub-disciplines of Physics, such as Kinetic Theory, Density functional theory, etc. Anderson Exploitation is suited for memory parallelization, which makes it compatible with GPUs. There are various open-source libraries available that provide this functionality, such as PETSc, SUNDIALS, etc. It improves GPU performance by reusing cached state vector data, promoting fewer and more expensive steps.

To test the efficacy of the above idea, authors utilized Deep equilibrium neural networks. DEQa are huge neural networks with a number of layers tending to infinity. Its architecture approximates many explicit layers with a single implicit layer with exponentially fewer parameters using a backward pass. This phenomenon presents the scope of nonlinear, vector-to-vector mapping techniques. Vector-to-vector mapping techniques outperform standard forward iteration by combining information from previous iterations to span a searchable subspace to extrapolate the next iteration, enhancing convergence rates at the expense of memory usage in each iteration.

Experimental results showed Anderson acceleration reaching higher accuracies in training and testing in less time than forward iteration. It exhibited fewer fluctuations in accuracy, especially in test data, in contradistinction to the forward iteration’s rapid fluctuation, which indicated overfitting time and again. Anderson thus made training more generalizable. Anderson on GPU performed much better than standard forward iterations and Anderson on CPUs.This is because the parallel processing capabilities of GPUs balance Anderson’s additional computational expense. However, a trade-off exists between accuracy and computing time. In this regard, its counter, forward iteration maintained a more consistent computational time as the number of epochs increased. In the case of Anderson, an increase in computation time with successive iterations arose from the residual minimization process during each acceleration step. Even after this trade-off, Anderson improved DEQ’s performance in a fraction of the time required for forward iterations to stabilize at comparable accuracy.

Conclusion

Anderson acceleration substantially improved the accuracy of Deep Equilibrium Models along with the model’s computational efficiency and generalizing ability. This research shows a bright future in applying vector-to-vector mapping techniques to CPU and GPU architectures. Even in the least, further acceleration could be examined by stochastically varying Anderson Exploitation.


Check out the Paper.. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs


Adeeba Alam Ansari is currently pursuing her Dual Degree at the Indian Institute of Technology (IIT) Kharagpur, earning a B.Tech in Industrial Engineering and an M.Tech in Financial Engineering. With a keen interest in machine learning and artificial intelligence, she is an avid reader and an inquisitive individual. Adeeba firmly believes in the power of technology to empower society and promote welfare through innovative solutions driven by empathy and a deep understanding of real-world challenges.



Source link

Related posts
AI

Meet LOTUS 1.0.0: An Advanced Open Source Query Engine with a DataFrame API and Semantic Operators

3 Mins read
Modern data programming involves working with large-scale datasets, both structured and unstructured, to derive actionable insights. Traditional data processing tools often struggle…
AI

This AI Paper from Microsoft and Oxford Introduce Olympus: A Universal Task Router for Computer Vision Tasks

2 Mins read
Computer vision models have made significant strides in solving individual tasks such as object detection, segmentation, and classification. Complex real-world applications such…
AI

OpenAI Researchers Propose Comprehensive Set of Practices for Enhancing Safety, Accountability, and Efficiency in Agentic AI Systems

3 Mins read
Agentic AI systems are fundamentally reshaping how tasks are automated, and goals are achieved in various domains. These systems are distinct from…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *