AI

This AI Paper from NYU and Meta Reveals ‘Machine Learning Beyond Boundaries – How Fine-Tuning with High Dropout Rates Outshines Ensemble and Weight Averaging Methods’

2 Mins read

In recent years, machine learning has significantly shifted away from the assumption that training and testing data come from the same distribution. Researchers have identified that models perform better when handling data from multiple distributions. This adaptability is often achieved through what’s known as “rich representations,” which exceed the capabilities of models trained under traditional sparsity-inducing regularization or common stochastic gradient methods.

The challenge is optimizing machine learning models to perform well across various distributions, not just the one they were trained on. Models have been fine-tuned on large, pre-trained datasets specific to a task and then tested on a set of tasks designed to benchmark different aspects of the system. However, this method has limitations, especially when dealing with data distributions that diverge from the training set.

Researchers have explored various methods to obtain versatile representations, including engineering diverse datasets, architectures, and hyperparameters. Interesting results have been achieved by adversarially reweighting the training dataset and concatenating representations from multiple networks. Fine-tuning deep residual networks is a near-linear process, with the final training phase confined to a nearly convex attraction basin.

Researchers from New York University and Facebook AI Research have introduced a novel approach to achieving out-of-distribution (OOD) performance. They investigate the use of very high dropout rates as an alternative to ensemble techniques for obtaining rich representations. Conventionally, training a deep network from scratch with high dropout rates is nearly impossible due to the complexity and depth of the network. However, fine-tuning a pre-trained model under such conditions is feasible and surpasses the performance achieved by ensembles and weight-averaging methods like model soups.

The method employs a sophisticated fine-tuning process on a deep learning network with residual connections, primarily trained on extensive datasets. This process is characterized by applying very high dropout rates to the penultimate layer during fine-tuning, effectively blocking contributions from all residual blocks without creating new representations rather than leveraging existing ones. The technique stands out by utilizing a linear training approximation, where applying dropout acts as a form of milder regularization compared to its use in non-linear systems. Remarkably, this approach achieves comparable or superior performance to traditional methods like ensembles and weight averaging, showcasing its effectiveness across various DOMAINBED tasks.

Performance results underscore the effectiveness of this method. Fine-tuning with large dropout rates improved OOD performance across several benchmarks. For instance, on the VLCS dataset, a domain adaptation benchmark that poses significant challenges for generalization, models fine-tuned with this method showed substantial gains. The results indicate a considerable leap in OOD performance, affirming the method’s potential to enhance model robustness and reliability across diverse datasets.

In conclusion, the research provides a compelling case for reevaluating fine-tuning practices in machine learning. By introducing and validating very large dropout rates, the research has opened up avenues for developing more versatile and robust models capable of navigating the complexities of diverse data distributions. This method advances our understanding of rich representations and sets a new benchmark for OOD performance, marking a significant step forward in pursuing more generalized machine-learning solutions.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….


Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.




Source link

Related posts
AI

PRISE: A Unique Machine Learning Method for Learning Multitask Temporal Action Abstractions Using Natural Language Processing (NLP)

2 Mins read
In the domain of sequential decision-making, especially in robotics, agents often deal with continuous action spaces and high-dimensional observations. These difficulties result…
AI

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference

3 Mins read
Large Language Models (LLMs) face deployment challenges due to latency issues caused by memory bandwidth constraints. Researchers use weight-only quantization to address…
AI

Self-Route: A Simple Yet Effective AI Method that Routes Queries to RAG or Long Context LC based on Model Self-Reflection

3 Mins read
Large Language Models (LLMs) have revolutionized the field of natural language processing, allowing machines to understand and generate human language. These models,…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *