AI

This AI Paper from NYU and Meta Reveals ‘Machine Learning Beyond Boundaries – How Fine-Tuning with High Dropout Rates Outshines Ensemble and Weight Averaging Methods’

2 Mins read

In recent years, machine learning has significantly shifted away from the assumption that training and testing data come from the same distribution. Researchers have identified that models perform better when handling data from multiple distributions. This adaptability is often achieved through what’s known as “rich representations,” which exceed the capabilities of models trained under traditional sparsity-inducing regularization or common stochastic gradient methods.

The challenge is optimizing machine learning models to perform well across various distributions, not just the one they were trained on. Models have been fine-tuned on large, pre-trained datasets specific to a task and then tested on a set of tasks designed to benchmark different aspects of the system. However, this method has limitations, especially when dealing with data distributions that diverge from the training set.

Researchers have explored various methods to obtain versatile representations, including engineering diverse datasets, architectures, and hyperparameters. Interesting results have been achieved by adversarially reweighting the training dataset and concatenating representations from multiple networks. Fine-tuning deep residual networks is a near-linear process, with the final training phase confined to a nearly convex attraction basin.

Researchers from New York University and Facebook AI Research have introduced a novel approach to achieving out-of-distribution (OOD) performance. They investigate the use of very high dropout rates as an alternative to ensemble techniques for obtaining rich representations. Conventionally, training a deep network from scratch with high dropout rates is nearly impossible due to the complexity and depth of the network. However, fine-tuning a pre-trained model under such conditions is feasible and surpasses the performance achieved by ensembles and weight-averaging methods like model soups.

The method employs a sophisticated fine-tuning process on a deep learning network with residual connections, primarily trained on extensive datasets. This process is characterized by applying very high dropout rates to the penultimate layer during fine-tuning, effectively blocking contributions from all residual blocks without creating new representations rather than leveraging existing ones. The technique stands out by utilizing a linear training approximation, where applying dropout acts as a form of milder regularization compared to its use in non-linear systems. Remarkably, this approach achieves comparable or superior performance to traditional methods like ensembles and weight averaging, showcasing its effectiveness across various DOMAINBED tasks.

Performance results underscore the effectiveness of this method. Fine-tuning with large dropout rates improved OOD performance across several benchmarks. For instance, on the VLCS dataset, a domain adaptation benchmark that poses significant challenges for generalization, models fine-tuned with this method showed substantial gains. The results indicate a considerable leap in OOD performance, affirming the method’s potential to enhance model robustness and reliability across diverse datasets.

In conclusion, the research provides a compelling case for reevaluating fine-tuning practices in machine learning. By introducing and validating very large dropout rates, the research has opened up avenues for developing more versatile and robust models capable of navigating the complexities of diverse data distributions. This method advances our understanding of rich representations and sets a new benchmark for OOD performance, marking a significant step forward in pursuing more generalized machine-learning solutions.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 38k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

You may also like our FREE AI Courses….


Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.




Source link

Related posts
AI

Introducing multi-turn conversation with an agent node for Amazon Bedrock Flows (preview)

10 Mins read
Amazon Bedrock Flows offers an intuitive visual builder and a set of APIs to seamlessly link foundation models (FMs), Amazon Bedrock features,…
AI

Video security analysis for privileged access management using generative AI and Amazon Bedrock

12 Mins read
Security teams in highly regulated industries like financial services often employ Privileged Access Management (PAM) systems to secure, manage, and monitor the…
AI

Beyond Open Source AI: How Bagel’s Cryptographic Architecture, Bakery Platform, and ZKLoRA Drive Sustainable AI Monetization

5 Mins read
Bagel is a novel AI model architecture that transforms open-source AI development by enabling permissionless contributions and ensuring revenue attribution for contributors….

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *