AI

Adaptive Training Distributions with Scalable Online Bilevel Optimization

1 Mins read

Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this paradigm, the distribution of the large, heterogeneous pretraining data rarely matches that of the application domain. This work considers modifying the pretraining distribution in the case where one has a small sample of data reflecting the targeted test conditions. We propose an algorithm motivated by a recent formulation of this setting as an online, bilevel optimization problem. With scalability in mind, our algorithm prioritizes computing gradients at training points which are likely to most improve the loss on the targeted distribution. Empirically, we show that in some cases this approach is beneficial over existing strategies from the domain adaptation literature but may not succeed in other cases. We propose a simple test to evaluate when our approach can be expected to work well and point towards further research to address current limitations.


Source link

Related posts
AI

How GoDaddy built a category generation system at scale with batch inference for Amazon Bedrock

18 Mins read
This post was co-written with Vishal Singh, Data Engineering Leader at Data & Analytics team of GoDaddy Generative AI solutions have the…
AI

Optimize hosting DeepSeek-R1 distilled models with Hugging Face TGI on Amazon SageMaker AI

11 Mins read
DeepSeek-R1, developed by AI startup DeepSeek AI, is an advanced large language model (LLM) distinguished by its innovative, multi-stage training process. Instead…
AI

Revolutionizing customer service: MaestroQA’s integration with Amazon Bedrock for actionable insight

7 Mins read
This post is cowritten with Harrison Hunter is the CTO and co-founder of MaestroQA. MaestroQA augments call center operations by empowering the…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *