AI

Unconstrained Channel Pruning – Apple Machine Learning Research

1 Mins read

Modern neural networks are growing not only in size and complexity but also in inference time. One of the most effective compression techniques — channel pruning — combats this trend by removing channels from convolutional weights to reduce resource consumption. However, removing channels is non-trivial for multi-branch segments of a model, which can introduce extra memory copies at inference time. These copies incur increase latency — so much so, that the pruned model is even slower than the original, unpruned model. As a workaround, existing pruning works constrain certain channels to be pruned together. This fully eliminates inference-time memory copies, but as we show, these constraints significantly impair accuracy. To solve both challenges, our insight is to enable unconstrained pruning by reordering channels to minimize memory copies. Using this insight, we design a generic algorithm UCPE to prune models with any pruning pattern. Critically, by removing constraints from existing pruning heuristics, we improve ImageNet top-1 accuracy for post-training pruning by 2.1 points on average — benefiting pruned DenseNet (+16.9), EfficientNetV2 (+7.9), and ResNet (+6.2). Furthermore, our UCPE algorithm reduces latency by up to 52.8% when compared with naive unconstrained pruning, nearly fully eliminating memory copies at inference time.


Source link

Related posts
AI

Google AI Releases Gemma 3: Lightweight Multimodal Open Models for Efficient and On‑Device AI

3 Mins read
In the field of artificial intelligence, two persistent challenges remain. Many advanced language models require significant computational resources, which limits their use…
AI

Google’s new open model based on Gemini 2.0

3 Mins read
For a deeper dive into the technical details behind these capabilities, as well as a comprehensive overview of our approach to responsible…
AI

A Step by Step Guide to Build an Interactive Health Data Monitoring Tool Using Hugging Face Transformers and Open Source Model Bio_ClinicalBERT

4 Mins read
In this tutorial, we will learn how to build an interactive health data monitoring tool using Hugging Face’s transformer models, Google Colab,…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *