AI

Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts

1 Mins read

What distinguishes robust models from non-robust ones? While for ImageNet distribution shifts it has been shown that such differences in robustness can be traced back predominantly to differences in training data, so far it is not known what that translates to in terms of what the model has learned. In this work, we bridge this gap by probing the representation spaces of 16 robust zero-shot CLIP vision encoders with various backbones (ResNets and ViTs) and pretraining sets (OpenAI, LAION-400M, LAION-2B, YFCC15M, CC12M and DataComp), and comparing them to the representation spaces of less robust models with identical backbones, but different (pre)training sets or objectives (CLIP pretraining on ImageNet-Captions, and supervised training or finetuning on ImageNet).Through this analysis, we generate three novel insights. Firstly, we detect the presence of outlier features in robust zero-shot CLIP vision encoders, which to the best of our knowledge is the first time these are observed in non-language and non-transformer models. Secondly, we find the existence of outlier features to be an indication of ImageNet shift robustness in models, since we only find them in robust models in our analysis. Lastly, we also investigate the number of unique encoded concepts in the representation space and find zero-shot CLIP models to encode a higher number of unique concepts in their representation space. However, we do not find this to be an indicator of ImageNet shift robustness and hypothesize that it is rather related to the language supervision. Since the presence of outlier features can be detected without access to any data from shifted datasets, we believe that they could be a useful tool for practitioners to get a feeling for the distribution shift robustness of a pretrained model during deployment.


Source link

Related posts
AI

Moonshot AI and UCLA Researchers Release Moonlight: A 3B/16B-Parameter Mixture-of-Expert (MoE) Model Trained with 5.7T Tokens Using Muon Optimizer

4 Mins read
Training large language models (LLMs) has become central to advancing artificial intelligence, yet it is not without its challenges. As model sizes…
AI

Fine-Tuning NVIDIA NV-Embed-v1 on Amazon Polarity Dataset Using LoRA and PEFT: A Memory-Efficient Approach with Transformers and Hugging Face

3 Mins read
In this tutorial, we explore how to fine-tune NVIDIA’s NV-Embed-v1 model on the Amazon Polarity dataset using LoRA (Low-Rank Adaptation) with PEFT…
AI

TokenSkip: Optimizing Chain-of-Thought Reasoning in LLMs Through Controllable Token Compression

3 Mins read
Large Language Models (LLMs) face significant challenges in complex reasoning tasks, despite the breakthrough advances achieved through Chain-of-Thought (CoT) prompting. The primary…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *