Fine-grained image categorization delves into distinguishing closely related subclasses within a broader category. For example, instead of merely identifying an image as a “bird,” this approach would differentiate specific bird species. Due to the complexity of these tasks, these models frequently unintentionally rely on tiny information from image backgrounds. Background information might offer contextual cues, but it can also generate bias. For instance, a model may accidentally associate all urban backgrounds with sparrows if it frequently observes birds in urban environments during training. Eliminating this background-induced bias for more accurate results is crucial because it can limit the model’s real-world applicability.
Modern algorithms for fine-grained image classification frequently rely on convolutional neural networks (CNN) and vision transformers (ViT) as their structural basis. A fundamental issue still exists, though: the context in which an object appears can significantly impact humans and machines. Deep learning models frequently unintentionally concentrate more on backgrounds, occasionally to the point where they can categorize based only on it. When used in scenarios with unusual, unfamiliar backgrounds, these models suffer significant performance deterioration.
To counteract the challenges posed by background biases, a new study was recently published by a research team from the University of Montpellier in France, proposing to investigate two primary strategies:
- Early Masking: Where background details are removed at the very outset, at the image level.
- Late Masking: This method masks features associated with the background at a higher, more abstract stage in the model.
The key contribution of the research is its thorough investigation of background-induced bias in fine-grained image classification. It carefully analyzes how sophisticated models like CNN and ViT perform when faced with these biases and provides creative masking techniques to address them.
Concretely, early Masking involves removing the background at the image’s input stage. Before classification by models like CNNs or Vision Transformers, the image’s background regions are masked using a binary segmentation network, making the model concentrate only on the main object. In contrast, Late Masking lets the model process the whole image initially but masks the background at a more advanced stage. After the primary model backbone has processed the image, high-level spatial features related to the background are selectively excluded. Both methods aim to ensure models focus on the object of interest, reducing biases arising from background details, which is particularly crucial for fine-grained classifications where distinctions between categories can be subtle.
To evaluate the two strategies, the researchers performed an experimental study. The models were trained in experiments using the CUB dataset, which contains images of 200 bird species. On the CUB test set and the Waterbirds dataset, an out-of-distribution (OOD) set where the backgrounds of the CUB images were changed to those from the Places dataset, the performance of these models was evaluated. The researchers used several model layouts, such as ConvNeXt and ViT, as well as Small, Base, and Large model sizes. The outcomes showed that early masking-trained models often outperformed those trained without it, particularly on the OOD Waterbirds test set. This indicates that using early Masking reduces biases caused by image backgrounds and improves model generalization.
In conclusion, the authors examined the effects of background-induced bias on CNN and ViT model generalization for out-of-distribution (OOD) images. They tested various background masking techniques and found early Masking to be the most effective for both model types. The study highlights the importance of background considerations in image tasks and presents strategies to reduce biases and enhance generalization.
Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep