AI

MOFI: Learning Image Representation from Noisy Entity Annotated Images

1 Mins read

In this paper, we introduce a novel approach to automatically assign entity labels to images from existing noisy image-text pairs. The approach employees a named entity recognition model to extract entities from text, and uses a CLIP model to select the right entities as the labels of the paired image. The approach is simple, and can be readily scaled up to billions of image-text pairs mined from the web, through which we have successfully created a dataset with 2 millions of distinct entities. We study new training approaches on the collected new dataset with large scale entity labels, including supervised pre-training, contrastive pre-training, and mulit-task learning. Experiments show that supervised pre-training with large scale entity labels is very effective for image retrieval tasks, and multi-task training can further improve the performance. The final model, named \textbf{MOFI}, achieves 83.59% mAP on the challenging GPR1200 dataset, compared to the previous state-of-the-art 67.33% from OpenAI’s CLIP model. Further experiments on zero-shot and linear probe image classification tasks also show that our MOFI model outperforms a CLIP model trained on the original image-text data, demonstrating the effectiveness of the new dataset for learning general-purpose image representations.


Source link

Related posts
AI

Amazon Q Apps supports customization and governance of generative AI-powered apps

4 Mins read
We are excited to announce new features that allow creation of more powerful apps, while giving more governance control using Amazon Q…
AI

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 1: ModelTrainer

7 Mins read
Amazon SageMaker has redesigned its Python SDK to provide a unified object-oriented interface that makes it straightforward to interact with SageMaker services….
AI

This AI Paper from CMU, KAIST and University of Washington Introduces AGORA BENCH: A Benchmark for Systematic Evaluation of Language Models as Synthetic Data Generators

3 Mins read
Language models (LMs) are advancing as tools for solving problems and as creators of synthetic data, playing a crucial role in enhancing…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *