AI

Guiding Instruction-based Image Editing via Multimodal Large Language Models

1 Mins read

Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). MGIE learns to derive expressive instructions and provides explicit guidance. The editing model jointly captures this visual imagination and performs manipulation through end-to-end training. We evaluate various aspects of Photoshop-style modification, global photo optimization, and local editing. Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.


Source link

Related posts
AI

HPC-AI Tech Releases Open-Sora 2.0: An Open-Source SOTA-Level Video Generation Model Trained for Just $200K

3 Mins read
AI-generated videos from text descriptions or images hold immense potential for content creation, media production, and entertainment. Recent advancements in deep learning,…
AI

Patronus AI Introduces the Industry's First Multimodal LLM-as-a-Judge (MLLM-as-a-Judge): Designed to Evaluate and Optimize AI Systems that Convert Image Inputs into Text Outputs

2 Mins read
​In recent years, the integration of image generation technologies into various platforms has opened new avenues for enhancing user experiences. However, as…
AI

Allen Institute for AI (AI2) Releases OLMo 32B: A Fully Open Model to Beat GPT 3.5 and GPT-4o mini on a Suite of Multi-Skill Benchmarks

2 Mins read
The rapid evolution of artificial intelligence (AI) has ushered in a new era of large language models (LLMs) capable of understanding and…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *