This AI Research from Korea Introduces MagiCapture: A Personalization Method for Integrating Subject and Style Concepts to Generate High-Resolution Portrait Images

2 Mins read

People often need to attend a photo studio, followed by an expensive and time-consuming picture editing procedure, to produce high-quality portrait photographs suited for resumes or wedding celebrations. Imagine a situation where you could get high-quality portrait shots in particular styles, like passport or profile photos, using just a few selfies and reference photos. This paper automates the procedure. High-fidelity, lifelike portrait photos are now achievable because of recent developments in large-scale text-to-image models like Stable Diffusion and Imagen. The current study on customizing these models aims to combine certain subjects or aesthetics utilizing available train photos. 

They define their objective as a multi-concept customization challenge in their paper. The composite output is produced once the source material and reference style have been learned, respectively. Using reference pictures instead of text-driven editing enables users to offer fine-grained advice, making it more appropriate for this purpose. However, despite the encouraging outcomes of earlier personalization techniques, they frequently result in visuals that lack realism and are not commercially viable. This issue generally occurs while trying to update the parameters of big models with just a few photos. In a multi-concept generation, where the lack of ground truth pictures for the combined concepts commonly results in the artificial mixing of different concepts or divergence from the original concepts, this reduction in quality is even more obvious. 

Due to their intrinsic human bias, any artificial artifacts or changes in identity are readily apparent in portrait picture production, where this problem is most obvious. MagiCapture, a multi-concept customization approach for merging topic and style ideas to create high-resolution portrait photographs using just a few subject and style references, is presented by researchers from KAIST AI and Sogang University as a solution to these problems. Their approach uses composed prompt learning, which includes the composed prompt as part of the training process and strengthens the tight integration of source material and reference style. Auxiliary loss and fake labels are used to accomplish this. They also suggest the Attention Refocusing loss in combination with a disguised reconstruction goal, an essential tactic for achieving information disentanglement and avoiding information leaking during inference. MagiCapture performs better than other baselines in quantitative and qualitative evaluations, and with only a few tweaks, it may be applied to other nonhuman objects. 

Following are their paper’s key contributions: 

• They provide a multi-concept personalization technique that can produce high-resolution portrait photos that accurately replicate the traits of both the source and reference photographs. 

• They provide a brand-new Attention Refocusing loss with a masked reconstruction aim that successfully separates the needed information from the input pictures and stops information from leaking during production. 

• They provide a constructed prompt learning strategy that uses auxiliary loss and pseudo-labels to fuse source material and reference style effectively. Their method outperforms existing baseline approaches in quantitative and qualitative evaluations and, with slight modifications, may be applied to produce pictures of nonhuman things.

Check out the PaperAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

Source link

Related posts

Google AI Described New Machine Learning Methods for Generating Differentially Private Synthetic Data

3 Mins read
Google AI researchers describe their novel approach to addressing the challenge of generating high-quality synthetic datasets that preserve user privacy, which are…

Planning Architectures for Autonomous Robotics

3 Mins read
Autonomous robotics has seen significant advancements over the years, driven by the need for robots to perform complex tasks in dynamic environments….

This AI Paper from Stanford University Evaluates the Performance of Multimodal Foundation Models Scaling from Few-Shot to Many-Shot-In-Context Learning ICL

3 Mins read
Incorporating demonstrating examples, known as in-context learning (ICL), significantly enhances large language models (LLMs) and large multimodal models (LMMs) without requiring parameter…



Leave a Reply

Your email address will not be published. Required fields are marked *