This AI Research from Korea Introduces MagiCapture: A Personalization Method for Integrating Subject and Style Concepts to Generate High-Resolution Portrait Images

2 Mins read

People often need to attend a photo studio, followed by an expensive and time-consuming picture editing procedure, to produce high-quality portrait photographs suited for resumes or wedding celebrations. Imagine a situation where you could get high-quality portrait shots in particular styles, like passport or profile photos, using just a few selfies and reference photos. This paper automates the procedure. High-fidelity, lifelike portrait photos are now achievable because of recent developments in large-scale text-to-image models like Stable Diffusion and Imagen. The current study on customizing these models aims to combine certain subjects or aesthetics utilizing available train photos. 

They define their objective as a multi-concept customization challenge in their paper. The composite output is produced once the source material and reference style have been learned, respectively. Using reference pictures instead of text-driven editing enables users to offer fine-grained advice, making it more appropriate for this purpose. However, despite the encouraging outcomes of earlier personalization techniques, they frequently result in visuals that lack realism and are not commercially viable. This issue generally occurs while trying to update the parameters of big models with just a few photos. In a multi-concept generation, where the lack of ground truth pictures for the combined concepts commonly results in the artificial mixing of different concepts or divergence from the original concepts, this reduction in quality is even more obvious. 

Due to their intrinsic human bias, any artificial artifacts or changes in identity are readily apparent in portrait picture production, where this problem is most obvious. MagiCapture, a multi-concept customization approach for merging topic and style ideas to create high-resolution portrait photographs using just a few subject and style references, is presented by researchers from KAIST AI and Sogang University as a solution to these problems. Their approach uses composed prompt learning, which includes the composed prompt as part of the training process and strengthens the tight integration of source material and reference style. Auxiliary loss and fake labels are used to accomplish this. They also suggest the Attention Refocusing loss in combination with a disguised reconstruction goal, an essential tactic for achieving information disentanglement and avoiding information leaking during inference. MagiCapture performs better than other baselines in quantitative and qualitative evaluations, and with only a few tweaks, it may be applied to other nonhuman objects. 

Following are their paper’s key contributions: 

• They provide a multi-concept personalization technique that can produce high-resolution portrait photos that accurately replicate the traits of both the source and reference photographs. 

• They provide a brand-new Attention Refocusing loss with a masked reconstruction aim that successfully separates the needed information from the input pictures and stops information from leaking during production. 

• They provide a constructed prompt learning strategy that uses auxiliary loss and pseudo-labels to fuse source material and reference style effectively. Their method outperforms existing baseline approaches in quantitative and qualitative evaluations and, with slight modifications, may be applied to produce pictures of nonhuman things.

Check out the PaperAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

Source link

Related posts

This AI Paper Unveils an Enhanced CycleGAN Approach for Robust Person Re-identification Across Varied Camera Styles

3 Mins read
Person re-identification (ReID) aims to identify individuals across multiple non-overlapping cameras. The challenge of obtaining comprehensive datasets has driven the need for…

This AI Paper Unveils OpenBA: An Open-Sourced 15B Parameter Bilingual Model Outperforming Predecessors and Aiding Chinese-centric NLP Advancements

3 Mins read
The scaling rule of language models has produced success like never before. These huge language models have gotten novel emerging capabilities in…

This AI Paper Introduces VidChapters-7M: A Scalable Approach to Segmenting Videos into Chapters Using User-Annotated Data

2 Mins read
In the realm of video content organization, the segmentation of lengthy videos into chapters emerges as an important capability, allowing users to…



Leave a Reply

Your email address will not be published. Required fields are marked *