AI

Create your fashion assistant application using Amazon Titan models and Amazon Bedrock Agents

6 Mins read

In the generative AI era, agents that simulate human actions and behaviors are emerging as a powerful tool for enterprises to create production-ready applications. Agents can interact with users, perform tasks, and exhibit decision-making abilities, mimicking humanlike intelligence. By combining agents with foundation models (FMs) from the Amazon Titan in Amazon Bedrock family, customers can develop multimodal, complex applications that enable the agent to understand and generate natural language or images.

For example, in the fashion retail industry, an assistant powered by agents and multimodal models can provide customers with a personalized and immersive experience. The assistant can engage in natural language conversations, understanding the customer’s preferences and intents. It can then use the multimodal capabilities to analyze images of clothing items and make recommendations based on the customer’s input. Additionally, the agent can generate visual aids, such as outfit suggestions, enhancing the overall customer experience.

In this post, we implement a fashion assistant agent using Amazon Bedrock Agents and the Amazon Titan family models. The fashion assistant provides a personalized, multimodal conversational experience. Among others, the capabilities of Amazon Titan Image Generator to inpaint and outpaint images can be used to generate fashion inspirations and edit user photos. Amazon Titan Multimodal Embeddings models can be used to search for a style on a database using both a prompt text or a reference image provided by the user to find similar styles. Anthropic Claude 3 Sonnet is used by the agent to orchestrate the agent’s actions, for example, search for the current weather to receive weather-appropriate outfit recommendations. A simple web UI through Streamlit provides the user with the best experience to interact with the agent.

The fashion assistant agent can be smoothly integrated into existing ecommerce platforms or mobile applications, providing customers with a seamless and delightful experience. Customers can upload their own images, describe their desired style, or even provide a reference image, and the agent will generate personalized recommendations and visual inspirations.

The code used in this solution is available in the GitHub repository.

Solution overview

The fashion assistant agent uses the power of Amazon Titan models and Amazon Bedrock Agents to provide users with a comprehensive set of style-related functionalities:

  • Image-to-image or text-to-image search – This tool allows customers to find products similar to styles they like from the catalog, enhancing their user experience. We use the Titan Multimodal Embeddings model to embed each product image and store them in Amazon OpenSearch Serverless for future retrieval.
  • Text-to-image generation – If the desired style is not available in the database, this tool generates unique, customized images based on the user’s query, enabling the creation of personalized styles.
  • Weather API connection – By fetching weather information for a given location mentioned in the user’s prompt, the agent can suggest appropriate styles for the occasion, making sure the customer is dressed for the weather.
  • Outpainting – Users can upload an image and request to change the background, allowing them to visualize their preferred styles in different settings.
  • Inpainting – This tool enables users to modify specific clothing items in an uploaded image, such as changing the design or color, while keeping the background intact.

The following flow chart illustrates the decision-making process:

Agent Execution Flowchart

And the corresponding architecture diagram:

Prerequisites

To set up the fashion assistant agent, make sure you have the following:

  • An active AWS account and AWS Identity and Access Management (IAM) role with Amazon Bedrock, AWS Lambda, and Amazon Simple Storage (Amazon S3) access
  • Installation of required Python libraries such as Streamlit
  • Anthropic Claude 3 Sonnet, Amazon Titan Image Generator and Amazon Titan Multimodal Embeddings models enabled in Amazon Bedrock. You can confirm these are enabled on the Model access page of the Amazon Bedrock console. If these models are enabled, the access status will show as Access granted, as shown in the following screenshot.

Before executing the notebook provided in the GitHub repo to start building the infrastructure, make sure your AWS account has permission to:

  • Create managed IAM roles and policies
  • Create and invoke Lambda functions
  • Create, read from, and write to S3 buckets
  • Access and manage Amazon Bedrock agents and models

If you want to enable the image-to-image or text-to-image search capabilities, additional permissions for your AWS account are required:

  • Create security policy, access policy, collect, index, and index mapping on OpenSearch Serverless
  • Call the BatchGetCollection on OpenSearch Serverless

Set up the fashion assistant agent

To set up the fashion assistant agent, follow these steps:

  1. Clone the GitHub repository using the command
  2. Complete the prerequisites to grant sufficient permissions
  3. Follow the deployment steps outlined in the README.md
  4. (Optional) If you want to use the image_lookup feature, execute code snippets in opensearch_ingest.ipynb to use Amazon Titan Multimodal Embeddings to embed and store sample images
  5. Run the Streamlit UI to interact with the agent using the command
    streamlit run frontend/app.py

By following these steps, you can create a powerful and engaging fashion assistant agent that combines the capabilities of Amazon Titan models with the automation and decision-making capabilities of Amazon Bedrock Agents.

Test the fashion assistant

After the fashion assistant is set up, you can interact with it through the Streamlit UI. Follow these steps:

  1. Navigate to your Streamlit UI, as shown in the following screenshot

  1. Upload an image or enter a text prompt describing the desired style, according to the desired action, for example, image search, image generation, outpainting, or inpainting. The following screenshot shows an example prompt.

Streamlit UI Example Two

  1. Press enter to send the prompt to the agent. You can view the chain-of-thought (CoT) process of the agent in the UI, as shown in the following screenshot

Streamlit UI Example Three

  1. When the response is ready, you can view the agent’s response in the UI, as shown in the following screenshot. The response may include generated images, similar style recommendations, or modified images based on your request. You can download the generated images directly from the UI or check the image in your S3 bucket.

Streamlit UI Example Four

Clean up

To avoid unnecessary costs, make sure to delete the resources used in this solution. You can do this by running the following command.

Conclusion

The fashion assistant agent, powered by Amazon Titan models and Amazon Bedrock Agents, is an example of how retailers can create innovative applications that enhance the customer experience and drive business growth. By using this solution, retailers can gain a competitive edge, offering personalized style recommendations, visual inspirations, and interactive fashion advice to their customers.

We encourage you to explore the potential of building more agents like this fashion assistant by checking out the examples available on the aws-samples GitHub repository.


 About the Authors

Akarsha Sehwag is a Data Scientist and ML Engineer in AWS Professional Services with over 5 years of experience building ML based solutions. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently. With the advent of Generative AI, she worked with numerous customers to identify good use-cases, and building it into production-ready solutions.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.

antoniaAntonia Wiebeler is a Data Scientist at the AWS Generative AI Innovation Center, where she enjoys building proofs of concept for customers. Her passion is exploring how generative AI can solve real-world problems and create value for customers. While she is not coding, she enjoys running and competing in triathlons.

Alex Newton is a Data Scientist at the AWS Generative AI Innovation Center, helping customers solve complex problems with generative AI and machine learning. He enjoys applying state of the art ML solutions to solve real world challenges. In his free time you’ll find Alex playing in a band or watching live music.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.


Source link

Related posts
AI

Beyond the Mask: A Comprehensive Study of Discrete Diffusion Models

4 Mins read
Masked diffusion has emerged as a promising alternative to autoregressive models for the generative modeling of discrete data. Despite its potential, existing…
AI

This AI Paper Introduces SRDF: A Self-Refining Data Flywheel for High-Quality Vision-and-Language Navigation Datasets

3 Mins read
Vision-and-Language Navigation (VLN) combines visual perception with natural language understanding to guide agents through 3D environments. The goal is to enable agents…
AI

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal AI System for Long-Term Streaming Video and Audio Interactions

4 Mins read
AI systems are progressing toward emulating human cognition by enabling real-time interactions with dynamic environments. Researchers working in AI aim to develop…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *