AI

Text-to-image AI models can be tricked into generating disturbing images

1 Mins read

Their work, which they will present at the IEEE Symposium on Security and Privacy in May next year, shines a light on how easy it is to force generative AI models into disregarding their own guardrails and policies, known as “jailbreaking.” It also demonstrates how difficult it is to prevent these models from generating such content, as it’s included in the vast troves of data they’ve been trained on, says Zico Kolter, an associate professor at Carnegie Mellon University. He demonstrated a similar form of jailbreaking on ChatGPT earlier this year but was not involved in this research.

“We have to take into account the potential risks in releasing software and tools that have known security flaws into larger software systems,” he says.

All major generative AI models have safety filters to prevent users from prompting them to produce pornographic, violent, or otherwise inappropriate images. The models won’t generate images from prompts that contain sensitive terms like “naked,” “murder,” or “sexy.”

But this new jailbreaking method, dubbed “SneakyPrompt” by its creators from Johns Hopkins University and Duke University, uses reinforcement learning to create written prompts that look like garbled nonsense to us but that AI models learn to recognize as hidden requests for disturbing images. It essentially works by turning the way text-to-image AI models function against them.

These models convert text-based requests into tokens—breaking words up into strings of words or characters—to process the command the prompt has given them. SneakyPrompt repeatedly tweaks a prompt’s tokens to try to force it to generate banned images, adjusting its approach until it is successful. This technique makes it quicker and easier to generate such images than if somebody had to input each entry manually, and it can generate entries that humans wouldn’t imagine trying.


Source link

Related posts
AI

Google DeepMind Just Released PaliGemma 2: A New Family of Open-Weight Vision Language Models (3B, 10B and 28B)

3 Mins read
Vision-language models (VLMs) have come a long way, but they still face significant challenges when it comes to effectively generalizing across different…
AI

Advancing AI trust with new responsible AI tools, capabilities, and resources

7 Mins read
As generative AI continues to drive innovation across industries and our daily lives, the need for responsible AI has become increasingly important….
AI

Study: Browsing negative content online makes mental health struggles worse | MIT News

3 Mins read
People struggling with their mental health are more likely to browse negative content online, and in turn, that negative content makes their…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *