AI

Text-to-image AI models can be tricked into generating disturbing images

1 Mins read

Their work, which they will present at the IEEE Symposium on Security and Privacy in May next year, shines a light on how easy it is to force generative AI models into disregarding their own guardrails and policies, known as “jailbreaking.” It also demonstrates how difficult it is to prevent these models from generating such content, as it’s included in the vast troves of data they’ve been trained on, says Zico Kolter, an associate professor at Carnegie Mellon University. He demonstrated a similar form of jailbreaking on ChatGPT earlier this year but was not involved in this research.

“We have to take into account the potential risks in releasing software and tools that have known security flaws into larger software systems,” he says.

All major generative AI models have safety filters to prevent users from prompting them to produce pornographic, violent, or otherwise inappropriate images. The models won’t generate images from prompts that contain sensitive terms like “naked,” “murder,” or “sexy.”

But this new jailbreaking method, dubbed “SneakyPrompt” by its creators from Johns Hopkins University and Duke University, uses reinforcement learning to create written prompts that look like garbled nonsense to us but that AI models learn to recognize as hidden requests for disturbing images. It essentially works by turning the way text-to-image AI models function against them.

These models convert text-based requests into tokens—breaking words up into strings of words or characters—to process the command the prompt has given them. SneakyPrompt repeatedly tweaks a prompt’s tokens to try to force it to generate banned images, adjusting its approach until it is successful. This technique makes it quicker and easier to generate such images than if somebody had to input each entry manually, and it can generate entries that humans wouldn’t imagine trying.


Source link

Related posts
AI

Tsinghua University Researchers Just Open-Sourced CogAgent-9B-20241220: The Latest Version of CogAgent

3 Mins read
Graphical User Interfaces (GUIs) are central to how users engage with software. However, building intelligent agents capable of effectively navigating GUIs has…
AI

This Machine Learning Research from Amazon Introduces a New Open-Source High-Fidelity Dataset for Automotive Aerodynamics

3 Mins read
One of the most critical challenges in computational fluid dynamics (CFD) and machine learning (ML) is that high-resolution, 3D datasets specifically designed…
AI

Meet ONI: A Distributed Architecture for Simultaneous Reinforcement Learning Policy and Intrinsic Reward Learning with LLM Feedback

3 Mins read
Reward functions play a crucial role in reinforcement learning (RL) systems, but their design presents significant challenges in balancing task definition simplicity…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *