Senior Audrey Lorvo is researching AI safety, which seeks to ensure increasingly intelligent AI models are reliable and can benefit humanity. The growing…
When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden…