Optimizing AI Safety and Deployment: A Game-Theoretic Approach to Protocol Evaluation in Untrusted AI Systems
3 Mins read
AI Control assesses the safety of deployment protocols for untrusted AIs through red-teaming exercises involving a protocol designer and an adversary. AI…