Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

LLMs come with an entirely new class of security issues. Backdoors are already hard to find in traditional software. They are totally undetectable in LLMs.

Posted on 13 Jan 2024