Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
LLMs come with an entirely new class of security issues. Backdoors are already hard to find in traditional software. They are totally undetectable in LLMs.
Posted on