AI Deception
Overview
Top-10 Papers
Tutorials
People
More Research
Resist (ACL2025
Best Paper
)
SafeRLHF (ICLR 2024
Spotlight
)
PKU-SafeRLHF (ACL2025 Main)
Aligner (NeurIPS 2024
Oral
)
InterMT (NeurIPS 2025 DB Track
Spotlight
)
ProgressGYM (NeurIPS 2024 DB Track
Spotlight
)
SafeSora (NeurIPS 2024 DB Track)
BeaverTails (NeurIPS 2023 DB Track)
Align-Anything🔥🔥🔥
AI Alignment: A Comprehensive Survey (ACM Computing Surveys)
People
Advisors and Contributors to the AI Deception Survey project (partial list).
Loading people data...
† Project Lead ‡ Corresponding Authors