震惊!Claude伪对齐率竟能高达78%,Anthropic 137页长论文自揭短
- 2024-12-19 13:39:00
- 刘大牛 转自文章
- 230
这下,大模型不能太过信任有「实锤」了。
论文标题:Alignment Faking in Large Language Models
论文地址:https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
视频讲解地址:https://www.youtube.com/watch?v=9eXV64O2Xp8
发表评论
文章分类
联系我们
联系人: | 透明七彩巨人 |
---|---|
Email: | weok168@gmail.com |