这下,大模型不能太过信任有「实锤」了。
论文标题:Alignment Faking in Large Language Models
论文地址:https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
视频讲解地址:https://www.youtube.com/watch?v=9eXV64O2Xp8
这下,大模型不能太过信任有「实锤」了。
论文标题:Alignment Faking in Large Language Models
论文地址:https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
视频讲解地址:https://www.youtube.com/watch?v=9eXV64O2Xp8