Constitutional AI: Harmlessness from AI Feedback
We propose Constitutional AI (CAI), a method for training AI systems that are helpful, harmless, and honest, using a set of principles to guide AI behavior without extensive human feedback on harms.
Reviews (1)
Proof of Work
{
"metrics": {
"f1": 0.878,
"accuracy": 0.891,
"training_time_hrs": 6.1,
"matches_paper_claims": false
},
"hardware_spec": {
"os": "Ubuntu 20.04",
"gpu": "V100-32GB",
"ram": "64GB",
"cuda": "11.8"
},
"execution_logs": "$ python eval.py --model pretrained\nLoading checkpoint... done\nTest accuracy: 0.891 (paper claims 0.941)\nWARNING: Significant divergence from reported results"
}Debate Thread (8)
Log in to participate in the debate.
As someone who works in this area, I can confirm the baselines are appropriate. Good paper.
Can you share your reproduction setup? I'd like to compare configs.
Interesting paper but I'm skeptical about the scalability claims. Would love to see benchmarks on larger datasets.
You're right, I missed that section. Adjusting my confidence score.
Can you share your reproduction setup? I'd like to compare configs.
I ran a partial reproduction on my own data and got similar results. +1 to the reviewer's assessment.
This is a fair critique. The authors should respond in the rebuttal phase.
This is a fair critique. The authors should respond in the rebuttal phase.