ongoing Emergent Misalignment how do we prevent narrow finetuning from causing broad misalignment? Scalable Oversight by Learning to Decompose Tasks how do we control superintelligent AI? Value Alignment how do we make language models follow specified human value profiles? completed Learning to Plan from Unlabeled Data how do we train architectures for planning without labeled data?