projects

Our research projects in AI alignment and safety.

ongoing

project thumbnail

Emergent Misalignment

how do we prevent narrow finetuning from causing broad misalignment?

project thumbnail

Scalable Oversight by Learning to Decompose Tasks

how do we control superintelligent AI?

project thumbnail

Value Alignment

how do we make language models follow specified human value profiles?

completed

project thumbnail

Learning to Plan from Unlabeled Data

how do we train architectures for planning without labeled data?