mAI alignment lab
Junior Research Group at University of Bonn focusing on AI alignment and safety issues
About
Welcome to the mAI alignment lab, a Junior Research Group at the University of Bonn led by Dr. Florian Mai.
Our research focuses on AI alignment and safety issues, exploring how to ensure that current and future advanced AI systems are acting reliably in accordance with human values.
Research Interests
- Scalable Oversight
- Value alignment
- Emergent misalignment
- Reasoning Models
- LLM training
Current Projects
-
Scalable Oversight by Learning to Decompose Tasks: Exploring how AI systems can learn to break complex tasks into manageable subtasks for reliable human oversight, advancing the frontier of superalignment research.
-
Emergent Misalignment: Investigating how narrow finetuning can produce broadly misaligned language models and developing methods to prevent such misalignment.
-
Value Alignment: Researching methods to ensure AI systems align with human values and preferences.
Join Us
We currently have no open positions available. However, if you are interested in collaborating with our research group, please feel free to send an email to Dr. Florian Mai at fmai@uni-bonn.de.
news
| Jan 8, 2026 | Our paper “AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?” will be presented at IASEAI’26. Read the paper on arXiv. |
|---|---|
| Dec 14, 2025 | Our workshop paper “Pluralistic AI Alignment: A Cross-Cultural Pilot Survey” will be presented at the Second Workshop on Language Models for Underserved Communities (LM4UC). |
| Oct 27, 2025 | The AI alignment lab at Uni Bonn has started! Learn more on the course page. |
| Oct 13, 2025 | New preprint! Leonard Dung and Florian Mai analyze AI alignment strategies from a risk perspective and compare overlaps in failure modes across alignment techniques. Read the preprint on arXiv. |
| Aug 21, 2025 | Our JQL paper has been accepted at EMNLP 2025! Read the preprint on arXiv. |
| Aug 15, 2025 | New preprint! Survey-to-Behavior aligns language models with human values using survey questions. Check it out on arXiv. |
| Aug 8, 2025 | New preprint! We explore in-training defenses against emergent misalignment in language models. Check it out on arXiv. |
| May 31, 2025 | Our conference on AI risk has successfully concluded! The event featured insightful discussions, brilliant keynote speakers including Yoshua Bengio and Iason Gabriel, and engaging talks on critical AI safety topics. The conference received coverage from Belgian media, including De Standaard and De Tijd. Hope to see you next year! |
| May 28, 2025 | New preprint! We introduce JQL, a systematic approach for multilingual data curation that outperforms existing filtering methods. Check it out on arXiv. |
| Mar 30, 2025 | We’re giving a seminar course about the ethics of Artificial General Intelligence in the summer semester! The course covers AGI basics, alignment and value specification, control and autonomy, systemic risks, and global governance. Learn more and register here. |
| Mar 23, 2025 | Registrations are now open for the International Conference on Large-Scale AI Risks from 26-28th May 2025 in Leuven, Belgium. Dr. Florian Mai helped organize this event and we look forward to seeing you there! |
| Mar 20, 2025 | Dr. Florian Mai is participating in a panel discussion on trustworthy AI at the Deutsches Museum Bonn. |
| Mar 6, 2025 | Great news! Dr. Florian Mai and collaborators’ paper “Superalignment with Dynamic Human Values” was accepted at the BiAlign Workshop at ICLR 2025! |
| Feb 19, 2025 | The mAI alignment lab started an AI safety reading group at University of Bonn, discussing recent papers on alignment and more! Interested in joining? Subscribe to our mailing list! |
| Jan 1, 2025 | The mAI alignment lab is founded! Dr. Florian Mai started as a Junior Research Group Leader at University of Bonn as part of the CAISA lab headed by Prof. Lucie Flek. The lab’s research will focus on AI safety topics like value alignment, and on reasoning and planning approaches for LLMs. |
safety reading group
| May 21, 2025 | Evaluating the Paperclip Maximizer and Instrumental Goals |
|---|---|
| May 7, 2025 | Gradual Disempowerment and Systemic AI Risks |
| Apr 9, 2025 | Dynamic Normativity and Value Alignment |