mAI alignment lab

Junior Research Group at University of Bonn focusing on AI alignment and safety issues

About

Welcome to the mAI alignment lab, a Junior Research Group at the University of Bonn led by Dr. Florian Mai.

Our research focuses on AI alignment and safety issues, exploring how to ensure that current and future advanced AI systems are acting reliably in accordance with human values.

Research Interests

  • Scalable Oversight
  • Value alignment
  • Emergent misalignment
  • Reasoning Models
  • LLM training

Current Projects

  • Scalable Oversight by Learning to Decompose Tasks: Exploring how AI systems can learn to break complex tasks into manageable subtasks for reliable human oversight, advancing the frontier of superalignment research.

  • Emergent Misalignment: Investigating how narrow finetuning can produce broadly misaligned language models and developing methods to prevent such misalignment.

  • Value Alignment: Researching methods to ensure AI systems align with human values and preferences.

Join Us

We currently have no open positions available. However, if you are interested in collaborating with our research group, please feel free to send an email to Dr. Florian Mai at fmai@uni-bonn.de.

news

Jan 8, 2026 Our paper “AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?” will be presented at IASEAI’26. Read the paper on arXiv.
Dec 14, 2025 Our workshop paper “Pluralistic AI Alignment: A Cross-Cultural Pilot Survey” will be presented at the Second Workshop on Language Models for Underserved Communities (LM4UC).
Oct 27, 2025 The AI alignment lab at Uni Bonn has started! Learn more on the course page.
Oct 13, 2025 New preprint! Leonard Dung and Florian Mai analyze AI alignment strategies from a risk perspective and compare overlaps in failure modes across alignment techniques. Read the preprint on arXiv.
Aug 21, 2025 Our JQL paper has been accepted at EMNLP 2025! Read the preprint on arXiv.
Aug 15, 2025 New preprint! Survey-to-Behavior aligns language models with human values using survey questions. Check it out on arXiv.
Aug 8, 2025 New preprint! We explore in-training defenses against emergent misalignment in language models. Check it out on arXiv.
May 31, 2025 Our conference on AI risk has successfully concluded! The event featured insightful discussions, brilliant keynote speakers including Yoshua Bengio and Iason Gabriel, and engaging talks on critical AI safety topics. The conference received coverage from Belgian media, including De Standaard and De Tijd. Hope to see you next year!
May 28, 2025 New preprint! We introduce JQL, a systematic approach for multilingual data curation that outperforms existing filtering methods. Check it out on arXiv.
Mar 30, 2025 We’re giving a seminar course about the ethics of Artificial General Intelligence in the summer semester! The course covers AGI basics, alignment and value specification, control and autonomy, systemic risks, and global governance. Learn more and register here.
Mar 23, 2025 Registrations are now open for the International Conference on Large-Scale AI Risks from 26-28th May 2025 in Leuven, Belgium. Dr. Florian Mai helped organize this event and we look forward to seeing you there!
Mar 20, 2025 Dr. Florian Mai is participating in a panel discussion on trustworthy AI at the Deutsches Museum Bonn.
Mar 6, 2025 Great news! Dr. Florian Mai and collaborators’ paper “Superalignment with Dynamic Human Values” was accepted at the BiAlign Workshop at ICLR 2025!
Feb 19, 2025 The mAI alignment lab started an AI safety reading group at University of Bonn, discussing recent papers on alignment and more! Interested in joining? Subscribe to our mailing list!
Jan 1, 2025 The mAI alignment lab is founded! Dr. Florian Mai started as a Junior Research Group Leader at University of Bonn as part of the CAISA lab headed by Prof. Lucie Flek. The lab’s research will focus on AI safety topics like value alignment, and on reasoning and planning approaches for LLMs.

safety reading group

selected publications

  1. IASEAI
    AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?
    Leonard Dung, and Florian Mai
    In IASEAI’26: International Association for Safe and Ethical AI Conference, Feb 2026
  2. COLM
    Learning to Plan for Language Modeling from Unlabeled Data
    Nathan Cornille, Marie-Francine Moens, and Florian Mai
    In First Conference on Language Modeling, Feb 2024