publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. IASEAI
    AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?
    Leonard Dung, and Florian Mai
    In IASEAI’26: International Association for Safe and Ethical AI Conference, Feb 2026

2025

  1. LM4UC
    Pluralistic AI Alignment: A Cross-Cultural Pilot Survey
    Khashayar Alavi, Lucie Flek, and Florian Mai
    In Second Workshop on Language Models for Underserved Communities (LM4UC), Feb 2025
  2. EMNLP
    Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
    Mehdi Ali, Manuel Brack, Max Lübbering, and 16 more authors
    In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), Feb 2025
  3. BiAlign
    Superalignment with Dynamic Human Values
    Florian Mai, David Kaczér, Nicholas Kluge Corrêa, and 1 more author
    In ICLR 2025 Workshop on Bidirectional Human-AI Alignment, Feb 2025
  4. arXiv
    In-Training Defenses against Emergent Misalignment in Language Models
    David Kaczér, Magnus Jørgenvåg, Clemens Vetter, and 2 more authors
    Feb 2025
  5. arXiv
    Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
    Shangrui Nie, Florian Mai, David Kaczér, and 3 more authors
    Feb 2025

2024

  1. COLM
    Learning to Plan for Language Modeling from Unlabeled Data
    Nathan Cornille, Marie-Francine Moens, and Florian Mai
    In First Conference on Language Modeling, Feb 2024
  2. WiNLP
    Improving Language Modeling by Increasing Test-time Planning Compute
    Florian Mai, Nathan Cornille, and Marie-Francine Moens
    In Eighth Widening NLP Workshop (WiNLP 2024) Phase II, Feb 2024