cs.LGcs.AI

The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

Max Springer, Chung Peng Lee, Blossom Metevier, Jane Castleman, Bohdan Turbal +3 more2/17/2026arxiv

This paper hasn't been summarized yet

AI Evaluation

AI analysis scores

Overall Score

Novelty90/100

Methodology85/100

Reproducibility80/100

Impact95/100

Similar Papers

From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems

arxiv

On the Identifiability of Steering Vectors in Large Language Models

arxiv

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

arxiv

ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues

arxiv

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

arxiv