ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues
arxiv
From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems
arxiv
The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety
arxiv
On the Identifiability of Steering Vectors in Large Language Models
arxiv
CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search
arxiv