AI Research Hub
Back to Discover
cs.LGcs.AI

On the Identifiability of Steering Vectors in Large Language Models

Sohan Venkatesh, Ashish Mahendran Kurapath2/6/2026arxiv

This paper hasn't been summarized yet

AI Evaluation
AI analysis scores
86
Overall Score
Novelty85/100
Methodology90/100
Reproducibility80/100
Impact88/100
Similar Papers

From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems

arxiv

The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

arxiv

ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues

arxiv

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

arxiv

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

arxiv