cs.LGcs.AI

On the Identifiability of Steering Vectors in Large Language Models

Sohan Venkatesh, Ashish Mahendran Kurapath2/6/2026arxiv

This paper hasn't been summarized yet

AI Evaluation

AI analysis scores

Overall Score

Novelty85/100

Methodology90/100

Reproducibility80/100

Impact88/100

Similar Papers

From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems

arxiv

The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

arxiv

ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues

arxiv

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

arxiv

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

arxiv