From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems
arxiv
On the Identifiability of Steering Vectors in Large Language Models
arxiv
UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models
arxiv
ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues
arxiv
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
arxiv