DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
arxiv
SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning
arxiv
HINT: Composed Image Retrieval with Dual-path Compositional Contextualized Network
arxiv
StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues
arxiv
Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection
arxiv