DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset
arxiv
The Thiomi Dataset: A Large-Scale Multimodal Corpus for Low-Resource African Languages
arxiv
The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation
arxiv
StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues
arxiv
Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection
arxiv