cs.CVcs.AI

StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

Zanxi Ruan, Qiuyu Kong, Songqun Gao, Yiming Wang, Marco Cristani2/23/2026arxiv

This paper hasn't been summarized yet

AI Evaluation

AI analysis scores

Overall Score

Novelty85/100

Methodology90/100

Reproducibility95/100

Impact80/100

Similar Papers

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

arxiv

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

arxiv

Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models

arxiv

SentiAvatar: Towards Expressive and Interactive Digital Humans

arxiv

VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

arxiv