Back to Discover
cs.CVcs.AI

StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

Zanxi Ruan, Qiuyu Kong, Songqun Gao, Yiming Wang, Marco Cristani2/23/2026arxiv

This paper hasn't been summarized yet