Discover Papers

Search and explore AI research papers from arXiv, Semantic Scholar, and more

Filters:

Showing 20 papers

Bidirectional Cross-Modal Prompting for Event-Frame Asymmetric Stereo

86

Ninghui Xu, Fabio Tosi, Lihui Wang +5 more

The paper introduces Bi-CMPStereo, a framework that leverages both event and frame-based camera data for improved 3D perception in dynamic scenes.

cs.CV
arxiv4/16/2026

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

83

Zhanhao Liang, Tao Yang, Jie Wu +2 more

LeapAlign improves flow matching model fine-tuning by reducing computational costs and enabling efficient gradient propagation, leading to better image quality and alignment.

cs.CV
arxiv4/16/2026

TokenLight: Precise Lighting Control in Images using Attribute Tokens

86

Sumit Chaturvedi, Yannick Hold-Geoffroy, Mengwei Ren +5 more

TokenLight introduces a method for precise and continuous control of multiple illumination attributes in images using attribute tokens, achieving state-of-the-art performance in relighting tasks.

cs.CVcs.GR
arxiv4/16/2026

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

83

Yan Li, Zezi Zeng, Yifan Yang +12 more

MM-WebAgent is a hierarchical framework for generating coherent and visually consistent webpages by coordinating multimodal content generation through hierarchical planning and iterative self-reflection.

cs.CVcs.AIcs.CL
arxiv4/16/2026

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

83

Hao Gao, Shaoyu Chen, Yifan Zhu +4 more

RAD-2 is a generator-discriminator framework for autonomous driving that improves trajectory planning and driving quality using a diffusion-based generator and RL-optimized discriminator.

cs.CV
arxiv4/16/2026

Generalization in LLM Problem Solving: The Case of the Shortest Path

83

Yao Tong, Jiayuan Ye, Anastasia Borovykh +1 more

Language models generalize well to new maps but struggle with longer problem-solving tasks due to recursive instability.

cs.AIcs.LG
arxiv4/16/2026

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

88

Manan Gupta, Dhruv Kumar

The paper presents a diagnostic toolkit to evaluate the reliability of LLM-as-judge frameworks, revealing inconsistencies and providing a measure of per-instance reliability through prediction set widths.

cs.AIcs.CLcs.LG
arxiv4/16/2026

Think in Latent Thoughts: A New Paradigm for Gloss-Free Sign Language Translation

86

Yiyang Jiang, Li Zhang, Xiao-Yong Wei +1 more

The paper introduces a new SLT framework using latent thoughts for more accurate sign language translation without relying on glosses.

cs.CV
arxiv4/16/2026

AnimationBench: Are Video Models Good at Character-Centric Animation?

83

Leyi Wu, Pengjun Fang, Kai Sun +8 more

AnimationBench is a new benchmark designed to evaluate animation-style image-to-video generation using principles of animation and broader quality dimensions.

cs.CV
arxiv4/16/2026

Benchmarking Optimizers for MLPs in Tabular Deep Learning

78

Yury Gorishniy, Ivan Rubachev, Dmitrii Feoktistov +1 more

Summary not available

cs.LG
arxiv4/16/2026

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

82

Zhen Yang, Ping Jian, Zhongbin Guo +5 more

This paper investigates how language models understand viewpoint rotation without visual inputs and finds that they struggle to achieve spatial intelligence, but selective fine-tuning can improve their performance.

cs.AI
arxiv4/16/2026

AD4AD: Benchmarking Visual Anomaly Detection Models for Safer Autonomous Driving

88

Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto +4 more

The paper benchmarks visual anomaly detection models on a synthetic dataset to enhance safety in autonomous driving by identifying unfamiliar objects and guiding driver attention.

cs.CVcs.AI
arxiv4/16/2026

Structural interpretability in SVMs with truncated orthogonal polynomial kernels

83

Víctor Soto-Larrosa, Nuria Torrado, Edmundo J. Huertas

The paper introduces Orthogonal Representation Contribution Analysis (ORCA) for interpreting SVMs with truncated orthogonal polynomial kernels after training.

stat.MLcs.LGmath.ST
arxiv4/16/2026

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

86

Roni Itkin, Noam Issachar, Yehonatan Keypur +3 more

GlobalSplat introduces an efficient 3D Gaussian Splatting method using global scene tokens for compact and fast novel-view synthesis.

cs.CV
arxiv4/16/2026

R3D: Revisiting 3D Policy Learning

86

Zhengdong Hong, Shenrui Wu, Haozhe Cui +8 more

R3D introduces a new architecture for 3D policy learning that addresses training instabilities and overfitting, outperforming state-of-the-art baselines in manipulation tasks.

cs.CVcs.RO
arxiv4/16/2026

Why Do Vision Language Models Struggle To Recognize Human Emotions?

83

Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara +1 more

Vision-language models struggle with emotion recognition due to imbalanced emotion datasets and the inability to process temporal information effectively.

cs.CVcs.AI
arxiv4/16/2026

How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations

83

Nouhaila Innan, Antonello Rosato, Alberto Marchisio +1 more

This paper benchmarks classical and quantum-oriented node embeddings for graph neural networks, showing dataset-dependent performance variations.

cs.LGquant-ph
arxiv4/16/2026

Prism: Symbolic Superoptimization of Tensor Programs

86

Mengdi Wu, Xiaoyu Jiang, Oded Padon +1 more

Prism is a symbolic superoptimizer for tensor programs that achieves significant speedups over existing superoptimizers and compiler-based approaches by using a hierarchical symbolic representation and a two-level search strategy.

cs.PLcs.AIcs.LG
arxiv4/16/2026

SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation

88

Tianhao Fu, Austin Wang, Charles Chen +2 more

SegWithU is a framework that enhances medical image segmentation models with reliable uncertainty estimation using a single forward pass.

cs.CVcs.AIcs.LG
arxiv4/16/2026

Cloning is as Hard as Learning for Stabilizer States

83

Nikhil Bansal, Matthias C. Caro, Gaurav Mahajan

Cloning stabilizer states is as hard as learning them, with both requiring Θ(n) samples.

quant-phcs.LGmath.ST
arxiv4/16/2026