Search and explore AI research papers from arXiv, Semantic Scholar, and more
Showing 20 papers
Ninghui Xu, Fabio Tosi, Lihui Wang +5 more
The paper introduces Bi-CMPStereo, a framework that leverages both event and frame-based camera data for improved 3D perception in dynamic scenes.
Zhanhao Liang, Tao Yang, Jie Wu +2 more
LeapAlign improves flow matching model fine-tuning by reducing computational costs and enabling efficient gradient propagation, leading to better image quality and alignment.
Sumit Chaturvedi, Yannick Hold-Geoffroy, Mengwei Ren +5 more
TokenLight introduces a method for precise and continuous control of multiple illumination attributes in images using attribute tokens, achieving state-of-the-art performance in relighting tasks.
Yan Li, Zezi Zeng, Yifan Yang +12 more
MM-WebAgent is a hierarchical framework for generating coherent and visually consistent webpages by coordinating multimodal content generation through hierarchical planning and iterative self-reflection.
Hao Gao, Shaoyu Chen, Yifan Zhu +4 more
RAD-2 is a generator-discriminator framework for autonomous driving that improves trajectory planning and driving quality using a diffusion-based generator and RL-optimized discriminator.
Yao Tong, Jiayuan Ye, Anastasia Borovykh +1 more
Language models generalize well to new maps but struggle with longer problem-solving tasks due to recursive instability.
Manan Gupta, Dhruv Kumar
The paper presents a diagnostic toolkit to evaluate the reliability of LLM-as-judge frameworks, revealing inconsistencies and providing a measure of per-instance reliability through prediction set widths.
Yiyang Jiang, Li Zhang, Xiao-Yong Wei +1 more
The paper introduces a new SLT framework using latent thoughts for more accurate sign language translation without relying on glosses.
Leyi Wu, Pengjun Fang, Kai Sun +8 more
AnimationBench is a new benchmark designed to evaluate animation-style image-to-video generation using principles of animation and broader quality dimensions.
Yury Gorishniy, Ivan Rubachev, Dmitrii Feoktistov +1 more
Summary not available
Zhen Yang, Ping Jian, Zhongbin Guo +5 more
This paper investigates how language models understand viewpoint rotation without visual inputs and finds that they struggle to achieve spatial intelligence, but selective fine-tuning can improve their performance.
Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto +4 more
The paper benchmarks visual anomaly detection models on a synthetic dataset to enhance safety in autonomous driving by identifying unfamiliar objects and guiding driver attention.
Víctor Soto-Larrosa, Nuria Torrado, Edmundo J. Huertas
The paper introduces Orthogonal Representation Contribution Analysis (ORCA) for interpreting SVMs with truncated orthogonal polynomial kernels after training.
Roni Itkin, Noam Issachar, Yehonatan Keypur +3 more
GlobalSplat introduces an efficient 3D Gaussian Splatting method using global scene tokens for compact and fast novel-view synthesis.
Zhengdong Hong, Shenrui Wu, Haozhe Cui +8 more
R3D introduces a new architecture for 3D policy learning that addresses training instabilities and overfitting, outperforming state-of-the-art baselines in manipulation tasks.
Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara +1 more
Vision-language models struggle with emotion recognition due to imbalanced emotion datasets and the inability to process temporal information effectively.
Nouhaila Innan, Antonello Rosato, Alberto Marchisio +1 more
This paper benchmarks classical and quantum-oriented node embeddings for graph neural networks, showing dataset-dependent performance variations.
Mengdi Wu, Xiaoyu Jiang, Oded Padon +1 more
Prism is a symbolic superoptimizer for tensor programs that achieves significant speedups over existing superoptimizers and compiler-based approaches by using a hierarchical symbolic representation and a two-level search strategy.
Tianhao Fu, Austin Wang, Charles Chen +2 more
SegWithU is a framework that enhances medical image segmentation models with reliable uncertainty estimation using a single forward pass.
Nikhil Bansal, Matthias C. Caro, Gaurav Mahajan
Cloning stabilizer states is as hard as learning them, with both requiring Θ(n) samples.