Search and explore AI research papers from arXiv, Semantic Scholar, and more
Showing 20 papers
Shaden Alshammari, Kevin Wen, Abrar Zainal +5 more
Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We in...
Heming Zhu, Guoxing Sun, Marc Habermann
Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed alo...
Liubomyr Horbatko
Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention rema...
Yunke Ao, Le Chen, Bruce D. Lee +5 more
Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a signifi...
Kevin Murphy
We present BLF (Bayesian Linguistic Forecaster), an agentic system for binary forecasting that achieves state-of-the-art performance on the ForecastBench benchmark. The system is built on three ideas....
Aditya Arora, Akshita Gupta, Pau Rodriguez +1 more
Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives...
Salman Rahman, Jingyan Shen, Anna Mordvina +3 more
Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward s...
Savya Khosla, Sethuraman T, Aryan Chadha +2 more
Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts tasks like open-vocabulary semantic seg...
A. Sophia Koepke, Daniil Zverev, Shiry Ginosar +1 more
The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If...
Andrew Zhang, Tong Ding, Sophia J. Wagner +8 more
Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation....
Maria-Eleni Sfyraki, Jun-Kun Wang
In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the cov...
Manan Gupta, Dhruv Kumar
Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{...
Terry Leitch
We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics...
Haoyu Wu, Jiwen Yu, Yingtian Zou +1 more
Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that ...
Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki +1 more
A recent study (Kuribayashi et al., 2025) has shown that human sentence processing behavior, typically measured on syntactically unchallenging constructions, can be effectively modeled using surprisal...
Rui Qian, Chuanhang Deng, Qiang Huang +6 more
Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $\texttt{<SEG>}$, whose...
Minji Lee, Colin Kalicki, Minkyu Jeon +3 more
Models from the AlphaFold (AF) family reliably predict one dominant conformation for most well-ordered proteins but struggle to capture biologically relevant alternate states. Several efforts have foc...
Wei Yao, Haohan Ma, Hongwen Zhang +6 more
Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited ge...
Alireza Dadgarnia, Soroush Tabesh, Mahdi Nikdan +3 more
Weight quantization has become a standard tool for efficient LLM deployment, especially for local inference, where models are now routinely served at 2-3 bits per parameter. The state of the art is cu...
Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson +3 more
This note clarifies the relationship between the recent TurboQuant work and the earlier DRIVE (NeurIPS 2021) and EDEN (ICML 2022) schemes. DRIVE is a 1-bit quantizer that EDEN extended to any $b>0$...