Discover Papers

Search and explore AI research papers from arXiv, Semantic Scholar, and more

Showing 20 papers

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Shaden Alshammari, Kevin Wen, Abrar Zainal +5 more

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We in...

cs.AIcs.DLcs.IR

arxiv4/20/2026

MUA: Mobile Ultra-detailed Animatable Avatars

Heming Zhu, Guoxing Sun, Marc Habermann

Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed alo...

cs.CV

arxiv4/20/2026

Sessa: Selective State Space Attention

Liubomyr Horbatko

Modern sequence models are dominated by Transformers, where self-attention mixes information from the visible context in an input-dependent way. However, when retrieval is not sharp and attention rema...

cs.LGcs.AIcs.CL

arxiv4/20/2026

Bounded Ratio Reinforcement Learning

Yunke Ao, Le Chen, Bruce D. Lee +5 more

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a signifi...

cs.LGcs.AI

arxiv4/20/2026

Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs

Kevin Murphy

We present BLF (Bayesian Linguistic Forecaster), an agentic system for binary forecasting that achieves state-of-the-art performance on the ForecastBench benchmark. The system is built on three ideas....

cs.AI

arxiv4/20/2026

ReCap: Lightweight Referential Grounding for Coherent Story Visualization

Aditya Arora, Akshita Gupta, Pau Rodriguez +1 more

Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives...

cs.CV

arxiv4/20/2026

When Can LLMs Learn to Reason with Weak Supervision?

Salman Rahman, Jingyan Shen, Anna Mordvina +3 more

Large language models have achieved significant reasoning improvements through reinforcement learning with verifiable rewards (RLVR). Yet as model capabilities grow, constructing high-quality reward s...

cs.LGcs.AI

arxiv4/20/2026

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

Savya Khosla, Sethuraman T, Aryan Chadha +2 more

Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts tasks like open-vocabulary semantic seg...

cs.CV

arxiv4/20/2026

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

A. Sophia Koepke, Daniil Zverev, Shiry Ginosar +1 more

The Platonic Representation Hypothesis suggests that neural networks trained on different modalities (e.g., text and images) align and eventually converge toward the same representation of reality. If...

cs.CVcs.AIcs.LG

arxiv4/20/2026

A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

Andrew Zhang, Tong Ding, Sophia J. Wagner +8 more

Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation....

cs.LGcs.AIcs.CL

arxiv4/20/2026

Revisiting Active Sequential Prediction-Powered Mean Estimation

Maria-Eleni Sfyraki, Jun-Kun Wang

In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the cov...

stat.MLcs.LG

arxiv4/20/2026

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

Manan Gupta, Dhruv Kumar

Large language models frequently commit unrecoverable reasoning errors mid-generation: once a wrong step is taken, subsequent tokens compound the mistake rather than correct it. We introduce $\textbf{...

cs.LGcs.AIcs.CL

arxiv4/20/2026

Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion

Terry Leitch

We present a systematic evaluation of large language model families -- spanning both proprietary cloud APIs and locally-hosted open-source models -- on two purpose-built benchmarks for System Dynamics...

cs.AIcs.HCcs.LG

arxiv4/20/2026

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Haoyu Wu, Jiwen Yu, Yingtian Zou +1 more

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that ...

cs.CV

arxiv4/20/2026

Dual Alignment Between Language Model Layers and Human Sentence Processing

Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki +1 more

A recent study (Kuribayashi et al., 2025) has shown that human sentence processing behavior, typically measured on syntactically unchallenging constructions, can be effectively modeled using surprisal...

cs.CL

arxiv4/20/2026

AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

Rui Qian, Chuanhang Deng, Qiang Huang +6 more

Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $\texttt{<SEG>}$, whose...

cs.CV

arxiv4/20/2026

ConforNets: Latents-Based Conformational Control in OpenFold3

Minji Lee, Colin Kalicki, Minkyu Jeon +3 more

Models from the AlphaFold (AF) family reliably predict one dominant conformation for most well-ordered proteins but struggle to capture biologically relevant alternate states. Several efforts have foc...

q-bio.BMcs.LG

arxiv4/20/2026

SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy

Wei Yao, Haohan Ma, Hongwen Zhang +6 more

Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited ge...

cs.CV

arxiv4/20/2026

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

Alireza Dadgarnia, Soroush Tabesh, Mahdi Nikdan +3 more

Weight quantization has become a standard tool for efficient LLM deployment, especially for local inference, where models are now routinely served at 2-3 bits per parameter. The state of the art is cu...

cs.CLcs.LG

arxiv4/20/2026

A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work

Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson +3 more

This note clarifies the relationship between the recent TurboQuant work and the earlier DRIVE (NeurIPS 2021) and EDEN (ICML 2022) schemes. DRIVE is a 1-bit quantizer that EDEN extended to any $b>0$...

cs.LG

arxiv4/20/2026