Paper Digest: ICLR 2026 Papers & Highlights
Note: ICLR-2026 accepts more than 5,300 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 5,300 ICLR-2026 papers in a separate page, which takes quite some time to load.
To search for papers presented at ICLR-2026 on a specific topic, please make use of the search by venue (ICLR-2026) service. To summarize the latest research published at ICLR-2026 on a specific topic, you can utilize the review by venue (ICLR-2026) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 22,000 authors (ICLR-2026). Additionally, you may want to explore our “Best Paper” Digest (ICLR), which lists the most influential ICLR papers since 2018.
As a pioneer in the field since 2018, Paper Digest has curated thousands of such lists, drawing on years of accumulated data across decades of conferences and research topics.To ensure you never miss a breakthrough, our daily service sifts through tens of thousands of new papers, clinical trials, news articles, community posts every day – delivering only what matters most to your specific interests. Beyond discovery, Paper Digest offers built-in research tools to help users read articles, write articles, get answers, conduct literature reviews, and generate research reports more efficiently.
Paper Digest Team
New York City, New York, 10017
TABLE 1: Paper Digest: ICLR 2026 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | SigLIP-HD By Fine-to-Coarse Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Before simply adopting a higher resolution, have we truly unlocked the model’s full perception capability at a standard resolution? Therefore, we study an interesting problem: how to achieve fine visual perception under lower cost without larger images. |
Lihe Yang; Zhen Zhao; Hengshuang Zhao; |
| 2 | Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Cosmos Policy, a simple approach for adapting a large pretrained video model (Cosmos-Predict2) into an effective robot policy through a single stage of post-training on the robot demonstration data collected on the target platform, with no architectural modifications. |
Moo Jin Kim; Yihuai Gao; Tsung-Yi Lin; Yen-Chen Lin; Yunhao Ge; Grace Lam; Percy Liang; Shuran Song; Ming-Yu Liu; Chelsea Finn; Jinwei Gu; |
| 3 | SAM 3: Segment Anything with Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., “yellow school bus”), image exemplars, or a combination of both. |
Nicolas Carion; Laura Gustafson; Yuan-Ting Hu; Shoubhik Debnath; Ronghang Hu; Didac Suris Coll-Vinent; Chaitanya Ryali; Kalyan Vasudev Alwala; Haitham Khedr; Andrew Huang; Jie Lei; Tengyu Ma; Baishan Guo; Arpit Kalla; Markus Marks; Joseph Greer; Meng Wang; Peize Sun; Roman Rädle; Triantafyllos Afouras; Effrosyni Mavroudi; Katherine Xu; Tsung-Han Wu; Yu Zhou; Liliane Momeni; RISHI HAZRA; Shuangrui Ding; Sagar Vaze; Francois Porcher; Feng Li; Siyuan Li; Aishwarya Kamath; Ho Kei Cheng; Piotr Dollar; Nikhila Ravi; Kate Saenko; Pengchuan Zhang; Christoph Feichtenhofer; |
| 4 | Humanline: Online Alignment As Perceptual Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Drawing on prospect theory from behavioral economics, we propose a human-centric explanation. |
Sijia Liu; Niklas Muennighoff; Kawin Ethayarajh; |
| 5 | SNaX: Sparse Narrow Accelerated Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing MoE methods optimize system efficiency or model architecture independently. We show that as MoE models get more granular and sparser, they become more memory-bound, and jointly optimizing the algorithms and the kernel design leads to a major improvement in MoE training throughput. |
Wentao Guo; Mayank Mishra; Xinle Cheng; Ion Stoica; Tri Dao; |
| 6 | In-Place Test-Time Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce **In-Place Test-Time Training (In-Place TTT)**, a framework that seamlessly endows LLMs with Test-Time Training ability. |
Guhao Feng; Shengjie Luo; Kai Hua; Ge Zhang; Wenhao Huang; Di He; Tianle Cai; |
| 7 | Revisiting Multimodal Positional Encoding in Vision–Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through extensive experiments, we identify three key guidelines: positional coherence, full frequency utilization, and preservation of textual priors—ensuring unambiguous layout, rich representation, and faithful transfer from the pre-trained LLM. Based on these insights, we propose Multi-Head RoPE (MHRoPE) and MRoPE-Interleave (MRoPE-I), two simple and plug-and-play variants that require no architectural changes. |
Jie Huang; Xuejing Liu; Sibo Song; RuiBing Hou; Hong Chang; Junyang Lin; Shuai Bai; |
| 8 | ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While reasoning models trained with reinforcement learning (RL) excel in reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex equation solving—areas where computational tools like code interpreters (CI) demonstrate distinct advantages. To bridge this gap, we propose ReTool, which enhances long-form reasoning with tool-integrated learning, including two key features: (1) dynamic interleaving of real-time code execution within natural language reasoning processes, and (2) an automated RL paradigm that allows policy rollouts with multi-turn real-time code execution and teaches the model in learning when and how to invoke tools based on outcome feedback. |
Jiazhan Feng; Shijue Huang; Xingwei Qu; Ge Zhang; Yujia Qin; Baoquan Zhong; Chengquan Jiang; Jinxin Chi; Wanjun Zhong; |
| 9 | RewardEval: Advancing Reward Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we describe our benchmark construction process and report how existing models perform on it, while quantifying and providing new insights on how performance on the benchmark correlates with downstream use of the models in both inference-time scaling algorithms, like best-of-N sampling, and RLHF training algorithms like proximal policy optimization. |
Saumya Malik; Valentina Pyatkin; Sander Land; Jacob Morrison; Noah A. Smith; Hannaneh Hajishirzi; Nathan Lambert; |
| 10 | Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces the \textbf{Language Confusion Gate} (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. |
Collin Zhang; Fei Huang; Chenhan Yuan; Junyang Lin; |
| 11 | A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present A$^2$Search, an annotation-free, end-to-end training framework to recognize and handle ambiguity. |
Fengji Zhang; Xinyao Niu; Chengyang Ying; Guancheng Lin; Zhongkai Hao; Zhou Fan; Chengen Huang; Jacky Keung; Bei Chen; Junyang Lin; |
| 12 | Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a collection of new techniques that enable a dynamic chunking mechanism which automatically learns content- and context- dependent segmentation strategies learned jointly with the rest of the model. |
Sukjun Hwang; Brandon Wang; Albert Gu; |
| 13 | RLP: Reinforcement As A Pretraining Objective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present RLP, an information-driven reinforcement pretraining objective, that brings the core spirit of reinforcement learning—exploration—to the last phase of pretraining. |
Ali Hatamizadeh; Syeda Nahida Akter; Shrimai Prabhumoye; Jan Kautz; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; Yejin Choi; |
| 14 | VeriTrail: Closed-Domain Hallucination Detection with Traceability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this need, we present VeriTrail, the first closed-domain hallucination detection method designed to provide traceability for both MGS and SGS processes.We also introduce the first datasets to include all intermediate outputs as well as human annotations of final outputs’ faithfulness for their respective MGS processes. |
Dasha Metropolitansky; Jonathan Larson; |
| 15 | RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. |
Yuxiao Qu; Anikait Singh; Yoonho Lee; Amrith Setlur; Ruslan Salakhutdinov; Chelsea Finn; Aviral Kumar; |
| 16 | RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet despite this momentum, it remains difficult to gauge how close we are to this goal, as the field lacks a reproducible, large-scale benchmark for systematic evaluation. To address this gap, we present RoboCasa365, a comprehensive robot simulation benchmark for everyday tasks. |
Soroush Nasiriany; Sepehr Nasiriany; Abhiram Maddukuri; Yuke Zhu; |
| 17 | VisualPRM400K: An Effective Dataset for Training Multimodal Process Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We construct VisualPRM400K, a dataset comprising about 400K multimodal process supervision data. |
Weiyun Wang; Zhangwei Gao; Lianjie Chen; Zhe Chen; Jinguo Zhu; Xiangyu Zhao; Yangzhou Liu; Yue Cao; Shenglong Ye; Xizhou Zhu; Lewei Lu; Haodong Duan; Yu Qiao; Jifeng Dai; Wenhai Wang; |
| 18 | Principled RL for Diffusion LLMs Emerges from A Sequence-Level Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The core difficulty lies in likelihood approximation: while autoregressive models naturally provide token-level conditional probabilities essential for token-level RL objectives (e.g., GRPO), dLLMs generate sequences through iterative non-autoregressive denoising steps that lack this factorization. To address this fundamental mismatch, we propose ELBO-based Sequence-level Policy Optimization (ESPO), a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy. |
Jingyang Ou; Jiaqi Han; Minkai Xu; Shaoxuan Xu; Jianwen Xie; Stefano Ermon; Yi Wu; Chongxuan Li; |
| 19 | Early Signs of Steganographic Capabilities in Frontier LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, LLMs could evade monitoring through steganography: Encoding hidden information within seemingly benign generations. In this paper, we evaluate the steganography capabilities in frontier LLMs to better understand the risk they pose. |
Artur Zolkowski; Kei Nishimura-Gasparian; Robert McCarthy; Roland S. Zimmermann; David Lindner; |
| 20 | Towards Spatial Supersensing in Video Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Using a four-level taxonomy: semantic perception, streaming event cognition, implicit 3D spatial cognition, and predictive world modeling, we audit existing benchmarks and show they focus heavily on the first tier, with only partial coverage of streaming and spatial cognition, and almost never test true world modeling. To ground these gaps, we introduce VSI-Super, a two-part benchmark for continual spatial sensing: VSO (long-horizon spatial observation and recall) and VSC (continual counting under changing viewpoints and scenes). |
Shusheng Yang; Jihan Yang; Pinzhi Huang; Ellis L Brown II; Zihao Yang; Yue Yu; Shengbang Tong; Zihan Zheng; Yifan Xu; Muhan Wang; Rob Fergus; Yann LeCun; Li Fei-Fei; Saining Xie; |
| 21 | Diffusion Transformers with Representation Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate replacing the VAE encoder–decoder with pretrained representation encoders (e.g., DINO, SigLIP, MAE) combined with trained decoders, forming what we call \emph{Representation Autoencoders} (RAEs). |
Boyang Zheng; Nanye Ma; Shengbang Tong; Saining Xie; |
| 22 | What Matters for Representation Alignment: Global Information or Spatial Structure? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The results are surprising – spatial structure, rather than global performance drives the generation performance of a target representation. To further study this, we introduce two straightforward modifications, which specifically accentuate the transfer of spatial information. |
Jaskirat Singh; Xingjian Leng; Zongze Wu; Liang Zheng; Richard Zhang; Eli Shechtman; Saining Xie; |
| 23 | DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For RL training, to reduce the variance of token log-likelihood estimates and maintain training efficiency, we propose coupled-GRPO, a novel sampling scheme that constructs complementary mask noise for completions used in training. |
Shansan Gong; Ruixiang ZHANG; Huangjie Zheng; Jiatao Gu; Navdeep Jaitly; Lingpeng Kong; Yizhe Zhang; |
| 24 | SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The comparison reveals significant inconsistencies between these metrics, making them impractical and inaccurate criteria for guiding SBDD methods toward synthesizable drug design. Therefore, we propose a simple yet effective SE(3)-invariant \textit{\underline{SYN}thesizability \underline{C}lassifier} (SYNC) to enable better synthesizability estimation in SBDD, which demonstrates superior generalizability and speed compared to existing metrics on five curated datasets. |
Yunfan Liu; Lirong Wu; Zhifeng Gao; Yufei Huang; Cheng Tan; Haitao Lin; Zicheng Liu; Changxi Chi; Chang Yu; Stan Z. Li; |
| 25 | LLMs Get Lost In Multi-Turn Conversation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we perform large-scale simulation experiments to compare LLM performance in single- and multi-turn settings. |
Philippe Laban; Hiroaki Hayashi; Yingbo Zhou; Jennifer Neville; |
| 26 | Mamba-3: Improved Sequence Modeling Using State Space Principles Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state-space model viewpoint of linear models. |
Aakash Lahoti; Kevin Li; Berlin Chen; Caitlin Wang; Aviv Bick; J Zico Kolter; Tri Dao; Albert Gu; |
| 27 | $\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel framework, named \ours{}, that captures three key stages for the grokking behavior of 2-layer nonlinear networks: (I) \underline{\textbf{L}}azy learning, (II) \underline{\textbf{i}}ndependent feature learning and (III) \underline{\textbf{i}}nteractive feature learning. |
Yuandong Tian; |
| 28 | Pre-training Under Infinite Compute Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We first show that existing data-constrained approaches of increasing epoch count and parameter count overfit, and we improve upon such recipes by tuning regularization, finding that the optimal weight decay is $30\times$ larger than standard practice. Since our regularized recipe monotonically decreases loss following a power law in parameter count, we estimate its best possible performance via the \textbf{asymptote} of its scaling law rather than the performance at a fixed compute budget. We then identify that ensembling independently trained models achieves a significantly lower loss asymptote than the regularized recipe. |
Konwoo Kim; Suhas Kotha; Percy Liang; Tatsunori Hashimoto; |
| 29 | Lyra: Generative 3D Scene Reconstruction Via Video Diffusion Model Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a self-distillation framework that aims to distill the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation, eliminating the need for multi-view training data. |
Sherwin Bahmani; Tianchang Shen; Jiawei Ren; Jiahui Huang; Yifeng Jiang; Haithem Turki; Andrea Tagliasacchi; David B. Lindell; Zan Gojcic; Sanja Fidler; Huan Ling; Jun Gao; Xuanchi Ren; |
| 30 | Kimi-Dev: Agentless Training As Skill Prior for SWE-agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we first curate the Agentless training recipe and present Kimi-Dev, an open-source SWE LLM achieving 60.4\% on SWE-bench Verified, the best among workflow approaches. |
Zonghan Yang; Shengjie Wang; Kelin Fu; Wenyang He; Weimin Xiong; Yibo Liu; Yibo Miao; Bofei Gao; Yejie Wang; YINGWEI MA; Yanhao Li; Yue Liu; Zhenxing Hu; kaitai zhang; Shuyi Wang; Huarong Chen; Flood Sung; Yang Liu; Yang Gao; Zhilin Yang; Tianyu Liu; |
| 31 | Ctrl-World: A Controllable Generative World Model for Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we make a step forward by introducing a controllable multi-view world model that can be used to evaluate and improve the instruction-following ability of generalist robot policies. |
Yanjiang Guo; Lucy Xiaoyang Shi; Jianyu Chen; Chelsea Finn; |
| 32 | FSPO: Few-Shot Optimization of Synthetic Preferences Effectively Personalizes to Real Users Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the strong in-context capabilities of LLMs, we propose few-shot preference optimization (FSPO), an algorithm for LLM personalization that reframes reward modeling as a meta-learning problem. |
Anikait Singh; Sheryl Hsu; Kyle Hsu; Eric Mitchell; Stefano Ermon; Tatsunori Hashimoto; Archit Sharma; Chelsea Finn; |
| 33 | 3D Aware Region Prompted Vision Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Spatial Region 3D (SR-3D) aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. |
An-Chieh Cheng; Yang Fu; Yukang Chen; Zhijian Liu; Xiaolong Li; Subhashree Radhakrishnan; Song Han; Yao Lu; Jan Kautz; Pavlo Molchanov; Hongxu Yin; Xiaolong Wang; Sifei Liu; |
| 34 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper empirically investigates the limitations of VLMs in 3D spatial knowledge, revealing that their primary shortcoming lies in the lack of global-local correspondence between the scene and individual frames. To address this, we introduce GPT4Scene, a novel visual prompting paradigm in VLM training and inference that helps build the global-local relationship, significantly improving the 3D spatial understanding of indoor scenes. |
Zhangyang Qi; Zhixiong Zhang; Ye Fang; Jiaqi Wang; Hengshuang Zhao; |
| 35 | The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We operationalize this question using a bias-variance decomposition of the errors made by AI models: An AI’s *incoherence* on a task is measured over test-time randomness as the fraction of its error that stems from variance rather than bias in task outcome. Across all tasks and frontier models we measure, we find that the longer models spend reasoning and taking actions, *the more incoherent* they become. |
Alexander Hägele; Aryo Pradipta Gema; Henry Sleight; Ethan Perez; Jascha Sohl-Dickstein; |
| 36 | Constrained Decoding of Diffusion LLMs with Context-Free Grammars Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing works are not applicable to the emerging paradigm of diffusion LLMs, as this requires supporting token generation in arbitrary order instead of the traditional left-to-right order. In this paper, we address this challenge and present the first constrained decoding method for diffusion models, one that can handle formal languages captured by context-free grammars. |
Niels Mündler; Jasper Dekoninck; Martin Vechev; |
| 37 | EDINET-Bench: Evaluating LLMs on Complex Financial Tasks Using Japanese Financial Statements Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce EDINET-Bench, an open-source Japanese financial benchmark designed to evaluate LLMs on challenging tasks such as accounting fraud detection, earnings forecasting, and industry classification. |
Issa Sugiura; Takashi Ishida; Taro Makino; Chieko Tazuke; Takanori Nakagawa; Kosuke Nakago; David Ha; |
| 38 | LeRobot: An Open-Source Library for End-to-End Robot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present lerobot, an open-source library that integrates across the entire robotics stack, from low-level middleware communication for motor controls to large-scale dataset collection, storage and streaming. |
Remi Cadene; Simon Alibert; Francesco Capuano; Michel Aractingi; Adil Zouitine; Pepijn Kooijmans; Jade Choghari; Martino Russi; Caroline Pascal; Steven Palma; Mustafa Shukor; Jess Moss; Alexander Soare; Dana Aubakirova; Quentin Lhoest; Quentin Gallouédec; Thomas Wolf; |
| 39 | BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents Via Contrastive Trigger Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, such vision-driven embodied agents open a new attack surface: visual backdoor attacks, where the agent behaves normally until a visual trigger appears in the scene, then persistently executes an attacker-specified multi-step policy. We introduce BEAT, the first framework to inject such visual backdoors into VLM-based embodied agents using objects in the environments as triggers. |
Qiusi Zhan; Hyeonjeong Ha; Rui Yang; Sirui Xu; Hanyang Chen; Liangyan Gui; Yu-Xiong Wang; Huan Zhang; Heng Ji; Daniel Kang; |
| 40 | SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SANA-Video, a small diffusion model that can efficiently generate videos up to 720×1280 resolution and minute-length duration. |
Junsong Chen; Yuyang Zhao; Jincheng YU; Ruihang Chu; Junyu Chen; Shuai Yang; Xianbang Wang; Yicheng Pan; Daquan Zhou; Huan Ling; Haozhe Liu; Hongwei Yi; Hao Zhang; Muyang Li; Yukang Chen; Han Cai; Sanja Fidler; Ping Luo; Song Han; Enze Xie; |
| 41 | EXPO: Stable Reinforcement Learning with Expressive Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Expressive Policy Optimization (EXPO), a sample-efficient online RL algorithm that utilizes an on-the-fly policy to maximize value with two parameterized policies — a larger expressive base policy trained with a stable imitation learning objective and a light-weight Gaussian edit policy that edits the actions sampled from the base policy toward a higher value distribution. |
Perry Dong; Qiyang Li; Dorsa Sadigh; Chelsea Finn; |
| 42 | What Matters for Batch Online Reinforcement Learning in Robotics? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on this analysis, we propose a general recipe for effective batch online RL. |
Perry Dong; Suvir Mirchandani; Dorsa Sadigh; Chelsea Finn; |
| 43 | The Ideation-Execution Gap: Execution Outcomes of LLM-Generated Versus Human Research Ideas Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: When comparing the aggregated review scores from the execution study, we even observe that for many metrics there is a flip in rankings where human ideas score higher than LLM ideas. This ideation-execution gap highlights the limitations of current LLMs in generating truly effective research ideas and the challenge of evaluating research ideas in the absence of execution outcomes. |
Chenglei Si; Tatsunori Hashimoto; Diyi Yang; |
| 44 | Fast-dLLM V2: Efficient Block-Diffusion LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Fast-dLLM v2, a carefully designed block diffusion language model (dLLM) that efficiently adapts pretrained AR models into dLLMs for parallel text generation—requiring only ∼1B tokens of fine-tuning. |
Chengyue Wu; Hao Zhang; Shuchen Xue; Shizhe Diao; Yonggan Fu; Zhijian Liu; Pavlo Molchanov; Ping Luo; Song Han; Enze Xie; |
| 45 | Fast-dLLM: Training-free Acceleration of Diffusion LLM By Enabling KV Cache and Parallel Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the practical inference speed of open-sourced Diffusion LLMs often lags behind autoregressive models due to the lack of Key-Value (KV) Cache and quality degradation when decoding multiple tokens simultaneously. To bridge this gap, we introduce Fast-dLLM, a method that incorporates a novel block-wise approximate KV Cache mechanism tailored for bidirectional diffusion models, enabling cache reuse with negligible performance drop. |
Chengyue Wu; Hao Zhang; Shuchen Xue; Zhijian Liu; Shizhe Diao; Ligeng Zhu; Ping Luo; Song Han; Enze Xie; |
| 46 | Reinforcement Learning from Dynamic Critic Feedback for Free-Form Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This problem is exacerbated by the fact that often the best way to combine these rubrics into one single reward is also highly prompt-specific. We propose Reinforcement Learning from Dynamic Critic Feedback (RLDCF), a post-training approach that addresses these challenges via dynamic rubric verification. |
Mian Wu; Gavin Zhang; Sewon Min; Sergey Levine; Aviral Kumar; |
| 47 | Scaling Up Memory for Robotic Control Via Experience Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hierarchical policy framework, where the high-level policy is trained to select and track previous task-relevant keyframes from its experience. |
Ajay Sridhar; Jennifer Pan; Satvik Sharma; Chelsea Finn; |
| 48 | Patching Gaps In LLM Reasoning With Interventional Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce \textbf{Interventional Training} (InT), a framework that leverages single-step oracle interventions to improve LLM reasoning. |
Matthew Y. R. Yang; Hao Bai; Ian Wu; Gene Yang; Amrith Setlur; Aviral Kumar; |
| 49 | The Art of Scaling Reinforcement Learning Compute for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe: (1) Not all recipes yield similar asymptotic performance, Details such as loss aggregation, normalization, curriculum, and off-policy algorithm primarily modulate compute efficiency without materially shifting the asymptote, and (3) Stable, scalable recipes follow predictable scaling trajectories, enabling extrapolation from smaller-scale runs. Combining these insights, we propose a _best-practice_ recipe, ScaleRL, and demonstrate its effectiveness by successfully scaling and predicting validation performance on a single RL run scaled up to 100,000 GPU-hours. |
Fnu Devvrit; Lovish Madaan; Rishabh Tiwari; Rachit Bansal; Sai Surya Duvvuri; Manzil Zaheer; Inderjit S Dhillon; David Brandfonbrener; Rishabh Agarwal; |
| 50 | Floq: Training Critics Via Flow-Matching for Scaling Compute in Value-Based RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce floq (flow-matching Q-functions), an approach that parameterizes the Q-function using a velocity field and trains it with techniques from flow-matching, typically used in generative modeling. |
Bhavya Kumar Agrawalla; Michal Nauman; Khush Agrawal; Aviral Kumar; |
| 51 | TRIM: Hybrid Inference Via Targeted Stepwise Routing in Multi-Step Reasoning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose TRIM (Targeted Routing in Multi-step reasoning tasks), which routes only critical steps to larger models while letting smaller models handle routine continuations. |
Vansh Kapoor; Aman Gupta; Hao Chen; Anurag Beniwal; Jing Huang; Aviral Kumar; |
| 52 | E3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Surprisingly, we find that most existing reasoning models do not extrapolate well. We show that one way to enable extrapolation is by training the LLM to perform in-context exploration: training the LLM to effectively spend its test time budget by chaining operations (such as generation, verification, refinement, etc.), or testing multiple hypotheses before it commits to an answer. |
Amrith Setlur; Matthew Y. R. Yang; Charlie Victor Snell; Jeremiah Greer; Ian Wu; Virginia Smith; Max Simchowitz; Aviral Kumar; |
| 53 | ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present ZIP-RC, an adaptive inference method that equips models with zero-overhead inference-time predictions of reward and cost. |
Rohin Manvi; Joey Hong; Tim Seyde; Maxime Labonne; Mathias Lechner; Sergey Levine; |
| 54 | Beyond Frequency: Scoring-Driven Debiasing for Object Detection Via Blueprint-Prompted Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a generation-based debiasing framework for object detection. |
Xinhao Cai; Liulei Li; Gensheng Pei; Tao Chen; Jinshan Pan; Yazhou Yao; Wenguan Wang; |
| 55 | NeuralOS: Towards Simulating Operating Systems Via Neural Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce NeuralOS, a neural framework that simulates graphical user interfaces (GUIs) of operating systems by directly predicting screen frames in response to user inputs such as mouse movements, clicks, and keyboard events. |
Luke Rivard; Sun Sun; Hongyu Guo; Wenhu Chen; Yuntian Deng; |
| 56 | Sapiens2 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. |
Rawal Khirodkar; He Wen; Julieta Martinez; Yuan Dong; Zhaoen Su; Shunsuke Saito; |
| 57 | Dual Goal Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce dual goal representations for goal-conditioned reinforcement learning (GCRL). |
Seohong Park; Deepinder Mann; Sergey Levine; |
| 58 | Transitive RL: Value Learning Via Divide and Conquer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present Transitive Reinforcement Learning (TRL), a new value learning algorithm based on a divide-and-conquer paradigm. |
Seohong Park; Aditya Oberai; Pranav Atreya; Sergey Levine; |
| 59 | XLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We conduct a comparative investigation on the scaling behavior of Transformers and xLSTM along the following lines, providing insights to guide future model design and deployment. |
Maximilian Beck; Kajetan Schweighofer; Sebastian Böck; Sebastian Lehner; Sepp Hochreiter; |
| 60 | EA3D: Event-Augmented 3D Diffusion for Generalizable Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce **EA3D**, an Event-Augmented 3D Diffusion framework for generalizable novel view synthesis from event streams and sparse RGB inputs.To further enhance scalability and generalization, we develop the Event-DL3DV dataset, a large-scale 3D benchmark pairing diverse synthetic event streams with photorealistic multi-view RGB images and depth maps. |
Wangbo Yu; Chaoran Feng; Jianing Li; Aofan Zhang; Zhenyu Tang; Mingyi Guo; Wei Zhang; Zhengyu Ma; Li Yuan; Yonghong Tian; |
| 61 | Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Detecting these semantic-level anomalies is essential for assessing the trustworthiness of AIGC media, especially in AIGC image analysis, explainable deepfake detection and semantic authenticity assessment.In this paper, we formalize \textbf{semantic anomaly detection and reasoning} for AIGC images and introduce \textbf{AnomReason}, a large-scale benchmark with structured annotations as quadruples \emph{(Name, Phenomenon, Reasoning, Severity)}.We will release code, metrics, data, and task-aligned models to support reproducible research on semantic authenticity and interpretable AIGC forensics. |
Chuangchuang Tan; Xiang Ming; Jinglu Wang; Renshuai Tao; Bin Li; Yunchao Wei; Yao Zhao; Yan Lu; |
| 62 | BindWeave: Subject-Consistent Video Generation Via Cross-Modal Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing video generation models still fall short in subject-consistent video generation due to an inherent difficulty in parsing prompts that specify complex spatial relationships, temporal logic, and interactions among multiple subjects. To address this issue, we propose BindWeave, a unified framework that handles a broad range of subject-to-video scenarios from single-subject cases to complex multi-subject scenes with heterogeneous entities. |
Zhaoyang Li; Dongjun Qian; Kai Su; qishuai diao; Xiangyang Xia; Chang Liu; Wenfei Yang; Tianzhu Zhang; Zehuan Yuan; |
| 63 | MergOPT: A Merge-Aware Optimizer for Robust Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches mainly address parameter conflicts at the merging stage and overlook the role of the fine-tuning process, which often leads to significant post-merge performance degradation. To address this limitation, we propose a novel merging-aware optimizer (abbreviated as MergOPT) that injects principled merge-induced parameter shifts into the weight update steps so that the fine-tuned model exhibits a more stable loss landscape under subsequent merging operations. |
Enneng Yang; Qun Yang; Peng Wang; Anke Tang; Guibing Guo; Li Shen; Xiaochun Cao; |
| 64 | Parameter-Efficient Reinforcement Learning Using Prefix Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study two methods for prefix optimization, using a naive algorithm that clusters prefixes and selects the best prefix (Prefix Clustering), and a method that optimizes the prefix by finetuning a lightweight adapter model with RL (Prefix-RL). |
Samy Jelassi; Itamar Rocha Filho; Rosie Zhao; Sham M. Kakade; Eran Malach; |
| 65 | ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present ChronoEdit, a framework that reframes image editing as a video generation problem. |
Jay Zhangjie Wu; Xuanchi Ren; Tianchang Shen; Tianshi Cao; Kai He; Yifan Lu; Ruiyuan Gao; Enze Xie; Shiyi Lan; Jose M. Alvarez; Jun Gao; Sanja Fidler; Zian Wang; Huan Ling; |
| 66 | SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Consequently, they fail to adequately assess model safety when handling knowledge-intensive, hazardous scenarios. To address this critical gap, we introduce SOSBench, a regulation-grounded, hazard-focused benchmark encompassing six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology. |
Fengqing Jiang; Fengbo Ma; Zhangchen Xu; Yuetai Li; Zixin Rao; Bhaskar Ramasubramanian; Luyao Niu; Bo Li; Xianyan Chen; Zhen Xiang; Radha Poovendran; |
| 67 | MATHMO: Automated Mathematical Modeling Through Adaptive Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In response, we introduce $\texttt{MATHMO}$, a novel adaptive search method designed to automatically navigate the complex decisions in selecting mathematical frameworks, specifying model formulations, and defining algorithmic procedures. |
Tennison Liu; Mihaela van der Schaar; |
| 68 | Detecting Data Contamination in LLMs Via In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Contamination Detection via Context (CoDeC), a practical and accurate method to detect and quantify training data contamination in large language models. |
Michał Zawalski; Meriem Boubdir; Klaudia Bałazy; Besmira Nushi; Pablo Ribalta; |
| 69 | Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a tractable computational framework that tracks and leverages curvature information during policy updates. |
Luckeciano Carvalho Melo; Alessandro Abate; Yarin Gal; |
| 70 | VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet current reasoning evaluations of multimodal large language models (MLLMs) often rely on text descriptions and allow language-based reasoning shortcuts, failing to measure genuine vision-centric reasoning. To address this, we introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories (e.g., quantitative shifts, spatial relations, attribute comparisons). |
Weiye Xu; Jiahao Wang; Weiyun Wang; Zhe Chen; Wengang Zhou; Aijun Yang; Lewei Lu; Houqiang Li; Xiaohua Wang; Xizhou Zhu; Wenhai Wang; Jifeng Dai; Jinguo Zhu; |
| 71 | YuE: Scaling Open Foundation Models for Long-Form Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We tackle the task of long-form music generation, particularly the challenging \textbf{lyrics-to-song} problem, by introducing \textbf{YuE (乐)}, a family of open-source music generation foundation models. |
Ruibin Yuan; Hanfeng Lin; Shuyue Guo; Ge Zhang; Jiahao Pan; Yongyi Zang; Haohe Liu; Yiming Liang; Wenye Ma; Xingjian Du; Xeron Du; Zhen Ye; Tianyu Zheng; Zhengxuan Jiang; Yinghao Ma; Minghao Liu; Zeyue Tian; Ziya Zhou; Liumeng Xue; Xingwei Qu; Yizhi LI; Shangda Wu; Tianhao Shen; Ziyang Ma; Jun Zhan; Chunhui Wang; Yatian Wang; Xiaowei Chi; Xinyue Zhang; Zhenzhu Yang; XiangzhouWang; Shansong Liu; Lingrui Mei; Peng Li; Junjie Wang; Jianwei Yu; Guojian Pang; Xu Li; Zihao Wang; Xiaohuan Zhou; Lijun Yu; Emmanouil Benetos; Yong Chen; Chenghua Lin; Xie Chen; Gus Xia; Zhaoxiang Zhang; Chao Zhang; Wenhu Chen; Xinyu Zhou; Xipeng Qiu; Roger Dannenberg; Jiaheng Liu; Jian Yang; Wenhao Huang; Wei Xue; Xu Tan; Yike Guo; |
| 72 | Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on a systematic pipeline of information-based feature selection and additive feature modeling, we introduce RAGLens, a lightweight hallucination detector that accurately flags unfaithful RAG outputs using LLM internal representations. |
Guangzhi Xiong; Zhenghao He; Bohan Liu; Sanchit Sinha; Aidong Zhang; |
| 73 | TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, massive model sizes demand heavy compute, restricting practical deployments and real-time applications. To address this, we propose TSPulse, an ultra-light pre-trained model (1M parameters) that performs disentangled masked reconstruction across spaces and abstraction levels, explicitly learning three disentangled views: temporal embeddings for fine-grained time analysis, spectral embeddings for frequency-aware fidelity, and semantic embeddings for high-level task understanding. |
Vijay Ekambaram; Subodh Kumar; Arindam Jati; Sumanta Mukherjee; Tomoya Sakai; Pankaj Dayama; Wesley M. Gifford; Jayant Kalagnanam; |
| 74 | WoW!: World Models in A Closed-Loop World Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: * To address this gap, we introduce WoW!We curate four closed-loop environments that rigorously evaluate diverse WMs, prioritize task success as the primary metric, and move beyond the common focus on visual quality; we also present the first data scaling law for world models in embodied settings. |
Jiahan Zhang; Muqing Jiang; Nanru Dai; TaiMing Lu; Arda Uzunoglu; Shunchi Zhang; Yana Wei; Jiahao Wang; Vishal M. Patel; Paul Pu Liang; Daniel Khashabi; Cheng Peng; Rama Chellappa; Tianmin Shu; Alan Yuille; Yilun Du; Jieneng Chen; |
| 75 | Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This property, referred to as \emph{asymmetric verification}, highlights the strong potential of test-time scaling. In this work, we study both sequential and parallel test-time scaling of deep search agents, motivated by the intuition that verification in this setting is often much easier than generation. |
Weihao Zeng; Keqing He; Chuqiao Kuang; Xiaoguang Li; Junxian He; |
| 76 | VAttention: Verified Sparse Attention Via Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe that top-$k$ and random sampling are complementary: top-$k$ performs well when attention scores are dominated by a few tokens, whereas random sampling provides better estimates when attention scores are relatively uniform. Building on this insight and leveraging the statistical guarantees of sampling, we introduce vAttention, the first practical sparse attention mechanism with user-specified $(\epsilon, \delta)$ guarantees on approximation accuracy. |
Aditya Desai; Kumar Krishna Agrawal; Shuo Yang; Alejandro Cuadron; Luis Gaspar Schroeder; Matei Zaharia; Joseph E. Gonzalez; Ion Stoica; |
| 77 | OptimalThinkingBench: Evaluating Over and Underthinking in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce OptimalThinkingBench, a unified benchmark that jointly evaluates overthinking and underthinking in LLMs and also encourages the development of optimally-thinking models that balance performance and efficiency. |
Pranjal Aggarwal; Seungone Kim; Jack Lanchantin; Sean Welleck; Jason E Weston; Ilia Kulikov; Swarnadeep Saha; |
| 78 | Music Flamingo: Scaling Music Understanding in Audio Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Music Flamingo, a novel large audio–language model, designed to advance music (including song) understanding in foundational audio models.We believe this work provides both a benchmark and a foundation for the community to build the next generation of models that engage with music as richly and meaningfully as humans do. |
Sreyan Ghosh; Arushi Goel; Lasha Koroshinadze; Sang-gil Lee; Zhifeng Kong; Joao Felipe Santos; Ramani Duraiswami; Dinesh Manocha; Wei Ping; Mohammad Shoeybi; Bryan Catanzaro; |
| 79 | The Limits of Inference Scaling Through Resampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Resampling cannot decrease this probability, so it imposes an upper bound to the accuracy of resampling-based inference scaling, regardless of compute budget. Our analysis shows that there is a strong correlation between the model’s single-sample accuracy and its false positive rate on HumanEval and MBPP, whose unit tests have limited coverage. |
Benedikt Stroebl; Sayash Kapoor; Arvind Narayanan; |
| 80 | Emergence of Superposition: Unveiling The Training Dynamics of Chain of Continuous Thought Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it remains unclear how the superposition mechanism is naturally learned from gradient-based training methods. To fill this gap, we theoretically analyze the training dynamics of a simplified two-layer transformer on the directed graph reachability problem to unveil how the superposition mechanism emerges during training in two training stages — (i) a *thought-generation* stage that autoregressively expands the continuous thought, and (ii) a *prediction* stage that converts the thought into the final answer. |
Hanlin Zhu; Shibo Hao; Zhiting Hu; Jiantao Jiao; Stuart Russell; Yuandong Tian; |
| 81 | Mixture of Contexts for Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We recast long-context video generation as an internal information retrieval task and propose a simple, learnable sparse attention routing module, Mixture of Contexts (MoC), as an effective long-term memory retrieval engine. |
Shengqu Cai; Ceyuan Yang; Lvmin Zhang; Yuwei Guo; Junfei Xiao; Ziyan Yang; Yinghao Xu; Zhenheng Yang; Alan Yuille; Leonidas Guibas; Maneesh Agrawala; Lu Jiang; Gordon Wetzstein; |
| 82 | Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we address this limitation by scaling up tool-based interactions and introduce Mini-o3, a system that executes deep, multi-turn reasoning—spanning tens of steps—and achieves state-of-the-art performance on challenging visual search tasks.First, we construct the Visual Probe Dataset, a collection of thousands of challenging visual search problems designed for exploratory reasoning. |
Xin Lai; Junyi Li; Wei Li; Tao Liu; Tianjian Li; Hengshuang Zhao; |
| 83 | RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current evaluations of this threat either lack support for adversarial testing in realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. |
Zeyi Liao; Jaylen Jones; Linxi Jiang; Yuting Ning; Eric Fosler-Lussier; Yu Su; Zhiqiang Lin; Huan Sun; |
| 84 | Mixture of Mini Experts: Overcoming The Linear Layer Bottleneck in Multiple Instance Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce MAMMOTH, a parameter-efficient, multi-head mixture of experts module designed to improve the performance of any MIL model with minimal alterations to the total number of parameters. |
Daniel Shao; Joel Runevic; Richard J. Chen; Drew FK Williamson; Ahrong Kim; Andrew H. Song; Faisal Mahmood; |
| 85 | Factuality Matters: When Image Generation and Editing Meet Structured Visuals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By releasing dataset, model, and benchmark, we aim to advance unified multimodal foundations for structured visuals.First, we construct a large-scale dataset of 1.3 million high-quality structured image pairs derived from executable drawing programs and augmented with chain-of-thought reasoning annotations. |
Le Zhuo; Songhao Han; Yuandong Pu; Boxiang Qiu; Sayak Paul; Yue Liao; Yihao Liu; Jie Shao; Xi Chen; Si Liu; Hongsheng Li; |
| 86 | UALM: Unified Audio Language Model for Understanding, Generation and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Unified Audio Language Model (UALM), which aims to unify audio understanding, text-to-audio generation, and multimodal reasoning in a single model. |
Jinchuan Tian; Sang-gil Lee; Zhifeng Kong; Sreyan Ghosh; Arushi Goel; Chao-Han Huck Yang; Wenliang Dai; Zihan Liu; Hanrong Ye; Shinji Watanabe; Mohammad Shoeybi; Bryan Catanzaro; Rafael Valle; Wei Ping; |
| 87 | Estimating Worst-Case Frontier Risks of Open-Weight LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the worst-case frontier risks of the OpenAI gpt-oss model. |
Eric Wallace; Olivia Watkins; Miles Wang; Kai Chen; Chris Koch; |
| 88 | VOGUE: Unified Understanding, Generation, and Editing for Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present VOGUE, a versatile framework that extends unified modeling to the video domain. |
Cong Wei; Quande Liu; Zixuan Ye; Qiulin Wang; Xintao Wang; Pengfei Wan; Kun Gai; Wenhu Chen; |
| 89 | GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that the interpretable nature of language often provides a much richer learning medium for LLMs, compared to policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. |
Lakshya A Agrawal; Shangyin Tan; Dilara Soylu; Noah Ziems; Rishi Khare; Krista Opsahl-Ong; Arnav Singhvi; Herumb Shandilya; Michael J Ryan; Meng Jiang; Christopher Potts; Koushik Sen; Alex Dimakis; Ion Stoica; Dan Klein; Matei Zaharia; Omar Khattab; |
| 90 | Bridging The Gap Between Promise and Performance for FP4 Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis shows that state-of-the-art methods struggle with FP4, due to two key issues: (1) NVFP4’s small group size \emph{provably} neutralizes traditional outlier mitigation techniques; (2) MXFP4’s power-of-two scale quantization severely degrades accuracy due to high induced error. To bridge this gap, we introduce Micro-Rotated-GPTQ (MR-GPTQ), a variant of the classic GPTQ quantization algorithm that tailors the quantization process to FP4’s unique properties, by using block-wise Hadamard transforms and format-specific optimizations. |
Vage Egiazarian; Roberto L. Castro; Denis Kuznedelev; Andrei Panferov; Saleh Ashkboos; Eldar Kurtic; Shubhra Pandit; Alexandre Noll Marques; Mark Kurtz; Torsten Hoefler; Dan Alistarh; |
| 91 | CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, Constructing such a benchmark is challenging owing to sparse interoperating code in real-world multi-programming-language projects, diverse Inter-process Communication (IPC) mechanisms, vast Foreign Function Interface (FFI) language pairs, and the difficulty of evaluation. To address this gap, we introduce CrossPL, the first benchmark for systematically assessing LLM performance of CPL code generation across two primary interoperation modes and 2534 tasks, specifically 1,982 IPC tasks spanning six languages and 522 Python–C FFI tasks. |
zhanhang xiong; Dongxia Wang; Yuekang Li; Xinyuan An; Wenhai Wang; |
| 92 | Rubrics As Rewards: Reinforcement Learning Beyond Verifiable Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce $\textbf{Rubrics as Rewards (\textit{RaR})}$, an on-policy reinforcement learning method that extends RLVR beyond verifiable domains by using rubric-based feedback. |
Anisha Gunjal; Anthony Wang; Elaine Lau; Vaskar Nath; Yunzhong He; Bing Liu; Sean M. Hendryx; |
| 93 | Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This insight exposes a core inefficiency in prevailing RL algorithms like GRPO, which apply optimization pressure agnostically and dilute the learning signal across all tokens. To address this, we propose Hierarchy-Aware Credit Assignment (HICRA), an algorithm that concentrates optimization efforts on high-impact planning tokens. |
Haozhe Wang; Qixin Xu; Che Liu; Junhong Wu; Fangzhen Lin; Wenhu Chen; |
| 94 | Does FLUX Already Know How to Perform Physically Plausible Image Composition? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SHINE, a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors. |
Shilin Lu; Zhuming Lian; Zihan Zhou; Shaocong Zhang; Chen Zhao; Adams Wai-Kin Kong; |
| 95 | ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textbf{ImagenWorld}, a benchmark of 3.6K condition sets spanning six core tasks (generation and editing, with single or multiple references) and six topical domains (artworks, photorealistic images, information graphics, textual graphics, computer graphics, and screenshots). |
Samin Mahdizadeh Sani; Max Ku; Nima Jamali; Matina Mahdizadeh Sani; Paria Khoshtab; Wei-Chieh Sun; Parnian Fazel; Zhi Rui Tam; Thomas Chong; Edisy Kin Wai Chan; Donald Wai Tong Tsang; Chiao-Wei Hsu; Lam Ting Wai; Ho Yin Sam Ng; Chiafeng Chu; Chak-Wing Mak; Keming Wu; Hiu Tung Wong; Yik Chun Ho; Chi Ruan; Zhuofeng Li; I-Sheng Fang; Shih-Ying Yeh; Ho Kei Cheng; Ping Nie; Wenhu Chen; |
| 96 | EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built EditReward, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. |
Keming Wu; Sicong Jiang; Max Ku; Ping Nie; Minghao Liu; Wenhu Chen; |
| 97 | Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce LGTM (Less Gaussians, Texture More), a feed-forward and pose-free framework that predicts both compact geometric primitives and associated per-primitive texture maps in a single forward pass without per-scene optimization. |
Yixing Lao; Xuyang BAI; Xiaoyang Wu; Nuoyuan Yan; Zixin Luo; Tian Fang; Jean-Daniel Nahmias; Yanghai Tsin; Shiwei Li; Hengshuang Zhao; |
| 98 | Let’s (not) Just Put Things in Context: Test-time Training for Long-context LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose query-only test-time training (qTTT): a cache-preserving adaptation that performs a single prefill to fix keys/values and then applies a handful of gradient updates to the query projections. |
Rachit Bansal; Aston Zhang; Rishabh Tiwari; Lovish Madaan; Sai Surya Duvvuri; Fnu Devvrit; David Brandfonbrener; David Alvarez-Melis; Prajjwal Bhargava; Mihir Kale; Samy Jelassi; |
| 99 | The CoT Encyclopedia: Analyzing, Predicting, and Controlling How A Reasoning Model Will Think Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the CoT Encyclopedia, a bottom-up framework for analyzing and steering model reasoning. |
Seongyun Lee; Seungone Kim; Minju Seo; Yongrae Jo; Dongyoung Go; Hyeonbin Hwang; Jinho Park; Xiang Yue; Sean Welleck; Graham Neubig; Moontae Lee; Minjoon Seo; |
| 100 | VisCoder2: Building Multi-Language Visualization Coding Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. |
Yuansheng Ni; Songcheng Cai; Xiangchao Chen; Jiarong Liang; Zhiheng Lyu; Jiaqi Deng; Kai Zou; Ping Nie; Fei Yuan; Xiang Yue; Wenhu Chen; |
| 101 | Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper systematically investigates the impact of RLVR on LLM reasoning. |
Xumeng Wen; Zihan Liu; Shun Zheng; Shengyu Ye; Zhirong Wu; Yang Wang; Zhijian Xu; Xiao Liang; Junjie Li; Ziming Miao; Jiang Bian; Mao Yang; |
| 102 | Unified Vision–Language Modeling Via Concept Space Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce vSONAR, a vision–language embedding space extended from the text-only embedding space SONAR, which supports 200 text languages and 37 speech languages. |
Yifu QIU; Paul-Ambroise Duquenne; Holger Schwenk; |
| 103 | Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that while instruction-tuning helps elicit underlying capabilities and models, it hurts a model’s ability to flexibly steer in-context. To mitigate these issues, we propose Spectrum Tuning, a post-training method using Spectrum Suite to improve steerability and distributional coverage. |
Taylor Sorensen; Benjamin Newman; Jared Moore; Chan Young Park; Jillian Fisher; Niloofar Mireshghallah; Liwei Jiang; Yejin Choi; |
| 104 | Topology-Preserved Auto-regressive Mesh Generation in The Manner of Weaving Silk Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation stems from previous mesh tokenization methods treating meshes as simple collections of equivalent triangles, lacking awareness of the overall topological structure during generation. To address this issue, we propose a novel mesh tokenization algorithm that provides a canonical topological framework through vertex layering and ordering, ensuring critical geometric properties including manifoldness, watertightness, face normal consistency, and part awareness in the generated meshes. |
Gaochao Song; Zibo Zhao; Haohan Weng; Jingbo Zeng; Rongfei Jia; Shenghua Gao; |
| 105 | AceReason-Nemotron 1.1: Advancing Math and Code Reasoning Through SFT and RL Synergy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models. |
Zihan Liu; Zhuolin Yang; Yang Chen; Chankyu Lee; Mohammad Shoeybi; Bryan Catanzaro; Wei Ping; |
| 106 | Critique-Coder: Enhancing Coder Models By Critique Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by them, we propose Critique Reinforcement Learning (CRL), where the model is tasked with generating a critique for a given (question, solution) pair. |
Chi Ruan; Dongfu Jiang; Yubo Wang; Wenhu Chen; |
| 107 | PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through an intuitive theoretical analysis, we introduce PLoP (Precise LoRA Placement), a lightweight method that allows automatic identification of module types where LoRA adapters should be placed, given a pretrained model and a finetuning task. |
Soufiane Hayou; Nikhil Ghosh; Bin Yu; |
| 108 | Reconciling Visual Perception and Generation in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present \textsc{GenRep}, a unified image understanding and synthesis model that jointly conducts discriminative learning and generative modeling in one training session. |
Liulei Li; Yi Yang; Wenguan Wang; |
| 109 | Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs. |
Sayash Kapoor; Benedikt Stroebl; Peter Kirgis; Nitya Nadgir; Zachary S Siegel; Boyi Wei; Tianci Xue; Ziru Chen; Felix Chen; Saiteja Utpala; Franck Ndzomga; Dheeraj Oruganty; Sophie Luskin; Kangheng Liu; Botao Yu; Amit Arora; Dongyoon Hahm; Harsh Trivedi; Huan Sun; Juyong Lee; Tengjun Jin; Yifan Mai; Yifei Zhou; Yuxuan Zhu; Rishi Bommasani; Daniel Kang; Dawn Song; Peter Henderson; Yu Su; Percy Liang; Arvind Narayanan; |
| 110 | GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GDPval, a benchmark evaluating AI model capabilities on real-world economically valuable knowledge-work tasks. |
Tejal Patwardhan; Rachel Dias; Elizabeth Proehl; Grace Kim; Michele Wang; Olivia Watkins; Simon Posada Fishman; Marwan Aljubeh; Phoebe Thacker; Laurance Fauconnet; Natalie S. Kim; Samuel Miserendino; Gildas Chabot; David Li; Patrick Chao; Michael Sharman; Alexandra Barr; Amelia Glaese; Jerry Tworek; |
| 111 | ARMs: Adaptive Red-Teaming Agent Against Multimodal Models with Plug-and-Play Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing red-teaming efforts are either restricted to a narrow set of adversarial patterns or depend heavily on manual engineering, lacking scalable exploration of emerging real-world adversarial strategies. To bridge this gap, we propose ARMs, an adaptive red-teaming agent that systematically conducts comprehensive risk assessments for VLMs. |
Zhaorun Chen; Xun Liu; Mintong Kang; Jiawei Zhang; Minzhou Pan; Shuang Yang; Bo Li; |
| 112 | Depth Anything 3: Recovering The Visual Space from Any Views Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. |
Haotong Lin; Sili Chen; Jun Hao Liew; Donny Y. Chen; Zhenyu Li; Yang Zhao; Sida Peng; Hengkai Guo; Xiaowei Zhou; Guang Shi; Jiashi Feng; Bingyi Kang; |
| 113 | K-Sort Eval: Efficient Preference Evaluation for Visual Generation Via Corrected VLM-as-a-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose K-Sort Eval, a reliable and efficient VLM-based evaluation framework that integrates posterior correction and dynamic matching. |
Zhikai Li; jiatong li; Xuewen Liu; Wangbo Zhao; Pan Du; Kaicheng Zhou; Qingyi Gu; Yang You; Zhen Dong; Kurt Keutzer; |
| 114 | SimpleToM: Exposing The Gap Between Explicit ToM Inference and Implicit ToM Application in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SimpleToM, a benchmark that advances ToM evaluation along two novel axes. |
Yuling Gu; Oyvind Tafjord; Hyunwoo Kim; Jared Moore; Ronan Le Bras; Peter Clark; Yejin Choi; |
| 115 | FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The advancement of open-source text-to-image (T2I) models has been hindered by the absence of large-scale, reasoning-focused datasets and comprehensive evaluation benchmarks, resulting in a performance gap compared to leading closed-source systems. To address this challenge, We introduce FLUX-Reason-6M and PRISM-Bench (Precise and Robust Image Synthesis Measurement Benchmark). |
Rongyao Fang; Aldrich Yu; Chengqi Duan; Linjiang Huang; Shuai Bai; Yuxuan Cai; Kun Wang; Si Liu; Xihui Liu; Hongsheng Li; |
| 116 | SimBench: Benchmarking The Ability of Large Language Models to Simulate Human Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By making progress measurable, we aim to accelerate the development of more faithful LLM simulators. |
Tiancheng Hu; Joachim Baumann; Lorenzo Lupo; Nigel Collier; Dirk Hovy; Paul Röttger; |
| 117 | Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes RLCR (Reinforcement Learning with Calibration Rewards), an approach to training reasoning models that jointly improves accuracy and calibrated confidence estimation. |
Mehul Damani; Isha Puri; Stewart Slocum; Idan Shenfeld; Leshem Choshen; Yoon Kim; Jacob Andreas; |
| 118 | From F(x) and G(x) to F(g(x)): LLMs Learn New Skills in RL By Composing Old Ones Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To mitigate data contamination and other confounding factors and to allow precise control over task complexity, we develop a synthetic framework for our investigation. |
Lifan Yuan; Weize Chen; Yuchen Zhang; Ganqu Cui; Hanbin Wang; Ziming You; Ning Ding; Zhiyuan Liu; Maosong Sun; Hao Peng; |
| 119 | HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce HackWorld, the first evaluation framework for systematically assessing CUAs’ capabilities in exploiting web application vulnerabilities through visual interaction. |
Xiaoxue Ren; Penghao Jiang; Kaixin Li; Zhiyong Huang; Xiaoning Du; Jiaojiao Jiang; Zhenchang Xing; Jiamou Sun; Terry Yue Zhuo; |
| 120 | Weak-to-Strong Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, inherent limitations of current generative models lead to an inevitable gap between generated data and real data. To address this, we propose Weak-to-Strong Diffusion (W2SD), a novel framework that utilizes the estimated gap between existing weak and strong models (i.e., weak-to-strong gap) to bridge the gap between an ideal model and a strong model. |
Lichen Bai; Masashi Sugiyama; Zeke Xie; |
| 121 | Scaling Agent Learning Via Experience Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While reinforcement learning (RL) can empower large language model (LLM) agents by enabling self-improvement through interaction, its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity, all of which obstruct the collection of scalable experience data. To address these challenges, we introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind to enable effective online RL training for autonomous agents. |
Zhaorun Chen; Zhuokai Zhao; Kai Zhang; Bo Liu; Qi Qi; Yifan Wu; Tarun Kalluri; Xuefei Cao; Yuanhao Xiong; Haibo Tong; Huaxiu Yao; Hengduo Li; Jiacheng Zhu; Xian Li; Dawn Song; Bo Li; Jason E Weston; Dat Huynh; |
| 122 | Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe that the visual token redundancy is higher in coarse manipulation phase than in fine-grained operations, and is strongly correlated with the action dynamic. Motivated by this observation, we propose Action-aware Dynamic Pruning (ADP), a multi-modal pruning framework that integrates text-driven token selection with action-aware trajectory gating. |
Xiaohuan Pei; Yuxing Chen; Siyu Xu; Yunke Wang; Yuheng Shi; Chang Xu; |
| 123 | Rethinking Causal Mask Attention for Vision-Language Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Strictly masking future positions for vision queries introduces overly rigid constraints, which hinder the model’s ability to leverage future context that often contains essential semantic cues for accurate inference. In this work, we empirically investigate how different causal masking strategies affect vision-language inference and then propose a family of future-aware attentions tailored for this setting. |
Xiaohuan Pei; Tao Huang; Yanxiang Ma; Chang Xu; |
| 124 | UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce UFO-4D, a unified feedforward framework to reconstruct a dense, explicit 4D representation from just a pair of unposed images. |
Junhwa Hur; Charles Herrmann; Songyou Peng; Philipp Henzler; Zeyu Ma; Todd Zickler; Deqing Sun; |
| 125 | Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. |
Zihan Wang; Jiashun Wang; Jeff Tan; Yiwen Zhao; Jessica K. Hodgins; Shubham Tulsiani; Deva Ramanan; |
| 126 | D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a real-to-sim-to-real engine that leverages the Gaussian Splat representations to build a differentiable engine, enabling object mass identification from real-world visual observations and robot control signals, while enabling grasping policy learning simultaneously. |
Haozhe Lou; Mingtong Zhang; Haoran Geng; Hanyang Zhou; Sicheng He; Zhiyuan Gao; Siheng Zhao; Jiageng Mao; Pieter Abbeel; Jitendra Malik; Daniel Seita; Yue Wang; |
| 127 | Characterizing Pattern Matching and Its Limits on Compositional Task Structures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We prove a tight sample complexity bound of learning a two-hop structure by identifying the exponent of the data scaling law for perfect in-domain generalization. |
Hoyeon Chang; Jinho Park; Hanseul Cho; Sohee Yang; Miyoung Ko; Hyeonbin Hwang; Seungpil Won; Dohaeng Lee; Youbin Ahn; Minjoon Seo; |
| 128 | Expanding The Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unlocking advanced reasoning in large language model agents is hindered by a scarcity of training data situated at the very frontier of their capabilities. We address this with a novel data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which conceptualizes this frontier as tasks an LLM cannot solve independently but can master with guidance. |
Xuanzhong Chen; Zile Qiao; Guoxin Chen; Liangcai Su; Zhen Zhang; Xinyu Wang; Yong Jiang; Pengjun Xie; Fei Huang; Ting Chen; Jingren Zhou; |
| 129 | Fantastic Pretraining Optimizers and Where to Find Them Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We posit that two methodological shortcomings have obscured fair comparisons and hindered practical adoption: (i) unequal hyperparameter tuning and (ii) limited or misleading evaluation setups. To address these two issues, we conduct a systematic study of ten deep learning optimizers across four model scales (0.1B–1.2B parameters) and data-to-model ratios (1–8$\times$ the Chinchilla optimum). |
Kaiyue Wen; David Leo Wright Hall; Tengyu Ma; Percy Liang; |
| 130 | Understanding The Emergence of Seemingly Useless Features in Next-Token Predictors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Trained Transformers have been shown to compute abstract features that appear redundant for predicting the immediate next token. We identify which components of the gradient signal from the next-token prediction objective give rise to this phenomenon, and we propose a method to estimate the influence of those components on the emergence of specific features. |
Mark Rofin; Jalal Naghiyev; Michael Hahn; |
| 131 | On The Reasoning Abilities of Masked Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we characterize what types of reasoning problems MDMs can provably solve and how efficiently. |
Anej Svete; Ashish Sabharwal; |
| 132 | Flow Matching Policy Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm that brings flow matching into the policy gradient framework. |
David McAllister; Songwei Ge; Brent Yi; Chung Min Kim; Ethan Weber; Hongsuk Choi; Haiwen Feng; Angjoo Kanazawa; |
| 133 | Narrow Finetuning Leaves Clearly Readable Traces in The Activation Differences Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show that narrow finetuning creates easily readable biases in LLM activations that can be detected using simple model diffing tools, suggesting that the finetuning data is overrepresented in the model’s activations. |
Julian Minder; Clément Dumas; Stewart Slocum; Helena Casademunt; Cameron Holmes; Robert West; Neel Nanda; |
| 134 | IC-Custom: Diverse Image Customization Via In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios. To overcome these limitations, we propose IC-Custom, a unified framework that seamlessly integrates position-aware and position-free image customization through in-context learning. |
Yaowei Li; Xiaoyu Li; Zhaoyang Zhang; Yuxuan Bian; Gan Liu; Xinyuan Li; Jiale Xu; Wenbo Hu; yating liu; Lingen Li; Jing Cai; Yuexian Zou; Yancheng He; Ying Shan; |
| 135 | Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards Into Open-Weight LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate whether filtering text about dual-use topics from training data can prevent unwanted capabilities and serve as a more tamper-resistant safeguard. |
Kyle O’Brien; Stephen Casper; Quentin Gregory Anthony; Tomek Korbak; Robert Kirk; Xander Davies; Ishan Mishra; Geoffrey Irving; Yarin Gal; Stella Biderman; |
| 136 | CoDi: Subject-Consistent and Pose-Diverse Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the limitation, we propose subject-Consistent and pose-Diverse T2I framework, dubbed as CoDi, that enables consistent subject generation with diverse pose and layout. |
Zhanxin Gao; Beier Zhu; Liangyao; Jian Yang; Ying Tai; |
| 137 | Stabilizing Off-Policy Reinforcement Learning for LLMs Via Balanced Policy Optimization with Adaptive Clipping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through theoretical and empirical analysis, we identify two key insights: (i) an imbalance in optimization, where negative-advantage samples dominate the policy gradient, suppressing useful behaviors and risking gradient explosions; and (ii) the derived Entropy-Clip Rule, which reveals that the fixed clipping mechanism in PPO-like objectives systematically blocks entropy-increasing updates, thereby driving the policy toward over-exploitation at the expense of exploration. Building on these insights, we propose BAlanced Policy Optimization with Adaptive Clipping (BAPO), a simple yet effective method that dynamically adjusts clipping bounds to adaptively re-balance positive and negative contributions, preserve entropy, and stabilize RL optimization. |
Zhiheng Xi; Xin Guo; Yang Nan; Enyu Zhou; Junrui Shen; Wenxiang Chen; Jiaqi Liu; Jixuan Huang; Xun Deng; Zhihao Zhang; Honglin Guo; Zhikai Lei; Miao Zheng; Guoteng Wang; Peng Sun; Rui Zheng; Hang Yan; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 138 | Critique-RL: Training Critiquing Language Models Through Two-Stage RL for Improved Discrimination and Constructive Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches typically rely on stronger supervisors for annotating critique data. To address this, we propose Critique-RL, an online RL approach for developing critiquing language models without stronger supervision. |
Zhiheng Xi; Jixuan Huang; Xin Guo; Boyang Hong; Dingwen Yang; Xiaoran Fan; Shuo Li; Zehui Chen; Junjie Ye; Siyu Yuan; Zhengyin Du; Xuesong Yao; Yufei Xu; Jiecao Chen; Rui Zheng; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 139 | Cost-of-Pass: An Economic Framework for Evaluating Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Building on production theory, we develop an economically grounded framework for evaluating language models by combining accuracy and inference cost. |
Mehmet Hamza Erol; Batu El; Mirac Suzgun; Mert Yuksekgonul; James Zou; |
| 140 | DAComp: Benchmarking Data Agents Across The Full Data Intelligence Lifecycle Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark of 236 tasks that mirrors these complex workflows. |
Fangyu Lei; Jinxiang Meng; Junjie zhao; Yiming Huang; Yitong Zhang; Jianwen Luo; Xin Zou; Ruiyi Yang; Wenbo Shi; Yan Gao; Shizhu He; Jun Zhao; Zuo Wang; Qian Liu; Yang Wang; WANG KE; Kang Liu; |
| 141 | Emergent Misalignment Is Easy, Narrow Misalignment Is Hard Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Concerningly, a pre-registered survey of experts failed to predict this result, highlighting our poor understanding of the inductive biases governing learning and generalisation in LLMs. We use emergent misalignment (EM) as a case study to investigate these inductive biases, and find that although models can learn the narrow dataset task, the general solution is measurably more stable and more efficient. |
Anna Soligo; Edward Turner; Senthooran Rajamanoharan; Neel Nanda; |
| 142 | Don’t Throw Away Your Pretrained Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We aim to make the best of both worlds through model collaboration, where different models in the training pipeline collaborate and complement each other. |
Shangbin Feng; Wenhao Yu; Yike Wang; Hongming Zhang; Yulia Tsvetkov; Dong Yu; |
| 143 | Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To facilitate the study of human-agent collaboration, we introduce Collaborative Gym (Co-Gym), an open framework for developing and evaluating collaborative agents that engage in bidirectional communication with humans while interacting with task environments. |
Yijia Shao; Vinay Samuel; Yucheng Jiang; John Yang; Diyi Yang; |
| 144 | Don’t Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs Via Beam Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new family of methods that employ beam search to generate candidates for consistency-based UQ, yielding improved performance and reduced variance compared to multinomial sampling. |
Ekaterina Fadeeva; Maiya Goloburda; Aleksandr Rubashevskii; Roman Vashurin; Artem Shelmanov; Preslav Nakov; Mrinmaya Sachan; Maxim Panov; |
| 145 | Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MedVLSynther, a rubric-guided generator-verifier framework that synthesizes high-quality multiple-choice VQA items directly from open biomedical literature by conditioning on figures, captions, and in-text references. |
Xiaoke Huang; Ningsen Wang; Hui Liu; Xianfeng Tang; Yuyin Zhou; |
| 146 | ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV) caching, and incoherent generation arising from learning dependencies over an intractable space of token combinations. To address these limitations, we introduce ReFusion, a novel masked diffusion model that achieves superior performance and efficiency by elevating parallel decoding from the token level to a higher slot level, where each slot is a fixed-length, contiguous sub-sequence. |
Jia-Nan Li; Jian Guan; Wei Wu; Chongxuan Li; |
| 147 | Steering Evaluation-Aware Language Models To Act Like They Are Deployed Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show that adding a steering vector to an LLM’s activations can suppress evaluation-awareness and make the model act like it is deployed during evaluation. |
Tim Tian Hua; Andrew Qin; Samuel Marks; Neel Nanda; |
| 148 | Doxing Via The Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify a novel category of privacy leakage in MLRMs: Adversaries can infer sensitive geolocation information, such as users’ home addresses or neighborhoods, from user-generated images, including selfies captured in private settings. |
Weidi Luo; Tianyu Lu; Qiming Zhang; Xiaogeng Liu; Bin Hu; Yue Zhao; Jieyu Zhao; Song Gao; Patrick McDaniel; Zhen Xiang; Chaowei Xiao; |
| 149 | From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Each step marches forward in a strict left-to-right sequence, causing small errors to accumulate and compromise the final image. In this work, we reimagine this process with TensorAR, a decoder-only AR model that shifts from predicting discrete tokens to predicting overlapping tensor windows. |
Cheng Cheng; Lin Song; Di An; Yicheng Xiao; Xuchong Zhang; Hongbin Sun; Ying Shan; |
| 150 | La-Proteina: Atomistic Protein Generation Via Partially Latent Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce La-Proteina for atomistic protein design based on a novel partially latent protein representation: coarse backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables of fixed dimensionality, thereby effectively side-stepping challenges of explicit side-chain representations. |
Tomas Geffner; Kieran Didi; Zhonglin Cao; Danny Reidenbach; Zuobai Zhang; Christian Dallago; Emine Kucukbenli; Karsten Kreis; Arash Vahdat; |
| 151 | Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model Via Decoupled Co-Refinement Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Feedforward models for novel view synthesis (NVS) have recently advanced by transformer-based methods like LVSM, using attention among all input and target views. In this work, we argue that its full self-attention design is suboptimal, suffering from quadratic complexity with respect to the number of input views and rigid parameter sharing among heterogeneous tokens. |
Xiaosong Jia; Yihang Sun; Junqi You; Songbur Wong; Zichen Zou; Junchi Yan; Zuxuan Wu; Yu-Gang Jiang; |
| 152 | R-Zero: Self-Evolving Reasoning LLM from Zero Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. |
Chengsong Huang; Wenhao Yu; Xiaoyang Wang; Hongming Zhang; Zongxia Li; Ruosen Li; Jiaxin Huang; Haitao Mi; Dong Yu; |
| 153 | StreamingVLM: Real-Time Understanding for Infinite Video Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce **StreamingVLM**, a model designed for real-time, stable understanding of infinite visual input. |
Ruyi Xu; Guangxuan Xiao; Yukang Chen; Liuning He; Kelly Peng; Yao Lu; Song Han; |
| 154 | Scaling Generalist Data-Analytic Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces DataMind, a scalable data synthesis and agent training recipe designed to build generalist data-analytic agents. |
Shuofei Qiao; Yanqiu Zhao; Zhisong Qiu; Xiaobin Wang; Jintian Zhang; Zhao Bin; Ningyu Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen; |
| 155 | Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces **Story-Iter**, a new training-free iterative paradigm to enhance long-story generation. |
Jiawei Mao; Xiaoke Huang; Yunfei Xie; Yuanqi Chang; Mude Hui; Bingjie Xu; Zeyu Zheng; Zirui Wang; Cihang Xie; Yuyin Zhou; |
| 156 | From “Sure to “Sorry: Detecting Jailbreak in Large Vision Language Model Via JailNeurons Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing detection methods are either limited to detecting specific attack types or are too time-consuming, making them impractical for real-world deployment. To address these challenges, we propose \textbf{JDJN} (\textbf{J}ailbreak \textbf{D}etection via \textbf{J}ail\textbf{N}eurons), a novel jailbreak detection method for LVLMs. |
Yuyou Gan; Qingming Li; Junhao Li; Zhi Chen; Jinbao Li; Xiaoming Li; Shouling Ji; |
| 157 | AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making Via Multi-Turn RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the open-source community currently lacks a unified RL framework capable of training agents from scratch across diverse and realistic environments. To bridge this gap, we introduce AgentGym-RL, a modular and decoupled framework specifically designed for RL-based agent in multi-turn decision-making tasks. |
Zhiheng Xi; Jixuan Huang; Chenyang Liao; Baodai Huang; Jiaqi Liu; Honglin Guo; yajie yang; Rui Zheng; Junjie Ye; Jiazheng Zhang; Wenxiang Chen; Wei He; Yiwen Ding; Guanyu Li; Zehui Chen; Zhengyin Du; Xuesong Yao; Yufei Xu; Jiecao Chen; Tao Gui; Zuxuan Wu; Qi Zhang; Xuanjing Huang; Yu-Gang Jiang; |
| 158 | ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For instance, inbetweening approaches struggle with large motions, while colorization methods require dense per-frame sketches. To address this, we introduce ToonComposer, a generative model that unifies inbetweening and colorization into a single post-keyframing stage. |
Lingen Li; Guangzhi Wang; Zhaoyang Zhang; Yaowei Li; Xiaoyu Li; Qi Dou; Jinwei Gu; Tianfan Xue; Ying Shan; |
| 159 | Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we study the gradient geometry of diffusion models, which can already produce high-quality replay data. |
Zekun Wang; Anant Gupta; Zihan Dong; Christopher J. MacLellan; |
| 160 | CapRL: Stimulating Dense Image Caption Capabilities Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To overcome the limitation of SFT, we propose applying the Reinforcement Learning with Verifiable Rewards (RLVR) paradigm to the open-ended task of image captioning. |
Long Xing; Xiaoyi Dong; Yuhang Zang; Yuhang Cao; Jianze Liang; Qidong Huang; Jiaqi Wang; Feng Wu; Dahua Lin; |
| 161 | ScaleCap: Scalable Image Captioning Via Dual-Modality Debiasing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents ScaleCap, a scalable image captioning strategy that generates comprehensive and detailed image captions. |
Long Xing; Qidong Huang; Xiaoyi Dong; Pan Zhang; Yuhang Zang; Yuhang Cao; Jinsong Li; Shuangrui Ding; Weiming Zhang; Nenghai Yu; Jiaqi Wang; Feng Wu; Dahua Lin; |
| 162 | Personalized Reasoning: Just-in-time Personalization and Why LLMs Fail at It Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks using psychologically-grounded personas with sparse preferences. |
Shuyue Stella Li; Avinandan Bose; Faeze Brahman; Simon Shaolei Du; Pang Wei Koh; Maryam Fazel; Yulia Tsvetkov; |
| 163 | How Reinforcement Learning After Next-token Prediction Facilitates Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent advances in reasoning domains with neural networks have primarily been enabled by a training recipe that optimizes Large Language Models, previously trained to predict the next-token in a sequence, with reinforcement learning algorithms. We introduce a framework to study the success of this paradigm, and we theoretically expose the optimization mechanisms by which reinforcement learning improves over next-token prediction in this setting. |
Nikolaos Tsilivis; Eran Malach; Karen Ullrich; Julia Kempe; |
| 164 | Cyber-Zero: Training Cybersecurity Agents Without Runtime Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. |
Terry Yue Zhuo; Dingmin Wang; Hantian Ding; Varun Kumar; Zijian Wang; |
| 165 | ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose ReCogDrive, a novel **Re**inforced **Cog**nitive framework for end-to-end autonomous **Driv**ing, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. |
Yongkang Li; Kaixin Xiong; Xiangyu Guo; Fang Li; Sixu Yan; Gangwei Xu; Lijun Zhou; Long Chen; Haiyang Sun; BING WANG; Kun Ma; Guang Chen; Hangjun Ye; Wenyu Liu; Xinggang Wang; |
| 166 | CoAct-1: Computer-using Multi-agent System with Coding Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a more robust and flexible paradigm: enabling agents to use coding as an enhanced action. |
Linxin Song; Yutong Dai; Viraj Prabhu; Jieyu Zhang; Taiwei Shi; Li Li; Junnan Li; silvio savarese; Zeyuan Chen; Jieyu Zhao; Ran Xu; Caiming Xiong; |
| 167 | ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A key limitation, however, is their failure to learn from this accumulated experience, forcing them to discard valuable insights and repeat past errors. Unlike prior works that primarily store raw experience or successful routines, we propose ReasoningBank, a novel memory framework that allows an agent to self-curate generalizable reasoning strategies from both its successful and failed experiences for future leverage. |
Siru Ouyang; Jun Yan; I-Hung Hsu; Yanfei Chen; Ke Jiang; Zifeng Wang; Rujun Han; Long Le; Samira Daruki; Xiangru Tang; Vishy Tirumalashetty; George Lee; Mahsan Rofouei; Hangfei Lin; Jiawei Han; Chen-Yu Lee; Tomas Pfister; |
| 168 | Mode-conditioning Unlocks Superior Test-time Compute Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the mode-conditioning (ModC) framework, which explicitly allocates test-time compute across reasoning modes using either specialist models or mode-specific prefixes. |
Chen Henry Wu; Sachin Goyal; Aditi Raghunathan; |
| 169 | Watermark-based Attribution of AI-Generated Content Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, attribution–the ability to trace back to the user of a generative AI (GenAI) service who created the given AI-generated content–remains largely unexplored despite its growing importance. In this work, we aim to bridge this gap by conducting the first systematic study on watermark-based, user-level attribution of AI-generated content. |
Zhengyuan Jiang; Moyang Guo; Yuepeng Hu; Yupu Wang; Neil Zhenqiang Gong; |
| 170 | VADv2: End-to-End Autonomous Driving Via Probabilistic Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a probabilistic planning model for end-to-end autonomous driving, termed VADv2.We also provide comprehensive evaluations on the NAVSIM dataset and a large-scale 3DGS-based benchmark, demonstrating its effectiveness in real-world applications. |
Bo Jiang; Shaoyu Chen; Hao Gao; Bencheng Liao; Qian Zhang; Wenyu Liu; Xinggang Wang; |
| 171 | Deep Think with Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. |
Yichao Fu; Xuewei Wang; Hao Zhang; Yuandong Tian; Jiawei Zhao; |
| 172 | Uniform Discrete Diffusion with Metric Path for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit discrete generative modeling and present Uniform discRete diffuSion with metric pAth (URSA), a simple yet powerful framework that bridges the gap with continuous approaches for the scalable video generation. |
Haoge Deng; Ting Pan; Fan Zhang; Yang Liu; Zhuoyan Luo; Yufeng Cui; Wenxuan Wang; Chunhua Shen; Shiguang Shan; Zhaoxiang Zhang; Xinlong Wang; |
| 173 | AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce AlphaBench, the first systematic benchmark for evaluating LLMs in FAFM. |
Haochen Luo; Ho Tin Ko; Jiandong Chen; David Sun; Yuan Zhang; Chen Liu; |
| 174 | Theory of Space: Can Foundation Models Construct Spatial Beliefs Through Active Exploration? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In particular, it remains unclear whether and how these models can decide what to observe next in order to build and maintain a coherent spatial belief over time. We therefore propose \tos, defined as an agent’s ability to actively acquire information through self-directed, active exploration and to construct, revise, and exploit a spatial belief from sequential, partial observations. |
Pingyue Zhang; Zihan Huang; Yue Wang; Jieyu Zhang; Letian Xue; Zihan Wang; Qineng Wang; Keshigeyan Chandrasegaran; Ruohan Zhang; Yejin Choi; Ranjay Krishna; Jiajun Wu; Li Fei-Fei; Manling Li; |
| 175 | Deep Hierarchical Learning with Nested Subspace Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose *Nested Subspace Networks (NSNs)*, a novel architectural paradigm that enables a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets at inference time. |
Paulius Rauba; Mihaela van der Schaar; |
| 176 | RLBFF: Binary Flexible Feedback to Bridge Between Human Feedback & Verifiable Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Reinforcement Learning with Binary Flexible Feedback (RLBFF), which combines the versatility of human-driven preferences with the precision of rule-based verification, enabling reward models to capture nuanced aspects of response quality beyond mere correctness. |
Zhilin Wang; Jiaqi Zeng; Olivier Delalleau; Ellie Evans; Daniel Egert; Hoo-Chang Shin; Felipe Soares; Yi Dong; Oleksii Kuchaiev; |
| 177 | How to Train Data-efficient LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study data-efficient approaches for pre-training LLMs, \ie, techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. |
Noveen Sachdeva; Benjamin Coleman; Wang-Cheng Kang; Jianmo Ni; Lichan Hong; Ed H. Chi; James Caverlee; Julian McAuley; Derek Zhiyuan Cheng; |
| 178 | Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In parallel, text describing the visual world proves crucial, though its performance impact saturates rapidly. Leveraging these insights, we propose a data-centric recipe for pre-training vision-aware LLMs and verify it in 1T token scale pre-training. |
Junlin Han; Shengbang Tong; David Fan; Yufan Ren; Koustuv Sinha; Philip Torr; Filippos Kokkinos; |
| 179 | Generative View Stitching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In camera-guided video generation with a predefined camera trajectory, this limitation leads to collisions with the generated scene, after which autoregression quickly collapses. To address this, we propose Generative View Stitching (GVS), which samples the entire sequence in parallel such that the generated scene is faithful to every part of the predefined camera trajectory. |
Chonghyuk Song; Michal Stary; Boyuan Chen; George Kopanas; Vincent Sitzmann; |
| 180 | WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose WebGen-Agent, a novel website-generation agent that leverages comprehensive and multi-level visual feedback to iteratively generate and refine the website codebase. |
Zimu Lu; Houxing Ren; Yunqiao Yang; Ke Wang; Zhuofan Zong; Junting Pan; Mingjie Zhan; Hongsheng Li; |
| 181 | Demystifying Deep Search: A Holistic Evaluation with Hint-free Multi-Hop Questions and Factorised Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Second, evaluation is typically reduced to a single pass rate, which collapses diverse behaviors into one score and obscures whether failures stem from inadequate search, poor knowledge use, or inappropriate refusal. To address these issues, we present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox that ensures full traceability of model actions, and a holistic evaluation framework that separates search sufficiency, knowledge utilization, and refusal behavior. |
Maojia Song; Liu Renhang; Xinyu Wang; Yong Jiang; Pengjun Xie; Fei Huang; Soujanya Poria; Jingren Zhou; |
| 182 | Front-Loading Reasoning: The Synergy Between Pretraining and Post-Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Could earlier inclusion risk overfitting and harm generalization, or instead establish durable foundations that later fine-tuning cannot recover? To address these questions, we conduct the first systematic study of how reasoning data—varying in scale, diversity, and quality—affects LLM performance when introduced at different stages of training. |
Syeda Nahida Akter; Shrimai Prabhumoye; Eric Nyberg; Mostofa Patwary; Mohammad Shoeybi; Yejin Choi; Bryan Catanzaro; |
| 183 | OpenThoughts: Data Recipes for Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Yet, there are still many open questions about the best train- ing recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training reasoning models. |
Etash Kumar Guha; Ryan Marten; Sedrick Keh; Negin Raoof; Georgios Smyrnis; Hritik Bansal; Marianna Nezhurina; Jean Mercat; Trung Vu; Zayne Rea Sprague; Ashima Suvarna; Benjamin Feuer; Leon Liangyu Chen; Zaid Khan; Eric Frankel; Sachin Grover; Caroline Choi; Niklas Muennighoff; Shiye Su; Wanjia Zhao; John Yang; Shreyas Pimpalgaonkar; Kartik sharma; Charlie Cheng-Jie Ji; Yichuan Deng; Sarah M Pratt; Vivek Ramanujan; Jon Saad-Falcon; Stutee Acharya; Jeffrey Li; Achal Dave; Alon Albalak; Kushal Arora; Blake Wulfe; Chinmay Hegde; Greg Durrett; Sewoong Oh; Mohit Bansal; Saadia Gabriel; Aditya Grover; Kai-Wei Chang; Vaishaal Shankar; Aaron Gokaslan; Mike A Merrill; Tatsunori Hashimoto; Yejin Choi; Jenia Jitsev; Reinhard Heckel; Maheswaran Sathiamoorthy; Alex Dimakis; Ludwig Schmidt; |
| 184 | Locality-Attending Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we seek to enhance the segmentation performance of vision transformers after being trained using the usual image-level classification objective. |
Sina Hajimiri; Farzad Beizaee; Fereshteh Shakeri; Christian Desrosiers; Ismail Ben Ayed; Jose Dolz; |
| 185 | Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present Terminal-Bench 1.5: a carefully curated hard benchmark composed of 74 tasks in computer terminal environments inspired by problems from real workflows. |
Mike A Merrill; Alexander Glenn Shaw; Nicholas Carlini; Boxuan Li; Harsh Raj; Ivan Bercovich; Lin Shi; Jeong Yeon Shin; Thomas Walshe; E. Kelly Buchanan; Junhong Shen; Guanghao Ye; Haowei Lin; Jason Poulos; Maoyu Wang; Jenia Jitsev; Marianna Nezhurina; Di Lu; Orfeas Menis Mastromichalakis; Zhiwei Xu; Zizhao Chen; Yue Liu; Robert Zhang; Leon Liangyu Chen; Anurag Kashyap; Jan-Lucas Uslu; Jeffrey Li; Jianbo Wu; Minghao Yan; Song Bian; Vedang Sharma; Ke Sun; Steven Dillmann; Akshay Anand; Andrew Lanpouthakoun; Bardia Koopah; Changran Hu; Etash Kumar Guha; Gabriel H. S. Dreiman; Jiacheng Zhu; Karl Krauth; Li Zhong; Niklas Muennighoff; Robert Kwesi Amanfu; Shangyin Tan; Shreyas Pimpalgaonkar; Tushar Aggarwal; Xiangning Lin; Xin Lan; Xuandong Zhao; Yiqing Liang; Yuanli Wang; Zilong Wang; Changzhi Zhou; David Heineman; Hange Liu; Harsh Trivedi; John Yang; Junhong Lin; Manish Shetty; Michael Yang; Nabil Omi; Negin Raoof; Shanda Li; Terry Yue Zhuo; Wuwei Lin; Yiwei Dai; Yuxin Wang; Wenhao Chai; Shang Zhou; Dariush Wahdany; Ziyu She; Jiaming Hu; Zhikang Dong; Yuxuan Zhu; Sasha Cui; Ahson Saiyed; Arinbjörn Kolbeinsson; Christopher Michael Rytting; Ryan Marten; Yixin Wang; Alex Dimakis; Andy Konwinski; Ludwig Schmidt; |
| 186 | What’s The Plan? Metrics for Implicit Planning in LLMs and Their Application to Rhyme Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose much simpler techniques for assessing implicit planning in language models. |
Jim Maar; Denis Paperno; Callum Stuart McDougall; Neel Nanda; |
| 187 | Thought Branches: Interpreting LLM Reasoning Requires Resampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a resilience metric and counterfactual importance that repeatedly resample to remove sentences such that similar content doesn’t reappear downstream. |
Uzay Macar; Paul C. Bogdan; Senthooran Rajamanoharan; Neel Nanda; |
| 188 | Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Leading uncertainty estimation methods generate and analyze multiple output sequences, which is computationally expensive and impractical at scale. In this work, we inspect the theoretical foundations of these methods and explore new directions to enhance computational efficiency. |
Lukas Aichberger; Kajetan Schweighofer; Sepp Hochreiter; |
| 189 | OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a curation and synthesis pipeline that generates 24M single-modal and omni-modal conversations. |
Hanrong Ye; Chao-Han Huck Yang; Arushi Goel; Wei Huang; Zhen Wan; Jinchuan Tian; An-Chieh Cheng; Ligeng Zhu; Yuanhang Su; Yuming Lou; Yong-Xiang Lin; Dong Yang; Sreyan Ghosh; Zhijian Liu; Yukang Chen; Ehsan Jahangiri; Ambrish Dantrey; Daguang Xu; Ehsan Hosseini-Asl; Seyed Danial Mohseni Taheri; Vidya Nariyambut Murali; Sifei Liu; Yao Lu; Oluwatobi Olabiyi; Yu-Chiang Frank Wang; Rafael Valle; Bryan Catanzaro; Andrew Tao; Song Han; Jan Kautz; Hongxu Yin; Pavlo Molchanov; |
| 190 | A Problem-Oriented Perspective and Anchor Verification for Code Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, we observe that code optimization presents greater challenges compared to code generation, often accompanied by optimization tax. Recognizing the inherent trade-offs in correctness and efficiency, we introduce a novel anchor verification framework to mitigate this optimization tax. |
Tong Ye; Tengfei Ma; Xuhong Zhang; Hang Yu; Jianwei Yin; Wenhai Wang; |
| 191 | Decoupled Q-Chunking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our key insight is to decouple the chunk length of the critic from that of the policy, allowing the policy to operate over shorter action chunks. We propose a novel algorithm that achieves this by optimizing the policy against a distilled critic for partial action chunks, constructed by optimistically backing up from the original chunked critic to approximate the maximum value achievable when a partial action chunk is extended to a complete one. |
Qiyang Li; Seohong Park; Sergey Levine; |
| 192 | Q-Learning with Adjoint Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion/flow-matching based policy with respect to a parameterized value function (i.e., the critic $Q_\phi(s, a)$). |
Qiyang Li; Sergey Levine; |
| 193 | Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we intro- duce Nemotron-CC-Math, a large-scale, high-quality mathematical corpus constructed from Common Crawl using a novel, domain-agnostic pipeline specifically designed for robust scientific text extraction.We collected a large, high-quality math corpus, namely Nemotron-CC-Math-3+(133B tokens) and Nemotron-CC-Math-4+ (52B tokens). |
Rabeeh Karimi mahabadi; Sanjeev Satheesh; Shrimai Prabhumoye; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; |
| 194 | ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs.We will release data, models, and code to advance future research. |
Zhaoyang Liu; JingJing Xie; Zichen Ding; Zehao Li; Bowen Yang; Zhenyu Wu; Xuehui Wang; Qiushi Sun; Shi Liu; Weiyun Wang; Shenglong Ye; Qingyun Li; Zeyue Tian; Gen Luo; Xiangyu Yue; Biqing Qi; Kai Chen; Bowen Zhou; Yu Qiao; Qifeng Chen; Wenhai Wang; |
| 195 | On The Theoretical Limitations of Embedding-Based Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we demonstrate that we may encounter these theoretical limitations in realistic settings with extremely simple queries.We then create a realistic dataset called LIMIT that stress tests models based on these theoretical results, and observe that even state-of-the-art models fail on this dataset despite the simple nature of the task. |
Orion Weller; Michael Boratko; Iftekhar Naim; Jinhyuk Lee; |
| 196 | Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate whether distributional properties of LLM predictions can be recovered _without_ explicit autoregressive generation. |
Julianna Piskorz; Kasia Kobalczyk; Mihaela van der Schaar; |
| 197 | ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: It raises an intriguing question: do modern vision-language models (VLMs), trained largely in a disembodied manner, exhibit signs of embodied cognition? To investigate this, we introduce **ENACT**, a benchmark that probes this question through world modeling from egocentric interaction. |
Qineng Wang; Wenlong Huang; Yu Zhou; Hang Yin; Tianwei Bao; Jianwen Lyu; Weiyu Liu; Ruohan Zhang; Jiajun Wu; Li Fei-Fei; Manling Li; |
| 198 | ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MIDR-3D, the first unified end-to-end pipeline that simultaneously reconstructs complete 3D geometry, spatially-varying physically-based materials, and environment illumination from sparse multi-view images in under one second. |
Jan-Niklas Dihlmann; Mark Boss; Simon Donné; Andreas Engelhardt; Hendrik Lensch; Varun Jampani; |
| 199 | WebSeer: Training Deeper Search Agents Through Reinforcement Learning with Self-Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism.Specifically, we construct a large dataset annotated with reflection patterns and design a two-stage training framework that unifies cold start and reinforcement learning within the self-reflection paradigm for real-world web-based environments, which enables the model to generate longer and more reflective tool-use trajectories. |
Guanzhong He; Zhen Yang; Jinxin Liu; Bin Xu; Lei Hou; Juanzi Li; |
| 200 | True Self-Supervised Novel View Synthesis Is Transferable Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify that the key criterion for determining whether a model is truly capable of novel view synthesis (NVS) is transferability: Whether any pose representation extracted from one video sequence can be used to re-render the same camera trajectory in another. |
Thomas Mitchel; Hyunwoo Ryu; Vincent Sitzmann; |
| 201 | Scalable Offline Model-Based RL with Action Chunks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study whether model-based reinforcement learning (RL), in particular model-based value expansion, can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL. |
Kwanyoung Park; Seohong Park; Youngwoon Lee; Sergey Levine; |
| 202 | Forge: Compiling A Unified Abstraction Into Scalable Kernels for Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Forge, a domain-specific compiler that automates the generation of high-performance, scalable kernels for a wide range of linear attention models directly from high-level PyTorch code. |
Haojie Duanmu; Size Zheng; Ningxin Zheng; Jianqiao Lu; Xuegui Zheng; Xingcheng Zhang; Li-Wen Chang; Xin Liu; Dahua Lin; |
| 203 | Revisiting [CLS] and Patch Token Interaction in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the friction between global and local feature learning under different pre-training strategies by analyzing the interactions between class and patch tokens. |
Alexis Marouani; Oriane Siméoni; Herve Jegou; Piotr Bojanowski; Huy V. Vo; |
| 204 | ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. |
Hanyu Lai; Xiao Liu; Yanxiao Zhao; Han Xu; Hanchen Zhang; Bohao Jing; Yanyu Ren; Shuntian Yao; Yuxiao Dong; Jie Tang; |
| 205 | Geometry-aware 4D Video Generation for Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent video generation models have shown strong potential in modeling dynamic scenes, generating videos that are both temporally coherent and geometrically consistent across camera views remains a significant challenge. To address this, we propose a 4D video generation model that enforces multi-view 3D consistency of generated videos by supervising the model with cross-view pointmap alignment during training. |
Zeyi Liu; Shuang Li; Eric Cousineau; Siyuan Feng; Benjamin Burchfiel; Shuran Song; |
| 206 | Self-Improving Vision-Language-Action Models with Data Generation Via Residual RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Probe, Learn, Distill (PLD), a plug-and-play framework that improves VLAs through residual reinforcement learning and distribution-aware data collection. |
Wenli Xiao; Haotian Lin; Andy Peng; Haoru Xue; Tairan He; Zhengyi Luo; Yuqi Xie; Fengyuan Hu; Linxi Fan; Guanya Shi; Yuke Zhu; |
| 207 | SIM-CoT: Supervised Implicit Chain-of-Thought Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis shows that this instability arises from latent representations becoming homogeneous and losing semantic diversity, caused by insufficient step-level supervision in current implicit CoT methods. To address this, we propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space. |
Xilin Wei; Xiaoran Liu; Yuhang Zang; Xiaoyi Dong; Yuhang Cao; Jiaqi Wang; Xipeng Qiu; Dahua Lin; |
| 208 | Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present *modal aphasia*, a systematic dissociation in which current unified multimodal models accurately memorize concepts visually but fail to articulate them in writing, despite being trained on images and text simultaneously. |
Michael Aerni; Joshua Swanson; Kristina Nikolić; Florian Tramèr; |
| 209 | Distractor-free Generalizable 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present DGGS, a novel framework that addresses the previously unexplored challenge: \textbf{Distractor-free Generalizable 3D Gaussian Splatting} (3DGS). |
Yanqi Bao; Jing Liao; Jing Huo; Yang Gao; |
| 210 | RM-R1: Reward Modeling As Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a reasoning-oriented training pipeline and train a family of ReasRMs, RM-R1. |
Xiusi Chen; Gaotang Li; Ziqi Wang; Bowen Jin; Cheng Qian; Yu Wang; Hongru WANG; Yu Zhang; Denghui Zhang; Tong Zhang; Hanghang Tong; Heng Ji; |
| 211 | WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce WorldSense, the first benchmark to assess the multi-modal video understanding, that simultaneously encompasses visual, audio, and text inputs. |
Jack Hong; Shilin Yan; Jiayin Cai; Xiaolong Jiang; Yao Hu; Weidi Xie; |
| 212 | Seq Vs Seq: An Open Suite of Paired Encoders and Decoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the SOTA open-data Ettin suite of models: paired encoder-only and decoder-only models ranging from 17 million parameters to 1 billion, trained on up to 2 trillion tokens. |
Orion Weller; Kathryn Ricci; Marc Marone; Antoine Chaffin; Dawn Lawrie; Benjamin Van Durme; |
| 213 | Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an approach grounded in conditional Lagrangian optimal transport theory, jointly learning the Lagrangian function governing hyperparameter-induced dynamics along with the associated optimal transport maps and geodesics, which form the surrogate model. |
Harry Amad; Mihaela van der Schaar; |
| 214 | Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: MDPs introduce unique challenges for autoformulation, including a significantly larger formulation search space, and for computing and interpreting the optimal policy. In this work, we address these challenges in the context of queueing problems—central to domains such as healthcare and logistics—which often require substantial technical expertise to formulate correctly. |
Victor Baillet; Yuanzhang Xiao; Nicolás Astorga; Mihaela van der Schaar; |
| 215 | A Study of Posterior Stability in Time-Series Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through this method, we confirm that posterior collapse seriously affects latent time-series diffusion on real time series. |
Yangming Li; Yixin Cheng; Mihaela van der Schaar; |
| 216 | YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present YoNoSplat, a feedforward model that reconstructs high-quality 3D Gaussian Splatting representations from an arbitrary number of images. |
Botao Ye; Boqi Chen; Haofei Xu; Daniel Barath; Marc Pollefeys; |
| 217 | IterResearch: Rethinking Long-Horizon Agents Via Markovian State Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce IterResearch, a novel iterative deep-research paradigm that reformulates long-horizon research as a Markov Decision Process with strategic workspace reconstruction. |
Guoxin Chen; Zile Qiao; Xuanzhong Chen; Donglei Yu; Haotian Xu; Xin Zhao; Ruihua Song; Wenbiao Yin; Huifeng Yin; Liwen Zhang; Kuan Li; Minpeng Liao; Yong Jiang; Pengjun Xie; Fei Huang; Jingren Zhou; |
| 218 | Robust Fine-tuning of Vision-Language-Action Robot Policies Via Parameter Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: When finetuned on limited demonstrations of a new task, these policies often overfit to the specific demonstrations—not only losing their prior abilities to solve a wide variety of generalist tasks but also failing to generalize within the new task itself. In this work, we aim to develop a method that preserves the generalization capabilities of the generalist policy during finetuning, allowing a single policy to robustly incorporate a new skill into its repertoire. |
Yajat Yadav; Zhiyuan Zhou; Andrew Wagenmaker; Karl Pertsch; Sergey Levine; |
| 219 | WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This makes multimodal deep research highly challenging, as such agents require much stronger perceptual, logical, and knowledge-based reasoning abilities, as well as proficiency in more sophisticated tools. To address this limitation, we introduce WebWatcher, a multimodal agent for deep research with joint reasoning ability across both visual and textual modalities. |
Xinyu Geng; Peng Xia; Zhen Zhang; Xinyu Wang; Qiuchen Wang; Ruixue Ding; Chenxi Wang; Jialong Wu; Kuan Li; Yida Zhao; Huifeng Yin; Yong Jiang; Pengjun Xie; Fei Huang; Huaxiu Yao; Yi R. Fung; Jingren Zhou; |
| 220 | WebShaper: Agentically Data Synthesizing Via Information-Seeking Formalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To mitigate, we propose a formalization-driven IS data synthesis framework WebShaper, which systematically formalizes IS tasks using set-theoretic constructs. |
Zhengwei Tao; Jialong Wu; Wenbiao Yin; Pu Wu; Junkai Zhang; Baixuan Li; Haiyang SHEN; Kuan Li; Liwen Zhang; Xinyu Wang; Wentao Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Jingren Zhou; |
| 221 | Empowering Efficiency and Efficacy in WebAgent Via Enabling Info-Rich Seeking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A key factor underlying this inefficiency is the sparsity of target entities in training tasks, which limits opportunities for agents to learn and generalize efficient search behaviors. To address these challenges, we propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories. |
Zhengwei Tao; Haiyang SHEN; Baixuan Li; Wenbiao Yin; Jialong Wu; Kuan Li; Zhongwang Zhang; Huifeng Yin; Rui Ye; Yun Ma; Zhiqiang Gao; Wentao Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Jingren Zhou; |
| 222 | AgentFold: Long-Horizon Web Agents with Proactive Context Folding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Addressing these, we introduce AgentFold, a novel agent paradigm inspired by the human cognitive process of retrospective consolidation. |
Rui Ye; Zhongwang Zhang; Kuan Li; Huifeng Yin; Zhengwei Tao; Yida Zhao; Liangcai Su; Liwen Zhang; Zile Qiao; Xinyu Wang; Yong Jiang; Pengjun Xie; Fei Huang; Siheng Chen; Jingren Zhou; |
| 223 | Scalable Chain of Thoughts Via Elastic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that explicitly separates reasoning into two phases—thinking and solution—with independently allocated budgets. |
Yuhui Xu; Hanze Dong; Lei Wang; Doyen Sahoo; Junnan Li; Caiming Xiong; |
| 224 | VisCodex: Unified Multimodal Code Generation Via Merging Vision and Coding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce VisCodex, a unified framework that seamlessly merges vision and coding language models to empower MLLMs with strong multimodal code generation abilities.To support training and evaluation, we introduce the Multimodal Coding Dataset (MCD), a large-scale and diverse collection of 598k samples, including high-quality HTML code, chart image-code pairs, image-augmented StackOverflow QA, and algorithmic problems. |
Lingjie Jiang; Shaohan Huang; Xun Wu; Yixia Li; Guanhua Chen; Dongdong Zhang; Furu Wei; |
| 225 | PE-SGD: Differentially Private Deep Learning Via Evolution of Gradient Subspace for Text Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they have overlooked two crucial aspects: the limitation of using a fixed projection subspace throughout training and the importance of choosing where to inject noise. Therefore, we propose Private Evolution aided Stochastic Gradient Descent (***PE-SGD***), a differentially private training framework effective for scenarios with limited private data. |
Tianyuan Zou; Zinan Lin; Sivakanth Gopi; Yang Liu; Ya-Qin Zhang; Robert Sim; Xin Deng; Sergey Yekhanin; |
| 226 | Language-Instructed Vision Embeddings for Controllable and Generalizable Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a different paradigm: instead of solely feeding visual features into language, we use language itself to dynamically guide the vision encoder. |
Chengzhi Mao; Xudong Lin; Wen-Sheng Chu; |
| 227 | Scaling Goal-conditioned Reinforcement Learning with Multistep Quasimetric Distances Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: There is a fundamental tension between the local dynamic programming (TD backups, temporal distances) that enables optimal shortest-path reasoning in theory and the statistical global MC updates (multistep returns, suboptimal in theory). We show how these approaches can be integrated into a practical GCRL method that fits a quasimetric distance using a multistep Monte-Carlo return. |
Bill Zheng; Vivek Myers; Benjamin Eysenbach; Sergey Levine; |
| 228 | MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MMSearch-Plus, a 311-task benchmark that enforces multimodal understanding by requiring extraction and propagation of fine-grained visual cues through iterative image–text retrieval and cross-validation under retrieval noise. |
Xijia Tao; Teng Yihua; Xinxing Su; Xinyu Fu; Jihao Wu; Chaofan Tao; Ziru Liu; Haoli Bai; Rui Liu; Lingpeng Kong; |
| 229 | A Noise Is Worth Diffusion Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a noise refinement framework where a refining network is trained to minimize the difference between images generated by unguided sampling from the refined noise and those produced by guided sampling from the input Gaussian noise. |
Donghoon Ahn; Jiwon Kang; Sanghyun Lee; Jaewon Min; Minjae Kim; Wooseok Jang; Hyoungwon Cho; Sayak Paul; SeonHwa Kim; Eunju Cha; Kyong Hwan Jin; Seungryong Kim; |
| 230 | Agent Data Protocol Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. |
Yueqi Song; Ketan Ramaneti; Zaid Sheikh; Ziru Chen; Boyu Gou; Tianbao Xie; Yiheng Xu; Danyang Zhang; Apurva Gandhi; Fan Yang; Joseph Liu; Tianyue Ou; Zhihao Yuan; Frank F. Xu; Shuyan Zhou; Xingyao Wang; Xiang Yue; Tao Yu; Huan Sun; Yu Su; Graham Neubig; |
| 231 | Learning to Reason Without External Rewards Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data. We propose Intuitor, an RLIF method that uses a model’s own confidence—termed self-certainty—as its sole reward signal. |
Xuandong Zhao; Zhewei Kang; Aosong Feng; Sergey Levine; Dawn Song; |
| 232 | Geometric-Mean Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we propose Geometric-Mean Policy Optimization (GMPO), with the aim to improve the stability of GRPO through suppressing token reward outliers. |
Yuzhong Zhao; Yue Liu; Junpeng Liu; Jingye Chen; Xun Wu; Yaru Hao; Tengchao Lv; Shaohan Huang; Lei Cui; Qixiang Ye; Fang Wan; Furu Wei; |
| 233 | VibeVoice: Expressive Podcast Generation with Next-Token Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present VibeVoice , a novel model designed to synthesize expressive, long-form speech with multiple speakers in a zero-shot manner. |
Zhiliang Peng; Jianwei Yu; Wenhui Wang; Yaoyao Chang; Yutao Sun; Li Dong; Yi Zhu; Weijiang Xu; Hangbo Bao; Zehua Wang; Shaohan Huang; Yan Xia; Furu Wei; |
| 234 | A Scene Is Worth A Thousand Features: Feed-Forward Camera Localization from A Collection of Image Features Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce FastForward, a method that creates a map representation and relocalizes a query image on-the-fly in a single feed-forward pass. |
Axel Barroso-Laguna; Tommaso Cavallari; Victor Adrian Prisacariu; Eric Brachmann; |
| 235 | Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. |
Johan Obando-Ceron; Walter Mayor; Samuel Lavoie; Scott Fujimoto; Aaron Courville; Pablo Samuel Castro; |
| 236 | Scaling Agents Via Continual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on this approach, we develop a deep research agent model named AgentFounder. |
Liangcai Su; Zhen Zhang; Guangyu Li; Zhuo Chen; Chenxi Wang; Maojia Song; Xinyu Wang; Kuan Li; Jialong Wu; Xuanzhong Chen; Zile Qiao; Zhongwang Zhang; Huifeng Yin; Shihao Cai; Runnan Fang; Zhengwei Tao; Wenbiao Yin; Rui Ye; Yong Jiang; Ningyu Zhang; Pengjun Xie; Fei Huang; Kai Ye; Kewei Tu; Chenxiong Qian; Jingren Zhou; |
| 237 | We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce WE-MATH 2.0, a unified system that integrates a structured mathematical knowledge hierarchy, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to enhance the mathematical reasoning abilities of MLLMs. |
Runqi Qiao; Qiuna Tan; Peiqing Yang; Yanzi Wang; Xiaowan Wang; Enhui Wan; Guanting Dong; Shiqiang Lang; Sitong Zhou; Yida Xu; Yuchen Zeng; Jie Wang; Chong Sun; Chen Li; Honggang Zhang; |
| 238 | FlashDLM: Accelerating Diffusion Language Model Inference Via Efficient KV Caching and Guided Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, parallel token generation introduces token incoherence problems, and current sampling heuristics suffer from significant quality drops with decreasing denoising steps. We address these limitations with two training-free techniques. |
Zhanqiu Hu; Jian Meng; Yash Akhauri; Mohamed S. Abdelfattah; Jae-sun Seo; Zhiru Zhang; Udit Gupta; |
| 239 | MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To overcome these shortcomings, we introduce MomaGraph, a unified scene representation for embodied agents that integrates spatial-functional relationships and part-level interactive elements. |
Yuanchen Ju; Yongyuan Liang; Yen-Jen Wang; Gireesh Nandiraju; Yuanliang Ju; Seungjae Lee; Qiao Gu; Elvis Hsieh; Furong Huang; Koushil Sreenath; |
| 240 | Kevin: Multi-Turn RL for Generating CUDA Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Kevin the Kernel Writer, the first model trained with multi-turn RL for CUDA kernel generation and optimization. |
Carlo Baronio; Pietro Marsella; Ben Pan; Simon Guo; Silas Alberti; |
| 241 | Perception-Aware Policy Optimization for Multimodal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In particular, we observe that a major source of error (67%) in current multimodal reasoning lies in the perception of visual inputs. To address this bottleneck, we propose PAPO, a novel policy gradient algorithm that encourages the model to generate visually grounded reasoning without external supervision. |
Zhenhailong Wang; Xuehang Guo; Sofia Stoica; Haiyang Xu; Hongru WANG; Hyeonjeong Ha; Xiusi Chen; Yangyi Chen; Ming Yan; Fei Huang; Heng Ji; |
| 242 | Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that interleaved image-text inputs offer richer and less biased context and enable robots to better handle unseen tasks with more versatile human-robot interaction. Building on this insight, we introduce Interleave-VLA, a robot learning paradigm extending interleaved image-text instructions from digital world to directly generating continuous action sequences in the physical world. |
Cunxin Fan; Xiaosong Jia; Yihang Sun; Yixiao Wang; Jianglan Wei; Ziyang Gong; Xiangyu Zhao; Masayoshi Tomizuka; Xue Yang; Junchi Yan; Mingyu Ding; |
| 243 | PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we identify the key issue as the redundant content in videos. |
Ruyang Liu; Shangkun Sun; Haoran Tang; Yixiao Ge; Haibo Lu; Jiankun Yang; Chen Li; |
| 244 | Universal Model Routing for Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose UniRoute, a new approach to this problem that relies on representing each LLM as afeature vector, derived based on predictions on a set of representative prompts. |
Wittawat Jitkrittum; Harikrishna Narasimhan; Ankit Singh Rawat; Jeevesh Juneja; Congchao Wang; Zifeng Wang; Alec Go; Chen-Yu Lee; Pradeep Shenoy; Rina Panigrahy; Aditya Krishna Menon; Sanjiv Kumar; |
| 245 | Is The Reversal Curse A Binding Problem? Uncovering Limitations of Transformers from A Basic Generalization Failure Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we conjecture that the Reversal Curse in LLMs is a manifestation of the long-standing *binding problem* in cognitive science, neuroscience and AI. |
Boshi Wang; Huan Sun; |
| 246 | Planner Aware Path Learning in Diffusion Language Models Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we systematically investigate the mismatch of discrete diffusion training and inference under planning and theoretically prove that the standard discrete diffusion training evidence lower bound (ELBO) does not accurately describe a denoiser that uses a non-uniform planner. |
Fred Zhangzhi Peng; Zachary Bezemek; Jarrid Rector-Brooks; Shuibai Zhang; Michael M. Bronstein; Anru Zhang; Joey Bose; Alexander Tong; |
| 247 | Learning to Grasp Anything By Playing with Random Toys Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our results indicate robots can learn generalizable grasping using randomly assembled objects that are composed from just four shape primitives: spheres, cuboids, cylinders, and rings. We show that training on these toys enables robust generalization to real-world objects, yielding strong zero-shot performance. |
Dantong Niu; Yuvan Sharma; Baifeng Shi; Rachel Ding; Matteo Gioia; Haoru Xue; Henry Tsai; Konstantinos Kallidromitis; Anirudh Pai; S. Shankar Sastry; Trevor Darrell; Jitendra Malik; Roei Herzig; |
| 248 | VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this, we introduce VideoPhy-2, an action-centric dataset for evaluating physical commonsense in generated videos.We will release the dataset, videos, auto-rater model, and code in the camera-ready version. |
Hritik Bansal; Clark Peng; Yonatan Bitton; Roman Goldenberg; Aditya Grover; Kai-Wei Chang; |
| 249 | Programming with Pixels: Can Computer-Use Agents Do Software Engineering? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: It therefore remains unclear whether such generalist agents can automate more sophisticated and specialized work such as software engineering (SWE). To investigate this, we introduce Programming with Pixels (PwP), the first comprehensive computer-use environment for software engineering, where agents visually control an IDE to perform diverse software engineering tasks. |
Pranjal Aggarwal; Sean Welleck; |
| 250 | Equilibrium Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Equilibrium Language Models (ELMs), a novel compression framework that replaces groups of Transformer layers with a lightweight fixed-point network, reinterpreting deep computation as solving for an equilibrium state. |
Yikun Jiang; Huanyu Wang; Tianhong Ding; Wenhu Zhang; Yiming Wu; Hanbin Zhao; John C.S. Lui; |
| 251 | Virtual Community: An Open World for Humans, Robots, and Society Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To explore this future, we present Virtual Community—an open-world platform for humans, robots, and society—built on a universal physics engine and grounded in real-world 3D scenes.Leveraging Virtual Community, we propose two novel challenges. |
Qinhong Zhou; Hongxin Zhang; Xiangye Lin; Zheyuan Zhang; Yutian Chen; Wenjun Liu; Zunzhe Zhang; Sunli Chen; Lixing Fang; Qiushi Lyu; Xinyu Sun; Jincheng Yang; Zeyuan Wang; Bao Chi Dang; Zhehuan Chen; Daksha Ladia; Quang Vinh Dang; Jiageng Liu; Chuang Gan; |
| 252 | Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we uncover a critical issue in the standard practice of training small proxy models on each data recipe with a single set of hyperparameters. |
Jiachen T. Wang; Tong Wu; Kaifeng Lyu; James Zou; Dawn Song; Ruoxi Jia; Prateek Mittal; |
| 253 | J1: Incentivizing Thinking in LLM-as-a-Judge Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce J1, a reinforcement learning framework for teaching LLM judges to think before making decisions. |
Chenxi Whitehouse; Tianlu Wang; Ping Yu; Xian Li; Jason E Weston; Ilia Kulikov; Swarnadeep Saha; |
| 254 | From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models now solve many benchmark math problems at near‑expert levels, yet this progress has not fully translated into reliable performance in real‑world applications. We study this gap through contextual mathematical reasoning, where the mathematical core must be formulated from descriptive scenarios.We introduce CORE-MATH, a benchmark that repurposes AIME and MATH-500 problems into two contextual settings: Scenario Grounding (SG), which embeds abstract problems into realistic narratives without increasing reasoning complexity, and Complexity Scaling (CS), which transforms explicit conditions into sub‑problems to capture how constraints often appear in practice. |
Bowen Cao; Dongdong Zhang; Yixia Li; Junpeng Liu; Shijue Huang; Chufan Shi; Hongyuan Lu; Yaokang Wu; Guanhua Chen; Wai Lam; Furu Wei; |
| 255 | PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While prior works accelerate the generation process through feature caching, they often suffer from notable quality degradation. In this work, we reveal that this issue arises from their inability to distinguish truly redundant features, which leads to the unintended skipping of computations on important features. |
Jiangshan Wang; Kang Zhao; Jiayi Guo; Jiayu Wang; Hang Guo; Chenyang Zhu; Xiangyu Yue; Xiu Li; |
| 256 | Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large Language Models (LLMs) often struggle with challenging, multi-step reasoning problems due to a fundamental learning gap — Reinforcement Learning with Verifiable Rewards (RLVR) suffers from sparse rewards when correct solutions are rarely sampled, while Supervised Fine-Tuning (SFT) tends to overfit to long demonstrations through rigid token mimicry. To bridge this gap, we introduce Supervised Reinforcement Learning (SRL), a framework that reformulates problem-solving as a sequence of logical actions. |
Yihe Deng; I-Hung Hsu; Jun Yan; Zifeng Wang; Rujun Han; Gufeng Zhang; Yanfei Chen; Wei Wang; Tomas Pfister; Chen-Yu Lee; |
| 257 | CONCUR: A Framework for Continual Constrained and Unconstrained Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior models also typically use a *single* input representation, limiting their ability to capture the full complexity of the routing problem and leading to sub-optimal routing decisions. To address these gaps, we propose CONCUR, a **con**tinual routing framework that supports both **c**onstrained and **u**nconstrained **r**outing (i.e., routing with or without a budget). |
Peter Baile Chen; Weiyue Li; Dan Roth; Mike Cafarella; Samuel Madden; Jacob Andreas; |
| 258 | BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textbf{BOTS}, a unified framework for \textbf{B}ayesian \textbf{O}nline \textbf{T}ask \textbf{S}election in LLM reinforcement finetuning. |
Qianli Shen; Daoyuan Chen; Yilun Huang; Zhenqing Ling; Yaliang Li; Bolin Ding; Jingren Zhou; |
| 259 | DEAS: DEtached Value Learning with Action Sequence for Scalable Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce DEtached value learning with Action Sequence (DEAS), a simple yet effective offline RL framework that leverages action sequences for value learning. |
Changyeon Kim; Haeone Lee; Younggyo Seo; Kimin Lee; Yuke Zhu; |
| 260 | On The Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we argue that the direction of updates is a more critical lens for understanding RLVR’s effects, which can be captured by the signed, token-level log probability difference $\Delta\log p$ between the base and final RLVR models. |
Kexin Huang; Haoming Meng; Junkang Wu; Jinda Lu; Chiyu Ma; Ziqian Chen; Xue Wang; Bolin Ding; Jiancan Wu; Xiang Wang; Xiangnan He; Guoyin Wang; Jingren Zhou; |
| 261 | AesCoder: Code Aesthetics with Agentic Reward Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a new pipeline to enhance the aesthetic quality of LLM-generated code.We will release both the code and datasets to facilitate further research in code aesthetics. |
Bang Xiao; Lingjie Jiang; Shaohan Huang; Tengchao Lv; Yupan Huang; Xun Wu; Lei Cui; Furu Wei; |
| 262 | Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We analyze penalty-based simulators to pinpoint why gradients degrade under hard contacts. Building on these insights, we propose DiffMJX, which couples adaptive time integration with penalty-based simulation to substantially improve gradient accuracy. |
Anselm Paulus; Andreas René Geist; Pierre Schumacher; Vít Musil; Simon Rappenecker; Georg Martius; |
| 263 | RL’s Razor: Why Online Reinforcement Learning Forgets Less Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis reveals that on-policy RL is implicitly biased towards KL-minimal solutions among the many that solve the new task, whereas SFT can converge to distributions arbitrarily far from the base model. |
Idan Shenfeld; Jyothish Pari; Pulkit Agrawal; |
| 264 | Manipulation As in Simulation: Enabling Accurate Geometry Perception in Robots Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Camera Depth Models (CDMs) as a simple plugin on daily-use depth cameras, which take RGB images and raw depth signals as input and output denoised, accurate metric depth. |
Minghuan Liu; Zhengbang Zhu; Xiaoshen Han; PengHu; Haotong Lin; Xinyao Li; Jingxiao Chen; Jiafeng Xu; Yichu Yang; Yunfeng Lin; Xinghang Li; Yong Yu; Weinan Zhang; Tao Kong; Bingyi Kang; |
| 265 | BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose BFM-Zero, a framework that learns an effective shared latent representation that embeds motions, goals, and rewards into a common space, enabling a single policy to be prompted for multiple downstream tasks without retraining. |
Yitang Li; Zhengyi Luo; Tonghe Zhang; Cunxi Dai; Anssi Kanervisto; Andrea Tirinzoni; Haoyang Weng; Kris Kitani; Mateusz Guzek; Ahmed Touati; Alessandro Lazaric; Matteo Pirotta; Guanya Shi; |
| 266 | Discovering Hierarchical Software Engineering Agents Via Bandit Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by how human engineers decompose problems into sub-tasks, we argue that SWE agents should be structured as orchestrators coordinating specialized sub-agents, each responsible for a specific sub-task such as bug reproduction, fault localization, code modification, or validation. |
Iris Xu; Guangtao Zeng; Zexue He; Charles Jin; Aldo Pareja; Dan Gutfreund; Chuang Gan; Zhang-Wei Hong; |
| 267 | Streaming Visual Geometry Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To facilitate interactive and low-latency applications, we propose a streaming visual geometry transformer that shares a similar philosophy with autoregressive large language models. |
Dong Zhuo; Wenzhao Zheng; Jiahe Guo; Yuqi Wu; Jie Zhou; Jiwen Lu; |
| 268 | Language Models Use Lookbacks to Track Beliefs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our investigation uncovered a pervasive algorithmic pattern that we call a lookback mechanism, which enables the LM to recall important information when it becomes necessary. |
Nikhil Prakash; Natalie Shapira; Arnab Sen Sharma; Christoph Riedl; Yonatan Belinkov; Tamar Rott Shaham; David Bau; Atticus Geiger; |
| 269 | AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AgentSynth, a scalable and cost-efficient pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. |
Jingxu Xie; Dylan Xu; Xuandong Zhao; Dawn Song; |
| 270 | WebSailor-V2: Bridging The Chasm to Proprietary Agents Via Synthetic Data and Scalable Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To significantly advance the capabilities of open-source web agents, we present WebSailor-V2, a complete post-training pipeline encompassing data construction, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL). |
Kuan Li; Zhongwang Zhang; Huifeng Yin; Rui Ye; Yida Zhao; Liwen Zhang; Litu Ou; Ding-Chu Zhang; Xixi Wu; Xinmiao Yu; Jialong Wu; Xinyu Wang; Zile Qiao; Zhen Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Zhi-Qin John Xu; Shuai Wang; Minhao Cheng; Jingren Zhou; |
| 271 | GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a ”flow matching model within a flow matching model” to sample Markov transitions. As we show in this work, this ”inner” flow matching model can be retrieved from any pre-trained model without any re-training, effectively combining the efficiency of ODEs with the stochastic evolution of SDEs. |
Peter Holderrieth; Uriel Singer; Tommi Jaakkola; Ricky T. Q. Chen; Yaron Lipman; Brian Karrer; |
| 272 | Agentic Reinforced Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current RL algorithms typically employ trajectory-level rollout sampling, consistently neglecting the fine-grained exploration of multi-turn tool-call steps. To bridge this gap, we propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents. |
Guanting Dong; Hangyu Mao; Kai Ma; Licheng Bao; Yifei Chen; Zhongyuan Wang; Zhongxia Chen; Jiazhen Du; Huiyang Wang; Fuzheng Zhang; Guorui Zhou; Yutao Zhu; Ji-Rong Wen; Zhicheng Dou; |
| 273 | WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although multiple prior works have proposed efficient SVD variants to enable low-rank operations, we find that in practice it remains difficult to achieve substantial latency reduction during model execution. To address this limitation, we introduce a new computational pattern and apply SVD at a finer granularity, enabling real and measurable improvements in execution latency. |
Haiyu Wang; Yutong Wang; Jack Jiang; Sai Qian Zhang; |
| 274 | Interactive Agents to Overcome Underspecificity in Software Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we study the ability of LLM agents to handle underspecified instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance across three key steps: (a) detecting underspecificity, (b) asking targeted clarification questions, and (c) leveraging the interaction to improve performance in underspecified scenarios. |
Sanidhya Vijayvargiya; Xuhui Zhou; Akhila Yerukola; Maarten Sap; Graham Neubig; |
| 275 | Repurposing Synthetic Data for Fine-grained Search Agent Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our empirical analysis reveals a strong positive correlation between the number of ground-truth entities identified during an agent’s reasoning process and final answer accuracy. Building on this insight, we introduce Entity-aware Group Relative Policy Optimization (E-GRPO), a novel framework that formulates a dense entity-aware reward function. |
Yida Zhao; Kuan Li; Xixi Wu; Liwen Zhang; Ding-Chu Zhang; Baixuan Li; Maojia Song; Zhuo Chen; Chenxi Wang; Xinyu Wang; Yong Jiang; Kewei Tu; Pengjun Xie; Fei Huang; Jingren Zhou; |
| 276 | WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current approaches are plagued by dual-fold limitations: static research pipelines that decouple planning from evidence acquisition and monolithic generation paradigms that include redundant, irrelevant evidence, suffering from hallucination issues and low citation accuracy. To address these challenges, we introduce \textbf{WebWeaver}, a novel dual-agent framework that emulates the human research process. |
Zijian Li; Xin Guan; Bo Zhang; Shen Huang; Houquan Zhou; Shaopeng Lai; Ming Yan; Yong Jiang; Pengjun Xie; Fei Huang; Jun Zhang; Jingren Zhou; |
| 277 | Hidden Patterns in Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we perform an in-depth analysis of CoT traces originating from competition-level mathematics questions, with the aim of better understanding how, and which parts of CoT actually contribute to the final answer. |
Gregor Bachmann; Yichen Jiang; Seyed-Mohsen Moosavi-Dezfooli; Moin Nabi; |
| 278 | Transformers Learn Latent Mixture Models In-Context Via Mirror Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we formalize the task of estimating token importance as an in-context learning problem by introducing a novel framework based on Mixture of Transition Distributions, whereby a latent variable, whose distribution is parameterized by a set of unobserved mixture weights, determines the influence of past tokens on the next. |
Francesco D’Angelo; Nicolas Flammarion; |
| 279 | Rolling Forcing: Autoregressive Long Video Diffusion in Real Time Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We design Rolling Forcing, a novel video generation technique that enables streaming long videos with minimal error accumulation. |
Kunhao Liu; Wenbo Hu; Jiale Xu; Ying Shan; Shijian Lu; |
| 280 | DynaGuard: A Dynamic Guardian Model With User-Defined Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our models provide both rapid detection of policy violations and a chain-of-thought reasoning option that articulate and justify model outputs. |
Monte Hoover; Vatsal Baherwani; Neel Jain; Khalid Saifullah; Joseph James Vincent; Chirag Jain; Melissa Kazemi Rad; C. Bayan Bruss; Ashwinee Panda; Tom Goldstein; |
| 281 | RL Makes MLLMs See Better Than SFT Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we first investigate the impact of training strategies on MLLMs, where RL shows a clear advantage in strongly vision-related VQA benchmarks than SFT. Motivated by this, we conduct a critical yet under-explored analysis of the vision encoder of MLLMs through diverse and in-depth experiments, ranging from ImageNet classification and segmentation to gradient visualization. |
Junha Song; Sangdoo Yun; Dongyoon Han; Jaegul Choo; Byeongho Heo; |
| 282 | Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this issue, we propose the reasoning MLLM, Vision-R1, to improve multimodal reasoning capability.Specifically, we first construct a high-quality multimodal CoT dataset without human annotations by leveraging an existing MLLM and DeepSeek-R1 through modality bridging and data filtering to obtain a 200K multimodal CoT dataset, Vision-R1-cold dataset. |
Wenxuan Huang; Bohan Jia; Shaosheng Cao; Zheyu Ye; Fei zhao; Zhe Xu; Yao Hu; Shaohui Lin; |
| 283 | Interleaving Reasoning for Better Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Interleaving Reasoning Generation (IRG), a framework that alternates between text-based thinking and image synthesis: the model first produces a text-based thinking to guide an initial image, then reflects on the result to refine fine-grained details, visual quality, and aesthetics while preserving semantics. |
Wenxuan Huang; Shuang Chen; Zheyong Xie; Shaosheng Cao; SHIXIANG TANG; Yufan Shen; Qingyu Yin; Wenbo Hu; Xiaoman Wang; Yuntian Tang; Junbo Qiao; Hangyu Guo; Yao Hu; Zhenfei Yin; Philip Torr; Yu Cheng; Wanli Ouyang; Shaohui Lin; |
| 284 | CyclicReflex: Improving Reasoning Models Via Cyclical Reflection Token Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we treat reflection tokens as a “resource” and introduce the problem of resource allocation, aimed at improving the test-time compute performance of LRMs by adaptively regulating the frequency and placement of reflection tokens. |
Chongyu Fan; Yihua Zhang; Jinghan Jia; Alfred O. Hero; Sijia Liu; |
| 285 | ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. |
Yichao Liang; Thanh Dat Nguyen; Cambridge Yang; Tianyang Li; Joshua B. Tenenbaum; Carl Edward Rasmussen; Adrian Weller; Zenna Tavares; Tom Silver; Kevin Ellis; |
| 286 | LLMs Process Lists With General Filter Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate the mechanisms underlying a range of list-processing tasks in LLMs, and we find that they have learned to encode a compact, causal representation of a general filtering operation that mirrors the generic “filter” function of functional programming. |
Arnab Sen Sharma; Giordano Rogers; Natalie Shapira; David Bau; |
| 287 | Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While longer reasoning can help on hard problems, many extra tokens are filler: verbose text making little progress. We introduce GFPO (Group Filtered Policy Optimization), which curbs this length explosion by sampling larger groups per problem and only training on responses filtered by (1) length and (2) token efficiency (reward per token). |
Vaishnavi Shrivastava; Ahmed Hassan Awadallah; Vidhisha Balachandran; Shivam Garg; Harkirat Behl; Dimitris Papailiopoulos; |
| 288 | Test-Time Adaptation for LLM Agents Via Environment Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This challenge stems from two distinct failure modes: a syntactic misunderstanding of environment-specific components like observation formats, and a semantic misunderstanding of state-transition dynamics, which are only revealed at test time. To address these issues, we propose two distinct strategies for adapting LLM agents by leveraging environment-specific information from interaction that is available during deployment. |
Arthur Chen; Zuxin Liu; Jianguo Zhang; Akshara Prabhakar; Zhiwei Liu; Shelby Heinecke; Silvio Savarese; Victor Zhong; Caiming Xiong; |
| 289 | Evaluating Memory in LLM Agents Via Incremental Multi-Turn Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, based on classic theories from memory science and cognitive science, we identify four core competencies essential for memory agents: accurate retrieval, test-time learning, long-range understanding, and selective forgetting. |
Yuanzhe Hu; Yu Wang; Julian McAuley; |
| 290 | ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ThinkMorph, a unified thinking model capable of effective interleaved reasoning. |
Jiawei Gu; Yunzhuo Hao; Huichen Will Wang; Linjie Li; Michael Qizhe Shieh; Yejin Choi; Ranjay Krishna; Yu Cheng; |
| 291 | On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning Via Dynamic Weighting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CHORD, a framework for Controllable Harmonization of On- and Off-Policy Reinforcement Learning via Dynamic Weighting, which reframes SFT not as a separate stage but as a dynamically weighted auxiliary objective within the on-policy RL process. |
Wenhao Zhang; Yuexiang Xie; Yuchang Sun; Yanxi Chen; Guoyin Wang; Yaliang Li; Bolin Ding; Jingren Zhou; |
| 292 | LightMem: Lightweight and Efficient Memory-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. |
Jizhan Fang; Xinle Deng; Haoming Xu; Ziyan Jiang; Yuqi Tang; Ziwen Xu; Shumin Deng; Yunzhi Yao; Mengru Wang; Shuofei Qiao; Huajun Chen; Ningyu Zhang; |
| 293 | OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce OmniSpatial, a comprehensive and challenging benchmark for spatial reasoning, grounded in cognitive psychology. |
Mengdi Jia; Zekun Qi; Shaochen Zhang; Wenyao Zhang; XinQiang Yu; Jiawei He; He Wang; Li Yi; |
| 294 | CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent rotation-based smoothing techniques alleviate the problem by redistributing outlier magnitudes, residual errors remain and continue to impede reliable low-precision deployment. In this work, we tackle this challenge by introducing a unified quantization-and-clustering scheme that contains smoothing activation outliers via learnable rotation and absorbing weight outliers into fine-tuned cluster centroids for MoE. |
Xiangyang Yin; Xingyu Liu; Tianhua Xia; BO BAO; Vithursan Thangarasa; Valavan Manohararajah; Eric Sather; Sai Qian Zhang; |
| 295 | Data-Centric Lessons To Improve Speech-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs. |
Vishaal Udandarao; Zhiyun Lu; Xuankai Chang; Yongqiang Wang; Albin Madappally Jose; Fartash Faghri; Joshua P Gardner; Chung-Cheng Chiu; |
| 296 | Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks. |
Zhuoran Jin; Hongbang Yuan; Kejian Zhu; Jiachun Li; Pengfei Cao; Yubo Chen; Kang Liu; Jun Zhao; |
| 297 | SWE-RM: Execution-free Feedback for Software Engineering Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In particular, we analyze the impact of various factors such as training data scale, policy mixtures, and data source composition. Guided by these investigations, we introduce SWE-RM, an accurate and robust reward model adopting a mixture-of-experts architecture with 30B total parameters and 3B activated during inference. |
KaShun SHUM; Binyuan Hui; Jiawei Chen; Lei Zhang; X. W.; Jiaxi Yang; Yuzhen Huang; Junyang Lin; Junxian He; |
| 298 | Feed-forward Human Performance Capture Via Progressive Canonical Space Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a feed-forward human performance capture method that renders novel views of a performer from a monocular RGB stream. |
YoungJoong Kwon; Yao He; Hee Jung Choi; Chen Geng; Zhengmao Liu; Jiajun Wu; Ehsan Adeli; |
| 299 | SinkTrack: Attention Sink Based Context Anchoring for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we make use of a related, intrinsic characteristic of LLMs: attention sink – the tendency to consistently allocate high attention to the very first token (i.e., ⟨BOS⟩) of a sequence. Concretely, we propose an advanced context anchoring method, SINKTRACK, which treats ⟨BOS⟩ as an information anchor and injects key contextual features (such as those derived from the input image or instruction) into its representation. |
Xu Liu; Guikun Chen; Wenguan Wang; |
| 300 | Go-Browse: Training Web Agents with Structured Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For instance, a web browsing agent may get lost in unfamiliar websites, uncertain what pages must be visited to achieve its goals. To address this, we propose Go-Browse, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments. |
Apurva Gandhi; Graham Neubig; |
| 301 | Uncertainty-Aware 3D Reconstruction for Dynamic Underwater Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose an Uncertainty-aware Dynamic Field (UDF) that jointly represents underwater structure and view-dependent medium over time. |
Rui Liu; Zhibo Duan; Jianzhe Gao; Yi Yang; Wenguan Wang; |
| 302 | HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While diffusion models have made remarkable progress in image generation, their outputs can still appear unrealistic and lack fine details, especially when using fewer number of neural function evaluations (NFEs) or lower guidance scales. To address this issue, we propose a novel momentum-based sampling technique, termed history-guided sampling (HiGS), which enhances quality and efficiency of diffusion sampling by integrating recent model predictions into each inference step. |
Seyedmorteza Sadat; Farnood Salehi; Romann M. Weber; |
| 303 | DuPO: Enabling Reliable Self-Verification Via Dual Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present DuPO, a dual learning-based preference optimization framework that generates annotation-free feedback via the generalized duality. |
Shuaijie She; Yu Bao; Yu Lu; Lu Xu; Tao Li; Wenhao Zhu; Jianbing Zhang; Shujian Huang; Shanbo Cheng; Lu Lu; Yuxuan Wang; |
| 304 | String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce _String Seed of Thought (SSoT)_, a novel prompting method for LLMs that improves _Probabilistic Instruction Following (PIF)_. |
Kou Misaki; Takuya Akiba; |
| 305 | From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AutoExperiment, a benchmark that evaluates AI agents’ ability to implement and run machine learning experiments based on natural language descriptions in research papers. |
Gyeongwon James Kim; Alex Wilf; Louis-Philippe Morency; Daniel Fried; |
| 306 | Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unfortunately, proving that most changes are net beneficial is impossible in practice. We introduce the Darwin Gödel Machine (DGM), a novel self-improving system that iteratively modifies its own code (thereby also improving its ability to modify its own codebase) and empirically validates each change using coding benchmarks. |
Jenny Zhang; Shengran Hu; Cong Lu; Robert Tjarko Lange; Jeff Clune; |
| 307 | CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we argue that model confidence of responses can be used for improving the efficiency of test-time scaling. |
Chengsong Huang; Langlin Huang; Jixuan Leng; Jiacheng Liu; Jiaxin Huang; |
| 308 | Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify a key factor that differentiates RL observations: whether the pretrained model already exhibits strong *Model-Task Alignment*, as measured by pass@k accuracy on the evaluated task. |
Haoze Wu; Cheng Wang; Wenshuo Zhao; Junxian He; |
| 309 | Training Large Reasoning Models Efficiently Via Progressive Thought Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches. |
Zeliang Zhang; Xiaodong Liu; Hao Cheng; Hao Sun; Chenliang Xu; Jianfeng Gao; |
| 310 | Adaptive Social Learning Via Mode Policy Optimization for Language Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods either lack explicit reasoning or employ lengthy Chain-of-Thought reasoning uniformly across all scenarios, resulting in excessive token usage and inflexible social behaviors in tasks such as negotiation or collaboration. To address this, we propose an $\textbf{A}$daptive $\textbf{S}$ocial $\textbf{L}$earning ($\textbf{ASL}$) framework in this paper, aiming to improve the adaptive reasoning ability of language agents in dynamic social interactions. |
Minzheng Wang; Yongbin Li; Haobo Wang; Xinghua Zhang; Nan Xu; Bingli Wu; Fei Huang; Haiyang Yu; Wenji Mao; |
| 311 | NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms. |
Run Luo; Xiaobo Xia; Lu Wang; Longze Chen; Renke Shan; Jing Luo; Min Yang; Tat-Seng Chua; |
| 312 | VMoBA: Mixture-of-Block Attention for Video Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Video Mixture of Block Attention (VMoBA), a novel sparse attention mechanism specifically adapted for VDMs. |
Jianzong Wu; Liang Hou; Haotian Yang; Ye Tian; Pengfei Wan; Di ZHANG; Yunhai Tong; |
| 313 | Embodied Navigation Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a cross-embodiment and cross-task Navigation Foundation Model (NavFoM), trained on eight million navigation samples that encompass quadrupeds, drones, wheeled robots, and vehicles, and spanning diverse tasks such as vision-and-language navigation, object searching, target tracking, and autonomous driving. |
Jiazhao Zhang; Anqi Li; Yunpeng Qi; Minghan Li; Jiahang Liu; Shaoan Wang; Haoran Liu; Gengze Zhou; Yuze Wu; Xingxing LI; Yuxin Fan; Wenjun Li; Zhibo Chen; Fei Gao; Qi Wu; Zhizheng Zhang; He Wang; |
| 314 | Fresh in Memory: Training-order Recency Is Linearly Encoded in Language Model Activations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that language models’ activations linearly encode when information was learned during training. |
Dmitrii Krasheninnikov; Richard E. Turner; David Krueger; |
| 315 | AudioX: A Unified Framework for Anything-to-Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As such, we propose AudioX, a unified framework for anything-to-audio generation that integrates varied multimodal conditions (i.e., text, video, image, and audio signals) in this work.To train this unified model, we construct a large-scale, high-quality dataset, IF-caps, comprising over 7 million samples curated through a structured data annotation pipeline.We will release the code, model, and dataset. |
Zeyue Tian; Yizhu Jin; Zhaoyang Liu; Ruibin Yuan; Liumeng Xue; Xu Tan; Qifeng Chen; Wei Xue; Yike Guo; |
| 316 | Financial Fraud Collusion Among Generative AI Agents in Social Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the risks of collective financial fraud in large-scale multi-agent systems, driven by large language model (LLM) agents. |
Qibing Ren; Zhijie Zheng; Jiaxuan Guo; Junchi Yan; Lizhuang Ma; Jing Shao; |
| 317 | Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, an overlooked yet potentially powerful question is: can one leverage auxiliary $\textit{unpaired}$ multimodal data to directly enhance representation learning in a $\textit{target}$ modality? We introduce $\textbf{UML}$: $\textbf{U}$npaired $\textbf{M}$ultimodal $\textbf{L}$earner, a modality-agnostic training paradigm in which a single model alternately processes inputs from different modalities while sharing parameters across them. |
Sharut Gupta; Shobhita Sundaram; Chenyu Wang; Stefanie Jegelka; Phillip Isola; |
| 318 | Dyna-Mind: Learning to Simulate from Experience for Better AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by literature on human cognition, we argue that current AI agents need “vicarious trial and error” – the capacity to mentally simulate alternative futures before acting – in order to enhance their understanding and performance in complex interactive environments. We introduce Dyna-Mind, a two-stage training framework that explicitly teaches (V)LM agents to integrate such simulation into their reasoning. |
Xiao Yu; Baolin Peng; Michel Galley; Hao Cheng; Qianhui Wu; Janardhan Kulkarni; Suman Nath; Zhou Yu; Jianfeng Gao; |
| 319 | Self-Rewarding Vision-Language Model Via Reasoning Decomposition and Multi-Reward Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Vision-SR1, a self-rewarding method that improves visual reasoning without relying on external visual supervisions via reinforcement learning and Multi-Reward Policy Optimization. |
Zongxia Li; Wenhao Yu; Chengsong Huang; Rui Liu; Zhenwen Liang; Fuxiao Liu; Jingxi Chen; Dian Yu; Jordan Lee Boyd-Graber; Haitao Mi; Dong Yu; |
| 320 | Temporal Sparse Autoencoders: Leveraging The Sequential Nature of Language for Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that this limitation stems from neglecting the temporal structure of language, where semantic content typically evolves smoothly over sequences. Building on this insight, we introduce Temporal Sparse Autoencoders (T-SAEs), which incorporate a novel contrastive loss encouraging consistent activations of high-level features over adjacent tokens. |
Usha Bhalla; Alex Oesterling; Claudio Mayrink Verdun; Himabindu Lakkaraju; Flavio Calmon; |
| 321 | Only Brains Align with Brains: Cross-Region Patterns Expose Limits of Normative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, recent works have revealed a number of problems with these rankings, such as their sensitivity towards the choice of AM, raising the deeper conceptual question of what it means for a model to be “brain-aligned.” Here, we introduce the notion of *alignment patterns* – characteristic patterns of alignment between brain regions-and posit that models should reproduce these patterns in order to be considered brain-aligned. |
Larissa Höfling; Matthias Tangemann; Lotta Piefke; Susanne Keller; Matthias Bethge; Katrin Franke; |
| 322 | SEMA: Simple Yet Effective Learning for Multi-Turn Jailbreak Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SEMA, a simple yet effective framework that trains a multi-turn attacker without relying on any existing strategies or external data. |
Mingqian Feng; Xiaodong Liu; Weiwei Yang; Jialin Song; Xuekai Zhu; Chenliang Xu; Jianfeng Gao; |
| 323 | SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SynthWorlds, a framework that disentangles task reasoning complexity from factual knowledge.In SynthWorlds, we construct parallel corpora representing two worlds with identical interconnected structure: a real-mapped world, where models may exploit parametric knowledge, and a synthetic-mapped world, where such knowledge is meaningless. |
Ken Gu; Advait Bhat; Mike A Merrill; Robert West; Xin Liu; Daniel McDuff; Tim Althoff; |
| 324 | SkillFactory: Self-Distillation for Learning Cognitive Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: How can we get models to leverage skills that aren’t exhibited by base models? Our work, SkillFactory, is a method for fine-tuning models to roughly learn these skills during a supervised fine-tuning (SFT) stage prior to RL. |
Zayne Rea Sprague; Jack Lu; Manya Wadhwa; Sedrick Keh; Mengye Ren; Greg Durrett; |
| 325 | TINKER: Diffusion’s Gift to 3D–Multi-View Consistent Editing From Sparse Inputs Without Per-Scene Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce TINKER, a novel framework for high-fidelity 3D editing without any per-scene finetuning, where only a single edited image (one-shot) or a few edited images (few-shot) are required as input. |
Canyu Zhao; Xiaoman Li; Tianjian Feng; Zhiyue Zhao; Hao Chen; Chunhua Shen; |
| 326 | Sparse But Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a systematic empirical study of RLVR’s distributional effects across three complementary axes: (1) token-level distributional shifts, (2) functional validation via cross-sampling interventions, and (3) exploratory investigations of advantage signal modulation based on token divergence. |
Haoming Meng; Kexin Huang; Shaohang Wei; Chiyu Ma; Shuo Yang; Xue Wang; Guoyin Wang; Bolin Ding; Jingren Zhou; |
| 327 | CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we approach the problem from the perspective of reinforcement learning gradient optimization, offering a systematic and theoretical investigation into how to improve the training efficiency of LLMs. |
Yongcheng Zeng; Zexu Sun; Bokai Ji; Erxue Min; Hengyi Cai; Shuaiqiang Wang; Dawei Yin; Haifeng Zhang; Xu Chen; Jun Wang; |
| 328 | Prosperity Before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit this challenge and uncover a \emph{prosperity-before-collapse} phenomenon: stale data can be as informative as on-policy data if exploited properly. Building on this insight, we introduce M2PO (Second-Moment Trust Proxy Optimization), which constrains the second moment of importance weights to suppress only extreme outliers while preserving informative updates. |
Haizhong Zheng; Jiawei Zhao; Beidi Chen; |
| 329 | Enhancing Diffusion-Based Sampling with Molecular Collective Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a repulsive potential centered on the CVs from recent samples, which pushes future samples towards novel CV regions and effectively increases the temperature in the projected space. |
Juno Nam; Bálint Máté; Artur P. Toshev; Manasa Kaniselvan; Rafael Gomez-Bombarelli; Ricky T. Q. Chen; Brandon M. Wood; Guan-Horng Liu; Benjamin Kurt Miller; |
| 330 | RobotArena $\infty$: Unlimited Robot Benchmarking Via Real-to-Sim Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As policies expand in scope and complexity, these barriers only intensify, since defining “success in robotics often hinges on nuanced human judgments of execution quality. In this paper, we introduce a new benchmarking framework that overcomes these challenges by shifting VLA evaluation into large-scale simulated environments augmented with online human feedback. |
Yash Jangir; Yidi Zhang; Kashu Yamazaki; Chenyu Zhang; Kuan-Hsun Tu; Tsung-Wei Ke; Lei Ke; Yonatan Bisk; Katerina Fragkiadaki; |
| 331 | From Pixels to Words — Towards Native Vision-Language Primitives at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: (-) How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. In this paper, we clarify these challenges and outline guiding principles for constructing native VLMs. |
Haiwen Diao; Mingxuan Li; Silei Wu; Linjun Dai; Xiaohua Wang; Hanming Deng; Lewei Lu; Dahua Lin; Ziwei Liu; |
| 332 | Jackpot: Align Actor-Policy Distribution for Scalable and Stable RL for LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing importance sampling–based corrections for distribution mismatch suffer from an inherent trade-off between stability and training performance. To tackle this problem, we propose Jackpot, which leverages Optimal Budget Rejection Sampling to directly reduce the gap between actor and policy distributions. |
Zhuoming Chen; Hongyi Liu; Yang Zhou; Haizhong Zheng; Beidi Chen; |
| 333 | Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To achieve high parallelization while maintaining generation quality, we introduce two key techniques: (1) Flexible Parallelized Autoregressive Modeling, a novel architecture that enables arbitrary generation ordering and degrees of parallelization. |
Zhuoyang Zhang; Luke J. Huang; Chengyue Wu; Shang Yang; Kelly Peng; Yao Lu; Song Han; |
| 334 | SealQA: Raising The Bar for Reasoning in Search-Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SealQA, a challenge benchmark for evaluating SEarch-Augmented Language models on fact-seeking questions where web search yields conflicting, noisy, or unhelpful results. |
Thinh Pham; Nguyen Phan Nguyen; Pratibha Zunjare; Weiyuan Chen; Yu-Min Tseng; Tu Vu; |
| 335 | It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building upon these insights, we present Miras, a general framework to design deep learning architectures based on the choice of attentional bias objective, retention gate, associative memory architecture, and memory learning algorithm.Going beyond $L_2$ loss function, we present a set of alternative attentional bias configurations along with their effective approximations. |
Ali Behrouz; Meisam Razaviyayn; Peilin Zhong; Vahab Mirrokni; |
| 336 | OptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by classic influence functions, we repurpose an optimizer-aware estimator that uses gradient information to quantify each synthetic sample’s contribution to the objective of a given target model on specific tasks. |
Zhiting Fan; Ruizhe Chen; Tianxiang Hu; Ru Peng; Zenan Huang; Haokai Xu; Yixin Chen; Jian Wu; Junbo Zhao; Zuozhu Liu; |
| 337 | MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More Than Outcomes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. |
Yu Ying Chiu; Michael S. Lee; Rachel Calcott; Brandon Handoko; Paul de Font-Reaulx; Paula Rodriguez; Chen Bo Calvin Zhang; Ziwen Han; Udari Madhushani Sehwag; Yash Maurya; Christina Q Knight; Harry R. Lloyd; Florence Bacus; Mantas Mazeika; Bing Liu; Yejin Choi; Mitchell L Gordon; Sydney Levine; |
| 338 | SceneCOT: Eliciting Chain-of-Thought Reasoning in 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper bridges the gap by presenting a novel framework. |
Xiongkun Linghu; Jiangyong Huang; Ziyu Zhu; Baoxiong Jia; Siyuan Huang; |
| 339 | Polychromic Objectives for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This convergence hinders exploration, which is essential for expanding the capabilities of the pretrained policy and for amplifying the benefits of test-time compute scaling. To address this, we introduce an objective for policy gradient methods that explicitly enforces the exploration and refinement of diverse generations, which we call a polychromic objective. |
Jubayer Ibn Hamid; Ifdita Hasan Orney; Ellen Xu; Chelsea Finn; Dorsa Sadigh; |
| 340 | Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a theoretical analysis of how a common workaround, training for multiple epochs on the same dataset, reshapes the data scaling laws. |
Tingkai Yan; Haodong Wen; Binghui Li; Kairong Luo; Wenguang Chen; Kaifeng Lyu; |
| 341 | Prompt-MII: Meta-Learning Instruction Induction for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A popular method to adapt large language models (LLMs) to new tasks is in-context learning (ICL), which is effective but incurs high inference costs as context length grows. In this paper we propose a method to perform instruction induction, where we take training examples and reduce them to a compact but descriptive prompt that can achieve performance comparable to ICL over the full training set. |
Emily Xiao; Yixiao Zeng; Ada Chen; Chin-Jou Li; Amanda Bertsch; Graham Neubig; |
| 342 | OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces OneTwoVLA, a single unified vision-language-action model that can perform both acting (System One) and reasoning (System Two). |
Fanqi Lin; Ruiqian Nai; Yingdong Hu; Jiacheng You; Junming Zhao; Yang Gao; |
| 343 | Sparse Attention Adaptation for Long Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SeerAttention-R, a sparse attention framework specifically tailored for the long decoding of reasoning models. |
Yizhao Gao; Shuming Guo; Shijie Cao; Yuqing Xia; Yu Cheng; Lei Wang; Lingxiao Ma; Yutao Sun; Tianzhu Ye; Li Dong; Hayden Kwok-Hay So; Yu Hua; Ting Cao; Fan Yang; Mao Yang; |
| 344 | SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the existing LLM-based decompilers still are somewhat limited in effectively presenting a program’s source-level structure with its original identifiers. To mitigate this, we introduce SK2Decompile, a novel two-phase approach to decompile from the skeleton (semantic structure) to the skin (identifier) of programs. |
Hanzhuo Tan; Weihao Li; Xiaolong Tian; Siyi Wang; Jiaming Liu; Jing Li; Yuqun Zhang; |
| 345 | TTT3R: 3D Reconstruction As Test-Time Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. |
Xingyu Chen; Yue Chen; Yuliang Xiu; Andreas Geiger; Anpei Chen; |
| 346 | FALCON: Few-step Accurate Likelihoods for Continuous Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Few-step Accurate Likelihoods for Continuous Flows (FALCON), a method which allows for few-step sampling with a likelihood accurate enough for importance sampling applications by introducing a hybrid training objective that encourages invertibility. |
Danyal Rehman; Tara Akhound-Sadegh; Artem Gazizov; Yoshua Bengio; Alexander Tong; |
| 347 | LatentQA: Teaching LLMs to Decode Activations Into Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A key difficulty in developing such a probe is collecting a dataset mapping activations to natural-language descriptions. In response, we propose an approach for generating a pseudo-labeled dataset of activations and associated question-answer pairs and develop a fine-tuning method for training a decoder LLM on this dataset. |
Alexander Pan; Lijie Chen; Jacob Steinhardt; |
| 348 | GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present GoT-R1, a framework that applies reinforcement learning to enhance semantic-spatial reasoning in autoregressive visual generation models. |
Chengqi Duan; Rongyao Fang; Yuqing Wang; Kun Wang; Linjiang Huang; Xingyu Zeng; Hongsheng Li; Xihui Liu; |
| 349 | MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present an online agentic reinforcement learning framework MOBILERL to enhance GUI agents in mobile environments. |
Yifan Xu; Xiao Liu; Xinghan Liu; Jiaqi Fu; Jiayu Huang; Hanchen Zhang; Bohao Jing; Shudan Zhang; Yuting Wang; Zhao wenyi; Yuxiao Dong; |
| 350 | DaVinci: Reinforcing Visual-Structural Syntax in MLLMs for Generalized Scientific Diagram Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing multimodal LLMs (MLLMs) struggle with the diverse visual primitives, complex structural layouts, and strict syntax involved. To address this, we introduce DaVinci, a novel MLLM that learns diagram parsing based on a two-stage framework—supervised learning of visual primitives followed by reinforcement learning of their structural relationships. |
Xingchen ZENG; Zhewei Su; Hengming Zhang; Juyong Jiang; Jiazhi Xia; Wei Zeng; |
| 351 | S3OD: Towards Generalizable Salient Object Detection with Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a method that dramatically improves generalization through large-scale synthetic data generation and ambiguity-aware architecture. |
Orest Kupyn; Hirokatsu Kataoka; Christian Rupprecht; |
| 352 | Reinforcement Learning for Machine Learning Engineering Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show that in environments such as MLE where a good verifier is available, adapting the LM parameters through gradient updates can be more effective in utilizing compute and agent’s experience. |
Sherry Yang; Joy He-Yueya; Percy Liang; |
| 353 | Uncertainty-Aware Gaussian Map for Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: During navigation, existing agents commonly encounter perceptual uncertainty, such as insufficient evidence for reliable grounding or ambiguity in interpreting spatial cues, yet they typically ignore such information when predicting actions. In this work, we explicitly model three forms of perceptual uncertainty (i.e., geometric, semantic, and appearance uncertainty) and integrate them into the agent’s observation space to enable informed decision-making. |
Jianzhe Gao; Rui Liu; Yuxuan Xu; Tongtong Cao; Yingxue Zhang; Zhanguang Zhang; Sida Peng; Yi Yang; Wenguan Wang; |
| 354 | DiffusionBlocks: Block-wise Neural Network Training Via Diffusion Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose $\textit{DiffusionBlocks}$, a principled framework for transforming transformer-based networks into genuinely independent trainable blocks that maintain competitive performance with end-to-end training. |
Makoto Shing; Masanori Koyama; Takuya Akiba; |
| 355 | VideoNSA: Native Sparse Attention Scales Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We employ a hardware-aware hybrid approach to attention, preserving dense attention for text, while employing NSA for video. |
Enxin Song; Wenhao Chai; Shusheng Yang; Ethan J. Armand; Xiaojun Shan; Haiyang Xu; Jianwen Xie; Zhuowen Tu; |
| 356 | Beyond Pass@ 1: Self-Play with Variational Problem Synthesis Sustains RLVR Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we systematically analyze the policy’s generation diversity from the perspective of training problems and find that augmenting and updating training problems helps mitigate entropy collapse during training. |
Xiao Liang; Zhong-Zhi Li; Yeyun Gong; yelong shen; Ying Nian Wu; Zhijiang Guo; Weizhu Chen; |
| 357 | Addressing Divergent Representations from Causal Interventions on Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: First, we demonstrate theoretically and empirically that common causal intervention techniques often do shift internal representations away from the natural distribution of the target model. Then, we provide a theoretical analysis of two cases of such divergences: harmless divergences that occur in the behavioral null-space of the layer(s) of interest, and pernicious divergences that activate hidden network pathways and cause dormant behavioral changes. |
Satchel Grant; Simon Jerome Han; Alexa R. Tartaglini; Christopher Potts; |
| 358 | TSLM: Tree-Structured Language Modeling for Divergent Thinking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Tree-Structured Language Modeling (TSLM), which uses special tokens to encode branching structure, enabling models to generate and selectively expand multiple search paths within a single generation process. |
Doyoung Kim; JaeHyeok Doo; Minjoon Seo; |
| 359 | Spatially Guided Training for Vision-Language-Action Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SP-VLA, a dual-system **V**ision–**L**anguage–**A**ction framework that leverages **S**patial **P**riors as a bridge between linguistic instructions and embodiment-specific control.We will release code, data, and model checkpoints to support future research. |
Jinhui Ye; Fangjing Wang; Ning Gao; Junqiu Yu; Zhu Yangkun; Bin Wang; Jinyu Zhang; Weiyang Jin; Yanwei Fu; Feng Zheng; Yilun Chen; Jiangmiao Pang; |
| 360 | CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence Under Partial Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify three core cognitive challenges hindering this transition: spatial reasoning, long-horizon state tracking via mental simulation, and active exploration under partial observation. To isolate and evaluate these faculties, we introduce \textbf{CubeBench}, a novel generative benchmark centered on the Rubik’s Cube. |
Huan-ang Gao; Zikang Zhang; Tianwei Luo; Kaisen Yang; Xinzhe Juan; Jiahao Qiu; Tianxing Chen; Bingxiang He; Hao Zhao; Hao Zhou; Shilong Liu; Mengdi Wang; |
| 361 | MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite improvements, these static pipelines lack flexibility and adaptability in reasoning. To address this, we propose MMedAgent-RL, a reinforcement learning (RL)-based multi-agent framework that enables dynamic, optimized collaboration among medical agents. |
Peng Xia; Jinglu Wang; Yibo Peng; Kaide Zeng; Zihan Dong; Xian Wu; Xiangru Tang; Hongtu Zhu; Yun Li; Linjun Zhang; Shujie LIU; Yan Lu; Huaxiu Yao; |
| 362 | Large Scale Diffusion Distillation Via Score-Regularized Continuous-Time Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our investigation reveals fundamental quality limitations of sCM in fine-detail generation, which we attribute to error accumulation and the “mode-covering” nature of its forward-divergence objective. To remedy this, we propose the score-regularized continuous-time consistency model (rCM), which incorporates score distillation as a long-skip regularizer. |
Kaiwen Zheng; Yuji Wang; Qianli Ma; Huayu Chen; Jintao Zhang; Yogesh Balaji; Jianfei Chen; Ming-Yu Liu; Jun Zhu; Qinsheng Zhang; |
| 363 | MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MobiEdit, the first mobile knowledge editing framework that enables efficient LLM personalization on commercial off-the-shelf (COTS) mobile devices. |
Zhenyan Lu; Daliang Xu; Dongqi Cai; Zexi Li; Wei Liu; Jian Luan; Fangming Liu; Shangguang Wang; Mengwei Xu; |
| 364 | Global Resolution: Optimal Multi-Draft Speculative Sampling Via Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Two recent theoretical works have reframed the OTLP in terms of importance sampling or subset selection. In this work, we prove that these formulations are equivalent to an exponentially large relaxed OTLP, so it remains infeasible to solve. |
Rahul Krishna Thomas; Arka Pal; |
| 365 | VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose an end-to-end threat model where Visual Prompt Injection (VPI) manipulates CUAs in black-box settings to perform unauthorized actions or leak sensitive information, capturing the entire attack chain from injection to harmful outcomes. |
Tri Cao; Bennett Lim; Yue Liu; Yuan Sui; Yuexin Li; Shumin Deng; Lin Lu; Nay Oo; Shuicheng YAN; Bryan Hooi; |
| 366 | Pitfalls in Evaluating Language Model Forecasters Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models (LLMs) have recently been applied to forecasting tasks, with some works claiming these systems match or exceed human performance. In this paper, we argue that, as a community, we should be careful about such conclusions as evaluating LLM forecasters presents unique challenges. |
Daniel Paleka; Shashwat Goel; Jonas Geiping; Florian Tramèr; |
| 367 | ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation arises from the LLM approaches’ treating autoformalization as a simplistic translation task which lacks mechanisms for self-reflection and iterative refinement that human experts naturally employ. To address these issues, we propose ReForm, a Reflective Autoformalization method that tightly integrates semantic consistency evaluation into the autoformalization process. |
Guoxin Chen; Wu Jing; Xinjie Chen; Xin Zhao; Ruihua Song; Chengxi Li; Kai Fan; Dayiheng Liu; Minpeng Liao; |
| 368 | EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we use Apple Vision Pro to collect EgoDex: the largest and most diverse dataset of dexterous human manipulation to date. |
Ryan Hoque; Peide Huang; David J. Yoon; Mouli sivapurapu; Jian Zhang; |
| 369 | Vid2World: Crafting Video Diffusion Models to Interactive World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present _Vid2World_, a general approach for leveraging and transferring pre-trained video diffusion models into interactive world models. |
Siqiao Huang; Jialong Wu; Qixing Zhou; Shangchen Miao; Mingsheng Long; |
| 370 | Should We Still Pretrain Encoders with Masked Language Modeling? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it remains unclear whether these gains reflect an inherent advantage of the CLM approach or arise from confounding factors such as model and data scale. In this paper, we address this question through a series of large-scale, carefully controlled pretraining ablations, training a total of 38 models ranging from 210 million to 1 billion parameters, and conducting over 15,000 fine-tuning and evaluation runs. |
Hippolyte Gisserot-Boukhlef; Nicolas Boizard; Manuel Faysse; Duarte Miguel Alves; Emmanuel Malherbe; Andre Martins; CELINE HUDELOT; Pierre Colombo; |
| 371 | STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textbf{STEM} (\emph{Scaling Transformers with Embedding Modules}), a static, token-indexed approach that replaces the FFN up-projection with a layer-local embedding lookup while keeping the gate and down-projection dense. |
Ranajoy Sadhukhan; Sheng Cao; Harry Dong; Changsheng Zhao; Attiano Purpura-Pontoniere; Yuandong Tian; Zechun Liu; Beidi Chen; |
| 372 | Prompt and Parameter Co-Optimization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, prior work has typically studied them in isolation, leaving their synergistic potential largely underexplored. To bridge this gap, in this paper, we introduce MetaTuner, a novel framework that jointly integrates prompt optimization and fine-tuning for LLM training. |
Xiaohe Bo; Rui Li; Zexu Sun; Quanyu Dai; Zeyu Zhang; Zihang Tian; Xu Chen; Zhenhua Dong; |
| 373 | Reasoning on Time-Series for Financial Technical Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Verbal Technical Analysis (VTA), a novel framework that combine verbal and latent reasoning to produce stock time-series forecasts that are both accurate and interpretable. |
Kelvin J.L. Koa; Jan Chen; Yunshan Ma; Zheng Huanhuan; Tat-Seng Chua; |
| 374 | LiveClin: A Live Clinical Benchmark Without Leakage Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The reliability of medical LLM evaluation is critically undermined by data contamination and knowledge obsolescence, leading to inflated scores on static benchmarks. To address these challenges, we introduce LiveClin, a live benchmark designed for the faithful replication of clinical practice. |
Xidong Wang; Guo shuqi; Yue Shen; Junying Chen; Jian Wang; Jinjie Gu; Ping Zhang; Lei Liu; Benyou Wang; |
| 375 | Exploring The Limits of Sub-Billion Language Model Reasoners with Open Training Recipes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit the necessity of scaling to extremely large corpora (>10T tokens) for reasoning emergence. |
Changsheng Zhao; Ernie Chang; Zechun Liu; Chia-Jung Chang; Wei Wen; Chen Lai; Sheng Cao; Yuandong Tian; Raghuraman Krishnamoorthi; Yangyang Shi; Vikas Chandra; |
| 376 | VL-JEPA: Joint Embedding Predictive Architecture for Vision-language Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce VL-JEPA, a vision-language model built on a Joint Embedding Predictive Architecture (JEPA). |
Delong Chen; Mustafa Shukor; Théo Moutakanni; Willy Chung; Lei Yu; Tejaswi Kasarla; Allen Bolourchi; Yann LeCun; Pascale Fung; |
| 377 | ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce \textbf{\textsc{ST-WebAgentBench}}, a configurable and extensible framework designed as a first step toward enterprise-grade evaluation. |
Ido Levy; Ben wiesel; Sami Marreed; Alon Oved; Avi Yaeli; Segev Shlomov; |
| 378 | Priors in Time: Missing Inductive Biases for Language Model Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a novel SAE architecture—Temporal SAE—with a temporal inductive bias that decomposes representations at a given time into two parts: a predictable component, which can be inferred from the context, and a residual component, which captures novel information that cannot be captured by the context. |
Ekdeep Singh Lubana; Sai Sumedh R. Hindupur; Can Rager; Valérie Costa; Oam Patel; Sonia Krishna Murthy; Thomas Fel; Greta Tuckute; Daniel Wurgaft; Eric Bigelow; Demba E. Ba; Melanie Weber; Aaron Mueller; |
| 379 | Entropy-Monitored Kernelized Token Distillation for Audio-Visual Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a method for audio-visual knowledge distillation. |
Hyoungseob Park; Lipeng Ke; Pritish Mohapatra; Huajun Ying; sankar venkataraman; Alex Wong; |
| 380 | ORCaS: Unsupervised Depth Completion Via Occluded Region Completion As Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a method for inferring an egocentric dense depth map from an RGB image and a sparse point cloud. |
Hyoungseob Park; Runjian Chen; Patrick Rim; Dong Lao; Alex Wong; |
| 381 | A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present the first comprehensive security evaluation of A2A-MAS. |
Tianhao Li; Chuangxin Chu; Yujia Zheng; Bohan Zhang; Neil Zhenqiang Gong; Chaowei Xiao; |
| 382 | Breaking The Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel variance-adaptive algorithm VAEE (Variance-Aware Exploration with Elimination) for large action set, which actively explores actions that maximizes the information gain among a candidate set of actions that are not eliminated. |
Heyang Zhao; Tianyuan Jin; Weixin Wang; Vincent Y. F. Tan; Pan Xu; Quanquan Gu; |
| 383 | Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Upgrading these is often prohibitively expensive, as it requires complete vision-language alignment retraining which is costly. To address this issue, we introduce Perception-Reasoning Decoupling, which modularizes the MLLM’s reasoning component and makes it easily replaceable. |
Yunhao Gou; Kai Chen; Zhili Liu; Lanqing HONG; Xin Jin; Zhenguo Li; James Kwok; Yu Zhang; |
| 384 | To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In fact, we show that given the right choice of tool access and problem-dependent training data, SSMs can learn to solve any tractable problem and generalize to arbitrary problem length/complexity (i.e., achieve length generalization). |
Eran Malach; Omid Saremi; Sinead Williamson; Arwen Bradley; Aryo Lotfi; Emmanuel Abbe; Joshua M. Susskind; Etai Littwin; |
| 385 | Enhancing Language Model Reasoning with Structured Multi-Level Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This problem is particularly severe for smaller language models (LMs) with long CoTs due to their limited capacity. To address this, we propose Multi-Level Reasoning (MLR), which reformulates long-CoT generation as a two-level stochastic process. |
Siheng Xiong; Ali Payani; Faramarz Fekri; |
| 386 | PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Compared to classical 3D layout generation, producing complex physical scenes introduces additional challenges: (a) higher object density and complexity (e.g., a small shelf may hold dozens of books), (b) richer supporting relationships and compact spatial layouts, and (c) the need to accurately model both spatial placement and physical properties. To address these challenges, we propose PhyScensis, an LLM agent-based framework powered by a physics engine, to produce physically plausible scene configurations with high complexity. |
Yian Wang; Han Yang; Minghao Guo; Xiaowen Qiu; Tsun-Hsuan Wang; Wojciech Matusik; Joshua B. Tenenbaum; Chuang Gan; |
| 387 | Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate RL-based approaches to promote reasoning efficiency. |
Wei Liu; Ruochen Zhou; Yiyun Deng; Yuzhen Huang; Junteng Liu; Yuntian Deng; Yizhe Zhang; Junxian He; |
| 388 | Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we present AnthroBench: a novel empirical method and tool for evaluating anthropomorphic LLM behaviours in realistic settings. |
Lujain Ibrahim; Canfer Akbulut; Rasmi Elasmar; Charvi Rastogi; Minsuk Kahng; Meredith Ringel Morris; Kevin R. McKee; Verena Rieser; Murray Shanahan; Laura Weidinger; |
| 389 | Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Surprisingly, we reveal a new and concerning risk along with the practice: the provider of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to the fine-tuned downstream model. |
Zhexin Zhang; Yuhao Sun; Junxiao Yang; Shiyao Cui; yuanchao zhang; Hongning Wang; Minlie Huang; |
| 390 | Much Ado About Noising: Dispelling The Myths of Generative Robotic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. |
Chaoyi Pan; Giri Anantharaman; Nai-Chieh Huang; Claire Jin; Daniel Pfrommer; Chenyang Yuan; Frank Permenter; Guannan Qu; Nicholas Matthew Boffi; Guanya Shi; Max Simchowitz; |
| 391 | STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present STream3R, a novel approach to 3D reconstruction that reformulates pointmap prediction as a decoder-only Transformer problem. |
Yushi LAN; Yihang Luo; Fangzhou Hong; Shangchen Zhou; Honghua Chen; Zhaoyang Lyu; Bo Dai; Shuai Yang; Chen Change Loy; Xingang Pan; |
| 392 | The Markovian Thinker Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Delethink, a thinking algorithm that realizes the Markovian Thinking Paradigm. |
Milad Aghajohari; Kamran Chitsaz; Amirhossein Kazemnejad; Sarath Chandar; Alessandro Sordoni; Aaron Courville; Siva Reddy; |
| 393 | PI-Light: Physics-Inspired Diffusion for Full-Image Relighting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing attempts to bridge the synthetic-to-real gap for full-scene relighting remain suboptimal. To tackle these challenges, we introduce **P**hysics-**I**nspired diffusion for full-image re**Light** ($\pi$-Light, or PI-Light), a two-stage framework that leverages physics-inspired diffusion models. |
Zhexin Liang; Zhaoxi Chen; Yongwei Chen; Tianyi Wei; Tengfei Wang; Xingang Pan; |
| 394 | Decoupled DMD: CFG Augmentation As The Spear, Distribution Matching As The Shield Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Among these, Distribution Matching Distillation (DMD) and its variants stand out for their impressive performance, which is widely attributed to their core mechanism of matching the student’s output distribution to that of a pre-trained teacher model. In this work, we challenge this conventional understanding. |
Dongyang Liu; Peng Gao; David Liu; Ruoyi Du; Zhen Li; Qilong Wu; Xin Jin; Sihan Cao; Shifeng Zhang; Steven HOI; Hongsheng Li; |
| 395 | The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing language agent benchmarks often focus on narrow domains or simplified tasks that lack the diversity, realism, and long-horizon complexity required to evaluate agents’ real-world performance. To address this gap, we introduce the Tool Decathlon (dubbed as Toolathlon), a benchmark for language agents offering diverse applications and tools, realistic environment setup, and reliable execution-based evaluation. |
Junlong Li; Wenshuo Zhao; Jian Zhao; Weihao Zeng; Haoze Wu; Xiaochen Wang; Rui Ge; Yuxuan Cao; Yuzhen Huang; Wei Liu; Junteng Liu; Zhaochen Su; Yiyang Guo; Fan Zhou; Lueyang Zhang; Juan Michelini; Xingyao Wang; Xiang Yue; Shuyan Zhou; Graham Neubig; Junxian He; |
| 396 | InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we discard the single-entity assumption and introduce a novel framework that enforces strong, region‑specific binding of conditions from modalities to each identity’s spatiotemporal footprint. |
Zhenzhi Wang; Jiaqi Yang; Jianwen Jiang; Chao Liang; Gaojie Lin; Zerong Zheng; Ceyuan Yang; Yuan Zhang; Mingyuan Gao; Dahua Lin; |
| 397 | ARMOR: Aligning Secure and Safe Large Language Models Via Meticulous Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose ARMOR, which introduces a structured three-step reasoning pipeline: (1) analyze jailbreak strategies from an external, updatable strategy library, (2) extract the core intent, and (3) apply policy-based safety verification. |
Zhengyue Zhao; YingziYingzi Ma; Somesh Jha; Marco Pavone; Patrick McDaniel; Chaowei Xiao; |
| 398 | VisionReasoner: Unified Reasoning-Integrated Visual Perception Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce VisionReasoner, a unified framework capable of reasoning and solving multiple visual perception tasks within a shared model. |
Yuqi Liu; Tianyuan Qu; Zhisheng Zhong; Bohao PENG; Shu Liu; Bei Yu; Jiaya Jia; |
| 399 | Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In policy optimization algorithms like GRPO, this collapses the advantage calculation to zero, rendering these difficult problems invisible to the learning gradient and stalling progress. To overcome this, we introduce Scaf-GRPO (Scaffolded Group Relative Policy Optimization), a progressive training framework that strategically provides minimal guidance only when a model’s independent learning has plateaued. |
Xichen Zhang; Sitong Wu; Yinghao Zhu; Haoru Tan; Shaozuo Yu; Ziyi He; Jiaya Jia; |
| 400 | EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenges posed by our benchmark, we present EvoTest, an evolutionary test-time learning framework that improves an agent without any fine-tuning or gradients—by evolving the entire agentic system after every episode. |
Yufei He; Juncheng Liu; Yue Liu; Yibo Li; Tri Cao; Zhiyuan Hu; Xinxing Xu; Bryan Hooi; |
| 401 | Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this landscape, structure-based de novo binder design is most often cast as either conditional generative modeling or sequence optimization via structure predictors (hallucination). We argue that this is a false dichotomy and propose Complexa, a novel fully atomistic binder generation method unifying both paradigms. |
Kieran Didi; Zuobai Zhang; Guoqing Zhou; Danny Reidenbach; Zhonglin Cao; Sooyoung Cha; Tomas Geffner; Christian Dallago; Jian Tang; Michael M. Bronstein; Martin Steinegger; Emine Kucukbenli; Arash Vahdat; Karsten Kreis; |
| 402 | EditLens: Quantifying The Extent of AI Editing in Text Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: First, we propose using lightweight similarity metrics to quantify the magnitude of AI editing present in a text given the original human-written text and validate these metrics with human annotators. Using these similarity metrics as intermediate supervision, we then train EditLens, a regression model that predicts the amount of AI editing present within a text. |
Katherine Thai; Bradley Emi; Elyas Masrour; Mohit Iyyer; |
| 403 | Cautious Optimizers: Improving Training with One Line of Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose a \textbf{single-line modification in Pytorch} to any momentum-based optimizer, which we rename cautious optimizer, e.g. C-AdamW and C-Lion. |
Kaizhao Liang; Lizhang Chen; Bo Liu; qiang liu; |
| 404 | LogicReward: Incentivizing LLM Reasoning Via Step-Wise Logical Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior work introduces supervision on intermediate steps but still lacks guarantees of logical soundness, which is crucial in high-stakes scenarios where logical consistency is paramount. To address this, we propose LogicReward, a novel reward system that guides model training by enforcing step-level logical correctness with a theorem prover. |
Jundong Xu; Hao Fei; Huichi Zhou; Xin Quan; Qijun Huang; Shengqiong Wu; William Yang Wang; Mong-Li Lee; Wynne Hsu; |
| 405 | DispViT: Direct Stereo Disparity Regression with A Single-Stream Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces \textbf{DispViT}, a new architecture that establishes a \textbf{regression-centric paradigm}. |
Tongfan Guan; Jiaxin Guo; Tianyu Huang; Jinhu Dong; Chen Wang; Yun-Hui Liu; |
| 406 | Why Is Your Language Model A Poor Implicit Reward Model? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: They can be trained using the same data, loss function, and language model, and differ only in how the reward is computed. Toward a fundamental understanding of the implicit biases underlying different reward model types, we investigate the root cause of this gap. |
Noam Razin; Yong Lin; Jiarui Yao; Sanjeev Arora; |
| 407 | Breaking Barriers: Do Reinforcement Fine-tuning Gains Transfer To Unseen Domains? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To understand the generalizability of RPT, we conduct two studies. (1) Observational: we compare a wide range of open-weight RPT models against their corresponding base models across multiple domains, including both seen and unseen domains in their fine-tuning data. (2) Interventional: we fine-tune LLMs with RPT on single domains and evaluate their performance across multiple domains. |
Chuxuan Hu; Yuxuan Zhu; Antony Kellermann; Caleb Biddulph; Suppakit Waiwitlikhit; Jason Benn; Daniel Kang; |
| 408 | AlphaFlow: Understanding and Improving MeanFlow Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that the MeanFlow objective naturally decomposes into two parts: trajectory flow matching and trajectory consistency. |
Huijie Zhang; Aliaksandr Siarohin; Willi Menapace; Michael Vasilkovsky; Sergey Tulyakov; Qing Qu; Ivan Skorokhodov; |
| 409 | Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a simple yet effective approach to mitigate quality degradation in long-horizon video generation without requiring supervision from long-video teachers or retraining on long video datasets. |
Justin Cui; Jie Wu; Ming Li; Tao Yang; Xiaojie Li; Rui Wang; Andrew Bai; Yuanhao Ban; Cho-Jui Hsieh; |
| 410 | Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study in-context learning (ICL) of linear regression in a deep linear self-attention model, characterizing how performance depends on various computational and statistical resources (width, depth, number of training steps, batch size and data per context). |
Blake Bordelon; Mary Letey; Cengiz Pehlevan; |
| 411 | ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Real-world travel planning exemplifies these challenges, evaluating agents’ abilities to handle constraints that are explicit, implicit, and even evolving based on interactions with dynamic environments and user needs. In this paper, we present ATLAS, a general multi-agent framework designed to effectively handle such complex nature of constraints awareness in real-world travel planning tasks. |
Jihye Choi; Jinsung Yoon; Jiefeng Chen; Somesh Jha; Tomas Pfister; |
| 412 | PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Cross-modal alignment models fail to learn stable correspondences with scarce training data, and memory-based approaches misclassify any unseen normal variation as anomalous.To addresses the few-shot challenge, we introduce PIRN (Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.) |
YITING LI; Xulei Yang; Jing Zhang; Sichao Tian; Jingyi Liao; Fayao Liu; |
| 413 | UNIVERSAL AND EFFICIENT LOADING BALANCING FOR RL TRAINING OF LARGE MULTIMODAL MODELS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing systems either focus on text-only RL or employ general load-balancing techniques that are incompatible with the small-batch, iterative nature of RL training. To address these challenges, we present FlexRL, a holistic system designed to optimize the end-to-end VLM RL pipeline. |
Zerui Wang; Qinghao Hu; Jiecheng Zhou; Chang Chen; Haojie Duanmu; Xingcheng Zhang; Peng Sun; Dahua Lin; |
| 414 | RankLLM: Weighted Ranking of LLMs By Quantifying Question Difficulty Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing benchmarks fail to differentiate question difficulty, limiting their ability to effectively distinguish models’ capabilities. To address this limitation, we propose RankLLM, a novel framework designed to quantify both question difficulty and model competency. |
Xingjian Hu; Ziqian Zhang; Yue Huang; Kai Zhang; Ruoxi Chen; Yixin Liu; Qingsong Wen; Kaidi Xu; Xiangliang Zhang; Neil Zhenqiang Gong; Lichao Sun; |
| 415 | Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce **Generate Any Scene**, a data engine that systematically enumerates scene graphs representing the combinatorial array of possible visual scenes. |
Ziqi Gao; Weikai Huang; Jieyu Zhang; Aniruddha Kembhavi; Ranjay Krishna; |
| 416 | Self-Speculative Masked Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. |
Andrew Campbell; Valentin De Bortoli; Jiaxin Shi; Arnaud Doucet; |
| 417 | Unified In-Context Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce UNified In-Context Video Editing (UNIC), a simple yet effective framework that unifies diverse video editing tasks within a single model in an in-context manner.To validate our method, we construct a unified video editing benchmark containing six representative video editing tasks. |
Zixuan Ye; Xuanhua He; Quande Liu; Qiulin Wang; Xintao Wang; Pengfei Wan; Di ZHANG; Kun Gai; Qifeng Chen; Wenhan Luo; |
| 418 | Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To achieve real synergy, we introduce the interleaved Analyzing–Drafting problem-solving loop (AD-Loop), a new think paradigm that dynamically alternates between analytic and drafting operations. |
Shengqiong Wu; Bobo Li; Xinkai Wang; Xiangtai Li; Lei Cui; Furu Wei; Shuicheng YAN; Hao Fei; Tat-Seng Chua; |
| 419 | One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling a broad and diverse suite of tasks, gradient conflicts and the loss of model plasticity often constrain their sample efficiency. In this work, we address these challenges from two complementary perspectives: the single learning iteration and the overall learning process. |
Yuan Pu; Yazhe Niu; Jia Tang; Junyu Xiong; Shuai Hu; Hongsheng Li; |
| 420 | Improving Code Localization with Repository Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, human developers naturally build long-term repository memory, such as the functionality of key modules and associations between various bug types and their likely fix locations. In this work, we augment language agents with such memory by leveraging a repository’s *commit history* – a rich yet underutilized resource that chronicles the codebase’s evolution. |
Boshi Wang; Weijian Xu; Yunsheng Li; Xuemei Gao; Yujia Xie; Huan Sun; Dongdong Chen; |
| 421 | Multiple-Prediction-Powered Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such evaluation often involves tradeoffs between expensive, high-quality measurements and a variety of lower-quality proxies. We introduce Multiple-Prediction-Powered Inference (MultiPPI), a general framework for constructing statistically efficient estimates by optimally allocating resources across these diverse data sources. |
Charlie Cowen-Breen; Alekh Agarwal; Stephen Bates; William W. Cohen; Jacob Eisenstein; Amir Globerson; Adam Fisch; |
| 422 | DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This constraint severely degrades code infilling performance when the predefined mask size mismatches the ideal completion length. To address this, we propose DreamOn, a novel diffusion framework that enables dynamic, variable-length generation. |
Zirui Wu; Lin Zheng; Zhihui Xie; Jiacheng Ye; Jiahui Gao; Shansan Gong; Yansong Feng; Zhenguo Li; Wei Bi; Guorui Zhou; Lingpeng Kong; |
| 423 | Addressing Pitfalls in The Evaluation of Uncertainty Estimation Methods for Natural Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose using several alternative risk indicators for risk correlation experiments that improve robustness of empirical assessment of UE algorithms for NLG. |
Mykyta Ielanskyi; Kajetan Schweighofer; Lukas Aichberger; Sepp Hochreiter; |
| 424 | Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Human cognitive behavior arises from the interaction of specialized brain networks dedicated to distinct functions, such as language, logic, and social reasoning. Inspired by this organization, we propose Mixture of Cognitive Reasoners (MiCRo): a modular, transformer-based architecture post-trained with a curriculum that induces functional specialization across experts. |
Badr AlKhamissi; C. Nicolò De Sabbata; Greta Tuckute; Zeming Chen; Martin Schrimpf; Antoine Bosselut; |
| 425 | Aurora: Towards Universal Generative Multimodal Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Aurora, a Multimodal Time Series Foundation Model, which supports multimodal inputs and zero-shot inference. |
Xingjian Wu; Jianxin Jin; Wanghui Qiu; Peng Chen; Yang Shu; Bin Yang; Chenjuan Guo; |
| 426 | Enhancing Molecular Property Predictions By Learning from Bond Modelling and Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This oversight limits their predictive accuracy for nuanced chemical behaviors. To address this limitation, we introduce \textbf{DeMol}, a dual-graph framework whose architecture is motivated by a rigorous information-theoretic analysis demonstrating the information gain from a bond-centric perspective. |
Yunqing LIU; Yi Zhou; Wenqi Fan; |
| 427 | Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We analyze this regime and show that cross-model semantic disagreement is higher on incorrect answers precisely when AU is low. Motivated by this, we introduce an epistemic uncertainty (EU) term that operates in the black-box access setting: EU uses only generated text from a small, scale-matched ensemble and is computed as the gap between inter-model and intra-model sequence-semantic similarity. |
Kimia Hamidieh; Veronika Thost; Walter Gerych; Mikhail Yurochkin; Marzyeh Ghassemi; |
| 428 | Towards Efficient Constraint Handling in Neural Solvers for Routing Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present Construct-and-Refine (CaR), the first general and efficient constraint-handling framework for neural routing solvers based on explicit learning-based feasibility refinement. |
Jieyi Bi; Zhiguang Cao; Jianan Zhou; Wen Song; Yaoxin Wu; Jie Zhang; Yining Ma; Cathy Wu; |
| 429 | Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a systematic and comprehensive investigation of omni detailed perception from the perspectives of the data pipeline, models, and benchmark. |
Ziyang Ma; Ruiyang Xu; Zhenghao Xing; Yunfei Chu; Yuxuan Wang; Jinzheng He; Jin Xu; Pheng-Ann Heng; Kai Yu; Junyang Lin; Eng Siong Chng; Xie Chen; |
| 430 | Leveraging Discrete Function Decomposability for Scientific Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables. |
James C Bowden; Sergey Levine; Jennifer Listgarten; |
| 431 | MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we introduce $\mathtt{MotionSight}$, a novel zero-shot method pioneering object-centric visual spotlight and motion blur as visual prompts to effectively improve fine-grained motion understanding without training.In summary, we present a novel zero-shot method and a large-scale, high-quality dataset specifically for fine-grained motion understanding. |
Yipeng Du; Tiehan Fan; Kepan Nan; Rui Xie; Penghao Zhou; Xiang Li; Jian Yang; Zhenheng Yang; Ying Tai; |
| 432 | From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Analyzing embeddings from 40+ LLMs against classic human categorization benchmarks, we uncover three key findings. First, LLMs broadly align with human categories but miss fine-grained semantic distinctions crucial for human understanding. Second, LLMs demonstrate aggressive statistical compression, achieving “optimal” information-theoretic efficiency, while humans prioritize contextual richness and adaptive flexibility. Third, encoder models surprisingly outperform decoder models in human alignment, suggesting that generation and understanding rely on distinct mechanisms in current architectures. |
Chen Shani; Liron Soffer; Dan Jurafsky; Yann LeCun; Ravid Shwartz-Ziv; |
| 433 | AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To better address the trade-off between safety and utility, we present a theoretically grounded and empirically effective activation steering method called AlphaSteer. |
Leheng Sheng; Changshuo Shen; Weixiang Zhao; Junfeng Fang; Xiaohao Liu; Zhenkai Liang; Xiang Wang; An Zhang; Tat-Seng Chua; |
| 434 | BAR: Refactor The Basis of Autoregressive Visual Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work proposes Basis Autoregressive (BAR), a novel paradigm that conceptualizes tokens as basis vectors within the image space and employs an end-to-end learnable approach to transform basis. |
Zhicong Tang; Dong Chen; Jianmin Bao; Baining Guo; |
| 435 | Parallel Multimodal Diffusion Language Models for Thinking-Aware Editing and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis using ParaBench reveals that this performance degradation is strongly correlated with poor alignment between the generated reasoning and the final image. To resolve this, we propose a parallel multimodal diffusion framework that enables continuous, bidirectional interaction between text and images throughout the entire denoising trajectory. |
Ye Tian; Ling Yang; JiongFan Yang; Anran Wang; Yu Tian; Jiani zheng; Haochen Wang; Zhiyang Teng; Zhuochen Wang; Yinjie Wang; Yunhai Tong; Mengdi Wang; Xiangtai Li; |
| 436 | Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textbf{STARE (Spatial Transformations and Reasoning Evaluation)}, a benchmark designed to rigorously evaluate multimodal large language models on tasks better solved through multi-step visual simulation. |
Linjie Li; Mahtab Bigverdi; Jiawei Gu; Zixian Ma; Yinuo Yang; Ziang Li; Yejin Choi; Ranjay Krishna; |
| 437 | STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Capturing such cross-scale interactions within a unified framework remains an open problem. To address this gap, we propose \textbf{STORM}, a spatio-temporal model that disentangles atmospheric variations into multiple scales to uncover scale-specific dependencies. |
Qihe Huang; Zhengyang Zhou; Yangze Li; Jiaming Ma; Kuo Yang; Binwu Wang; Xu Wang; Yang Wang; |
| 438 | TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: A key challenge in aligning TTA models lies in creating preference pairs, as TTA lacks structured mechanisms like verifiable rewards or gold-standard answers available for Large Language Models (LLMs). To address this, we propose CLAP-Ranked Preference Optimization (CRPO), a novel framework that iteratively generates and optimizes preference data to enhance TTA alignment. |
Chia-Yu Hung; Navonil Majumder; Zhifeng Kong; Ambuj Mehrish; Amir Zadeh; Chuan Li; Rafael Valle; Bryan Catanzaro; Soujanya Poria; |
| 439 | Attention Sinks and Compression Valleys in LLMs Are Two Sides of The Same Coin Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a surprising connection between attention sinks and compression valleys, tracing both to the formation of massive activations in the residual stream. |
Enrique Queipo-de-Llano; Alvaro Arroyo; Federico Barbero; Xiaowen Dong; Michael M. Bronstein; Yann LeCun; Ravid Shwartz-Ziv; |
| 440 | DParallel: Learnable Parallel Decoding for DLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, their parallel decoding potential remains largely underexplored, as existing open-source models still require nearly token-length decoding steps to ensure performance. To address this, we introduce dParallel, a simple and effective method that unlocks the inherent parallelism of dLLMs for fast sampling. |
Zigeng Chen; Gongfan Fang; Xinyin Ma; Ruonan Yu; Xinchao Wang; |
| 441 | All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Of the hypotheses considered, we find the most support for the explanation that on problems with a *generation-verification gap*, *(1)* it is relatively easy to learn the relatively simple RM (*verifier*) from the preference data. |
Gokul Swamy; Sanjiban Choudhury; Wen Sun; Steven Wu; Drew Bagnell; |
| 442 | Hystar: Hypernetwork-driven Style-adaptive Retrieval Via Dynamic SVD Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose the Hypernetwork-driven Style-adaptive Retrieval (Hystar), a lightweight framework that dynamically adapts model weights to each query’s style. |
Yujia Cai; Boxuan Li; Chenghao Xu; Jiexi Yan; |
| 443 | Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation arises from entangled features in overlapping regions, leading to degraded visual fidelity. To address this, we present RoboMaster, a novel framework that models inter-object dynamics via a collaborative trajectory formulation. |
Xiao Fu; Xintao Wang; Xian Liu; Jianhong Bai; Runsen Xu; Pengfei Wan; Di ZHANG; Dahua Lin; |
| 444 | Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Pixel3DMM, a set of highly-generalized vision transformers which predict per-pixel geometric cues in order to constrain the optimization of a 3D morphable face model (3DMM).To evaluate our method, we introduce a new benchmark for single-image face reconstruction, which features high diversity facial expressions, viewing angles, and ethnicities. |
Simon Giebenhain; Tobias Kirschstein; Martin Rünz; Lourdes Agapito; Matthias Nießner; |
| 445 | Pretraining Scaling Laws for Generative Evaluations of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose and evaluate three different pretraining scaling laws for fitting pass-at-$k$ on generative evaluations and for predicting pass-at-$k$ of the most expensive model using cheaper models. |
Rylan Schaeffer; Noam Itzhak Levi; Brando Miranda; Sanmi Koyejo; |
| 446 | Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Combining the advantages of majority voting and BoN, we propose a new inference strategy called Best-of-Majority (BoM), with a pivotal step that restricts the candidates to the responses with high frequency in the $N$ samples before selecting the top-$k$ rewards. |
Qiwei Di; Kaixuan Ji; Xuheng Li; Heyang Zhao; Quanquan Gu; |
| 447 | Flow Caching for Autoregressive Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present FlowCache, the first caching framework specifically designed for autoregressive video generation. |
Yuexiao Ma; Xuzhe Zheng; Jing Xu; Xiwei Xu; Feng Ling; Xiawu Zheng; Huafeng Kuang; Huixia Li; XING WANG; Xuefeng Xiao; Fei Chao; Rongrong Ji; |
| 448 | DrVoice: Parallel Speech-Text Voice Conversation Model Via Dual-Resolution Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents DrVoice, a parallel speech-text voice conversation model based on joint autoregressive modeling, featuring dual-resolution speech representations. |
Chao-Hong Tan; Qian Chen; Wen Wang; Chong Deng; Qinglin Zhang; Luyao Cheng; Hai Yu; Xin Zhang; Xiang Lyu; Tianyu Zhao; Chong Zhang; Yukun Ma; Yafeng Chen; Hui Wang; Jiaqing Liu; Xiangang Li; Jieping Ye; |
| 449 | Towards A Sharp Analysis of Learning Offline $f$-Divergence-Regularized Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the exact concentrability requirements to achieve the $\tilde{\Theta}(\epsilon^{-1})$ sample complexity for offline $f$-divergence-regularized contextual bandits. |
Qingyue Zhao; Kaixuan Ji; Heyang Zhao; Tong Zhang; Quanquan Gu; |
| 450 | Demystifying Supervision Data Generalization in Multimodal LMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we take the first step to study the problem in MLLMs: can we predict a training data’s influence on a target benchmark even before any training takes place? |
Xuan Qi; Luxi He; Dan Roth; Xingyu Fu; |
| 451 | Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce “Task Tokens” – a method to effectively tailor BFMs to specific tasks while preserving their flexibility. |
Ron Vainshtein; Zohar Rimon; Shie Mannor; Chen Tessler; |
| 452 | Enhancing Visual Token Representations for Video Large Language Models Via Training-free Spatial-Temporal Pooling and Gridding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods, such as LLaVA family, utilize simplistic pooling or interpolation techniques that overlook the intricate dynamics of visual tokens. To bridge this gap, we propose ST-GridPool, a novel training-free visual token enhancement method designed specifically for Video LLMs. |
Bingjun Luo; Tony Wang; Hanqi Chen; Xinpeng Ding; |
| 453 | ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches largely overlook a critical dimension of video content, i.e., changes and turning points, and they lack a collaborative model for spatio-temporal relationships. To address this, we propose a new perspective: similarity is for identifying redundancy, while difference is for capturing key events. |
Bingjun Luo; Tony Wang; Chaoqi Chen; Xinpeng Ding; |
| 454 | Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We create LitmusValues, an evaluation pipeline to reveal AI models’ priorities on a range of AI value classes.By measuring an AI model’s value prioritization using its aggregate choices, we obtain a self-consistent set of predicted value priorities that uncover potential risks. |
Yu Ying Chiu; Zhilin Wang; Sharan Maiya; Yejin Choi; Kyle Fish; Sydney Levine; Evan J Hubinger; |
| 455 | Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose DualEdit, a dual-objective model editing framework that jointly promotes affirmative outputs and suppresses refusal responses. |
Houcheng Jiang; Zetong Zhao; Junfeng Fang; Haokai Ma; Ruipeng Wang; Yang Deng; Xiang Wang; Xiangnan He; |
| 456 | DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we introduce DeepMath-103K, a large-scale mathematical dataset designed with high difficulty (primarily levels 5-9), rigorous decontamination against numerous benchmarks, and verifiable answers for rule-based RL reward. |
Zhiwei He; Tian Liang; Jiahao Xu; Qiuzhi Liu; Xingyu Chen; Yue Wang; Linfeng Song; Dian Yu; Zhenwen Liang; Wenxuan Wang; Zhuosheng Zhang; Rui Wang; Zhaopeng Tu; Haitao Mi; Dong Yu; |
| 457 | Don’t Settle Too Early: Self-Reflective Remasking for Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Remasking-enabled Diffusion Language Model (RemeDi), a mask-based DLM that introduces remasking as another fundamental mechanism, enabling more flexible text refinement in diffusion-based text generation. |
Zemin Huang; Yuhang Wang; Zhiyang Chen; Guo-Jun Qi; |
| 458 | LLM Pretraining with Continuous Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. |
Jihoon Tack; Jack Lanchantin; Jane Yu; Andrew Cohen; Ilia Kulikov; Janice Lan; Shibo Hao; Yuandong Tian; Jason E Weston; Xian Li; |
| 459 | Variance-Dependent Regret Lower Bounds for Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, to overcome the limitations of Jia et al. (2024), we consider the general variance sequence under two settings. |
Jiafan He; Quanquan Gu; |
| 460 | Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that a key bottleneck in learning sampling-consistent predictive diffusion models lies in suboptimal predictive ability, which we attribute to the entanglement of condition understanding and target denoising within shared architectures and co-training schemes. To address this, we propose **Foresight Diffusion (ForeDiff)**, a framework for predictive diffusion models that improves sampling consistency by decoupling condition understanding from target denoising. |
Yu Zhang; Xingzhuo Guo; Haoran Xu; Jialong Wu; Mingsheng Long; |
| 461 | MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge the gap, this paper proposes CapFlow, a novel multi-agent collaboration workflow. |
Zhenxin Lei; Zhangwei Gao; Changyao Tian; Erfei Cui; Guanzhou Chen; Danni Yang; Yuchen Duan; Zhaokai Wang; Wenhao Li; Weiyun Wang; Xiangyu Zhao; Jiayi Ji; Yu Qiao; Wenhai Wang; Gen Luo; |
| 462 | Selective Data Removal for Distributional Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a distance-based selection algorithm and show it is quadratically more sample-efficient than random removal in the challenging low-divergence regime. |
Youssef Allouah; Rachid Guerraoui; Sanmi Koyejo; |
| 463 | Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Puffin, a unified camera-centric multimodal model that extends spatial awareness along the camera dimension.We will release the code, models, dataset pipeline, and benchmark to advance multimodal spatial intelligence research. |
Kang Liao; Size Wu; Zhonghua Wu; Linyi Jin; Chao Wang; Yikai Wang; Fei Wang; Wei Li; Chen Change Loy; |
| 464 | StoryAlign: Evaluating and Training Reward Models for Story Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically evaluate the modeling of human story preferences and introduce StoryRMB, the first benchmark for assessing reward models on story preferences.We will release our dataset, model, and code to facilitate future research. |
Haotian Xia; Hao Peng; Yunjia Qi; Bin Xu; Lei Hou; Juanzi Li; |
| 465 | MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks Via MCP Servers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a multi-faceted evaluation framework covering tool-level schema understanding and usage, trajectorylevel planning and task completion. |
Zhenting Wang; Qi Chang; Hemani Patel; Shashank Biju; Cheng-En Wu; Quan Liu; Aolin Ding; Alireza Rezazadeh; Ankit Shah; Yujia Bao; Eugene Siow; |
| 466 | Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. |
Shirley Wu; Parth Sarthi; Shiyu Zhao; Aaron Lee; Herumb Shandilya; Adrian Mladenic Grobelnik; Nurendra Choudhary; Edward W Huang; Karthik Subbian; Linjun Zhang; Diyi Yang; James Zou; Jure Leskovec; |
| 467 | RAPID$^3$: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Alternatively, dynamic neural networks offer per-image adaptive acceleration, but their high fine-tuning costs limit broader applicability. To address these limitations, we introduce RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer framework that delivers image-wise acceleration with zero updates to the base generator. |
Wangbo Zhao; Yizeng Han; Zhiwei Tang; Jiasheng Tang; Pengfei Zhou; Kai Wang; Bohan Zhuang; Zhangyang Wang; Fan Wang; Yang You; |
| 468 | CLIP Behaves Like A Bag-of-Words Model Cross-modally But Not Uni-modally Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In particular, CLIP struggles to correctly bind attributes to their corresponding objects when multiple objects are present in an image or text. In this work, we investigate why CLIP exhibits this BoW-like behavior. |
Darina Koishigarina; Arnas Uselis; Seong Joon Oh; |
| 469 | Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel multimodal time series anomaly detection model (MindTS) that focuses on addressing two key challenges: (1) how to achieve semantically consistent alignment across heterogeneous multimodal data, and (2) how to filter out redundant modality information to enhance cross-modal interaction effectively. |
Shiyan Hu; Jianxin Jin; Yang Shu; Peng Chen; Bin Yang; Chenjuan Guo; |
| 470 | Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Agora, a framework that reframes coordination as a decentralized market for uncertainty. |
Jusheng Zhang; Yijia Fan; Kaitong Cai; Jing Yang; Jiawei Yao; Jian Wang; Guanlong Qu; Ziliang Chen; Keze Wang; |
| 471 | Quantile Advantage Estimation for Entropy-Safe Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose {Quantile Advantage Estimation} (QAE), replacing the mean with a group-wise $K$-quantile baseline. |
Junkang Wu; Kexin Huang; Jiancan Wu; An Zhang; Xiang Wang; Xiangnan He; |
| 472 | LS-Merge: Merging Language Models in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To align heterogeneous models, we introduce a dimensionality-matching projection that allows interpolation between models of different sizes. |
Bedionita Soro; Aoxuan Silvia Zhang; Bruno Andreis; Jaehyeong Jo; Song Chong; Sung Ju Hwang; |
| 473 | PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PRISM-Physics, a process-level evaluation framework and benchmark for complex physics reasoning problems. |
Wanjia Zhao; Qinwei Ma; Jingzhe Shi; Shirley Wu; Jiaqi Han; Yijia Xiao; Si-Yuan Chen; Xiao Luo; Ludwig Schmidt; James Zou; |
| 474 | ShapeGen4D: Towards High Quality 4D Shape Generation from Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. |
Jiraphon Yenphraphai; Ashkan Mirzaei; Jianqi Chen; Jiaxu Zou; Sergey Tulyakov; Raymond A. Yeh; Peter Wonka; Chaoyang Wang; |
| 475 | Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Whether VLMs can effectively leverage such multi-view inputs for robotic reasoning therefore remains an open question. To bridge this gap, we introduce MV-RoboBench, a benchmark specifically designed to evaluate the multi-view spatial reasoning capabilities of VLMs in robotic manipulation. |
ZhiYuan Feng; Zhaolu Kang; Qijie Wang; Zhiying Du; Jiongrui Yan; Shi Shubin; Chengbo Yuan; Huizhi Liang; Yu Deng; Qixiu Li; Rushuai Yang; Ruichuan An; Leqi Zheng; Weijie Wang; Shawn Chen; Sicheng Xu; Yaobo Liang; Jiaolong Yang; Baining Guo; |
| 476 | Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and the actual noise level encoded in intermediate states during sampling. |
Jincheng Zhong; Boyuan Jiang; Xin Tao; Pengfei Wan; Kun Gai; Mingsheng Long; |
| 477 | Secure Outlier-Aware Large Language Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Hence, we propose Secure Outlier-Aware Large Language Model Inference framework (SOAL), which accelerates the RMSNorm operation by nearly 2 $\times$, SiLU by $2\times$, and Softmax by more than 5$\times$. |
Lifan Zhao; Zhixuan Fang; |
| 478 | Partition Generative Modeling: Masked Modeling Without Masks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce “Partition Generative Models” (PGMs), which replace masking with partitioning. |
Justin Deschenaux; Lan Tran; Caglar Gulcehre; |
| 479 | Exploring Synthesizable Chemical Space with Iterative Pathway Refinements Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing solutions for this problem often struggle to effectively navigate exponentially large combinatorial space of synthesizable molecules and suffer from poor coverage. To address this problem, we introduce ReaSyn, an iterative generative pathway refinement framework that obtains synthesizable analogs to input molecules by projecting them onto synthesizable space. |
Seul Lee; Karsten Kreis; Srimukh Prasad Veccham; Meng Liu; Danny Reidenbach; Saee Gopal Paliwal; Weili Nie; Arash Vahdat; |
| 480 | Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes **Agentic Predictor**, a lightweight predictor for efficient agentic workflow evaluation. |
Patara Trirat; Wonyong Jeong; Sung Ju Hwang; |
| 481 | Can Vision–Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce AesEval-Bench, a comprehensive benchmark spanning four dimensions, twelve indicators, and three fully quantifiable tasks: aesthetic judgment, region selection, and precise localization.Moreover, we construct a training dataset to fine-tune VLMs for this domain, leveraging human-guided VLM labeling to produce task labels at scale and indicator-grounded reasoning to tie abstract indicators to concrete design regions.Together, our work establishes the first systematic framework for aesthetic quality assessment in graphic design. |
Ruichuan An; Shizhao Sun; Danqing Huang; Mingxi Cheng; Yan Gao; Ji Li; YU QIAO; Jiang Bian; |
| 482 | MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce *MathNet*, a large-scale, high-quality, multilingual, and multimodal dataset of Olympiad-level problems.We publicly release both the dataset and benchmark at http://mathnet.netlify.app/. |
Shaden Alshammari; Kevin Wen; Abrar Zainal; Mark Hamilton; Navid Safaei; Sultan Albarakati; William T. Freeman; Antonio Torralba; |
| 483 | TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As challenges remain in extracting 3D modal features and bridging the gap between different modalities, we propose TIGaussian, a framework that harnesses 3D Gaussian Splatting (3DGS) characteristics to strengthen cross-modality alignment through multi-branch 3DGS tokenizer and modality-specific 3D feature alignment strategies. |
Jiarun Liu; Qifeng Chen; Yiru Zhao; Minghua Liu; Baorui Ma; Sheng Yang; |
| 484 | Trace Anything: Representing Any Video in 4D Via Trajectory Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In a video, its atomic unit, the pixel, follows a continuous 3D trajectory that unfolds over time, acting as the atomic primitive of dynamics. Recognizing this, we propose to represent any video as a Trajectory Field: a dense mapping that assigns each pixel in each frame to a parametric 3D trajectory. |
Xinhang Liu; Yuxi Xiao; Donny Y. Chen; Jiashi Feng; Yu-Wing Tai; Chi-Keung Tang; Bingyi Kang; |
| 485 | MemGen: Weaving Generative Latent Memory for Self-Evolving Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing paradigms remain constrained: parametric memory forcibly adjusts model parameters, and retrieval-based memory externalizes experience into structured databases, yet neither captures the fluid interweaving of reasoning and memory that underlies human cognition. To address this gap, we propose MemGen, a dynamic generative memory framework that equips agents with a human-esque cognitive faculty. |
Guibin Zhang; Muxin Fu; Shuicheng YAN; |
| 486 | AgenTracer: Who Is Inducing Failure in The LLM Agentic Systems? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current state-of-the-art reasoning LLMs, however, remain strikingly inadequate for this challenge, with accuracy generally below $10\\%$. To address this gap, we propose AgenTracer, the first automated framework for annotating failed multi-agent trajectories via counterfactual replay and programmed fault injection, producing the curated dataset TracerTraj. |
Guibin Zhang; Junhao Wang; Junjie Chen; Wangchunshu Zhou; Kun Wang; Shuicheng YAN; |
| 487 | Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the human strategy of using a finger as a “visual anchor” to ensure accuracy when reading complex charts, we propose a new paradigm named Visual Self-Refine (VSR). |
Jinsong Li; Xiaoyi Dong; Yuhang Zang; Yuhang Cao; Jiaqi Wang; Dahua Lin; |
| 488 | Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion Large Language Models. |
Jinsong Li; Xiaoyi Dong; Yuhang Zang; Yuhang Cao; Jiaqi Wang; Dahua Lin; |
| 489 | Learn The Ropes, Then Trust The Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we target the progressive exploration-exploitation balance under the guidance of the agent’s own experiences without succumbing to either entropy collapsing or runaway divergence. |
Yulei Qin; Xiaoyu Tan; Zhengbao He; Gang Li; Haojia Lin; Zongyi Li; Zihan Xu; Yuchen Shi; Siqi Cai; Renting Rui; Shaofei Cai; Yuzheng Cai; Xuan Zhang; Sheng Ye; Ke Li; Xing Sun; |
| 490 | Synthetic Bootstrapped Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find SBP consistently improves upon a strong repetition baseline and delivers a significant fraction of performance improvement attainable by an oracle upper bound with access to 20x more unique data. |
Zitong Yang; Aonan Zhang; Hong Liu; Tatsunori Hashimoto; Emmanuel Candes; Chong Wang; Ruoming Pang; |
| 491 | InnoGym: Benchmarking The Innovation Potential of AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present \textbf{InnoGym}, the first benchmark and framework designed to systematically evaluate the innovation potential of AI agents. |
Jintian Zhang; Kewei Xu; Jingsheng Zheng; Zhuoyun Yu; Yuqi Zhu; Yujie Luo; Lanning Wei; Shuofei Qiao; Lun Du; Da Zheng; Shumin Deng; Huajun Chen; Ningyu Zhang; |
| 492 | Evaluating Language Models’ Evaluations of Games Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we advocate for a new paradigm that assesses AI systems’ evaluation of games. |
Katherine M. Collins; Cedegao E. Zhang; Graham Todd; Lance Ying; Mauricio Barba da Costa; Ryan Liu; Prafull Sharma; Adrian Weller; Ionatan Kuperwajs; Lionel Wong; Joshua B. Tenenbaum; Thomas L. Griffiths; |
| 493 | DataMIL: Selecting Data for Robot Imitation Learning with Datamodels Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. |
Shivin Dass; Alaa Khaddaj; Logan Engstrom; Aleksander Madry; Andrew Ilyas; Roberto Martín-Martín; |
| 494 | Exploiting Low-Dimensional Manifold of Features for Few-shot Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This insight reveals a key potential issue in downstream multiple instance learning models: linear layers are geometry-agnostic and, as we show empirically, can distort the manifold geometry of the features. To address this, we propose the Manifold Residual (MR) block, a plug-and-play module that is explicitly geometry-aware. |
Conghao Xiong; Zhengrui Guo; Zhe Xu; Yifei Zhang; Raymond Kai-yu Tong; Si Yong Yeo; Hao Chen; Joseph JY Sung; Irwin King; |
| 495 | ProofOptimizer: Training Language Models to Simplify Proofs Without Human Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ProofOptimizer, the first language model trained to simplify Lean proofs without requiring additional human supervision. |
Alex Gu; Bartosz Piotrowski; Fabian Gloeckle; Kaiyu Yang; Aram H. Markosyan; |
| 496 | Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: * Prior research on short lists of bound entities found strong evidence that LMs implement such retrieval via a **positional mechanism**, where *Ann* is retrieved based on its position in context. In this work, we find that this mechanism generalizes poorly to more complex settings; as the number of bound entities in context increases, the positional mechanism becomes noisy and unreliable in middle positions. |
Yoav Gur-Arieh; Mor Geva; Atticus Geiger; |
| 497 | How Text Quality Interventions Reshape Neural Scaling Laws for LLMs: Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present an empirical study of how interventions—deduplication, heuristic filtering, and LLM-guided rewriting—reshape scaling behavior in large language model training. |
Newsha Ardalani; Feiyang Kang; Michael Kuchnik; Mostafa Elhoushi; Shubhabrata Sengupta; Shang-Wen Li; Carole-Jean Wu; |
| 498 | Annotation-Efficient Honesty Alignment Via Confidence Elicitation and Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While effective, the latter demands costly, large-scale labeling. We introduce Elicitation-Then-Calibration (EliCal), a two-stage framework that first elicits internal confidence using inexpensive self-consistency supervision, then calibrates this confidence with a small set of correctness annotations. |
Shiyu Ni; Keping Bi; Jiafeng Guo; Minghao Tang; Jingtong wu; Zengxin Han; Xueqi Cheng; |
| 499 | MicroVerse: A Preliminary Exploration Toward A Micro-World Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce **MicroWorldBench**, a multi-level rubric-based benchmark for microscale simulation tasks. |
Rongsheng Wang; Minghao Wu; Hongru Zhou; Zhihan Yu; Zhenyang Cai; Junying Chen; Benyou Wang; |
| 500 | DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present DiffWind, a physics-informed differentiable framework that unifies wind–object interaction modeling, video-based reconstruction, and forward simulation. |
Yuanhang Lei; Boming Zhao; Zesong Yang; Xingxuan Li; Tao Cheng; Haocheng Peng; Ru Zhang; yang yang; Siyuan Huang; Yujun Shen; Ruizhen Hu; Hujun Bao; Zhaopeng Cui; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~5,300 papers), please visit Paper Digest: ICLR-2026 (Full List).