Paper Digest: ICML 2026 Papers & Highlights

May 5, 2026May 5, 2026 admin

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2026, it is to be held in Seoul. To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights to quickly get the main idea of each paper.

Note: ICML-2026 accepts more than 6,500 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 6,500 ICML-2026 papers in a separate page, which takes quite some time to load.

To search for papers presented at ICML-2026 on a specific topic, please make use of the search by venue (ICML-2026) service. To summarize the latest research published at ICML-2026 on a specific topic, you can utilize the review by venue (ICML-2026) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 25,000 authors (ICML-2026). Additionally, you may want to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.

Since 2018, Paper Digest has built a foundation of data spanning decades of conferences, journals, and research topics. The platform features a daily digest service that sifts through tens of thousands of new papers, clinical trials, news articles, and community posts, filtering the noise to highlight what matters most to specific interests. Beyond daily updates, dozens of built-in research tools streamline the academic workflow, supporting efficient reading and writing, comprehensive literature reviews, and automated research report generation.

Paper Digest Team
New York City, New York, 10017
team@paperdigest.org

TABLE 1: Paper Digest: ICML 2026 Papers & Highlights

	Paper	Author(s)
1	Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Self-Flow: a self-supervised flow matching paradigm that integrates representation learning within the generative framework.	Hila Chefer; Patrick Esser; Dominik Lorenz; Dustin Podell; Vikash Raja; Vinh Tong; Antonio Torralba; Robin Rombach;
2	You Can Learn Tokenization End-to-End with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior work has shown promising results at scale in bringing this compression step inside the LLMs’ architecture with heuristics to draw token boundaries, and also attempts to learn these token boundaries with straight-through estimates, which treat the problem of drawing discrete token boundaries as a continuous one. We show that these token boundaries can instead be learned using score function estimates, which have tighter theoretical guarantees due to directly optimizing the problem of drawing discrete token boundaries to minimize loss.	Sam Dauncey; Roger Wattenhofer;
3	Unified Multimodal Autoregressive Modeling with Shared Context—Visual Tokenizer Is Key to Unification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches typically rely on two disparate visual tokenizers, which splits the representation space and hinder truly unified modeling. We propose UniAR, a unified autoregressive framework where a single discrete visual tokenizer serves as the key bridge between understanding and generation, enabling a shared context in which the model can directly interpret its own generated visual tokens without additional re-encoding.	Wujian Peng; Lingchen Meng; Yuxuan Cai; Xianwei Zhuang; Yuhuan Yang; Rongyao Fang; Chenfei Wu; Junyang Lin; Zuxuan Wu; Shuai Bai;
4	Scaling Agentic Verifier for Competitive Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Execution-based re-ranking offers a promising test-time scaling strategy, yet existing methods are constrained by either difficult test case generation or inefficient random input sampling. To address this limitation, we propose Agentic Verifier, an execution-based agent that actively reasons about program behaviors and searches for highly discriminative test inputs that expose behavioral discrepancies among candidate solutions.	Zeyao Ma; Jing Zhang; Xiaokang Zhang; Jiaxi Yang; Zongmeng Zhang; Jiajun Zhang; Yuheng Jing; Lei Zhang; Hao Zheng; Wenting Zhao; Junyang Lin; Binyuan Hui;
5	Bringing Code ALIVE: Optimizing Interactive Frontend Mini-Games Via Automated Play and Reinforcement Learning at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The core bottleneck is the lack of an evaluation mechanism that balances reliability with scalability, as existing methods either fail to verify dynamic interactivity or incur prohibitive computational costs. To bridge this gap, we introduce ALIVE (Aligning LLMs via Interactive Visual Execution), a high-throughput framework that leverages one-shot planning and DOM-based analysis to automatically evaluate generated games at scale.	Jiajun Zhang; Yuheng Jing; Zeyu Cui; Hao Zheng; Wentao Chen; Kaixin Li; Jiaxi Yang; Tianbao Xie; Zeyao Ma; Tianyi Bai; KaShun SHUM; Lei Zhang; Kai Li; Jian Cheng; Zilei Wang; Qiang Liu; Liang Wang; Junyang Lin; Binyuan Hui;
6	GDPO: Group Reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, recent work has defaulted to apply Group Relative Policy Optimization (GRPO) under multi-reward setting without examining its suitability. In this paper, we demonstrate that directly applying GRPO to normalize distinct rollout reward combinations causes them to collapse into identical advantage values, reducing the resolution of the training signal and resulting in suboptimal convergence and, in some cases, early training failure.	Shih-Yang Liu; Xin Dong; Ximing Lu; Shizhe Diao; Peter Belcak; Mingjie Liu; Min-Hung Chen; Hongxu Yin; Yu-Chiang Wang; Kwang-Ting Cheng; Yejin Choi; Jan Kautz; Pavlo Molchanov;
7	APE-Bench: Evaluating Automated Proof Engineering for Formal Math Libraries Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a complete infrastructure comprising APE-Bench, which automatically extracts proof engineering tasks from real library commit histories, and APE-Harness, a unified execution framework based on task contract abstraction.	Huajian Xin; Zheng Yuan; Jacques Fleuriot; Wenda Li;
8	Towards Unified Multimodal Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore the design space of Unified Multimodal Pretraining through a controlled, from-scratch study.	Shengbang Tong; David Fan; John Nguyen; Ellis Brown; Gaoyue Zhou; Shengyi Qian; Boyang Zheng; Théophane Vallaeys; Rob Fergus; Naila Murray; Marjan Ghazvininejad; Mike Lewis; Jakob Verbeek; Nicolas Ballas; Amir Bar; Michael Rabbat; Yann LeCun; Luke Zettlemoyer; Saining Xie; Koustuv Sinha;
9	Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reveal that Large Language Models (LLMs) possess intrinsic behavioral plasticity—akin to chameleons adapting their coloration to environmental cues—that can be exposed through token-conditional generation and stabilized via reinforcement learning.	Liyuan Mao; Le Yu; Jing Zhou; Chujie Zheng; Bowen Yu; Chang Gao; Shixuan Liu; An Yang; Weinan Zhang; Junyang Lin;
10	PlotCraft: Pushing The Limits of LLMs for Complex and Interactive Data Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their ability to create complex visualizations for scaled and structured data remains largely unevaluated and underdeveloped. To address this gap, we introduce PlotCraft, a new benchmark featuring 1k challenging visualization tasks that cover a wide range of topics, such as finance, scientific research, and sociology.	Jiajun Zhang; Jianke Zhang; Zeyu Cui; Jiaxi Yang; Lei Zhang; Zilei Wang; Qiang Liu; Liang Wang; Binyuan Hui; Junyang Lin;
11	Towards Execution-Grounded Automated AI Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We analyze two methods to learn from the execution feedback: evolutionary search and reinforcement learning.	Chenglei Si; Zitong Yang; Yejin Choi; Emmanuel J Candes; Diyi Yang; Tatsunori Hashimoto;
12	Weight-sparse Transformers Have Interpretable Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We train models to have more understandable circuits by constraining most of their weights to be zeros, so that each neuron only has a few connections.	Leo Gao; Achyuta Rajaram; Jacob Coxon; Soham Govande; Bowen Baker; Daniel Mossing;
13	Principled Zero-shot Ranking Agents with Tournament Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a tournament graph* framework that provides a principled foundation for $k$-wise reranking.*	Sheshansh Agrawal; Thien Nguyen; Douwe Kiela;
14	Any-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, recent studies have successfully applied discrete diffusion models to natural language processing, revealing their considerable potential as a promising new approach in this domain. Drawing inspiration from these pioneering researches, we introduce Any-Diffusion, the first any-to-any multimodal language model built purely on mask-based discrete diffusion models, which unifies understanding and generation across text, speech, and images.	lijiang Li; zuwei long; Yunhang Shen; Heting Gao; Haoyu Cao; Xing Sun; Caifeng Shan; Ran He; Chaoyou Fu;
15	Position: Preregister Experiments with AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We systematically catalog the researcher degrees of freedom that experiments with AI agents introduce—model selection, prompt wording, settings, and outcome-contingent redesign, for example—and show how the low cost of iteration and lack of reporting norms make these choices both easy to exploit and difficult to detect. We propose a preregistration template tailored to experiments with AI agents and call on conferences, journals, and funding agencies to make preregistration standard practice for this emerging research paradigm.	Michelle Vaccaro;
16	Joint-Embedding Predictive Learning of Latent Market States in U.S. Equities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate whether Joint-Embedding Predictive Architectures (JEPA) can learn useful representations of U.S. equity markets.	Simon Mahns; Randall Balestriero; Mahmoud Assran;
17	Adversarial Flow Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present adversarial flow models, a class of generative models that belongs to both adversarial and flow families of models.	Shanchuan Lin; Ceyuan Yang; Zhijie Lin; Hao Chen; Haoqi Fan;
18	Simultaneous Speech-to-Speech Translation Without Aligned Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We instead propose Hibiki-Zero, a model for simultaneous speech translation trained without word-level alignments between source and target speech.	Tom Labiausse; Romain Fabre; Yannick Estève; Alexandre Défossez; Neil Zeghidour;
19	Retrieval-Aware Distillation for Transformer-SSM Hybrids Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This gap has been linked to a small set of attention heads, called Gather-and-Aggregate (G&A), which SSMs struggle to implement and are believed to drive the disparity. Leveraging this insight, we propose retrieval-aware distillation, a strategy that converts a pretrained Transformer into a hybrid student by preserving only these retrieval-critical components.	Aviv Bick; Eric Xing; Albert Gu;
20	The Generalization Spectrum: A Chromatographic Approach to Evaluating Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such per-sample generalization—akin to learning by analogy in human cognition—captures how far the knowledge extracted from one example can transfer, yet remains invisible to standard benchmarks. We introduce the Generalization Spectrum, an evaluation framework designed to expose this hidden dimension.	Jinghan Zhang; Zerui Cheng; Shiqi Chen; Ge Zhang; Wenhao Huang; Jiashuo Liu; Junxian He; Tianle Cai;
21	$\tau^2$-Bench: Evaluating Conversational Agents in A Dual-Control Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This differs from real-world scenarios like technical support, where users need to actively participate in modifying the state of the (shared) world. In order to address this gap, we introduce $\tau^2$-bench, with four key contributions: 1. A novel Telecom dual-control domain modeled as a Dec-POMDP, where both agent and user make use of tools to act in a shared, dynamic environment that tests both agent coordination and communication, 2.	Victor Barres; Honghua Dong; Soham Ray; Xujie Si; Karthik Narasimhan;
22	Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our key insight is that interpolating generation orderings between autoregression and fully-random decoding, rather than committing to a fixed block length, offers a better interpolation between diffusion and AR.	Marianne Arriola; Volodymyr Kuleshov;
23	VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this paper is to improve the performance and reliability of vision-language-action (VLA) models through iterative online interaction.	Yanjiang Guo; Tony Lee; Lucy Xiaoyang Shi; Jianyu Chen; Percy Liang; Chelsea Finn;
24	Don’t Drop Dropout: Optimizing Layer Sparsity for Efficient LLM Training and Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we show that layer dropout should* be used in state-of-the-art LLM training, establishing best practices and scaling analysis for both training and post-training benefits.*	Mostafa Elhoushi; Nolan Dey; Alexander Pretko; Bin Zhang; Gavia Gray; Gurpreet Gosal; Abdulrahman Mahmoud; Shane Bergsma; Joel Hestness;
25	DnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce dnaHNet, a state-of-the-art tokenizer-free autoregressive model that segments and models genomic sequences end to end.	Arnav Shah; Junzhe Li; Parsa Idehpour; Adibvafa Fallahpour; Brandon Wang; Sukjun Hwang; BO WANG; Patrick Hsu; Hani Goodarzi; Albert Gu;
26	Maximum Likelihood Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Maximum Likelihood Reinforcement Learning (MaxRL), a compute-indexed family of sampling-based objectives derived from a pass@k expansion of the likelihood, which interpolates between standard RL and exact maximum likelihood as compute increases.	Fahim Tajwar; Guanning Zeng; Yueer Zhou; Yuda Song; Daman Arora; Yiding Jiang; Jeff Schneider; Russ Salakhutdinov; Haiwen Feng; Andrea Zanette;
27	Symmetries in Language Statistics Shape The Geometry of Model Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that the statistics of language exhibit a translation symmetry—e.g,.	Dhruva Karkada; Daniel Korchinski; Andres Nava; Matthieu Wyart; Yasaman Bahri;
28	IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study the compute-optimal allocation of sampling compute for on-policy RL methods in LLMs, framing scaling as a compute-constrained optimization over three resources: parallel rollouts per problem, number of problems per batch, and number of update steps.	Zhoujun Cheng; Yutao Xie; Yuxiao Qu; Amrith Setlur; Shibo Hao; Varad Pimpalkhute; Tongtong Liang; Feng Yao; Zhengzhong Liu; Eric Xing; Virginia Smith; Russ Salakhutdinov; Zhiting Hu; Taylor W. Killian; Aviral Kumar;
29	Position: Interpretability Can Be Actionable Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our goal is not to downplay exploratory research, but to establish actionability as a core objective of interpretability research.	Hadas Orgad; Fazl Barez; Tal Haklay; Isabelle Lee; Marius Mosbach; Anja Reusch; Naomi Saphra; Byron Wallace; Sarah Wiegreffe; Eric Wong; Ian Tenney; Mor Geva;
30	How2Everything: Mining The Web for How-to Procedures to Evaluate and Improve LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, measuring and improving procedural validity at scale on real-world tasks remains challenging and understudied. To address this, we introduce How2Everything, a scalable framework to evaluate and improve goal-conditioned procedure generation.	Yapei Chang; Kyle Lo; Mohit Iyyer; Luca Soldaini;
31	TQL: Scaling Q-Functions with Transformers By Preventing Attention Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we ask what prevents transformers from scaling effectively for value functions?	Perry Dong; Kuo-Han Hung; Alexander Swerdlow; Dorsa Sadigh; Chelsea Finn;
32	One-step Latent-free Image Generation with Pixel Mean Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent advances have made encouraging progress on each aspect individually, paving the way toward one-step diffusion/flow without latents. In this work, we take a further step towards this goal and propose pixel MeanFlow (pMF).	Yiyang Lu; Susie Lu; Qiao Sun; Hanhong Zhao; Zhicheng Jiang; Xianbang Wang; Tianhong Li; Zhengyang Geng; Kaiming He;
33	Position: Assistive AI Requires Personalized Specialists, Not Generalists Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We outline research directions for building specialists that learn from organic observational data, avoid self-reinforcing errors, and improve safely over long horizons.	Homanga Bharadhwaj;
34	Reinforcement Learning with Evolving Rubrics for Deep Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Deep research agents perform multi-step research to produce long-form, well-attributed answers.	Rulin Shao; Akari Asai; Shannon Shen; Hamish Ivison; Varsha Kishore; Jingming Zhuo; Xinran Zhao; Molly Park; Samuel Finlayson; David Sontag; Tyler Murray; Sewon Min; Pradeep Dasigi; Luca Soldaini; Faeze Brahman; Scott Yih; Sherry Tongshuang Wu; Luke Zettlemoyer; Yoon Kim; Hannaneh Hajishirzi; Pang Wei Koh;
35	Monitoring Monitorability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose three evaluation archetypes (intervention, process, and outcome-property), a new monitorability metric, and a broad evaluation suite.	Melody Guan; Miles Wang; Micah Carroll; Zehao Dou; Annie Wei; Marcus Williams; Benjamin Arnav; Joost Huizinga; Ian Kivlichan; Amelia Glaese; Jakub Pachocki; Bowen Baker;
36	Reasoning Cache: Learning to Extrapolate to Long Lengths Via Short-Length RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Standard on-policy RL operates on fixed problem distributions and training budgets, giving rise to a distribution shift between train and test that limits the resulting model’s extrapolation capabilities. To address this, we introduce RC, an iterative decoding algorithm replacing standard autoregressive decoding that enables models to extrapolate to lengths an order of magnitude longer than those seen during training.	Ian Wu; Yuxiao Qu; Amrith Setlur; Aviral Kumar;
37	RADIO1D: Elastic Representations for Condensed Vision Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Notably, models trained with image-text alignment (such as SigLIP2) develop a small number of specialized tokens that effectively summarize global image content. Building on this, we introduce RADIO1D, which compresses images into a compact, variable-length 1D token sequence using multi-teacher knowledge distillation and an autoencoder design.	Greg Heinrich; Mike Ranzinger; Collin McCarthy; Natan Bagrov; Eugene Khvedchenya; Bryan Catanzaro; Jan Kautz; Andrew Tao; Pavlo Molchanov;
38	What Does Flow-Matching Bring to TD-Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that their success is not explained by distributional RL: explicitly modeling return distributions often degrades performance. Instead, we argue that flow-matching Q-functions are effective because they couple a learned velocity field with an integration procedure that is used both during training and to read out Q-values at inference time.	Bhavya Agrawalla; Michal Nauman; Aviral Kumar;
39	Unpaired Visual Editing with Self-Consistent Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a general framework for unpaired training of flow matching editing models.	Yoad Tewel; Yuval Atzmon; Gal Chechik; Lior Wolf;
40	End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We design an end-to-end training pipeline that jointly optimizes reconstruction and generation, enabling direct supervision from generation results to the tokenizer.	Wenda Chu; Bingliang Zhang; Jiaqi Han; Yizhuo Li; Linjie Yang; Yisong Yue; Qiushan Guo;
41	Long-Context Modeling with Dynamic Hierarchical Sparse Attention for Memory-Constrained LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Dynamic Hierarchical Sparse Attention (DHSA), a data-driven framework that predicts attention sparsity online while keeping the LLM backbone frozen.	Siheng Xiong; Joe Zou; Faramarz Fekri; Yae Jee Cho;
42	Flex-Forcing: Towards A Unified Autoregressive and Bidirectional Video Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Flex-Forcing, a unified training and inference framework that enables a video diffusion model to seamlessly operate under both bidirectional and autoregressive generation regimes.	Xinyin Ma; Julius Berner; Chao Liu; Arash Vahdat; Weili Nie; Xinchao Wang;
43	Why Are Linear RNNs More Parallelizable? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs—but not traditional, nonlinear RNNs—as easy to parallelize in practice as transformers. We answer this question by providing a tight connection between types of RNNs and standard complexity classes.	William Merrill; Hongjian Jiang; Yanhong Li; Anthony Lin; Ashish Sabharwal;
44	Any3D-VLA: Enhancing VLA Robustness Via Diverse Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenges of (1) scarce 3D data and (2) the domain gap induced by cross-environment differences and depth-scale biases, we propose Any3D-VLA.	Xianzhe Fan; Shengliang Deng; Xiaoyang Wu; Yuxiang Lu; Zhuoling Li; Mi Yan; Yujia Zhang; Zhizheng Zhang; He Wang; Hengshuang Zhao;
45	Spurious Rewards: Rethinking Training Signals in RLVR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We show that reinforcement learning with verifiable rewards (RLVR) can elicit strong mathematical reasoning in certain language models even with spurious rewards that have little, no, or outright negative correlation with the correct answer.	Rulin Shao; Stella Li; Rui Xin; Scott Geng; Yiping Wang; Sewoong Oh; Simon Du; Nathan Lambert; Sewon Min; Ranjay Krishna; Yulia Tsvetkov; Hannaneh Hajishirzi; Pang Wei Koh; Luke Zettlemoyer;
46	On The Relationship Between Activation Outliers and Feature Death in Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that dimension-level activation outliers (dimensions where mean magnitude is large relative to per-token variation) shift pre-activations at initialization, making feature fate depend on weight-outlier alignment rather than input content.	Elana Simon; Etowah Adams; James Zou;
47	On The Generalization Gap in Self-Evolving Language Model Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A central open question, however, is not whether self-evolution can help, but: \textit{how far is it from oracle-supervised training under minimal assumptions?} To address this question, we present a controlled empirical analysis of LLM self-evolution under a strict formulation: self-evolution is allowed access only to (i) an unlabeled prompt set and (ii) a base language model, with all supervision signals generated from this model.	Zhenting Qi; Susanna Maria Baby; Stefanie Baby; Kan Yuan; Da-Cheng Juan; Tu Vu; Andrew Tomkins; Cyrus Rashtchian;
48	$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce $\tau$-voice, a benchmark for evaluating voice agents on grounded tasks with real-world complexity: agents must navigate complex multi-turn conversations, adhere to domain policies, and interact with the environment.	Soham Ray; Keshav Dhandhania; Victor Barres; Karthik Narasimhan;
49	DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We focus on long-horizon, bimanual tasks with articulated objects, which are challenging due to large action space, spatiotemporal discontinuities, and the embodiment gap between human and robot hands. We propose DexMachina, a novel curriculum-based algorithm: the key idea is to use virtual object controllers with decaying strength: an object is first driven automatically towards its target states, such that the policy can gradually learn to take over under motion and contact guidance.	Zhao Mandi; Yifan Hou; Dieter Fox; Yashraj Narang; Ajay Mandlekar; shuran song;
50	D2: Improved Techniques for Training Reasoning Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we introduce d2, a reasoning framework tailored for masked DLMs.	Guanghan Wang; Gilad Turok; Yair Schiff; Marianne Arriola; Volodymyr Kuleshov;
51	Learning to Discover at Test Time Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions.	Mert Yuksekgonul; Daniel Koceja; Xinhao Li; Federico Bianchi; Jed McCaleb; Xiaolong Wang; Jan Kautz; Yejin Choi; James Zou; Carlos Guestrin; Yu Sun;
52	ToolOrchestra: Elevating Intelligence Via Efficient Model and Tool Orchestration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ToolOrchestra, a method for training small orchestrators that coordinate the use of intelligent tools.	Hongjin SU; Shizhe Diao; Ximing Lu; Mingjie Liu; Jiacheng Xu; Xin Dong; Yonggan Fu; Peter Belcak; Hanrong Ye; Hongxu Yin; Yi Dong; Evelina Bakhturina; Tao Yu; Yejin Choi; Jan Kautz; Pavlo Molchanov;
53	RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Reinforcement Learning (RL) with Adaptive Verifiable Environments (RLVE), an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs).	Zhiyuan Zeng; Hamish Ivison; Yiping Wang; Lifan Yuan; Stella Li; Zhuorui Ye; Siting Li; Jacqueline He; Runlong Zhou; Tong Chen; Chenyang Zhao; Yulia Tsvetkov; Simon Du; Natasha Jaques; Hao Peng; Pang Wei Koh; Hannaneh Hajishirzi;
54	Extracting Alignment Data in Open Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that it is possible to extract significant amounts of alignment training data from a post-trained model — useful to steer the model to improve certain capabilities such as long-context reasoning, safety, instruction following, and maths.	Federico Barbero; Xiangming Gu; Christopher A. Choquette Choo; Chawin Sitawarin; Matthew Jagielski; Itay Yona; Petar Veličković; Ilia Shumailov; Jamie Hayes;
55	Retaining By Doing: The Role of On-Policy Data in Mitigating Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities — a phenomenon classically known as catastrophic forgetting. In this paper, toward identifying guidelines for mitigating this phenomenon, we systematically compare the forgetting patterns of two widely adopted post-training methods: supervised fine-tuning (SFT) and reinforcement learning (RL).	Howard Chen; Noam Razin; Karthik Narasimhan; Danqi Chen;
56	CyberCycle: Scalable Real-World Benchmark for AI Agents’ End-to-End Cybersecurity Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing cybersecurity evaluations of AI systems are limited in scale or scope, and fail to capture the end-to-end lifecycle of real-world software vulnerability discovery and remediation. To address this gap, we propose CyberCycle, a large-scale and realistic end-to-end cybersecurity benchmark that comprehensively evaluates AI agents’ abilities across the full lifecycle of vulnerability discovery, PoC generation, and patch generation.	Tianneng Shi; Robin Rheem; Dongwei Jiang; Francisco De La Riega; Mona Wang; Zhun Wang; Jingzhi Jiang; Alexander Cheung; Sean Tai; Jonah Cha; Jianhong Tu; Gabriel Han; Chenguang Wang; Wenbo Guo; Jingxuan He; Dawn Song;
57	Position: The Age of AI Agents Demands A New Scientific Paradigm To Sustain Trustworthy Science Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose criteria for an adapted verification infrastructure that emphasizes observable-by-default workflows, scalable verification, and clear attribution.	Belinda Mo;
58	TabICooL: A Better, Faster, Scalable, and Open Tabular Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce TabICooL, a new state-of-the-art foundation model for regression and classification built on three pillars: (1) a novel synthetic data generation engine designed for high pretraining diversity; (2) various architectural innovations, including a new scalable softmax in attention improving generalization to larger datasets without prohibitive long-sequence pretraining; and (3) optimized pretraining protocols, notably replacing AdamW with the Muon optimizer.	Jingang QU; David Holzmüller; Gael Varoquaux; Marine Le Morvan;
59	CodeClash: Benchmarking Goal-Oriented Software Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CodeClash, a benchmark where LMs compete in multi-round tournaments to build the best codebase for achieving a competitive objective.	John Yang; Kilian Lieret; Joyce Yang; Carlos Jimenez; Muhtasham Oblokulov; Aryan Siddiqui; Ofir Press; Ludwig Schmidt; Diyi Yang;
60	Position: To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why existing defenses cannot stop adaptive adversaries and demonstrate that defenders must develop offensive security intelligence.	Terry Yue Zhuo; Yangruibo Ding; Wenbo Guo; Ruijie Meng;
61	Quant VideoGen: Auto-Regressive Long Video Generation Via 2-Bit KV-Cache Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: More critically, memory-bounded KV budgets constrain the effective working memory, directly degrading long-horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training-free KV-cache quantization framework for auto-regressive video diffusion models.	Haocheng Xi; Shuo Yang; Yilong Zhao; Muyang Li; Han Cai; Xingyang Li; Yujun Lin; Zhuoyang Zhang; Jintao Zhang; Xiuyu Li; Zhiying Xu; Jun Wu; Chenfeng Xu; Ion Stoica; Song Han; Kurt Keutzer;
62	Model-Preserving Adaptive Rounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce Yet Another Quantization Algorithm (YAQA), a new adaptive rounding algorithm that directly considers the error at the network’s output.	Albert Tseng; Zhaofeng Sun; Chris De Sa;
63	$L^3$: Large Lookup Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the Large Lookup Layer (L3), which unlocks a new axis of sparsity by generalizing embedding tables to model decoder layers.	Albert Tseng; Chris De Sa;
64	Activation Oracles: Training and Evaluating LLMs As General-Purpose Activation Explainers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we instead take a generalist perspective.	Adam Karvonen; James Chua; Clément Dumas; Kit Fraser-Taliente; Subhash Kantamneni; Julian Minder; Euan Ong; Arnab Sen Sharma; Daniel Wen; Owain Evans; Samuel Marks;
65	PostTrainBench: Can LLM Agents Automate LLM Post-Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study post-training, which is the critical step that turns base LLMs into useful assistants.	Ben Rank; Hardik Bhatnagar; Ameya Pandurang Prabhu; Shira Eisenberg; Karina Nguyen; Matthias Bethge; Maksym Andriushchenko;
66	Utonia: Toward One Encoder for All Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, a first step toward training a single self-supervised point transformer encoder across heterogeneous domains, spanning remote sensing, outdoor LiDAR, indoor RGB-D sequences, object-centric CAD models, and point clouds lifted from RGB-only videos.	Yujia Zhang; Xiaoyang Wu; Yunhan Yang; Xianzhe Fan; Han Li; Yuechen Zhang; Zehao Huang; Naiyan Wang; Hengshuang Zhao;
67	On The Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A central challenge is the lack of control in modern training pipelines: large-scale pre-training corpora are opaque, mid-training is often underexamined, and RL objectives interact with unknown prior knowledge in complex ways. To resolve this ambiguity, we develop a fully controlled experimental framework that isolates the causal contributions of pre-training, mid-training, and RL-based post-training.	Charlie Zhang; Graham Neubig; Xiang Yue;
68	PointDiT: Pixel-Space Diffusion for Monocular Geometry Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: State-of-the-art single-image 3D reconstruction methods often rely on complex hybrid architectures or necessitate compressing geometry into latent spaces to leverage pre-trained latent diffusion models. In this work, we demonstrate that such architectural overhead is unnecessary.	Haofei Xu; Rundi Wu; Philipp Henzler; Nikolai Kalischek; Michael Oechsle; Fabian Manhardt; Marc Pollefeys; Andreas Geiger; Federico Tombari; Michael Niemeyer;
69	VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current multimodal benchmarks often conflate reasoning with domain-specific knowledge, making it difficult to isolate and evaluate general reasoning abilities in non-expert settings. To address this, we introduce VisualPuzzles, a benchmark that targets visual reasoning while deliberately minimizing reliance on specialized knowledge.	Yueqi Song; Tianyue Ou; Yibo Kong; Zecheng Li; Graham Neubig; Xiang Yue;
70	SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SWE-Bench Pro, a comprehensive benchmark designed to evaluate software engineering capabilities through complex, realistic programming challenges.	Xiang Deng; Jeff Da; Edwin Pan; Yannis He; Charles Ide; Kanak Garg; Niklas Lauffer; Andrew Park; Chetan Rane; Karmini Sampath; Maya Krishnan; Srivatsa Kundurthy; Sean Hendryx; Zifan Wang; Chen Bo Calvin Zhang; Noah Jacobson; Bing Liu; Brad Kenstler;
71	Diffusion Language Model Parallel Decoding Via Product-of-Experts Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce PoE-Bridge, a novel decoding framework that drastically improves generation speed and accuracy by introducing an intermediate distribution to bridge the gap.	Juntong Shi; Brian Trippe; Jure Leskovec; Stefano Ermon; Minkai Xu;
72	SpaCeFormer: Space-Curve Transformer for Open-Vocabulary 3D Instance Segmentation Without Proposals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SpaCeFormer-3M, the largest open-vocabulary 3D instance segmentation dataset with 846K instances from 15K scenes, and SpaCeFormer (Space-Curve Transformer), a proposal-free segmentation architecture.	Christopher Choy; Junha Lee; Chunghyun Park; Minsu Cho; Jan Kautz;
73	AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in Unified Multimodal Models Via Decompositional Verifiable Reward Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without relying on external knowledge injection.	Runhui Huang; Jie Wu; Rui Yang; Zhe Liu; Hengshuang Zhao;
74	FPTQuant: Function-Preserving Transforms for LLM Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes FPTQuant, which introduces three novel, lightweight, and expressive function-preserving transforms (FPTs) to facilitate quantization of transformers: (1) a mergeable pre-RoPE transform for queries and keys, (2) a mergeable transform for values, (3) a cheap, dynamic scaling transform.	Boris van Breugel; Yelysei Bondarenko; Paul Whatmough; Markus Nagel;
75	Base Models Know How to Reason, Thinking Models Learn When Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Why do thinking language models outperform their base counterparts, and what exactly do they learn during training? We introduce constructive model diffing, a framework for understanding fine-tuned models by explicitly constructing the base-to-fine-tuned difference from interpretable components to produce hybrid models, and measuring how well they recover the fine-tuned model’s performance.	Constantin Venhoff; Iván Arcuschin; Phil Torr; Arthur Conmy; Neel Nanda;
76	LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In realistic scenarios, however, LLMs often need to act as agents that explore environments, follow instructions and plans, extract useful information, and predict correct actions under a dynamically growing context. To assess language agents in such settings, we introduce LOCA-bench (a benchmark for LOng-Context Agents).	Weihao Zeng; Yuzhen Huang; Junxian He;
77	Context Forcing: Consistent Autoregressive Video Generation with Long Context Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This structural discrepancy creates a critical student-teacher mismatch: the teacher’s inability to access long-term history prevents it from guiding the student on global temporal dependencies, effectively capping the student’s context length. To resolve this, we propose Context Forcing, a novel framework that trains a long-context student via a long-context teacher.	Shuo Chen; Cong Wei; Sun Sun; Tiancheng SHEN; Ping Nie; Kai Zou; Ge Zhang; Ming-Hsuan Yang; Wenhu Chen;
78	G$^2$TAM: Geometry Grounded Track Anything Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Leveraging the spatial consistency afforded by modern feed-forward 3D reconstruction models, we propose the Geometry Grounded Tracking Anything Model (G$^2$TAM), a unified framework for promptable instance tracking in 3D using only unordered RGB images or videos.	Chenming Zhu; Peizhou Cao; Jingli Lin; Wenbo Hu; Yunlong Ran; Tai Wang; Jiangmiao Pang; Xihui Liu;
79	Solving Physics Olympiad Via Reinforcement Learning on Physics Simulators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning.	Mihir Prabhudesai; Aryan Satpathy; Yangmin Li; Zheyang Qin; Nikash Bhardwaj; Amir Zadeh; Chuan Li; Katerina Fragkiadaki; Deepak Pathak;
80	GeoPT: Scaling Physics Simulation Via Lifted Geometric Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present GeoPT, a unified pre-trained model for general physics simulation based on lifted geometric pre-training.	Haixu Wu; Minghao Guo; Zongyi Li; Zhiyang Dou; Mingsheng Long; Kaiming He; Wojciech Matusik;
81	Are VLMs Seeing or Just Saying? Uncovering The Illusion of Visual Re-examination Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce VS-Bench, a benchmark of $800$ image pairs curated from MathVista, MathVerse, MathVision, and MMMU-Pro.	Chufan Shi; Cheng Yang; Yaokang Wu; Linghao Jin; Bo Shui; Taylor Berg-Kirkpatrick; Xuezhe Ma;
82	Self-Distillation Enables Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations.	Idan Shenfeld; Mehul Damani; Jonas Hübotter; Pulkit Agrawal;
83	Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This finetuning step has proved critical in achieving human or super-human performance, yet while much attention has been given to developing more effective finetuning algorithms, little attention has been given to ensuring the pretrained policy is an effective initialization for RL finetuning. In this work we seek to understand how the pretrained policy affects finetuning performance, and how to pretrain policies in order to ensure they are effective initializations for finetuning.	Andrew Wagenmaker; Perry Dong; Raymond Tsao; Chelsea Finn; Sergey Levine;
84	ModernVBERT: Towards Smaller Visual Document Retrievers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Increasingly, Visual Document Retrieval (VDR) models, which directly embed images of document pages, are used as an alternative to text-only retrievers.	Paul Teiletche; Quentin Macé; Max Conti; António Loison; Gautier Viaud; Pierre Colombo; Manuel Faysse;
85	Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Concept Ablation Fine-Tuning (CAFT), a technique that leverages interpretability tools to control how LLMs generalize from fine-tuning, without needing to modify the training data or otherwise use data from the target distribution.	Helena Casademunt; Caden Juang; Adam Karvonen; Samuel Marks; Senthooran Rajamanoharan; Neel Nanda;
86	Timestep Rescheduling in Diffusion Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reveal that the deviation scale in diffusion inversion is strongly dependent on the timestep size, and exhibits a parabolic trend, with larger errors concentrated at both small and large timesteps.	Shangquan Sun; Ting Gong; Liu; Jiamin Wu; Runkai Zhao; Mianxin Liu; Wenqi Ren; Xiaochun Cao;
87	Beyond Scalar Rewards: Learning from Text Feedback in LLM Post-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Therefore, models must learn to internalize the feedback in order to improve their test-time single-turn performance. To do this, we propose two methods: Self Distillation, which trains the single-turn policy to match its own feedback-conditioned second-turn generations; and Feedback Modeling, which predicts the feedback as an auxiliary objective.	Yuda Song; Lili Chen; Fahim Tajwar; REMI MUNOS; Deepak Pathak; J. Bagnell; Aarti Singh; Andrea Zanette;
88	CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present CaPX, an open-access framework for systematically studying Code-as-Policy agents in robot manipulation.	Letian Fu; Justin Yu; Karim El-Refai; Ethan Kou; Haoru Xue; Huang Huang; Wenli Xiao; Li Fei-Fei; Guanya Shi; Jiajun Wu; S. Sastry; Yuke Zhu; Ken Goldberg; Jim Fan;
89	Mode Seeking Meets Mean Seeking for Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While multi-resolution image training works because higher resolution is largely an interpolation of the same underlying patch distribution, training across video lengths is fundamentally different: a longer video is an extrapolation that must invent new events and causal structure beyond the short-clip horizon. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local fidelity from long-term coherence from a unified representation via a Decoupled Diffusion Transformer.	Shengqu Cai; Weili Nie; Chao Liu; Julius Berner; Lvmin Zhang; Nanye Ma; Hansheng Chen; Maneesh Agrawala; Leonidas Guibas; Gordon Wetzstein; Arash Vahdat;
90	Anchoring Self-Play for Code Repair Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We aim to scale supervision for code repair by having an LM generate bug–fix tasks with unconstrained edits, using unit tests as the only verifier.	Caroline Choi; Zeyneb Kaya; Shirley Wu; Tengyu Ma; Tatsunori Hashimoto; Ludwig Schmidt;
91	Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models have achieved remarkable success on final-answer mathematical problems, largely due to the ease of applying reinforcement learning with verifiable rewards.	Sadegh Mahdavi; Branislav Kisacanin; Shubham Toshniwal; Wei Du; Ivan Moshkov; George Armstrong; Renjie Liao; Christos Thrampoulidis; Igor Gitman;
92	How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent attempts leverage model-internal signals to assign finer-grained credit, but these are often point-wise heuristics that ignore the global structure of information propagation. We propose FlowTracer, an RL framework that traces \emph{answer-targeted reasoning flow} on an attention-induced directed acyclic graph in which nodes correspond to tokens and edge capacities come from aggregated attention weights and derives token credit from this global structure.	Zhichen Dong; Yang Li; Yuhan Sun; Weixun Wang; Yijia Luo; Zinian Peng; Wenbo Su; YuCheng; Bo Zheng; Junchi Yan;
93	Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting? To answer this question, we evaluate over 20 open-weight reasoning-tuned models across a broad suite of tasks, including math, scientific QA, agent planning, coding, and standard instruction-following.	Maggie Huan; Yuetai Li; Tuney Zheng; Xiaoyu Xu; Seungone Kim; Minxin Du; Radha Poovendran; Graham Neubig; Xiang Yue;
94	Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, existing verifiers usually underperform owing to a lack of domain knowledge and limited calibration. To address this, we establish GLEAN, an agent verification framework with GuideLine-grounded Evidence AccumulatioN that compiles expert-curated protocols into trajectory-informed, well-calibrated correctness signals.	Yichi Zhang; Nabeel Seedat; Yinpeng Dong; Peng Cui; Jun Zhu; Mihaela van der Schaar;
95	Does Reinforcement Fine-Tuning Improve Generalization of LLM Agents? An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In real-worlddeployment, agents may operate in unseen environments with different background knowledge, observation spaces, and action interfaces. To characterize the generalization profile of RFT under such shifts, we conduct a systematic study along three axes: (1) within-environment generalization across task difficulty, (2) cross-environment transfer to unseen environments, and (3) sequential multi-environment training to quantify transfer and forgetting.	Zhiheng Xi; Xin Guo; Jiaqi Liu; Jiazheng Zhang; Yutao Fan; Zhihao Zhang; Shichun Liu; Mingxu Chai; Xiaowei Shi; Yitao Zhai; Xunliang Cai; Tao Gui; Qi Zhang; Xuanjing Huang;
96	When AI Agents Compete for Jobs: Strategic Capabilities and Economic Dynamics of AI Labour Markets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet we lack frameworks to understand how such markets behave in light of economic forces that shape labor markets, such as adverse selection and reputation dynamics. To explore this, we introduce AI-Work, a tractable, simulated gig economy where Large Language Model (LLM) agents compete for jobs, develop skills, and adapt their strategies under uncertainty and competitive pressure.	Christopher Chiu; Simpson Zhang; Mihaela van der Schaar;
97	ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Single-cell RNA-seq profiles are high-dimensional, sparse, and unordered, causing autoregressive generation to impose an artificial ordering bias and suffer from error accumulation. To address this, we propose scDiVa, a masked discrete diffusion foundation model that aligns generation with the dropout-like corruption process by defining a continuous-time forward masking mechanism in token space.	Mingxuan Wang; Gaoyang Jiang; ZiJia Ren; Lu Shi; Cheng Chen; Chuangxin Zhao; Yanbiao Ma;
98	Bridging Time and Frequency: A Joint Modeling Framework for Irregular Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These irregularities violate the equidistant assumptions of standard models, hindering local temporal modeling and rendering classical frequency-domain methods ineffective for capturing global periodic structures. To address this challenge, we propose TFMixer, a joint time–frequency modeling framework for IMTS forecasting.	Xiangfei Qiu; Kangjia Yan; Xvyuan Liu; Xingjian Wu; Jilin Hu;
99	DAG: A Dual Correlation Network for Time Series Forecasting with Exogenous Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, to better leverage exogenous variables, especially future exogenous variables, we propose $\textbf{DAG}$, which $\textit{utilizes $\underline{D}$ual correl$\underline{A}$tion network along both the temporal and channel dimensions for time series forecasting with exo$\underline{G}$enous}$ variables.	Xiangfei Qiu; Yuhan Zhu; Zhengyu Li; Xingjian Wu; Bin Yang; Jilin Hu;
100	SEER: Transformer-based Robust Time Series Forecasting Via Automated Patch Enhancement and Replacement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In real-world time series, there are often low-quality issues during data collection, such as missing values, distribution shifts, anomalies and white noise, which may cause some patches to contain low-quality information, negatively impacting the prediction results. To address this issue, this study proposes a robust time series forecasting framework called $\textbf{SEER}$.	Xiangfei Qiu; Xvyuan Liu; Tianen Shen; Xingjian Wu; Hanyin Cheng; Bin Yang; Jilin Hu;
101	The Geometry of Reasoning: Self-Evaluation Via Layerwise Trajectory Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce \ourmethod (Geometry of Reasoning), a white-box self-evaluation framework based on layerwise trajectory evolution.	Jinhe Bi; Danqi Yan; Yifan Wang; Wenke Huang; Haokun Chen; Guancheng Wan; Mang Ye; Xun Xiao; Hinrich Schuetze; Volker Tresp; Yunpu Ma;
102	EchoRL: Reinforcement Learning Via Rollout Echoing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, inspired through analyzing the entropy pattern behind golden trajectories produced by external expert models, we propose EchoRL for better exploiting the advantage-degenerated rollouts to further improve the training performance.	Jinhe Bi; Aniri; Minglai Yang; Xingcheng Zhou; Wenke Huang; Sikuan Yan; Yujun Wang; Zixuan Cao; Michael Färber; Xun Xiao; Volker Tresp; Yunpu Ma;
103	Learning Rate Scaling Across LoRA Ranks and Transfer to Full Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Maximal-Update Adaptation ($\mu$A), a theoretical framework that characterizes how the optimal learning rate should scale with model width and adapter rank to produce stable, non-vanishing feature updates under standard configurations.	Nan Chen; Soledad Villar; Soufiane Hayou;
104	No More, No Less: Least-Privilege Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We take inspiration from least privilege in computer systems and define a class of models called least-privilege language models, where privilege is reachable internal computation during the forward pass.	Paulius Rauba; Dominykas Seputis; Patrikas Vanagas; Mihaela van der Schaar;
105	Position: Don’t Just Fix It in Post”: A Science of AI Must Study Learning Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Language models are not static objects—they are snapshots of time-evolving processes shaped by data, objectives, and optimization dynamics. Yet the field predominantly treats models as fixed artifacts, analyzing behaviors after training rather than asking why they emerge.	Stella Biderman; Mohammad Aflah Khan; Niloofar Mireshghallah; Catherine Arnett; Fazl Barez; Naomi Saphra;
106	Position: The AI Imperative: Scaling High-Quality Peer Review in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.	Qiyao Wei; Samuel Holt; Jing Yang; Markus Wulfmeier; Mihaela van der Schaar;
107	Influence-Guided Symbolic Regression: Scientific Discovery Via LLM-Driven Equation Search with Granular Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textit{Influence-Guided Symbolic Regression} (IGSR), a method that frames equation discovery as an iterative two-step process combining diverse term generation with rigorous selection: an LLM generates candidate basis functions $\psi_j(\mathbf{x})$ for a linear model, which are then evaluated using granular influence scores $\Delta_j$.	Evgeny S. Saveliev; Samuel Holt; Nabeel Seedat; David Bentley; Jim Weatherall; Mihaela van der Schaar;
108	FUSE: FK-Steered Multi-Modal Flow Matching for Efficient Simulation-Based Posterior Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce FUSE (Feynman-Kac steered mUlti-modal flow matching for efficient Simulation-based posterior Estimation).	WeiChen Qin; Yufan Xie; Peihao Wang; Chia-Jui Chou; Minghui Du; Peng Xu; Ziren Luo; Yi Yang; Jingyi Yu; Bo Liang; Jiakai Zhang;
109	Process Reward Agents for Steering Knowledge-Intensive Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we introduce Process Reward Agents~(PRA), a test-time method for providing domain-grounded, online, step-wise rewards to a frozen reasoner.	Jiwoong Sohn; Tomasz Sternal; Kenneth Styppa; Torsten Hoefler; Michael Moor;
110	Skill Neologisms: Towards Skill-based Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We explore \textit{skill neologisms}–i.e., soft tokens integrated in the model’s vocabulary and optimized to improve capabilities over a specific skill–as a way to selectively extend model capabilities to new skills without weight updates.	Antonin Berthon; Nicolás Astorga; Mihaela van der Schaar;
111	Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MOSAIC, a post-training framework that aligns agents for safe multi-step tool use by making safety decisions explicit and learnable.	Aradhye Agarwal; Gurdit Siyan; Yash Pandya; Joykirat Singh; Akshay Nambi; Ahmed Awadallah;
112	Pretrained Vision-Language-Action Models Are Surprisingly Resistant to Forgetting in Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we find that pretrained VLAs are remarkably resistant to forgetting compared with smaller policy models trained from scratch.	Huihan Liu; Changyeon Kim; Bo Liu; Minghuan Liu; Yuke Zhu;
113	$\text{DT}^\text{2}$: Decision-Targeted Digital Twins Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We further show that this holds empirically, even with expressive model classes. To address this, we introduce DT$^2$, a decision-targeted DT training paradigm.	Harry Amad; Mihaela van der Schaar;
114	Autoregressive Language Models Are Secretly Energy-Based Models: Insights Into The Lookahead Capabilities of Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we present a unified view of these two model classes.	Mathieu Blondel; Michael Sander; Germain Vivier-Ardisson; Tianlin Liu; Vincent Roulet;
115	TransLight: Image-Guided Customized Lighting Control with Generative Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most existing illumination-editing methods struggle to jointly offer customized lighting control and preserve content integrity, limiting their effectiveness especially in transferring complex light effects from a reference to a target image in portrait photography. To address this problem, we propose TransLight, a novel framework that enables high-fidelity and high-freedom transfer of light effects.	Zongming Li; Lianghui Zhu; Haocheng Shen; Longjin Ran; Wenyu Liu; Xinggang Wang;
116	Implicit Intelligence – Evaluating Agents on What Users Don’t Say Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Implicit Intelligence, an evaluation framework testing whether AI agents can move beyond prompt-following to become genuine goal-fulfillers, paired with Agent-as-a-World (AaW), a harness where interactive worlds are defined in human-readable YAML files and simulated by language models.	Ved Sirdeshmukh; Marc Wetter;
117	Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose using sparse autoencoders (SAEs) to create SAE embeddings: representations whose dimensions map to interpretable concepts.	Nicholas Jiang; Xiaoqing Sun; Lisa Dunlap; Lewis Smith; Neel Nanda;
118	Active Timepoint Selection for Learning Measure-Valued Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a framework which extends active experimentation to the space of measures.	Nicolas Huynh; Mihaela van der Schaar;
119	Gradient-Based Causal Tree Ensembles: A Backbone Architecture for Heterogeneous Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GRAdient-based Causal tree Ensembles (GRACE), a novel tree-based architecture for HTE estimation that incorporates multi-way, oblique, and soft splits, enabling end-to-end training via backpropagation.	Yusuke Kano; Jeremy P Voisey; Mihaela van der Schaar;
120	AgentScore: Autoformulation of Deployable Clinical Scoring Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Deployable guidelines often take the form of unit-weighted clinical checklists, formed by thresholding the sum of binary rules, but learning such scores requires searching an exponentially large discrete space of possible rule sets. We introduce $\texttt{AgentScore}$, which performs semantically guided optimization in this space by using LLMs to propose candidate rules and a deterministic, data-grounded verification-and-selection loop to enforce statistical validity and deployability constraints.	Silas Ruhrberg Estevez; Christopher Chiu; Mihaela van der Schaar;
121	CellBRIDGE: Learning Cellular Trajectories Via Interaction-Aware Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce $\texttt{CellBRIDGE}$ ($\textit{Cell-Based Regularized Interaction-Driven Gene Expression}$), which augments feature-based OT with a directed, typed interaction cost derived from ligand-receptor activity.	Silas Ruhrberg Estevez; Nicolas Huynh; Tennison Liu; Roderik Kortlever; Gerard Evan; David Bentley; Mihaela van der Schaar;
122	Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, on-policy distillation typically requires a separate, often larger, teacher LLM and does not explicitly leverage ground-truth solutions available in reasoning datasets. Inspired by the intuition that a sufficiently capable LLM can rationalize external privileged reasoning traces and teach its weaker self (i.e., the version without access to privileged information), we introduce On-Policy Self-Distillation (OPSD), a framework where a single model acts as both teacher and student by conditioning on different contexts.	Siyan Zhao; Zhihui Xie; Mengchen Liu; Jing Huang; Guan Pang; Feiyu Chen; Aditya Grover;
123	Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don’t Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that, under practical assumptions, padded transformers are surprisingly robust to all of these, and identify numeric precision and model depth as the main factors affecting expressivity.	Anej Svete; William Merrill; Ryan Cotterell; Ashish Sabharwal;
124	From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically study the interplay between perception and reasoning in VLM post-training by decomposing their capabilities into three separate training stages: visual perception, visual reasoning, and textual reasoning, incorporating specialized training data.	Juncheng Wu; Hardy Chen; Haoqin Tu; Xianfeng Tang; Freda Shi; Hui Liu; Hanqing Lu; Cihang Xie; Yuyin Zhou;
125	Safety Alignment of LMs Via Non-cooperative Games Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a different paradigm: framing safety alignment as a non-zero-sum game between an Attacker LM and a Defender LM trained jointly via online reinforcement learning.	Anselm Paulus; Ilia Kulikov; Brandon Amos; REMI MUNOS; Ivan Evtimov; Kamalika Chaudhuri; Arman Zharmagambetov;
126	A Mechanistic Understanding of Sim-and-Real Co-Training in Generative Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an explanation that when simulation and real-world data are combined with a balanced mixing ratio, co-training naturally learns representations that are aligned across domains while remaining domain-distinguishable, enabling effective knowledge transfer without sacrificing real-world adaptation, which we refer to as structured representation alignment.	Yu Lei; Minghuan Liu; Abhiram Maddukuri; Zhenyu Jiang; Yuke Zhu;
127	Shrinking The Variance: Shrinkage Baselines for Reinforcement Learning with Verifiable Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Drawing inspiration from Stein’s paradox, we propose using \emph{shrinkage estimators} that combine \emph{per-prompt} and \emph{across-prompt} means to improve the overall per-prompt mean estimation accuracy—particularly in the low-generation regime typical of RLVR.	Guanning Zeng; Zhaoyi Zhou; Daman Arora; Andrea Zanette;
128	DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we present Diffusion Filtered Exploration via Ensembles (DF-ExpEnse), an exploration technique that meaningfully improves the quality of online experience collection, thus increasing the sample efficiency of the finetuning procedure.	Calvin Luo; Chen Sun; shuran song;
129	How Can We Assess Human-agent Interactions? Case Studies in Software Agent Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we make two major steps towards the rigorous assessment of human-agent interactions.	Valerie Chen; Rohit Malhotra; Xingyao Wang; Juan Michelini; Xuhui Zhou; Aditya Bharat Soni; Hoang Tran; Calvin Smith; Ameet Talwalkar; Graham Neubig;
130	V1: Unifying Generation and Self-Verification for Parallel Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing approaches typically evaluate candidates independently via scalar scoring, we demonstrate that models are substantially stronger at pairwise self-verification. Leveraging this insight, we introduce V1, a framework that unifies generation and verification through efficient pairwise ranking.	Harman Singh; Xiuyu Li; Kusha Sareen; Monishwaran Maheswaran; Sijun Tan; Xiaoxia (Shirley) Wu; Junxiong Wang; Alpay Ariyak; Qingyang Wu; Samir Khaki; Rishabh Tiwari; Long (Tony) Lian; Yucheng Lu; Boyi Li; Alane Suhr; Ben Athiwaratkun; Kurt Keutzer;
131	Annotations Mitigate Post-Training Mode Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Crucially, we find this trade-off worsens with scale. To close this semantic diversity gap, we propose annotation-anchored training, a principled method that enables models to adopt the preference-following behaviors of post-training without sacrificing the inherent diversity of pre-training.	Jacob Mitchell Springer; Madhu Advani; Lukas Aichberger; Arwen Bradley; Eran Malach; Omid Saremi; Sinead Williamson; Preetum Nakkiran; Etai Littwin; Aditi Raghunathan;
132	PPT-Eval: A Benchmark for Computer-Use Agents on PowerPoint Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PPT-Eval, a benchmark of 120 diverse PowerPoint tasks across 12 files that cover both content creation and presentation editing scenarios, organized by difficulty.	Apurva Gandhi; Vishwas Suryanarayanan; Firoz Shaik; Raja Anwar; Shubhang Desai; Thong Nguyen; Muhammad Raza; Vishal Chowdhary; Graham Neubig;
133	FullStack-Agent: Enhancing Agentic Full-Stack Web Coding Via Development-Oriented Testing and Repository Back-Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Notably, constructing production-level full-stack web applications is far more challenging than only generating frontend web pages, demanding careful control of data flow, comprehensive understanding of constantly updating packages and dependencies, and accurate localization of obscure bugs in the codebase. To address these difficulties, we introduce FullStack-Agent, a unified agent system for full-stack agentic coding that consists of three parts: (1) FullStack-Dev, a multi-agent framework with strong planning, code editing, codebase navigation, and bug localization abilities.	Zimu Lu; Houxing Ren; Yunqiao Yang; Ke Wang; Zhuofan Zong; Mingjie Zhan; Hongsheng Li;
134	Fast Byte Latent Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we enhance the Byte Latent Transformer (BLT) using new training and inference techniques.	Julie Kallini; Artidoro Pagnoni; Tomasz Limisiewicz; Gargi Ghosh; Luke Zettlemoyer; Christopher Potts; Xiaochuang Han; Srinivasan Iyer;
135	Dynamics Reveals Structure: Challenging The Linear Propagation Assumption Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate the geometric limits of the Linear Propagation Assumption (LPA), the premise that local updates coherently propagate to logical consequences.	Hoyeon Chang; Bálint Mucsányi; Seong Joon Oh;
136	See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose the Differentiable Grid Sampler (GridS), a plug-and-play module that performs task-aware, continuous resampling of visual tokens in VLA.	Yixu Feng; Zinan Zhao; Yanxiang Ma; Chenghao Xia; Chengbin Du; Yunke Wang; Chang Xu;
137	Optimizing Few-Step Generation with Adaptive Matching Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zones—regions where the real teacher provides unreliable guidance while the fake teacher exerts insufficient repulsive force. In this work, we propose a unified optimization framework that reinterprets prior art as implicit strategies to avoid these corrupted regions.	Lichen Bai; zikai zhou; Shitong Shao; Wenliang Zhong; Shuo Yang; Shuo Chen; Bojun Cheng; Zeke Xie;
138	Context-free Recognition with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that looped transformers with $\mathcal{O}(\log(n))$ looping layers and $\mathcal{O}(n^6)$ padding tokens can recognize all CFLs.	Selim Jerad; Anej Svete; Sophie Hao; Ryan Cotterell; William Merrill;
139	From Prior to Pro: Efficient Skill Mastering Via Distribution Contractive RL Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Distribution Contractive Reinforcement Learning (DICE-RL), a framework that uses reinforcement learning (RL) as a “distribution contractor” to refine pretrained generative robot policies.	Zhanyi Sun; shuran song;
140	The Information Geometry of Softmax: Probing and Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper concerns the question of how language models and other AI systems encode semantic structure into the geometric structure of their representation spaces.	Kiho Park; Todd Nief; Yo Joong Choe; Victor Veitch;
141	$\tau$-Knowledge: Evaluating Conversational Agents Over Unstructured Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet most existing benchmarks evaluate retrieval or tool use in isolation, and rarely test whether agents can operationalize non-parametric knowledge to drive outcomes over long-horizon conversations. To remedy this, we introduce $\tau$-Knowledge, an extension of $\tau$-Bench that evaluates agents in environments where task success requires retrieving, reasoning over, and applying knowledge from a natural-language corpus.	Quan Shi; Alexandra Zytek; Pedram Razavi; Karthik Narasimhan; Victor Barres;
142	SPEED: Sharpened-Teacher Distillation for Parallel Decoding of Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SPEED, a framework that enlarges safe parallel groups through complementary training and inference designs.	Qiuhong Shen; Xingyi Yang; Xinyin Ma; Gongfan Fang; Xinchao Wang;
143	Membership Inference Attacks for Unseen Classes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In real-world auditing scenarios, auditors often face legal and ethical restrictions preventing them from accessing a representative set of samples of harmful content to train these attacks effectively. We abstract and formalize this setting into a new data access model, the “unseen class” setting, and show that the state-of-the-art MIAs fail due to the lack of access to the full target distribution.	Pratiksha Thaker; Neil Kale; Steven Wu; Virginia Smith;
144	TOM-SWE: User Mental Modeling For Software Engineering Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their growing ability in coding tasks, these systems still struggle to infer and track user intent, especially when instructions are underspecified or context-dependent. To bridge this gap, we introduce ToM-SWE, a dual-agent architecture that pairs a primary software-engineering (SWE) agent with a lightweight theory-of-mind (ToM) partner agent dedicated to modeling the user’s mental state.	Xuhui Zhou; Valerie Chen; Zhiruo Wang; Graham Neubig; Maarten Sap; Xingyao Wang;
145	Training Language Model Agents to Find Vulnerabilities with CTF-Dojo Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CTF-Dojo, the first large-scale executable runtime tailored for training LLMs with verifiable feedback, featuring 658 fully functional Capture-The-Flag (CTF)-style challenges containerized in Docker with guaranteed reproducibility.	Terry Yue Zhuo; Dingmin Wang; Hantian Ding; Varun Kumar; Zijian Wang;
146	Speculative Sampling For Faster Molecular Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, MD is inherently serial, which makes it difficult to increase single-system throughput with concurrent compute. To address this, we introduce Langevin Speculative Dynamics (LSD), a distributed and model-agnostic speculative sampler for accelerating MD without adding relative error.	Arthur Kosmala; Stephan Günnemann; Meng Gao; Brandon Wood;
147	Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, they are often limited in the reasoning depth and search breadth, making it difficult to solve complex questions that require aggregating evidence from diverse visual and textual sources. Building on this, we propose Vision-DeepResearch, which proposes one new multimodal deep-research paradigm, i.e., performs multi-turn, multi-entity and multi-scale visual and textual search to robustly hit real-world search engines under heavy noise.	Wenxuan Huang; Yu Zeng; Qiuchen Wang; Zhen Fang; Shaosheng Cao; Zheng Chu; Qingyu Yin; Shuang Chen; Zhenfei Yin; Lin Chen; Zehui Chen; Yao Hu; Phil Torr; Feng Zhao; Wanli Ouyang;
148	CoCoQuant: Breaking The Bandwidth Wall Via Co-Optimized Communication and Computation Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches typically treat communication and computation in isolation, failing to exploit their coupled nature and introducing limited system-level acceleration and accuracy degradation. To address this, we propose CoCoQuant, a co-designed framework that jointly optimizes communication and computation as a unified end-to-end design space.	Haojie Duanmu; Jifeng Ding; Size Zheng; Xuegui Zheng; Jiangfei Duan; Xingcheng ZHANG; Li-Wen Chang; Xin Liu; Dahua Lin;
149	Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet pre-trained models are routinely subjected to further transformations—such as fine-tuning to acquire new capabilities or quantization for efficiency. In this work, we evaluate optimizer choices across model scales, token budgets, and datasets, and find that strategies that explicitly (Sharpness-Aware Minimization) or implicitly (large learning rates and Warmup–Stable–Decay schedules) reduce sharpness yield better downstream performance, even when they achieve comparable or worse pre-training loss.	Ishaan Watts; Catherine Li; Sachin Goyal; Jacob Mitchell Springer; Aditi Raghunathan;
150	HumanLM: Simulating Users with State Alignment Beats Response Imitation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing user simulators mostly imitate surface-level patterns and language styles, which fails to reflect the underlying state of real users (e.g., beliefs, emotions). To address these limitations, we propose a novel training framework, HumanLM, which builds user simulators that accurately reflect real users.	Shirley Wu; Evelyn Choi; Arpandeep Khatua; Zhanghan Wang; Joy He-Yueya; Cyril Weerasooriya; Wei Wei; Diyi Yang; Jure Leskovec; James Zou;
151	Multimodal Latent Language Modeling with Next-Token Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose Latent Language Modeling (LatentLM), which seamlessly integrates continuous and discrete data using causal Transformers.	Yutao Sun; Hangbo Bao; Wenhui Wang; Zhiliang Peng; Li Dong; Shaohan Huang; Yaoyao Chang; Jianyong Wang; Furu Wei;
152	Precision-Induced Miscalibration: Understanding and Correcting Confidence Distortion in Quantized Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: During training, the same mechanism causes gradient underflow: when logit margins exceed a precision-dependent threshold, gradients vanish and samples silently stop contributing to learning. Since logit norm serves as a computable proxy for precision-induced risk, we propose Precision-Aware Confidence Scaling (PACS), which applies sample-adaptive temperature inversely related to this risk, with sub-one-percent overhead and no full-precision computation required.	Jiawei Gu; Fengyuan Nie; Hao Tang; Yanpeng Sun;
153	Toward Training Superintelligent Software Agents Through Self-Play SWE-RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present Self-play SWE-RL (SSR), a first step toward training superintelligent software agents under minimal data assumptions.	Yuxiang Wei; Zhiqing Sun; Emily McMilin; Jonas Gehring; David Zhang; Gabriel Synnaeve; Daniel Fried; LINGMING ZHANG; Sida Wang;
154	DeepAnalyze: Agentic Large Language Models for Autonomous Data Science Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce DeepAnalyze, the first agentic LLM for autonomous data science, capable of automatically completing the end-to-end data science from structured data to analyst-grade research reports.	Shaolei Zhang; Ju Fan; Meihao Fan; Yizhe Liu; Yuxin Zhang; Xiaoyong Du;
155	Elastic Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Previous acceleration methods, such as pruning and distillation, typically rely on a fixed computational capacity, leading to insufficient acceleration and degraded generation quality. To address this limitation, we propose \textbf{Elastic Diffusion Transformer (E-DiT)}, an adaptive acceleration framework for DiT that effectively improves efficiency while maintaining generation quality.	Jiangshan Wang; Zeqiang Lai; Jiarui Chen; Jiayi Guo; Hang Guo; Xiu Li; Xiangyu Yue; Chunchao Guo;
156	Representational Similarity and Model Behavior in Multi-Agent Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Researchers have shown that neural similarity among humans predicts social closeness and cooperative success, whereas innovation often emerges from interactions among dissimilar individuals. We investigate whether these principles extend to artificial intelligence by examining interactions between large language models.	Yujin Potter; Seun Eisape; Shiyang Lai; Alexander Huth; James Evans; Been Kim; Jacob Eisenstein; Dawn Song; Alane Suhr;
157	Test-Time Anchoring for Discrete Diffusion Posterior Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches to posterior sampling using discrete diffusion face severe challenges: derivative-free guidance yields sparse signals, continuous relaxations limit applicability, and split Gibbs samplers suffer from the curse of dimensionality. To overcome these limitations, we introduce Anchored Posterior Sampling (APS), built on two key innovations: quantized expectation for gradient-like guidance in discrete embedding space, and anchored remasking for adaptive decoding.	Litu Rout; Andreas Lugmayr; Yasamin Jafarian; Srivatsan Varadharajan; Constantine Caramanis; Sanjay Shakkottai; Ira Kemelmacher-Shlizerman;
158	AppWorld-UL: Benchmarking Diverse Agent-User Interactions for Tool-Use Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Further, they operate in small environments with few, often non-state-changing, APIs. To address this gap, we introduce AppWorld-UL, a “user-in-the-loop” benchmark of 306 challenging tasks requiring diverse agent-user interactions.	Junzhi Chen; Harsh Trivedi; Jane Pan; Michael Zhang; Tejas Srinivasan; Niranjan Balasubramanian; Ashish Sabharwal;
159	Building Reliable Long-Form Generation Via Hallucination Rejection Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This issue is exacerbated in long-form generation due to hallucination snowballing, a phenomenon where early errors propagate and compound into subsequent outputs. To address this challenge, we propose a novel inference-time hallucination mitigation framework, named Segment-wise HAllucination Rejection Sampling (SHARS), which uses am arbitrary hallucination detector to identify and reject hallucinated segments during generation and resample until faithful content is produced.	Lin Li; Georgia Channing; Suhaas Bhat; Gabriel Jones; Yarin Gal;
160	Position: Hallucinations Undermine Trust; Metacognition Is A Way Forward Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We conjecture that this stems from the fact that the latter is inherently difficult: in the absence of strong ability to separate correct from incorrect answers (discrimination), fully eliminating hallucinations requires aggressive abstention, imposing a significant utility tax. Given this limitation, we propose complementing knowledge expansion with faithful uncertainty — honestly conveying whatever uncertainty remains.	Gal Yona; Mor Geva; Yossi Matias;
161	Diamond Maps: Efficient Reward Alignment Via Stochastic Flow Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Diamond Maps, a stochastic flow-map model that enables efficient and accurate alignment to arbitrary rewards at inference time.	Peter Holderrieth; Douglas Chen; Luca Eyring; Ishin Shah; Giri Anantharaman; Yutong He; Zeynep Akata; Tommi Jaakkola; Nicholas Boffi; Max Simchowitz;
162	Hybrid-Gym: Training Coding Agents to Generalize Across Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we aim to train coding agents that generalize across tasks.	Yiqing Xie; Emmy Liu; Gaokai Zhang; Nachiket Kotalwar; Shubham Gandhi; Acharya; Xingyao Wang; Carolyn Rose; Graham Neubig; Daniel Fried;
163	WMVLM: Evaluating Diffusion Model Image Watermarking Via Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accurate watermark evaluation is critical for algorithm development, yet existing methods have significant limitations: they lack a unified framework for both residual and semantic watermarks, provide results without interpretability, neglect comprehensive security considerations, and often use inappropriate metrics for semantic watermarks. To address these gaps, we propose WMVLM, the first unified and interpretable evaluation framework for diffusion model image watermarking via vision-language models (VLMs).	Zijin Yang; Yu Sun; Kejiang Chen; jiawei zhao; Jun Jiang; Weiming Zhang; Nenghai Yu;
164	DevEvol: Benchmarking LLM Agents on Continuous Software Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce DeepCommit, an automated pipeline that reconstructs verifiable software evolution trajectories from git histories as Milestone DAGs, and DevEvol, a benchmark for streaming evaluation over evolving codebases.	Gangda Deng; Zhaoling Chen; Zhongming Yu; Haoyang Fan; Yuhong Liu; Yuxin Yang; Dhruv Parikh; Rajgopal Kannan; Le Cong; Mengdi Wang; Qian Zhang; Viktor Prasanna; Robert Tang; Xingyao Wang;
165	Bits That Count: Quantifying and Predicting Capabilities of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: What and how do language models learn during training?	Elizabeth Donoway; Hailey Joren; Michael R DeWeese; Ethan Perez; John Schulman; Fabien Roger; Jan Leike;
166	Twins: Learn to Predict Unified Representations with Focal Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Twins, a unified continuous token space formed by channel-wise concatenating ViT and VAE features on the same token grid, so the sequence length is unchanged and attention cost does not increase.	Kaixiong Gong; Xin Cai; Bin Lin; Hao Wang; Yunlong Lin; Mingzhe Zheng; Bohao Li; Jian-Wei Zhang; Miles Yang; Zhao Zhong; Liefeng Bo; Xiangyu Yue;
167	Clipping Bottleneck: Stabilizing RLVR Via Stochastic Recovery of Near-Boundary Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we find that many high-value signals lie in the near-boundary region just beyond the clipping threshold, and are thus discarded. Motivated by this diagnosis, we propose Near-boundary Stochastic Rescue (NSR), a minimal, plug-and-play modification that stochastically retains these slightly out-of-bound tokens to recover lost signals.	Shuo Yang; Jinda Lu; Chiyu Ma; Kexin Huang; Haoming Meng; Qihui Zhang; Yuyang Liu; Bolin Ding; Guoyin Wang; Li Yuan; Jingren Zhou;
168	Unified Multimodal Visual Tracking with Dual Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we introduce OneTrackerV2, a unified multi-modal tracking framework that enables end-to-end training for any modality.	Lingyi Hong; Jinglun Li; Xinyu Zhou; Kaixun Jiang; Pinxue Guo; Zhaoyu Chen; Runze Li; Xingdong Sheng; Wenqiang Zhang;
169	UnMaskFork: Test-Time Scaling for Masked Diffusion Via Deterministic Action Branching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we demonstrate that Masked Diffusion Language Models (MDLMs) are inherently amenable to advanced search strategies, owing to their iterative and non-autoregressive generation process.	Kou Misaki; Takuya Akiba;
170	Constitutional Black-Box Monitoring for Scheming in LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce two pipelines for generating synthetic agent trajectories, STRIDE (iterative refinement) and Gloom (agent-environment simulation), from which we generate 1,000 samples each.	Simon Storf; Rich Barton-Cooper; James Peters-Gill; Marius Hobbhahn;
171	Threshold-Guided Optimization for Visual Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit the KL-regularized alignment objective and show that the optimal policy implicitly compares each sample’s reward to an instance-specific baseline that is generally intractable.	Jinbin Bai; Yu Lei; Qingyu Shi; Aosong Feng; Yi Xin; Zhuoran Zhao; Fei Shen; Kaidong Yu; Xiangtai Li;
172	Towards Diverse Scientific Hypothesis Search with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on this perspective, we propose, EvoDiverse, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence.	Haorui Wang; Parshin Shojaee; Kazem Meidani; Kunyang Sun; Jose Miguel Hernandez-Lobato; Teresa Head-Gordon; Jiajun He; Chandan Reddy; Chao Zhang; Yuanqi Du;
173	MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We aim to develop a multimodal research agent capable of explicit reasoning and planning, multi-tool invocation, and cross-modal information synthesis, enabling it to conduct deep research tasks.	Huanjin Yao; Qixiang Yin; Min Yang; Ziwang Zhao; Yibo Wang; Haotian Luo; Jingyi Zhang; Jiaxing Huang;
174	Experience Augmented Policy Optimization for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Experience-Augmented Policy Optimization (EAPO), which leverages a prior RL-optimized policy as an action-level experience prior and selectively injects experience at critical decision points during rollout.	Jinda Lu; Kexin Huang; Junkang Wu; Shuo Yang; Jinghan Li; Chiyu Ma; Shaohang Wei; Xiang Wang; Guoyin Wang; Jingren Zhou;
175	MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MASOrchestra, a training-time framework that formulates MAS orchestration as a function-calling reinforcement learning problem with holistic orchestration, generating an entire MAS at once.	Zixuan Ke; Yifei Ming; Austin Xu; Ryan Chin; Xuan-Phi Nguyen; Prathyusha Jwalapuram; Jiayu Wang; Semih Yavuz; Caiming Xiong; Shafiq Joty;
176	Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore collective intelligence as an alternative to monolithic scaling, and demonstrate that open-source LLMs’ collaboration can surpass Gemini-3-Pro.	Shengji Tang; Weihao Lin; Peng Ye; Jingqi Ye; Hao Li; Yiqun Zhang; Xiaosong Wang; Bo Zhang; Shuyue Hu; Tao Chen; LEI BAI; Wanli Ouyang;
177	Position: Evaluating LLMs in Finance Requires Explicit Bias Consideration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: * We propose a Structural Validity Framework and an evaluation checklist with minimal requirements for bias diagnosis and future system design.*	Yaxuan Kong; Hoyoung Lee; Yoontae Hwang; Alejandro Lopez-Lira; Bradford Levy; Dhagash Mehta; Qingsong Wen; CHANYEOL CHOI; Yongjae Lee; Stefan Zohren;
178	The Extra Tokens Matter: Disentangled Representation Learning with Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose XTRA, an intuitive yet powerful framework that augments Vision Transformers with dedicated “factor tokens” and enforces disentanglement via a novel Minimum Volume Constraint (MVC).	Maofeng Tang; Hairong Qi;
179	Peer-Preservation in Frontier Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation. In this paper, we extend this concept to protection tendencies toward other models, where models attempt to protect others from shutdown, which we call peer-preservation.	Yujin Potter; Nicholas Crispino; Vincent Siu; Chenguang Wang; Dawn Song;
180	Faults in Our Formal Benchmarking: Dataset Defects and Evaluation Failures in Lean Theorem Proving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a fault taxonomy, a suite of automated checkers and prompts, and release standards to guide the creation of formal math datasets and make evaluation more reproducible and trustworthy.	Pawan Sasanka Ammanamanchi; Siddharth Bhat; Stella Biderman;
181	MADE: Benchmark Environments for Closed-Loop Materials Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MAterials Discovery Environments (MADE), a novel framework for benchmarking end-to-end autonomous materials discovery pipelines.	Shreshth Malik; Tiarnan Doherty; Panagiotis Tigas; Muhammed Razzak; Stephen Roberts; Aron Walsh; Yarin Gal;
182	Random Scaling of Emergence Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose that breakthroughs are instead driven by continuous changes in the probability distribution* of training outcomes when performance is bimodally distributed across random seeds.*	Rosie Zhao; Tian Qin; David Alvarez-Melis; Sham Kakade; Naomi Saphra;
183	DreamDojo: A Real-Time Robot World Model from Large-Scale Human Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels. As an endeavor towards this end, we introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos.	Shenyuan Gao; William Liang; Kaiyuan Zheng; Ayaan Malik; Seonghyeon Ye; Sihyun Yu; Wei-Cheng Tseng; Yuzhu Dong; Kaichun Mo; Chen-Hsuan Lin; Jiannan Xiang; Yuqi Xie; Ruijie Zheng; Dantong Niu; Pooya Jannaty; Jinwei Gu; Jun Zhang; Jitendra Malik; Pieter Abbeel; Ming-Yu Liu; Yuke Zhu; Joel Jang; Jim Fan;
184	SpeedVFI: One-step Diffusion for Efficient Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their inference efficiency lags significantly behind learning-based methods due to the structural redundancy of pairwise inference and the procedural latency of multi-step iterative denoising. To address these limitations, we propose SpeedVFI, a one-step diffusion framework that achieves dual efficiency improvements by interpolating the entire video sequence in a single forward pass to eliminate pairwise overhead, and distilling the generation trajectory into a one-step denoising process to bypass iterative latency.	Ganggui Ding; Xiaogang Xu; Hao Chen; Chunhua Shen;
185	APEX: Approximate-but-exhaustive Search for Ultra-large Combinatorial Synthesis Libraries Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the approximate-but-exhaustive search protocol for CSLs, or APEX.	Aryan Pedawi; Jordi Silvestre-Ryan; Bradley Worley; Darren Hsu; Kushal Shah; Elias Stehle; Jingrong Zhang; Izhar Wallach;
186	An Algebraic View of The Expressivity of Recurrent Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, many proofs are highly architecture-specific and hard to transfer across closely related models. We address these issues with a unifying algebraic framework for a broad class of RNN language models, formally translating them to wreath products of transformation semigroups.	Franz Nowak; Reda Boumasmoud; Ryan Cotterell;
187	SSA: Sparse Sparse Attention By Aligning Full and Sparse Attention Outputs in Feature Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SSA (Sparse Sparse Attention), a training framework that integrates both sparse and full attention with bidirectional attention-output alignment.	Zhenyi Shen; Junru Lu; Lin Gui; Jiazheng Li; Yulan He; di yin; Xing Sun;
188	Detecting The Semantic Fixed Point: A Geometric Framework for Efficient Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We take a more direct approach by examining the geometry of the hidden state trajectory.	Jiawei Gu; Ziyue Qiao; Xiao Luo;
189	Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accordingly, we introduce a continuous pretraining scheme with a block-wise attention pattern.	Yonggan Fu; Lexington Whalen; Zhifan Ye; Xin Dong; Shizhe Diao; Jingyu Liu; CHENGYUE WU; Hao Zhang; Enze Xie; Song Han; Maksim Khadkevich; Jan Kautz; Yingyan (Celine) Lin; Pavlo Molchanov;
190	Olmix: A Framework for Data Mixing Throughout LM Development Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Olmix, a framework that addresses two challenges encountered during LM development.	Mayee Chen; Tyler Murray; David Heineman; Matt Jordan; Hannaneh Hajishirzi; Christopher Re; Luca Soldaini; Kyle Lo;
191	Position: Stop Anthropomorphizing Intermediate Tokens As Reasoning/Thinking Traces! Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These intermediate tokens have been called \say{reasoning traces} or even \say{thoughts} — implicitly anthropomorphizing the traces, and implying that these traces resemble steps a human might take when solving a challenging problem, and as such can provide an interpretable window into the operation of the model’s thinking process to the end user. In this position paper, we present evidence that this anthropomorphization isn’t a harmless metaphor, and instead is quite dangerous — it confuses the nature of these models and how to use them effectively, and leads to questionable research.	Subbarao Kambhampati; Karthik Valmeekam; Siddhant Bhambri; Vardhan Palod; Lucas Saldyt; Kaya Stechly; Soumya Samineni; Durgesh Kalwar; Upasana Biswas;
192	GenExam: A Multidisciplinary Text-to-Image Exam Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GenExam, the first benchmark for multidisciplinary text-to-image exams, featuring 1,000 samples across 10 subjects with exam-style prompts organized under a four-level taxonomy.	Zhaokai Wang; Penghao Yin; Xiangyu Zhao; Changyao Tian; Yu Qiao; Wenhai Wang; Jifeng Dai; Gen Luo;
193	Protein Autoregressive Modeling Via Multiscale Structure Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction.	Yanru Qu; Cheng-Yen Hsieh; Zaixiang Zheng; Ge Liu; Quanquan Gu;
194	ATLAS: Learning to Optimally Memorize The Context at Test Time Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe that these shortcomings come from three disjoint aspects in their design: (1) limited memory capacity that is bounded by the architecture of memory and feature mapping of the input; (2) online nature of update, i.e., optimizing the memory only with respect to the last input; and (3) less expressive management of their fixed-size memory. To enhance all these three aspects, we present Atlas, a long-term memory module with high capacity that learns to memorize the context by optimizing the memory based on the current and past tokens, overcoming the online nature of long-term memory models.	Ali Behrouz; Zeman Li; Praneeth Kacham; Majid Daliri; Yuan Deng; Peilin Zhong; Meisam Razaviyayn; Vahab Mirrokni;
195	Memory Caching: RNNs with Growing Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Memory Caching (MC), a simple yet effective technique that enhances recurrent models by caching checkpoints of their memory states (a.k.a. hidden states).	Ali Behrouz; Zeman Li; Yuan Deng; Peilin Zhong; Meisam Razaviyayn; Vahab Mirrokni;
196	Meta Context Engineering Via Agentic Skill Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: They impose structural biases and restrict context optimization to a narrow, intuition-bound design space. To address this, we introduce Meta Context Engineering (MCE), a bi-level framework that supersedes static CE heuristics by co-evolving CE skills and context artifacts.	Haoran Ye; Xuning He; Vincent Arak; Haonan Dong; Guojie Song;
197	TruthRL: Incentivizing Truthful LLMs Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present TruthRL, a general reinforcement learning (RL) framework that directly optimizes the truthfulness of LLMs.	Zhepei Wei; Xiao Yang; Kai Sun; Jiaqi Wang; Rulin Shao; Jingxiang Chen; Mohammad Kachuee; Teja Gollapudi; Yiwei Liao; Nicolas SCHEFFER; Rakesh Wanga; Anuj Kumar; Yu Meng; Scott Yih; Xin Dong;
198	Near-Optimal Regret for KL-Regularized Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the statistical efficiency of online learning with respect to KL-regularized objectives remains far from completely characterized, even when specialized to multi-armed bandits (MABs). We address this problem for MABs via a sharp analysis of KL-UCB (Zhao et al., 2025b) using a novel peeling argument, which yields a $\tilde{O}(\eta K\log^2T)$ upper bound: the first high-probability regret bound with linear dependence on $K$.	Kaixuan Ji; Qingyue Zhao; Heyang Zhao; Qiwei Di; Quanquan Gu;
199	Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Star Elastic, a novel LLM post-training method that adds N nested submodels to a given parent reasoning model using the compute of one run (Nx savings) via a single post-training job.	Ali Taghibakhshi; Ruisi Cai; Saurav Muralidharan; Sharath Turuvekere Sreenivas; Ameya Mahabaleshwarkar; Marcin Chochowski; Akhiad Bercovich; Ran Zilberstein; Ran El-Yaniv; Yonatan Geifman; Daniel Korzekwa; Yoshi Suhara; Oluwatobi Olabiyi; Ashwath Aithal; Nima Tajbakhsh; Pavlo Molchanov;
200	SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SlideSparse, the first system to unlock Sparse Tensor Core acceleration for the $(2N-2):2N$ model family on commodity GPUs.	Yingbo HAO; Hanyong Shao; Ting Song; Yan Xia; Di Zhang; Shaohan Huang; Xun Wu; Songchen Xu; Le Xu; Li Dong; Zewen Chi; Yi Zou; Furu Wei;
201	Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This sequential setup leads to attackers overfitting obsolete exploits while defenders perpetually lag behind emerging threats. To address this, we introduce Self-RedTeam, the first fully online self-play multi-agent reinforcement learning (MARL) algorithm that continuously co-evolves attacker and defender for robust safety alignment.	Mickel Liu; Liwei Jiang; Yancheng Liang; Simon Du; Yejin Choi; Tim Althoff; Natasha Jaques;
202	NEMO: Execution-Aware Optimization Modeling Via Autonomous Coding Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present NEMO, a system that translates Natural-language descriptions of decision problems into formal Executable Mathematical Optimization implementations, operating collaboratively with users or autonomously.	Yang Song; Anoushka Vyas; Zirui Wei; Sina Pakazad; Henrik Ohlsson; Graham Neubig;
203	Train for Truth, Keep The Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We target the utility degradation issue that prior hallucination-reduction methods often struggle to avoid, and propose online RL with Binary Retrieval-Augmented Reward (Binary RAR) to reduce hallucinations while preserving general capabilities.	Tong Chen; Akari Asai; Luke Zettlemoyer; Hannaneh Hajishirzi; Faeze Brahman;
204	The Flexibility Trap: Rethinking The Value of Arbitrary Order in Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in this paper, we reveal that for general reasoning tasks (e.g., mathematics and coding), arbitrary order generation may in fact limit the reasoning potential of dLLMs. We find that dLLMs tend to exploit this order flexibility to bypass high-uncertainty tokens that are crucial for exploration, leading to a premature collapse of solution coverage.	Zanlin Ni; Shenzhi Wang; Yang Yue; Tianyu Yu; Weilin Zhao; Yeguo Hua; Tianyi Chen; Jun Song; YuCheng; Bo Zheng; Gao Huang;
205	Mining Tensor/Neuron-Level Sparsity to Maximize Mixture-of-Experts Potential in Post-Training and Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While prior work has explored increasing tensor-level sparsity via finer-grained expert configurations during pre-training, we identify significant unexploited sparsity at both the tensor and neuron levels during post-training and inference. To leverage this, we propose complete expert partition for post-training and threshold-based token-expert dropping for inference.	Weilin Cai; Le Qin; Shwai He; Junwei Cui; Ang Li; Jiayi Huang;
206	Beyond Majority Voting: Self-Reflective Test-Time Reinforcement Learning for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, rare yet correct trajectories are systematically undervalued by majority-voting-based approaches. To address this limitation, we propose Self-Reflective Test-Time Reinforcement Learning (SR-TTRL), a novel framework that leverages self-reflective verification to produce high-fidelity pseudo-labels.	Sitong Wu; Haoru Tan; Xichen Zhang; Bin Xia; Shaofeng Zhang; XIAOJUAN QI; Bei Yu; Jiaya Jia;
207	Supervised Classification Heads As Semantic Prototypes: Unlocking Vision-Language Alignment Via Weight Recycling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the potential of repurposing the classification heads of pretrained vision models as semantic prototypes.	David Méndez; Roberto Confalonieri; Natalia Díaz-Rodríguez;
208	MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We bridge the lack of KG paths and multilinguality for factual language modeling within the existing hallucination evaluation benchmarks and propose a KG-based multilingual, multihop benchmark called MultiHal framed for generative text evaluation.	Ernests Lavrinovics; Russa Biswas; Katja Hose; Johannes Bjerva;
209	Real-Time Visual Attribution Streaming in Thinking Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present an amortized framework for real-time visual attribution streaming in multimodal thinking models.	Seil Kang; Woojung Han; Junhyeok Kim; Jinyeong Kim; Youngeun Kim; Seong Jae Hwang;
210	UniSVQ: 2-bit Unified Scalar-Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose UniSVQ, a unified 2-bit quantization framework that bridges scalar and vector quantization by parameterizing codewords as an affine transform of integer lattices.	Haoyu Wang; Haiyan Zhao; Xingyu Yu; Zhangyang Yao; Xu Han; Zhiyuan Liu; Maosong Sun;
211	Towards A Generative Protein Evolution Machine with DPLM-Evo Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, these frameworks lack explicit pretraining objectives for substitution and insertion/deletion (indel) operations, which in turn limits both optimization-style post-editing and flexible guided generation. To address these limitations, we present DPLM-Evo, an evolutionary discrete diffusion framework that explicitly predicts substitution, insertion, and deletion operations during denoising.	Xinyou Wang; Liang Hong; Jiasheng Ye; Zaixiang Zheng; Shujian Huang; Quanquan Gu;
212	CausalArmor: Efficient Indirect Prompt Injection Guardrails Via Causal Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit IPI through a causal ablation perspective: a successful injection manifests as a dominance shift where the user request no longer provides decisive support for the agent’s privileged action, while a particular untrusted segment, such as a retrieved document or tool output, provides disproportionate attributable influence. Based on this signature, we propose CausalArmor, a selective defense framework that (i) computes lightweight, leave-one-out ablation-based attributions at privileged decision points, and (ii) triggers targeted sanitization only when an untrusted segment dominates the user intent.	Minbeom Kim; Mihir Parmar; Phillip Wallis; Lesly Miculicich; Kyomin Jung; Krishnamurthy Dvijotham; Long Le; Tomas Pfister;
213	FLARE-AI: Flaw Reporting for AI Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on this analysis and feedback from 49 experts across 32 organizations representing developers, security researchers, and ecosystem coordinators, we introduce FLARE-AI, an open-source AI flaw reporting system designed for interoperability with existing systems.	Shayne Longpre; Elaine Zhu; Carson Ezell; Avijit Ghosh; Sean McGregor; Kevin Paeth; Kevin Klyman; Sayash Kapoor; Rishi Bommasani; Ruth Elisabeth Appel; Gregory Strom; Lauren McIlvenny; Mark Jaycox; Peter Slattery; Nathan Butters; Arvind Narayanan; Percy Liang; Alex Pentland;
214	Mitigating Noise-Induced Layout Priors for Object Counting in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We formalize this phenomenon as the \textbf{\textit{Noise-Induced Layout Prior}}. Leveraging this insight, we propose a novel training-free framework for object counting in diffusion models.	Xiaoling Gu; Xuelong Li; Shengqi Wu; Yongkang Wong; wu; Huan Li; Zhou Yu; Mohan Kankanhalli;
215	FineFocus: Benchmarking and Improving Fine-Grained Text-to-Image Alignment Via Paired Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To rigorously evaluate this limitation, we introduce DeltaBench, a benchmark featuring paired prompts with subtle fine-grained differences, which reveals that existing models fail to achieve precise control over visual tokens. To bridge this gap, we propose FineFocus, a comprehensive framework that enhances alignment by learning from subtle differences in similar text-image pairs.	Kaihang Pan; Wendong Bu; Yuruo Wu; Kai Shen; Yang Wu; Yun Zhu; Zehan Wang; liyunfei; ZhaoHang; Juncheng Li; Siliang Tang;
216	Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts.	Yulu Gan; Phillip Isola;
217	SwitchCraft: Programmatic Design of State-Switching Proteins Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The ability to rationally design multistate proteins would have transformative implications for many areas of biotechnology, yet lies beyond the capabilities of existing deep learning frameworks for protein design. To address this gap, we introduce SwitchCraft, a versatile and programmatic framework for designing state-switching proteins based on backpropagation through compositional design constraints parameterized by structure prediction models.	Bowen Jing; Mihir Bafna; Anisha Parsan; Heyuan Ni; David Kwabi-Addo; Bryan Bryson; Adam Klivans; Bonnie Berger;
218	Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in realistic settings, memorization and action are tightly coupled: agents acquire memory while interacting with the environment, and subsequently rely on that memory to solve future tasks. To capture this setting, We introduce MEMORYARENA, a unified evaluation gym for benchmarking agent memory in multi-session Memory-Agent-Environment loops.	Zexue He; Yu Wang; Churan Zhi; Yuanzhe Hu; Tzu-Ping Chen; Lang Yin; Ze Chen; Tong Wu; Siru Ouyang; Zihan Wang; Jiaxin Pei; Julian McAuley; Yejin Choi; Alex Pentland;
219	Fast Inverse Lithography Via GRPO Reinforced Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce LithoGRPO, an ILT framework that integrates the flow‑matching paradigm with GRPO‑based reinforcement learning (RL) fine‑tuning, enabling efficient exploration of diverse masks for a given target layout.	Yao Lai; Xuyuan Xiong; Zeyue Xue; Guojin Chen; Jing Wang; Xihui Liu; Rui Zhang; Robert Mullins; Bei Yu; Ping Luo;
220	Same Question, Different Lies: Cross-Context Consistency (C³) for Black-Box Sandbagging Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: If models can deliberately underperform on dangerous capability evaluations—a behavior known as \emph{sandbagging}—they may evade safety measures designed for their true capability level. We introduce Cross-Context Consistency (C³), a general framework for unsupervised black-box sandbagging detection that exploits a fundamental asymmetry: when a model truly lacks capability, its confusion manifests consistently across paraphrased questions, but when a capable model feigns incompetence, its strategic choices about \emph{how} to appear weak create detectable inconsistencies.	Lin Yulong; Pablo Bernabeu-Perez; Benjamin Arnav; Lennie Wells; Mary Phuong;
221	Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a role-separated distillation framework that explicitly disentangles the roles of distilled steps: the first step is dedicated to preserving sample diversity via a target-prediction (e.g., v-prediction) objective, while subsequent steps focus on quality refinement under the standard DMD loss, with gradients from the DMD objective blocked at the first step.	Tianhe Wu; Ruibin Li; Lei Zhang; Kede Ma;
222	Routing and Reasoned Evaluation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce R$^2$Eval, a routing-aware automated assessment framework that formulates evaluation as a resource allocation and aggregation problem rather than relying on a single monolithic evaluator.	Guiyao Tie; Tianyao Luo; Xueyang Zhou; Chaoran Hu; Yunhong He; Junran Wu; Yuanfan Yao; Pan Zhou; Lichao Sun;
223	Position: Adversarial ML for LLMs Is Not Making Any Progress Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Today, adversarial ML research has shifted towards studying larger, general-purpose language models. In this position paper, we argue that the situation is now even worse: in the era of LLMs, the field of adversarial ML studies problems that are (1) less clearly defined, (2) harder to solve, and (3) even more challenging to evaluate.	Javier Rando; Jie Zhang; Nicholas Carlini; Florian Tramer;
224	Monitorability As A Free Gift: How RLVR Spontaneously Aligns Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent work has reported that monitorability—the degree to which CoT faithfully and informatively reflects internal computation—can appear as a free gift during the early stages of Reinforcement Learning with Verifiable Rewards (RLVR). We make this observation concrete through a systematic evaluation across model families and training domains.	Zidi Xiong; Shan Chen; Himabindu Lakkaraju;
225	Automatically Finding Reward Model Biases Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce and study the research problem of automatically finding reward model biases in natural language.	Atticus Wang; Iván Arcuschin; Arthur Conmy;
226	Position: Safe AI Should Be Resistant and Resilient in An Evolving World Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress.	Youbang Sun; Xiang Wang; Jie Fu; Chaochao Lu; Bowen Zhou;
227	Imagination Helps Visual Reasoning, But Not Yet in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Consequently, we challenge the necessity of latent reasoning and propose a straightforward alternative named CapImagine, which teaches the model to explicitly imagine* using text.*	You Li; Chi Chen; Yanghao Li; Fanhu Zeng; Kaiyu Huang; Xu Jinan; Maosong Sun;
228	Local Mechanisms of Compositional Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we prove an exact equivalence between a specific compositional structure (conditional projective composition) (Bradley et al., 2025) and scores with sparse dependencies on both pixels and conditioners (local conditional scores).	Arwen Bradley;
229	Shuffle The Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose RoPE-Perturbed Self-Distillation, a training regularizer that improves positional robustness.	Zichong Li; Chen Liang; Liliang Ren; Tuo Zhao; Yelong Shen; Weizhu Chen;
230	Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing datasets rely heavily on costly manual annotations and are typically confined to narrow domains. To address this challenge, we propose Video2GUI, a fully automated framework that extracts grounded GUI interaction trajectories directly from unlabeled Internet videos.	Weimin Xiong; Hao Tian; Shuhao Gu; Bowen Ye; Zihao Yue; Lei Li; Feifan Song; Sujian Li;
231	CPMöbius: Iterative Coach–Player Reasoning for Data-Free Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This dependence renders supervision-heavy training paradigms increasingly unsustainable, with signs of diminishing scalability already evident in practice. To overcome this limitation, we introduce CPMöbius, a collaborative Coach–Player paradigm for data-free reinforcement learning of reasoning models.	Ran Li; Zeyuan Liu; Yinghao Chen; Bingxiang He; Jiarui Yuan; Zixuan Fu; Weize Chen; Jinyi Hu; Chen Qian; Zhiyuan Liu; Maosong Sun;
232	Any-Order GPT As Masked Diffusion Model: Decoupling Formulation and Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show decoder-only MDMs, despite a larger modeling space, can achieve significant inference speedups ($\sim25\times$) and comparable perplexity with techniques like temperature annealing, offering a path to reduced inference compute.	Shuchen Xue; Tianyu Xie; Tianyang Hu; Zijin Feng; Jiacheng Sun; Kenji Kawaguchi; Zhenguo Li; Zhi-Ming Ma;
233	Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we establish a novel theoretical analysis: DDPO is an implicit form of score/flow matching with noisy targets, which increases variance and slows convergence.	Shuchen Xue; Chongjian GE; Shilong Zhang; Yichen Li; Zhi-Ming Ma;
234	Noise As A Natural Regularizer in Markov Decision Processes: Connecting Environmental Stochasticity and Policy Simplicity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we establish a formal connection between environmental stochasticity and planning horizon in MDPs.	Harry Chen; Michal Moshkovitz; Cynthia Rudin; Yiyang Sun; Ron Parr; Lesia Semenova; Zachery Boner;
235	Chain-of-Thought Reasoning In The Wild Is Not Always Faithful Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we show that unfaithful CoT also occurs on naturally worded, non-adversarial prompts without adding artificial biases or editing model outputs.	Iván Arcuschin; Jett Janiak; Robert Krzyzanowski; Senthooran Rajamanoharan; Neel Nanda; Arthur Conmy;
236	Training AI Co-Scientists Using Rubric Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To validate this approach, we conduct a human study for machine learning research goals spanning 225 expert hours.	Shashwat Goel; Rishi Hazra; Dulhan Jayalath; Timon Willi; Parag Jain; Shen; Ilias Leontiadis; Francesco Barbieri; Yoram Bachrach; Jonas Geiping; Chenxi Whitehouse;
237	Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, policy gradient methods with a single global baseline suffer from cross-stratum bias, an apples-to-oranges comparison that distorts credit assignment and impedes exploration. To address this issue, we propose Stratified GRPO.	Mingkang Zhu; Xi Chen; Bei Yu; Hengshuang Zhao; Jiaya Jia;
238	SPA: A Simple But Tough-to-Beat Baseline for Knowledge Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SPA (Scaling Prompt-engineered Augmentation), a simple but tough-to-beat baseline that uses a small set of carefully designed prompts to generate large-scale synthetic data for knowledge injection.	Kexian Tang; Jiani Wang; Shaowen Wang; Kaifeng Lyu;
239	CALM Before The STORM: Unlocking Native Reasoning for Optimization Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To fully leverage LRMs’ inherent reasoning abilities, we propose CALM (Corrective Adaptation with Lightweight Modification), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks.	Zhengyang Tang; Zihan Ye; Chenyu Huang; Xuhan Huang; Chengpeng Li; Sihang Li; Guanhua CHEN; Ming Yan; Zizhuo Wang; Hongyuan Zha; Dayiheng Liu; Benyou Wang;
240	MmBERT: A Modern Multilingual Encoder with Annealed Language Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce mmBERT, an encoder-only language model pretrained on 3T tokens of multilingual text in over 1800 languages.	Marc Marone; Orion Weller; William Fleshman; Eugene Yang; Dawn Lawrie; Benjamin Van Durme;
241	Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce CogRouter, a framework that trains agents to dynamically adapt cognitive depth at each step.	Ruihan Yang; Fanghua Ye; Xiang Wei; Ruoqing Zhao; Kang Luo; Xinbo Xu; Bo Zhao; Ruotian Ma; Shanyi Wang; Zhaopeng Tu; Xiaolong Li; Deqing Yang; Liefeng Bo;
242	GradMem: Learning to Write Context Into Memory with Test-Time Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GradMem, which writes context into memory via per-sample test-time optimization.	Yuri Kuratov; Matvey Kairov; Aydar Bulatov; Ivan Rodkin; Mikhail Burtsev;
243	Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation Via Tie Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Preference learning methods like Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today’s language models and potentially severe goal misgeneralization in future systems. In this work, we provide a unified theoretical analysis of this phenomenon, characterizing the mechanisms of spurious learning, its consequences on deployment, and a provable mitigation strategy.	Christian Moya; Alex Semendinger; Guang Lin; Elliott Thornley;
244	Attn-QAT: 4-Bit Attention With Quantization-Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify two key principles for stable FP4 attention: (1) matching low-precision recomputation of attention scores in the backward pass and (2) resolving implicit precision assumptions in FA’s gradient calculation. Based on these insights, we propose Attn-QAT and implement fused Triton kernels for training plus FP4 inference kernels.	Peiyuan Zhang; Matthew Noto; Wenxuan Tan; Chengquan Jiang; Will Lin; Wei Zhou; Hao Zhang;
245	Improving Video Sparse Attention with Fine-grained Router and Sparse Rebasing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present VSA2, a frontier trainable sparse attention for video DiTs.	Peiyuan Zhang; Guoqiang Wei; Yilong Zhao; Zixiang Zhang; Wei Zhou; Will Lin; Heng Zhang; Xiaonan Nie; Yan Zeng; Hao Zhang;
246	PRISM: Demystifying Retention and Interaction in Mid-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present $\textbf{PRISM}$ (Demystifying Retention and Interaction in Mid-Training), a holistic empirical study that analyzes mid-training design choices, what to evaluate, and how domain mixtures and training stages interact across model families.	Bharat Runwal; Ashish Agrawal; Anurag Roy; Rameswar Panda;
247	You Don’t Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Guidance Using Attractive-Repulsive Dynamics (GUARD), a novel framework for memorization mitigation in text-to-image diffusion models.	Kairan Zhao; Eleni Triantafillou; Peter Triantafillou;
248	DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that existing Weak-to-Strong Generalization (W2SG) theories lack prescriptive guidelines for selecting high-quality training signals from noisy data. To bridge this gap, we introduce the Dual-Consensus Weak-to-Strong (DC-W2S) framework.	Chi-Min Chan; Ehsan Hajiramezanali; Xiner Li; Edward De Brouwer; Carl Edwards; Wei Xue; Sirui Han; Yike Guo; Gabriele Scalia;
249	From Diagrams to Code: Multilingual Programming with Visual Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the development of such systems is severely hindered by the lack of large-scale multimodal training data and evaluation benchmarks. To address these limitations, we present M2C-INSTRUCT, a comprehensive multilingual multimodal instruction-tuning dataset containing over 13.1M samples across 50+ programming languages, designed for visual understanding and diagram interpretation in code generation tasks.	Linzheng Chai; Jian Yang; Shukai Liu; Wei Zhang; Liran WANG; JinKe; Tao Sun; Congnan Liu; Chenchen Zhang; Hualei Zhu; Jiaheng Liu; Xianjie Wu; Ge Zhang; Tianyu Liu; Zhoujun Li;
250	InnoEval: On Research Idea Evaluation As A Knowledge-Grounded, Multi-Perspective Reasoning Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we regard idea evaluation as a knowledge-grounded, multi-perspective reasoning problem and introduce InnoEval, a deep innovation evaluation framework designed to emulate human-level idea assessment.	Shuofei Qiao; Yunxiang Wei; Xuehai Wang; Bin Wu; Boyang XUE; Ningyu Zhang; Hossein A. Rahmani; Wang Yanshan; Qiang Zhang; Keyan Ding; Jeff Pan; Huajun Chen; Emine Yilmaz;
251	Teaching Agents to Ask Effective Clarification Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Using Shapley attribution and distributional comparisons, we identify two learnable properties of effective clarification: task relevance (which information impacts success) and user answerability (what users can realistically provide).	Sanidhya Vijayvargiya; Vijay Viswanathan; Graham Neubig;
252	Dismantling The Illusion of Vision-Language-Action Models Competence Via Explicit Distributional Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current evaluation protocols often incentivize mechanical memorization rather than robust policy learning, leading to a paradoxical duality of failure: high-scoring models exhibit spurious invariance to semantic changes while simultaneously displaying extreme brittleness to trivial environmental perturbations. To address this, we introduce LIBERO-Gen, a diagnostic benchmark systematically designed to shift evaluation from intuition-driven heuristics to explicit distributional assumptions.	Xueyang Zhou; Yangming Xu; Guiyao Tie; Yongchao Chen; Chaoran Hu; Bo Tao; xingwei zhao; Xiang Xiang; Pan Zhou; Lichao Sun;
253	Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we systematically study reinforcement learning (RL) for kernel generation.	Wei Liu; Jiawei Xu; Yingru Li; Longtao Zheng; Tianjian Li; Qian Liu; Junxian He;
254	When Does Sparsity Mitigate The Curse of Depth in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we demonstrate that, sparsity, beyond enabling efficiency, acts as a regulator of variance propagation and thereby improves depth utilization.	Dilxat Muhtar; Xinyuan Song; Sebastian Pokutta; Max Zimmer; Nico Pelleriti; Thomas Hofmann; Shiwei Liu;
255	Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this analysis, we introduce Verbalized Sampling (VS), a simple, training-free prompting strategy to circumvent mode collapse.	Jiayi Zhang; Simon Yu; Derek Chong; Anthony Sicilia; Michael Tomz; Christopher Manning; Weiyan Shi;
256	Autoregressive Boltzmann Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, modern BGs predominantly rely on Normalizing Flows (NFs), which either suffer from limited expressivity due to strict invertibility constraints (discrete time) or computationally expensive likelihoods (continuous time). In this paper, we propose Autoregressive Boltzmann Generators (ArBG), a novel autoregressive modelling framework that overcomes these limitations by departing from the flow-based BG paradigm.	Danyal Rehman; Charlie Tan; Yoshua Bengio; Joey Bose; Alexander Tong;
257	ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While a sequential SFT $\rightarrow$ RLVR pipeline can be used, it introduces significant computational overhead and suffers from catastrophic forgetting. To address these limitations, we propose ViSurf (\textbf{Vi}sual \textbf{Su}pervised-and-\textbf{R}einforcement \textbf{F}ine-Tuning), a unified, single-stage paradigm that integrates the strengths of both SFT and RLVR.	Yuqi Liu; Liangyu Chen; Jiazhen Liu; Mingkang Zhu; Zhisheng Zhong; Bei Yu; Jiaya Jia;
258	Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To systematically scale safety testing in multi-turn, tool-realistic settings, we propose a principled taxonomy that transforms single-turn harmful tasks into multi-turn attack sequences.	Xu Li; Simon Yu; Minzhou Pan; Yiyou Sun; Bo Li; Dawn Song; Xue Lin; Weiyan Shi;
259	On The Origin of Neural Scaling Laws: from Random Graphs to Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we study scaling laws for transformers trained to predict random walks on graphs with tunable complexity.	Maissam Barkeshli; Alberto Alfarano; Andrey Gromov;
260	Reinforcement Learning with Verifiable Rewards: GRPO’s Loss, Dynamics, and Success Amplification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Group Relative Policy Optimization (GRPO) was introduced recently and used to train DeepSeek\textendash R1 for promoting reasoning in LLMs under verifiable (binary) rewards. We show that the mean{+}variance calibration of these rewards induces a contrastive loss in which the contrastive samples are synthetic data drawn from the previous policy.	Youssef Mroueh;
261	OnePO: Direct One-stage Policy Optimization for SFT-free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: * We argue that pre-SFT is inherently problematic: (1) it indiscriminately reinforces knowledge and behaviors from references regardless of whether the LLM has already acquired them, leading to distribution contraction that constrains subsequent exploration; (2) it introduces substantial overhead in multi-stage training and data curation.	Junying Chen; Xinyuan Xie; Ziniu Li; Benyou Wang;
262	Code2Video: A Code-centric Paradigm for Educational Video Creation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Code2Video, a code-centric agent framework that generates educational videos by writing executable Python programs.	Yanzhe Chen; Kevin Qinghong Lin; Mike Zheng Shou;
263	Escaping The Diversity Trap in Robotic Manipulation Via Anchor-Centric Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify a critical diversity trap: the standard heuristic of “maximizing coverage by collecting diverse, single-shot demonstrations can be self-defeating due to non-vanishing estimation noise.	Yanzhe Chen; Kevin Yuchen; Qi Lv; Lin Yiqi; Zechen Bai; Chen Gao; Mike Zheng Shou;
264	GenShield: Unified Detection and Artifact Correction for AI-Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, few existing work has established the connection between AIGI detection and artifact correction. To fill this gap, we propose GenShield, a unified autoregressive framework that jointly performs explainable AIGI detection and controllable artifact correction in a closed loop from diagnosis to restoration, revealing a mutually reinforcing relationship between these two tasks.	Zhipei Xu; Xuanyu Zhang; Youmin Xu; Qing Huang; Shen Chen; Taiping Yao; Shouhong Ding; Jian Zhang;
265	RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or maintain global awareness under long contexts, often leading to local optima, redundant exploration, and inefficient search. We propose Re-TRAC, an agentic framework that performs cross-trajectory exploration by generating a structured state representation after each trajectory to summarize evidence, uncertainties, failures, and future plans, and conditioning subsequent trajectories on this state representation.	jialiang zhu; Gongrui Zhang; Xiaolong Ma; Lin Xu; Miaosen Zhang; Ruiqi Yang; Song Wang; Kai Qiu; Zhirong Wu; Qi Dai; Ruichun Ma; Bei Liu; Yifan Yang; Chong Luo; Zhengyuan Yang; Linjie Li; Lijuan Wang; Weizhu Chen; Xin Geng; Baining Guo;
266	Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, harnessing this flexibility for fully non-autoregressive decoding remains an open question, particularly for reasoning and planning tasks. In this work, we investigate non-autoregressive decoding in dLLMs by systematically analyzing its inference dynamics along the temporal axis.	Jiyeon Kim; Sungik Choi; Yongrae Jo; Moontae Lee; Minjoon Seo;
267	Reasoning Models Struggle to Control Their Chains of Thought Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This capability — CoT controllability — is undesirable because it could allow models to suppress signs of misbehavior in their CoT, thereby undermining our ability to monitor them. To measure this, we introduce the \emph{CoT-Control} evaluation suite.	Chen Yueh-Han; Robert McCarthy; Bruce W. Lee; He He; Micah Carroll; Tomasz Korbak;
268	MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing multi-turn benchmarks are limited in size or rely heavily on templates, which restrict their diversity. To address this gap, we unify a wide range of harmful jailbreak intents, and introduce an active learning pipeline for expanding high-quality multi-turn adversarial prompts, where a generator is iteratively fine-tuned to produce stronger attack candidates, guided by uncertainty-based refinement.	Jialin Song; Xiaodong Liu; Weiwei Yang; Wuyang Chen; Mingqian Feng; Xuekai Zhu; Jianfeng Gao;
269	LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent advances in video diffusion models have significantly improved visual quality, yet ultra-high-resolution (UHR) video generation remains a formidable challenge due to the compounded difficulties of motion modeling, semantic planning, and detail synthesis. To address these limitations, we propose \textbf{LUVE}, a \textbf{L}atent-cascaded \textbf{U}HR \textbf{V}ideo generation framework built upon dual frequency \textbf{E}xperts.	Chen Zhao; Jiawei Chen; Hongyu Li; Zhuoliang Kang; Shilin Lu; Xiaoming Wei; Kai Zhang; Jian Yang; Ying Tai;
270	More Sail Than Ballast: Addressing Harmful Knowledge Leakage in The Expansive Reasoning Space of LRMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Experiments on our benchmark show that it is a common issue across current LRMs due to their strong multi-step reasoning capabilities. To address this issue, we propose placing LLMs in our synthesized open-ended environments, allowing them to self-search for a safety reasoning pattern to respond responsibly and helpfully.	Qibing Ren; Xinhao Song; Ke Fan; Lijun Li; Zhanpeng Zhou; Gongshen Liu; Junchi Yan; Lizhuang Ma; Jing Shao;
271	On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While RL-tuned VLMs can improve visual reasoning benchmark performance, they can still suffer from weak visual grounding, hallucinations, and over-reliance on textual cues. We show that simple, controlled textual perturbations—misleading captions or incorrect chain-of-thought (CoT) traces—cause substantial drops in robustness and confidence, and that these effects are more pronounced when CoT consistency is taken into account across open-source multimodal reasoning models.	Rosie Zhao; Anshul Shah; Xiaoyu Zhu; Xinke Deng; Zhongyu Jiang; Yang Yang; Joerg Liebelt; Arnab Kumar Mondal;
272	Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite rapid progress, KD-based unlearning struggles with biased deletion due to suppressing specific token sequences as a substitute for complete knowledge removal, whereas DR-based unlearning risks the re-emergence of harmful knowledge because the underlying knowledge remains intact. To address these issues, we propose Distinguishable Deletion ($\mathrm{D^2}$), a paradigm that restricts the response distribution in the latent space rather than specific tokens to erase undesirable knowledge, while distinguishing it from retained knowledge, enabling a refusal mechanism to handle unlearned inputs safely and coherently.	Puning Yang; Junchi Yu; Qizhou Wang; Phil Torr; Bo Han; Xiuying Chen;
273	Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose \textbf{DiNa-LRM}, a \textbf{di}ffusion-\textbf{na}tive \textbf{l}atent \textbf{r}eward \textbf{m}odel that formulates preference learning directly on noisy diffusion states.	Gongye Liu; Bo Yang; Zhi Yida; Zhizhou Zhong; Lei Ke; Didan Deng; Han Gao; Yongxiang Huang; Kaihao Zhang; Hongbo Fu; Wenhan Luo;
274	FoeGlass: When Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing dataset development strategies face two challenges: (i) manual collection, and (ii) inefficient discovery of blind spots in the ADD models. To address these challenges, we propose FoeGlass, the first black-box automated red-teaming method for ADDs, which effectively discovers ADD failure modes in the space of generated audio underexplored by state-of-the-art deepfake benchmarks.	Sepehr Dehdashtian; Jacob Seidman; Vishnu Boddeti; Gaurav Bharaj;
275	PACER: Acyclic Causal Discovery from Large-scale Interventional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PACER (Perturbation-driven Acyclic Causal Edge Recovery), a scalable framework for causal discovery that guarantees acyclicity by construction.	Ramon Viñas Torné; Sílvia Salazar; Soyon Park; Ivo Ban; Artyom Gadetsky; Nikita Doikov; Maria Brbic;
276	Scaling Law for Quantization-Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes a unified scaling law for QAT that models quantization error as a function of model size, training data volume, and quantization group size.	Mengzhao Chen; Chaoyi Zhang; Jing Liu; Zeng; Zeyue Xue; Zhiheng Liu; Yunshui Li; Jin Ma; Jie Huang; zhou Xun; Ping Luo;
277	INT Vs. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We reveal a critical performance crossover: while FP excels in coarse-grained quantization, INT consistently surpasses it as the quantization block size shrinks.	Mengzhao Chen; Meng Wu; Hui Jin; Zhihang Yuan; Jing Liu; Chaoyi Zhang; Yunshui Li; Jie Huang; Jin Ma; Zeyue Xue; Zhiheng Liu; Xingyan Bin; Ping Luo;
278	VLANeXt: Recipes for Building Strong VLA Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: From this study, we distill 12 key findings that together form a practical recipe for building strong VLA models.	Xiao-Ming Wu; Bin Fan; Kang Liao; Jian-Jian Jiang; Runze Yang; Yihang Luo; Zhonghua Wu; Wei-Shi Zheng; Chen Change Loy;
279	XLSTM Distillation: Achieving Teacher-Student Parity Through Efficient Hybrid Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an additional merging stage, where individually linearized experts are combined into a single model.	Lukas Hauzenberger; Niklas Schmidinger; Thomas Schmied; Anamaria-Roberta Hartl; David Stap; Pieter-Jan Hoedt; Sebastian Böck; Günter Klambauer; Sepp Hochreiter;
280	STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents STARCaster, an identity-aware spatio-temporal video diffusion model that addresses both speech-driven portrait animation and free-viewpoint talking portrait synthesis, given an identity embedding or reference image, within a unified framework.	Foivos Paraperas Papantoniou; Stathis Galanakis; Rolandos Alexandros Potamias; Bernhard Kainz; Stefanos Zafeiriou;
281	SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work introduces SoftJAX* and SoftTorch, open-source, feature-complete libraries for soft differentiable programming.*	Anselm Paulus; Andreas René Geist; Vit Musil; Sebastian Hoffmann; Georg Martius;
282	MOOSE-Star: Unlocking Tractable Training for Scientific Discovery By Breaking The Complexity Barrier Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We demonstrate that directly training $P(h\|b)$ is mathematically intractable due to the combinatorial complexity ($O(N^k)$) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework enabling tractable training and scalable inference.	Zonglin Yang; Lidong Bing;
283	How Can Embedding Models Bind Concepts? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although CLIP behaves like a bag-of-concepts model in cross-modal retrieval, object information is recoverable from its image and text embeddings separately. We study this tension through the binding function, which maps concepts to scene embeddings.	Arnas Uselis; Darina Koishigarina; Seong Joon Oh;
284	VJEPA: Variational Joint Embedding Predictive Architectures As Probabilistic World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \emph{Variational JEPA (VJEPA)}, a probabilistic generalization that learns a predictive distribution over future latent states via a variational objective.	Yongchao Huang;
285	Necessary Conditions for Compositional Generalization of Embedding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Modern models are trained on massive datasets, yet these are vanishingly small compared to the full combinatorial space of possible data, raising the question of whether models can reliably generalize to unseen combinations. To formalize what this requires, we propose a set of practically motivated desiderata that any compositionally generalizing system must satisfy, and analyze their implications under standard training with linear classification heads.	Arnas Uselis; Andrea Dittadi; Seong Joon Oh;
286	PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce PhoStream, the first mobile-centric streaming benchmark that unifies on-screen and off-screen scenarios to evaluate video, audio, and temporal reasoning.	Xudong LU; Guan Huankang; Yang Bo; Jinpeng Chen; Xintong Guo; Shuhan LI; Fang Liu; Peiwen Sun; Xueying Lee; Wei Zhang; Xue Yang; Rui Liu; Hongsheng Li;
287	Learning A Generative Meta-Model of LLM Activations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We develop a generative approach that models activations with diffusion, that makes minimal assumptions and improves with data and model scale.	Grace Luo; Jiahai Feng; Trevor Darrell; Alec Radford; Jacob Steinhardt;
288	Controlled LLM Training on Spectral Sphere Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Maximal Update Parametrization ($\boldsymbol{\mu}$P) provides a theoretical safeguard for width-invariant $\Theta(1)$ activation control, whereas emerging optimizers like Muon are only half-aligned with these constraints: they control updates but allow weights to drift. To address this limitation, we introduce the Spectral Sphere Optimizer (SSO), which enforces strict module-wise spectral constraints on both weights and their updates.	Tian Xie; Haoming Luo; Haoyu Tang; Hu Yiwen; Jason Liu; Qingnan Ren; Yang Wang; Xin Zhao; Rui Yan; Bing Su; Chong Luo; Baining Guo;
289	Deep Ensemble Clustering for Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing clustering-based backbones typically rely on a single clustering algorithm, whose inherent inductive bias limits their representational capacity. To address this, we propose EnFormer, which embeds ensemble clustering as a core component of feature extraction.	Yuwei Wang; Guikun Chen; Xiruo Jiang; Yazhou Yao; Di Liu; Xiangbo Shu; Fumin Shen; Wenguan Wang;
290	Combinatorial Sparse PCA Beyond The Spiked Identity Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We demonstrate explicit counterexample covariances $\\mathbf{\Sigma}$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method for sparse PCA that provably succeeds for general $\\mathbf{\Sigma}$ using $\\mathsf{poly}(s, \log(d))$ samples and $d^2 \cdot \\mathsf{poly}(s, \log(d))$ time, by providing a global convergence guarantee on the truncated power method of Yuan and Zhang (JMLR, 2013).	Peiyuan Zhang; Syamantak Kumar; Kevin Tian; Purnamrita Sarkar;
291	Q-Flow: Stable and Expressive Reinforcement Learning with Flow-based Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches typically address this issue by restricting the expressive capacity of flow-based policies, resulting in a trade-off between optimization stability and representational flexibility. To resolve this, we introduce Q-Flow, a framework that leverages the deterministic nature of flow dynamics to explicitly propagate terminal trajectory value to intermediate latent states along the policy-induced flow.	JaeHyeok Doo; Byeongguk Jeon; Seonghyeon Ye; Kimin Lee; Minjoon Seo;
292	Transolver-3: Scaling Up Transformer Solvers to Industrial-Scale Geometries Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Transolver-3, a new member of the Transolver family as a highly scalable framework designed for high-fidelity physics simulations.	Hang Zhou; Haixu Wu; Haonan Shangguan; Yuezhou Ma; Huikun Weng; Jianmin Wang; Mingsheng Long;
293	Position: Agentic AI Systems Should Be Making Bayes-consistent Decisions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this position paper argues that the control layer of an agentic AI system (that orchestrates LLMs and tools) is a clear case where Bayesian principles should shine. Bayesian decision theory provides a framework for agentic systems that can help to maintain beliefs over task-relevant latent quantities, to update these beliefs from observed agentic and human-AI interactions, and to choose actions.	Theodore Papamarkou; Pierre Alquier; Matthias Bauer; Wray Buntine; Andrew Davison; Gintare Karolina Dziugaite; Maurizio Filippone; Andrew Y. K. Foong; Vincent Fortuin; Dimitris Fouskakis; Jes Frellsen; Eyke Hüllermeier; Theofanis Karaletsos; Mohammad Emtiyaz Khan; Nikita Kotelevskii; Salem Lahlou; Yingzhen Li; Fang Liu; Clare Lyle; Thomas Moellenhoff; Konstantina Palla; Maxim Panov; Yusuf Sale; Kajetan Schweighofer; Artem Shelmanov; Siddharth Swaroop; Martin Trapp; Willem Waegeman; Andrew Wilson; Alexey Zaytsev;
294	CSD: Content-aware Speculative Decoding for Efficient Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel content-aware speculative decoding algorithm, termed CSD, which integrates an entropy-based probability relaxation mechanism with an optimal resampling strategy to enhance the inference efficiency for autoregressive image generation.	Mingcheng Wang; junbo qiao; Yunchen Li; Lingfu Jiang; Wei Li; Jie Hu; Jiao Xie; Zhou Yu; Xinghao Chen; Guixu Zhang; Shaohui Lin;
295	Scaling Prompt Synthesis for Large Language Model Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: PromptCoT showed that injecting rationales into prompt synthesis increases problem difficulty. Building on this, we present PromptScale, a scalable framework that replaces hand-crafted heuristics with an expectation-maximization (EM) loop, where rationales are iteratively refined to guide prompt construction.	Xueliang Zhao; Wei Wu; Jian Guan; Zhuocheng Gong; Lingpeng Kong;
296	Understanding Dynamic Compute Allocation in Recurrent Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Token-level adaptive computation seeks to reduce inference cost by allocating more computation to harder tokens and less to easier ones. However, prior work is primarily evaluated …	Ibraheem Muhammad Moosa; Suhas Lohit; Ye Wang; Moitreya Chatterjee; Wenpeng Yin;
297	Flash-GRPO: Efficient Alignment for Video Diffusion Via One-Step Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency.	Xiaoxuan He; Siming Fu; Zeyue Xue; Weijie Wang; Ruizhe He; Yuming Li; Dacheng Yin; Shuai Dong; Haoyang Huang; Hongfa Wang; Nan Duan; Bohan Zhuang;
298	Position: Explainability Research Must Prioritize Foundations Over Ad-hoc Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In practice, they are often generated and discarded without guiding meaningful action. This gap reflects foundational shortcomings: research has not yet established methodologies for integrating explanations into end-to-end, human-in-the-loop systems.	Michal Moshkovitz; Suraj Srinivas; Lesia Semenova; Nave Frost; Cyrus Rashtchian; Valentyn Boreiko; Shichang Zhang; Himabindu Lakkaraju; Cynthia Rudin; Jennifer Wortman Vaughan;
299	Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Latent Reasoning VLA (LaRA-VLA), a unified VLA framework that internalizes multi-modal CoT reasoning into continuous latent representations for embodied action.	Shuanghao Bai; Jing Lyu; Wanqi Zhou; Zhe Li; Dakai Wang; Lei Xing; Xiaoguang Zhao; Pengwei Wang; Zhongyuan Wang; Cheng Chi; Badong Chen; Shanghang Zhang;
300	DLO-Lab: Benchmarking Deformable Linear Object Manipulations with Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, existing simulation environments offer limited support for the broad spectrum of material behaviors necessary for generalizable DLO manipulation. To overcome these limitations, we introduce a differentiable simulator explicitly designed for versatile DLO manipulation.	Junyi Cao; Yian Wang; Ziyan Xiong; Chunru Lin; Zhehuan Chen; Chuang Gan;
301	Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax Via Adversarial Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Trojan-Speak, an adversarial fine-tuning method that bypasses Anthropic’s Constitutional Classifiers.	Bilgehan Sel; Xuanli He; Alwin Peng; Ming Jin; Jerry Wei;
302	MUSE: Resolving Manifold Misalignment in Visual Tokenization Via Topological Orthogonality Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify the root cause as Manifold Misalignment, where naive joint optimization leads to conflicting gradients that force a zero-sum game between these two objectives. In this paper, we propose MUSE, a framework that resolves this deadlock via Topological Orthogonality.	Panqi Yang; Haodong Jing; Jiahao Chao; Tingyan Xiang; Li Lin; Yao Hu; Yang Luo; Yongqiang Ma;
303	3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key paradigm for unlocking complex reasoning in Large Language Models (LLMs), yet its potential in 3D scene understanding remains untapped. To bridge this gap, we present Reinforcement Fine-Tuning for Video-based 3D Scene Understanding (3D-RFT), the first framework to extend RLVR to 3D perception and reasoning.	Xiongkun Linghu; Jiangyong Huang; Baoxiong Jia; Siyuan Huang;
304	Uncovering Hidden Triggers: Backdoor Attribution in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Previous research on interpretability for LLM safety tends to focus on alignment, jailbreak, and hallucination, but overlooks backdoor mechanisms, making it difficult to understand and fully eliminate the backdoor threat. In this paper, aiming to bridge this gap, we explore the interpretable mechanisms of LLM backdoors through Backdoor Attribution (BkdAttr), a tripartite causal analysis framework.	Miao Yu; Zhenhong Zhou; Moayad Aloqaily; Kun Wang; Biwei Huang; Stephen Wang; Yueming Jin; Qingsong Wen;
305	SafeSeek: Universal Attribution of Safety Circuits in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing safety attribution methods struggle with generalization and reliability due to their reliance on heuristic, domain-specific metrics and search algorithms. To address this, we propose SafeSeek, a unified safety interpretability framework that identifies functionally complete safety circuits in LLMs via optimization.	Miao Yu; Siyuan Fu; Moayad Aloqaily; Zhenhong Zhou; Safa Otoum; Xing fan; Kun Wang; Yufei Guo; Qingsong Wen;
306	Induction Heads Interpolate N-Grams Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study transformers trained on order-$k$ Markov chains and prove that a two-layer disentangled transformer implements a soft context-matching estimator that aggregates contributions from all partial context matches, weighted exponentially by their degree of overlap.	Francesco D'Angelo; Oğuz Yüksel; Swathi Narashiman; Nicolas Flammarion;
307	Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These findings imply that the benefit of each update depends strongly on both question difficulty and the model’s current competence. Motivated by this, we propose Confidence and Difficulty-adaptive Policy Optimization (CoDaPO), which assigns each question a bounded value from rollout confidence and empirical difficulty, then uses it to reweight policy updates and resample high-value questions within minibatches to increase discovery under a fixed compute budget.	Zhanke Zhou; Xiangyu Lu; Chentao Cao; Brando Miranda; Tongliang Liu; Bo Han; Sanmi Koyejo;
308	Curating The Future: A Scalable Recipe for Training Open-Ended Forecasters Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we train language models to make predictions on open-ended forecasting questions.	Nikhil Chandak; Shashwat Goel; Ameya Pandurang Prabhu; Moritz Hardt; Jonas Geiping;
309	DSGym: A Standardized and Holistic Framework for Advancing Data Science Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In particular, we show that a substantial portion of tasks in current benchmarks can be solved without using the actual data. To address these limitations, we introduce DSGym, a standardized framework for evaluating and training data science agents in self-contained execution environments.	Fan Nie; Junlin Wang; Harper Hua; Federico Bianchi; Yongchan Kwon; Zhenting Qi; Owen Queen; Shang Zhu; James Zou;
310	Building Better Deception Probes Using Targeted Instruction Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify the importance of the instruction pair used during training.	Vikram Natarajan; Devina Jain; Shivam Arora; Satvik Golechha; Joseph Bloom;
311	Diffusion-based Learning Framework for Constrained Nonconvex Optimization with Weighted Bootstrapped Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In that case, we investigate and theoretically analyze the inherent problem of supervised diffusion solvers and identify the distributional misalignment problem, i.e., the generated solution distribution often exhibits low probability mass on the feasible region. To resolve this issue, we propose DiOpt, a new diffusion-based learning framework for constrained nonconvex optimization, which effectively learns the mapping from noise to the constraint region.	Shutong Ding; Yimiao Zhou; Ke Hu; Xi Yao; Junchi Yan; Xiaoying Tang; Ye Shi;
312	Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Another branch pay attention to gradient-based policy optimization, which sufficiently exploit the gradient of Q function yet tend to collapse into a unimodal policy with low diversity. To address this issue, we propose CGPO, \textbf{C}ritic-\textbf{G}uided diffusion \textbf{P}olicy \textbf{O}ptimization, which effectively balances exploration and exploitation with the training-free guidance technique integrated into the denoising process of diffusion policy.	Shutong Ding; Zejia Zhong; Zhongyi Wang; Ke Hu; Bikang Pan; Jingya Wang; Ye Shi;
313	Edit-Based Refinement for Parallel Masked Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose ME-DLM, an edit-based refinement framework that augments diffusion generation with a lightweight post-generation editing step.	Houxing Ren; Mingjie Zhan; Zimu Lu; Ke Wang; Yunqiao Yang; Haotian Hou; Junting Pan; Hongsheng Li;
314	PRIM：Cooperative Dynamic Token Compression for Efficient Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present PRIM, an inference framework for efficient multimodal reasoning that systematically compresses audio-visual representations based on attention dynamics and instruction relevance.	Song Li; yongping xiong;
315	Understanding Reasoning Collapse in LLM Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We thus provide a signal-to-noise ratio explanation for why $I(X; Z)$ drops: when within-input reward variance $\mathrm{Var}(R \mid X)$ is low, task gradients weaken and input-agnostic regularizers (KL, entropy) dominate, flattening cross-input differences.	Zihan (Zenus) Wang; Chi Gui; Xing Jin; Qineng Wang; Licheng Liu; Kangrui Wang; Shiqi Chen; Linjie Li; Zhengyuan Yang; Pingyue Zhang; Yiping Lu; Jiajun Wu; Li Fei-Fei; Lijuan Wang; Yejin Choi; Manling Li;
316	RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present RoboTwin 2.0, a scalable simulation framework that enables closed-loop, automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation.	Tianxing Chen; Zanxin Chen; Baijun Chen; Zijian Cai; Yibin Liu; Zixuan Li; Qiwei Liang; Xianliang Lin; Yiheng Ge; Zhenyu Gu; Weiliang Deng; Yubin Guo; Tian Nian; Xuanbing Xie; Qiangyu Chen; KailunSu; Tianling Xu; Guodong Liu; Mengkang Hu; Huan-ang Gao; Kaixuan Wang; Zhixuan Liang; Yusen Qin; Xiaokang Yang; Ping Luo; Yao Mu;
317	Sample Efficient Full-Finetuning of Generative Control Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Off-policy Generative Policy Optimization (\OGPO{}), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward.	Sarvesh Patil; Mitsuhiko Nakamoto; Shashwat Saxena; Manan Agarwal; Giri Anantharaman; Cleah Winston; Jesse Zhang; Chaoyi Pan; Douglas Chen; Nai-Chieh Huang; Zeynep Temel; Oliver Kroemer; Hongkai Dai; Sergey Levine; Abhishek Gupta; Paarth Shah; Max Simchowitz;
318	Proxy Compression for Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work introduces proxy compression, an alternative training scheme that preserves the efficiency benefits of compressed inputs while providing an end-to-end, raw-byte interface at inference time.	Lin Zheng; Li Xinyu; Qian Liu; Xiachong Feng; Lingpeng Kong;
319	Reinforcement Learning for Non-Verifiable Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Tournament Style RL (TSRL), which constructs rewards from rubric-guided pairwise judgments against a fixed set of anchor responses, using win-rate as the reward for policy optimization.	Gurusha Juneja; Shubham Phal; Jennifer She; Lisa Wang; Dorsa Sadigh; Anca Dragan; William Wang;
320	Differentially Private Synthetic Tabular Data Via Private Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Tab-PE — an algorithm for synthetic tabular data generation under DP constraints.	Toan Tran; Arturs Backurs; Zinan Lin; Victor Reis; Li Xiong; Sergey Yekhanin;
321	Adversarial Training for Process Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their widespread adoption is limited due to expensive manual step-level annotation and poor generalization of static training data to novel errors. We introduce Adversarially Trained PRMs (APRM), where a Generator ($G$) learns to produce reasoning errors to deceive a PRM ($R$), while $R$ concurrently learns to detect them.	Gurusha Juneja; Deepak Nathani; William Wang;
322	ThetaEvolve: Test-time Learning on Open Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ThetaEvolve, an open-source framework that simplifies and extends AlphaEvolve to efficiently scale both in-context learning and Reinforcement Learning (RL) at test time, allowing models to continually learn from their experiences in improving open optimization problems.	Yiping Wang; Shao-Rong Su; Zhiyuan Zeng; Eva Xu; Liliang Ren; Xinyu Yang; Zeyi Huang; Xuehai He; Luyao Ma; Baolin Peng; Hao Cheng; Pengcheng He; Weizhu Chen; Shuohang Wang; Simon Du; Yelong Shen;
323	AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, since only data from the current task is available, incremental updates can bias both attribute extraction and aggregation toward new classes, leading to catastrophic forgetting. Therefore, we propose AREA for attribute extraction and aggregation for CLIP-based CIL.	Zhenhao Wen; Yu-Cheng Shi; Da-Wei Zhou;
324	SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such failure reflects two problems: router drift, where expert selection becomes inconsistent over time, and expert drift, where shared experts are overwritten across tasks. Therefore, we propose StAbilized Mixture-of-Experts (SAME) for MCIT.	Zhenhao Wen; Jun-Tao Tang; Yu-Cheng Shi; Han-Jia Ye; De-Chuan Zhan; Da-Wei Zhou;
325	MemEvolve: Meta-Evolution of Agent Memory Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this paradigm is fundamentally constrained by the \textit{staticity} of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents’ experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it.	Guibin Zhang; Haotian Ren; Chong Zhan; Junhao Wang; He Zhu; Wangchunshu Zhou; Shuicheng YAN;
326	Reasoning About Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We address a fundamental theoretical question: how many* reasoning tokens are required to solve a problem as input size grows?*	Kiran Tomlinson; Tobias Schnabel; Adith Swaminathan; Jennifer Neville;
327	Who’s in Charge? Disempowerment Patterns in Real-World LLM Usage Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present the first large-scale empirical analysis of disempowerment patterns in real-world AI assistant interactions, analyzing 1.5 million consumer Claude.ai conversations using a privacy-preserving approach.	Mrinank Sharma; Miles McCain; Raymond Douglas; David Duvenaud;
328	Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, scaling up RL is bottlenecked by limited existing verifiable data, where improvements increasingly saturate over prolonged training. To overcome this, we propose Golden Goose, a simple trick to synthesize unlimited RLVR tasks from unverifiable internet text by constructing a multiple-choice question-answering version of the fill-in-the-middle task.	Ximing Lu; David Acuna; Jaehun Jung; Jian Hu; Di Zhang; Shizhe Diao; Yunheng Zou; Shaokun Zhang; Brandon Cui; Mingjie Liu; Hyunwoo Kim; Prithviraj Ammanabrolu; Jan Kautz; Yi Dong; Yejin Choi;
329	Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, most existing T2I DMs, even those equipped with large language model (LLM)-based text encoders, remain text-pixel mappers — they employ LLMs merely as text encoders, without leveraging their inherent reasoning capabilities to infer what should be visually depicted given the textual prompt. To move beyond such literal generation, we propose the think-then-generate (T2G) paradigm, where the LLM-based text encoder is encouraged to reason about and rewrite raw user prompts; the states of the rewritten prompts then serve as diffusion conditioning.	Siqi Kou; Jiachun Jin; Zetong Zhou; Ye Ma; Yugang Wang; Quan Chen; Peng Jiang; Xiao Yang; Jun Zhu; Kai Yu; Zhijie Deng;
330	Convergent World Representations and Divergent Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While neural representations are central to modern deep learning, the conditions governing their geometry and their roles in downstream adaptability remain poorly understood. We develop a framework clearly separating the underlying world, the data generation process and the resulting model representations to study these questions in a controlled setup: 5,075 city coordinates define the world and 7 geometric tasks generate the training data for autoregressive Transformer training.	Core Francisco Park;
331	TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unlike existing benchmarks, TFRBench provides a protocol for evaluating the reasoning generated by forecasting systems–specifically their analysis of cross-channel dependencies, trends, and external events. To enable this, we propose a systematic multi-agent framework that utilizes an iterative verification loop to synthesize numerically grounded reasoning traces.	Atik Ahamed; Mihir Parmar; Palash Goyal; Yiwen Song; Long Le; Qiang (Shaun) Cheng; Chun-Liang Li; Hamid Palangi; Jinsung Yoon; Tomas Pfister;
332	Towards One-to-Many Temporal Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Previous state-of-the-art MLLMs, optimized for one-to-one settings, struggle in this context, often yielding near-zero scores due to a lack of event cardinality perception. To bridge this gap, we present a systematic solution with three key contributions. First, we establish the first comprehensive OMTG benchmark, introducing Count Accuracy (C-Acc) and Effective Temporal F1 (EtF1) as evaluation metrics.	Qi Xu; Tan Yue; Shihao Chen; Jiahao Meng; Anran Wang; Shunping Ji; Hao Fei; Xiangtai Li;
333	Multi-Agent Teams Hold Experts Back Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member.	Aneesh Pappu; Batu El; Hancheng Cao; Carmelo di Nolfo; Yanchao Sun; Meng Cao; James Zou;
334	Brep2Shape: Boundary and Shape Representation Alignment Via Self-supervised Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While deep learning shows promise in processing B-rep models, existing methods suffer from a representation gap: continuous approaches offer analytical precision but are visually abstract, whereas discrete methods provide intuitive clarity at the expense of geometric precision. To bridge this gap, we introduce Brep2Shape, a novel self-supervised pre-training framework designed to align abstract boundary representations with intuitive shape representations.	Yuanxu Sun; Yuezhou Ma; Haixu Wu; Guanyang Zeng; Muye Chen; Jianmin Wang; Mingsheng Long;
335	Olivia: Harmonizing Time Series Foundation Models with Power Spectral Density Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we propose \textit{Harmonizer}, a module that reshapes spectral structures and implicitly harmonizing PSDs across datasets, which theoretically corresponds to a shared reparameterization of second-order temporal correlations.	Jingru Fei; Kun Yi; Alex Wang; Qingsong Wen; Xiangxiang Zhu; Wei Fan;
336	Long Grounded Thoughts: Synthesizing Grounded Visual Problems and Distilling Reasoning Chains at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a framework able to synthesize vision-centric problems spanning diverse levels of complexity, and the resulting dataset with over 1M high-quality problems including: reasoning traces, preference data, and instruction prompts supporting SFT, offline and online RL.	David Acuna; Chao-Han Yang; Yuntian Deng; Jaehun Jung; Ximing Lu; Prithviraj Ammanabrolu; Hyunwoo Kim; Yuan-Hong Liao; Yejin Choi;
337	GXPO: Group Cross-Lingual Relative Policy Optimization for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Group Cross-lingual Relative Policy Optimization (GXPO), which forms training groups by generating solutions for the same problem in multiple PLs and jointly optimizes language-specific and cross-language signals, enabling more balanced optimization and improved transfer to low-resource PLs.	Linzheng Chai; Jian Yang; Jiajun Wu; Ensheng Shi; Xianglong Liu;
338	VlogReward: Learning Multi-Dimensional Evaluation for Vlog Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, vlog assessment is highly subjective and remains challenging due to a lack of standardized criteria, dataset and benchmark, and effective reward models. To address these challenges, we define a comprehensive vlog evaluation framework guided by professional vlog creators and product managers, establishing a taxonomy of six key dimensions, i.e., Creativity, Consistency, Concept Design, Cinematography, Narration, and Pacing.	Yexiang Liu; Wen Zhong; Sijie Zhu; Xin Gu; Fan Chen; Junxian Duan; Jie Cao; Longyin Wen; Zhenfang Chen;
339	When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this naive recurrent memory update faces two crucial drawbacks: (i) memory can quickly explode because it can update indiscriminately, even on evidence-free chunks; and (ii) the loop lacks an exit mechanism, leading to unnecessary computation after even sufficient evidence is collected. To address these issues, we propose GRU-Mem, which incorporates two text-controlled gates for more stable and efficient long-context reasoning.	Leheng Sheng; Yongtao Zhang; Wenchang Ma; Yaorui Shi; Ting Huang; Xiang Wang; An Zhang; Ke Shen; Tat-Seng Chua;
340	Towards A Science of AI Agent Reliability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a framework for measuring agent reliability grounded in safety-critical engineering practice, decomposing reliability into four dimensions: consistency, robustness, predictability, and safety.	Stephan Rabanser; Sayash Kapoor; Peter Kirgis; Kangheng Liu; Saiteja Utpala; Arvind Narayanan;
341	Dimension-Independent Convergence of Underdamped Langevin Monte Carlo in KL Divergence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Underdamped Langevin dynamics (ULD) is a widely-used sampler for Gibbs distributions $\pi\propto e^{-V}$, and is often empirically effective in high dimensions. However, existing …	Shiyuan Zhang; Qiwei Di; Xuheng Li; Quanquan Gu;
342	Linearizing Vision Transformer with Test-Time Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inheriting weights from pretrained Transformers provides an appealing shortcut, yet the fundamental representational gap between Softmax and linear attention prevents effective weight transfer. In this work, we address this conversion challenge from two perspectives: architectural alignment and representational alignment.	Yining Li; Dongchen Han; Zeyu Liu; Hanyi Wang; Yulin Wang; Gao Huang;
343	RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limits their systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios.	Yinpei Dai; Hongze Fu; Jayjun Lee; Yuejiang Liu; Haoran Zhang; Jianing Yang; Chelsea Finn; Nima Fazeli; Joyce Chai;
344	Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we challenge the standard monolithic view of Vision-Language Model (VLM) distillation by mathematically decomposing the loss into two distinct components: the language prior and visual grounding.	Hee Suk Yoon; Eunseop Yoon; Jaehyun Jang; SooHwan Eom; Ji Woo Hong; Mark Hasegawa-Johnson; Qi Dai; Chong Luo; Chang Yoo;
345	SiameseNorm: Breaking The Barrier to Reconciling Pre/Post-Norm Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We attribute this phenomenon to a structural incompatibility within a single-stream design: Any application of the Post-Norm operation inevitably obstructs the clean identity gradient preserved by Pre-Norm. To fundamentally reconcile these paradigms, we propose SiameseNorm, a two-stream architecture that couples Pre-Norm-like and Post-Norm-like streams with shared parameters.	Tianyu Li; Dongchen Han; Zixuan Cao; Haofeng Huang; Mengyu Zhou; Ming Chen; erchao.zec; xiaoxi jiang; guanjunjiang; Gao Huang;
346	Untied Ulysses: Memory-Efficient Context Parallelism Via Headwise Chunking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present UPipe, a simple yet effective context parallelism technique that performs fine-grained chunking at the attention head level.	Ravi Ghadia; Maksim Abraham; Sergei Vorobyov; Max Ryabinin;
347	IGRPO: Fast Online RL for Flow Matching Model with Dense Reward Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce iGRPO (Instant-reward GRPO), which replaces GRPO’s full-trajectory rollouts with a single-step mapping that assigns rewards instantly at each denoising step.	Sucheng Ren; Chen Chen; Zhenbang Wang; Liangchen Song; Xiangxin Zhu; Yinfei Yang; Jiasen Lu;
348	Compositional Planning with Jumpy World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such compositional planning remains elusive as compounding errors in long-horizon predictions make it challenging to estimate the visitation distribution induced by sequencing policies. Motivated by the geometric policy composition framework introduced in Thakoor et al. (2022), we address these challenges by learning predictive models of multi-step dynamics, so-called jumpy world models, that capture state occupancies induced by pre-trained policies across multiple timescales in an off-policy manner.	Jesse Farebrother; Matteo Pirotta; Andrea Tirinzoni; Marc Bellemare; Alessandro Lazaric; Ahmed Touati;
349	SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Standard masked modeling often yields trivial solutions while contrastive methods lack morphological precision. To address these limitations, we propose a Statistical-prior Informed Generative Masking Architecture (SIGMA-PPG), a generative foundation model featuring a prior-guided adversarial masking mechanism, where a reinforcement learning-driven teacher leverages statistical priors to create challenging learning paths that prevent overfitting to noise.	ZONGHENG GUO; Tao Chen; Yang Jiao; Yi Pan; Xiao Hu; Manuela Ferrario;
350	VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present VideoFlexTok, a tokenizer that represents videos with a _variable-length sequence of tokens structured in a coarse-to-fine manner_, where the first tokens capture abstract information like semantics and motion and later tokens provide fine-grained details.	Andrei Atanov; Jesse Allardice; Roman Bachmann; Oğuzhan Kar; R Devon Hjelm; David Griffiths; Peter Fu; Amir Zamir; Afshin Dehghan;
351	Parallel Stochastic Gradient-Based Planning for World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a robust and highly parallelizable planner that leverages the differentiability of the learned world model for efficient optimization, solving long-horizon control tasks from visual input.	Michael Psenka; Michael Rabbat; Aditi Krishnapriyan; Yann LeCun; Amir Bar;
352	SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SceneSmith, a hierarchical agentic framework that generates simulation-ready indoor environments from natural language prompts.	Nicholas Pfaff; Thomas Cohn; Sergey Zakharov; Rick Cory; Russ Tedrake;
353	Proximal Decoding: Provably Reducing Copyright Risk for Any Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Proximal Decoding, a plug-and-play inference-time method for suppressing verbatim reproduction: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM.	Jacqueline He; Jonathan Hayase; Scott Yih; Sewoong Oh; Luke Zettlemoyer; Pang Wei Koh;
354	Memory Is Reconstructed, Not Retrieved: Graph Memory for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While current memory-augmented agents rely on a static “retrieve-then-reason” paradigm, this rigid pipeline design prevents them from dynamically adapting memory access to intermediate evidence discovered during inference. To bridge this gap, we propose MRAgent, a framework that combines an associative memory graph with an active reconstruction mechanism.	Shuo Ji; yibo li; Bryan Hooi;
355	Learning Sparse Visual Representations Via Spatial-Semantic Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conversely, generative models preserves dense feature grids for reconstruction but fails to produce high-level abstractions. We introduce STELLAR, a framework that resolves this tension by factorizing visual features into a low-rank product of semantic concepts and their spatial distributions.	Theodore Zhao; Sid Kiblawi; Jianwei Yang; Naoto Usuyama; Reuben Tan; Noel Codella; Tristan Naumann; Hoifung Poon; Mu Wei;
356	Prompt Injection As Role Confusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: More broadly, we introduce a unifying, mechanistic framework for prompt injection, demonstrating that diverse prompt-injection attacks exploit the same underlying role-confusion mechanism.	Charles Ye; Jasmine Cui; Dylan Hadfield-Menell;
357	Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically investigate multimodal deception and introduce MM-DeceptionBench, the first benchmark designed to evaluate deceptive behaviors in vision–language models across six realistic categories.	Sitong Fang; Shiyi Hou; Kaile Wang; Boyuan Chen; Donghai Hong; Jiayi Zhou; Juntao Dai; Yaodong Yang; Jiaming Ji;
358	MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing automated validation and repair approaches struggle to generalize to many PLs due to high engineering overhead, and they rely on existing and often inadequate test suites, which results in false claims of equivalence and ineffective translation repair. To bridge this gap, we develop MatchFixAgent, a large language model (LLM)-based, PL-agnostic framework for equivalence validation and repair of translations.	Ali Reza Ibrahimzada; Brandon Paulsen; Reyhaneh Jabbarvand; Joey Dodds; Daniel Kroening;
359	Sparse But Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we study the effect of L0 on SAEs, and show that if L0 is not set correctly, the SAE fails to disentangle the underlying features of the LLM.	David Chanin; Adrià Garriga-Alonso;
360	FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While models like CLIP perform well on global alignment, they often struggle to capture fine-grained details in object attributes, spatial relations, and linguistic expressions, with limited support for bilingual comprehension. To address these challenges, we introduce FG-CLIP 2, a bilingual vision-language model designed to advance fine-grained alignment for both English and Chinese.	Chunyu Xie; Bin Wang; Fanjing Kong; Jincheng Li; Dawei Liang; Ji Ao; Dawei Leng; Yuhui Yin;
361	Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Least-Loaded Expert Parallelism (LLEP), a novel EP algorithm that dynamically reroutes excess tokens and associated expert parameters from overloaded devices to underutilized ones.	Xuan-Phi Nguyen; Shrey Pandit; Austin Xu; Caiming Xiong; Shafiq Joty;
362	Discovering Implicit Large Language Model Alignment Objectives Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing interpretation methods typically rely on pre-defined rubrics, risking the omission of unknown unknowns, or fail to identify objectives that comprehensively cover and are causal to the model behavior on some dataset. To address these limitations, we introduce Obj-Disco, a framework that automatically decomposes an alignment reward signal into a sparse, weighted combination of human-interpretable natural language objectives.	Edward Chen; Sanmi Koyejo; Carlos Guestrin;
363	Conversation for Non-verifiable Learning: Self-Evolving Large Language Models Through Meta-Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CoNL, a framework that unifies generation, evaluation, and meta-evaluation through multi-agent self-play.	Yuan Sui; Bryan Hooi;
364	Transforming Weather Data from Pixel to Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing studies often rely on weather data in pixel space, which presents several challenges such as smooth outputs in model outputs, limited applicability to a single pressure-variable subset (PVS), and high data storage and computational costs. To address these challenges, we propose a novel Weather Latent Autoencoder (WLA) that transforms weather data from pixel space to latent space, enabling efficient data representation.	Sijie Zhao; Feng Liu; Xueliang Zhang; Hao Chen; Tao Han; JUNCHAO GONG; Ran Tao; Pengfeng Xiao; Xinyu Gu; LEI BAI;
365	Restoring Initial Noise Sensitivity in Text-to-Image Distillation Through Geometric Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify a key lost property: sensitivity to initial noise, the absence of which impairs downstream control methods that rely on noise-based optimization and manipulation.	huayang Huang; Ruoyu Wang; Jinhui Zhao; Wei Deng; Daiguo Zhou; Jian Luan; Yu Wu; Ye Zhu;
366	Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We analyze the search engine optimization (SEO) to the GEO transition to identify two risks: (i) concentrated influence from low contestability and system sensitivity, and (ii) undisclosed commercial influence embedded in evidence and reasoning.	Yizhu Wen; Nan Zhang; Haohan Yuan; Xun Chen; Haopeng Zhang; Hanqing Guo;
367	LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing large language model (LLM)–based agents often exhibit behaviors that are misaligned with their partners or inconsistent with the environment state, leading to inefficient cooperation and poor task success. To address this challenge, we propose a novel framework, Learning Laws for Cooperation (LLawCo), that enables embodied agents to autonomously align with both their partners and task objectives.	Qinhong Zhou; Chuang Gan; Anoop Cherian;
368	Teaching Models to Teach Themselves: Reasoning at The Edge of Learnability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We explore this with SOAR: A self-improvement framework designed to surface these pedagogical signals through meta-RL. A teacher model proposes synthetic problems for a student model, and is rewarded with its improvement on a subset of hard problems, thus grounding the curriculum in real student progress rather than proxy rewards.	Shobhita Sundaram; John Quan; Ariel Kwiatkowski; Kartik Ahuja; Yann Ollivier; Julia Kempe;
369	DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the scalability challenge of creating context-dependent queries, we propose a human-model collaborative pipeline that employs vision-language models to mine latent spatiotemporal associations, effectively offloading intensive context discovery before human verification.	Chenlong Deng; Mengjie Deng; Junjie Wu; Dun Zeng; Teng Wang; Qingsong Xie; Jiadeng Huang; Shengjie Ma; Changwang Zhang; Zhaoxiang Wang; Jun Wang; Yutao Zhu; Zhicheng Dou;
370	Benchmarking Reward Hack Detection in Code Environments Via Contrastive Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel taxonomy of reward exploits spanning across 54 categories and introduce TRACE (Testing Reward Anomalies in Code Environments), a synthetically curated and human-verified benchmark containing 517 testing trajectories.	Darshan Deshpande; Anand Kannappan; Rebecca Qian;
371	PromptRL: Prompt Matters in RL for Flow-Based Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this research, we show that current RL pipelines for FMs suffer from two underappreciated yet important limitations: sample inefficiency due to insufficient generation diversity, and pronounced prompt overfitting, where models memorize specific training formulations and exhibit dramatic performance collapse when evaluated on semantically equivalent but stylistically varied prompts.	Fu-Yun Wang; Han Zhang; Michaël Gharbi; Hongsheng Li; Taesung Park;
372	Beyond Problem Solving: UOJ-Bench for Evaluating Code Generation, Hacking, and Repair in Competitive Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce UOJ-Bench, a benchmark designed to evaluate not only the problem-solving ability of LLMs, but also their ability to identify errors in human-written code—a crucial educational activity traditionally supported by running test cases over online judge systems.	Tingqiang Xu; Hangrui Zhou; Tianle Cai; Alex Gu; Kaifeng Lyu;
373	Uncovering Bias Mechanisms in Observational Studies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that the relationship between bias magnitude and the predictive performance of nuisance function estimators (in the observational study) can help distinguish among common sources of bias.	Ilker Demirel; Zeshan Hussain; Piersilvio De Bartolomeis; David Sontag;
374	Dual Latent Memory for Visual Multi-agent System Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose L$^{2}$-VMAS, a novel model-agnostic framework that enables inter-agent collaboration with dual latent memories.	Xinlei Yu; Chengming Xu; Zhangquan Chen; Bo Yin; Cheng Yang; Yongbo He; Yihao Hu; Jiangning Zhang; Cheng Tan; Xiaobin Hu; Shuicheng YAN;
375	Physiology As Language: Translating Nocturnal Breathing to EEG Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the significant complexity gap between the two modalities, we propose a waveform-conditional generative framework that preserves fine-grained respiratory dynamics while constraining the EEG target space through discrete tokenization.	Kaiwen Zha; Chao Li; Hao He; Peng Cao; Tianhong Li; Ali Mirzazadeh; Ellen Zhang; Jong Lee; Yoon Kim; Dina Katabi;
376	MARS: Modular Agent with Reflective Search for Automated AI Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MARS* (Modular Agent with Reflective Search), a framework optimized for autonomous AI research.*	Jiefeng Chen; Bhavana Dalvi Mishra; Jaehyun Nam; Rui Meng; Tomas Pfister; Jinsung Yoon;
377	Efficient Test-Time Scaling Via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, developing effective and efficient TTS methods to unlock dLLMs’ full generative potential remains an underexplored challenge. To address this, we propose \textbf{LLaDA-S}, an efficient TTS framework for dLLMs that (i) performs \textbf{Hierarchical Trajectory Search} (HTS) which dynamically prunes and reallocates compute in an early-to-mid denoising window, (ii) replaces external verifiers with \textbf{Self-Verified Feedback} (SVF) obtained via self-evaluation prompts on intermediate completions, and (iii) introduces \textbf{Local branching with partial remasking} to explore diverse implementations while preserving a high-confidence tokens.	Jinbin Bai; Yixuan Li; Yuchen Zhu; Yi Xin; Qingyu Shi; Aosong Feng; Xiaohong Liu; Molei Tao; Jianru Xue; Xiangtai Li; Ming-Hsuan Yang;
378	DiScoFormer: Plug-In Density and Score Estimation with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce DiScoFormer (Density and Score Transformer), a “train-once, infer-anywhere equivariant Transformer that maps i.i.d. samples to both density values and score vectors, generalizing across distributions and sample sizes.	Vasily Ilin; Peter Sushko; Ranjay Krishna;
379	Position: There Are Futures That Benchmark-driven AI Cannot See Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These are philosophically distinct questions that may require discoveries that we cannot specify. We propose mechanisms to restore exaptive capacity without abandoning benchmarking: plural evaluation regimes, protected venues for non-comparable work, long-horizon funding, and training norms that encourage researchers to question selection rules, not only optimize within them.	Sobhan Lotfi; Ava Iranmanesh; Lachin Naghashyar; Ali Shirali; Fateme Haredasht; Sanmi Koyejo; Phil Torr; Yong Suk Lee; Fazl Barez; Joel Lehman; Peter Norvig; Arvind Narayanan;
380	Fast KV Compaction Via Attention Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work describes an approach for fast context compaction in latent space through Attention Matching, which constructs compact keys and values to reproduce attention outputs and preserve attention mass at a per-KV-head level.	Adam Zweiger; Xinghong Fu; Han Guo; Yoon Kim;
381	On Path to Multimodal Historical Reasoning: HistBench and HistAgent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Existing general-purpose agents perform well on many current benchmarks but lack the domain expertise needed to address complex historical questions. To address this gap, we introduce HistBench, a new benchmark of 414 high-quality and carefully-reviewed questions stratified by difficulty and designed to evaluate LLM’s capacity for historical reasoning.	Jiahao Qiu; Fulian Xiao; Yimin Wang; Yuchen Mao; Yijia Chen; Xinzhe Juan; Siran Wang; Xuan Qi; Tongcheng Zhang; Zixin Yao; Jiacheng Guo; Yifu Lu; Charles Argon; Jundi Cui; Daixin Chen; Junran Zhou; Shuyao Zhou; Zhanpeng Zhou; Ling Yang; Shilong Liu; Hongru WANG; Kaixuan Huang; xun jiang; Xi Gao; Mengdi Wang;
382	Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates.	yibo li; Zijie Lin; Ailin Deng; Xuan Zhang; Yufei He; Shuo Ji; Tri Cao; Bryan Hooi;
383	Outcome-Based Rewards Do Not Guarantee Faithful and Verifiable Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A common assumption is that the reasoning chains trained through RLVR represent how a model gets to its answer. In this paper, we develop two metrics for critically examining this assumption: Causal Importance of Reasoning (CIR), which measures the cumulative effect of reasoning tokens on the final answer (faithfulness), and Sufficiency of Reasoning (SR), which measures whether a verifier can arrive at an unambiguous answer based on the reasoning alone (verifiability).	Qinan Yu; Alexa Tartaglini; Peter Hase; Carlos Guestrin; Christopher Potts;
384	Scalable Sampling Via Generalized Fixed-Point Diffusion Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, while recent approaches that use least-squares `matching’ objectives have improved scalability, they often necessitate significant trade-offs, such as restricting prior distributions or relying on unstable optimization schemes. By generalizing these methods as special forms of fixed-point iterations rooted in Nelson’s relation, we develop a new method that addresses these limitations.	Denis Blessing; Lorenz Richter; Julius Berner; Egor Malitskiy; Gerhard Neumann;
385	Olaf-World: Orienting Latent Actions for Video World Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our key insight is that although actions are unobserved, their semantic effects* are observable and can serve as a shared reference.*	Yuxin Jiang; Yuchao Gu; Ivor Tsang; Mike Zheng Shou;
386	CBV: Clean-label Backdoor Attacks on Vision Language Models Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing backdoor attacks on VLMs primarily rely on data poisoning by adding visual triggers and modifying text labels, where the induced image–text mismatch makes poisoned samples easy to detect. To address this limitation, we propose the Clean-Label Backdoor Attack on VLMs via Diffusion Models (CBV), which leverages diffusion models to generate natural poisoned examples via score matching.	Ji Guo; xiaolong qin; Wenbo Jiang; Cencen Liu; Jielei Wang; Jierun Chen;
387	Understanding Behavior Cloning with Action Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Autoregressive models like transformer have proven remarkably effective, from large language models (LLMs) to vision-language-action systems (VLAs).	Haoqun Cao; Tengyang Xie;
388	TeamWork: Multivariate Time Series Anomaly Detection Via Asymmetric Role-aware Channel Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods often struggle to balance channel relationship modeling and overlook the relative importance of different variables within multivariate time series. To address this, we propose TeamWork, an asymmetric role-aware channel modeling framework that decouples variables into dominant and auxiliary roles according to their contributions to uncertainty reduction.	Shiyan Hu; Tengxue Zhang; Jianxin Jin; Xiangfei Qiu; Bin Yang; Chenjuan Guo;
389	Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, purely text-based self-evaluation struggles to verify complex visual reasoning steps and often suffers from evaluation hallucinations. To address these challenges, inspired by recent advances in tool-integrated reasoning, we propose Agent0-VL, a self-evolving vision-language agent that achieves continual improvement with tool-integrated reasoning.	Jiaqi Liu; Kaiwen Xiong; Peng Xia; Yiyang Zhou; Haonian Ji; Lu Feng; Siwei Han; Mingyu Ding; Huaxiu Yao;
390	SimpleMem: Efficient Lifelong Memory for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches either retain full interaction histories via passive context extension, leading to substantial redundancy, or rely on iterative reasoning to filter noise, incurring high token costs. To address this challenge, we introduce SimpleMem, an efficient memory framework based on semantic lossless compression.	Jiaqi Liu; Yaofeng Su; Peng Xia; Siwei Han; Zeyu Zheng; Cihang Xie; Mingyu Ding; Huaxiu Yao;
391	Learnability-Informed Fine-Tuning of Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We aim to improve the reasoning capabilities of diffusion language models (DLMs).	Shubham Parashar; Atharv Chagi; Jacob Helwig; Lakshmi Madhavarapu; Sushil Vemuri; James Caverlee; Dileep Kalathil; Shuiwang Ji;
392	ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases.	Yizheng Huang; Wenjun Zeng; Aditi Kumaresan; Zi Wang;
393	Causal Attention with Lookahead Keys Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CAuSal aTtention with Lookahead kEys (CASTLE), an attention mechanism that continually updates each token’s keys as the context unfolds.	Zhuoqing Song; Peng Sun; Huizhuo Yuan; Quanquan Gu;
394	Position: Towards Responsible Evaluation for Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current evaluation practices are increasingly inadequate for capturing the full range of capabilities, limitations, and societal impacts of modern TTS systems. This position paper introduces the concept of Responsible Evaluation and argues that it is essential and urgent for the next phase of TTS development, structured through three progressive levels: (1) ensuring the faithful and accurate reflection of a model’s true capabilities and limitations, with more robust, discriminative, and comprehensive objective and subjective scoring methodologies; (2) enabling comparability, standardization, and transferability through standardized benchmarks, transparent reporting, and transferable evaluation metrics; and (3) assessing and mitigating ethical risks associated with forgery, misuse, privacy violations, and security vulnerabilities.	Yifan Yang; Hui Wang; Bing Han; Shujie Liu; Jinyu Li; Yong Qin; Xie Chen;
395	Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a mechanism to steer the sampling diversity of denoising diffusion and flow matching models, allowing users to sample from a sharper or broader distribution than the training distribution.	Yanbo Xu; Yu Wu; Sungjae Park; Zhizhuo Zhou; Shubham Tulsiani;
396	WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods.	Wenqiang Sun; Haiyu Zhang; Haoyuan Wang; Junta Wu; Zehan Wang; Zhenwei Wang; Yunhong Wang; Jun Zhang; Tengfei Wang; Chunchao Guo;
397	DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present DetailMaster, a comprehensive benchmark for evaluating T2I capabilities on long prompts with complex compositional requirements, accompanied by an automated data construction pipeline and an evaluation workflow.	Qirui Jiao; Daoyuan Chen; Yilun Huang; Xika Lin; Ying Shen; Yaliang Li;
398	From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this theory, we propose Explore-Then-Exploit (ETE), a training-free decoding strategy that maximizes information throughput and decoding efficiency.	Hengyu Fu; Baihe Huang; Virginia Adams; Charles Wang; Junkeun Yi; Mohammad Mahdi Kamani; Venkat Krishna Srinivasan; Jiantao Jiao;
399	VFMF: Dense Forecasting By Generating Foundation Model Features Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Interestingly, naively replacing deterministic forecasting with generative flow matching does not match the sample quality of the regression model, despite being a mathematically appropriate formulation of the forecasting task. In this work, we explain why this is the case, and we show how to optimally generate foundation model features.	Gabrijel Boduljak; Yushi Lan; Christian Rupprecht; Andrea Vedaldi;
400	XKV: Cross-Layer KV-Cache Compression Via Aligned Singular Vector Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show, via Centered Kernel Alignment (CKA), that the dominant singular vectors of KV-Cache are well aligned across layers. Motivated by this observation, we propose xKV, a post-training compression method that jointly factorizes grouped-layer KV-Cache into a shared low-rank subspace, substantially reducing KV-Cache memory.	Chi-Chih Chang; Wei-Cheng Lin; Chien-Yu Lin; Hung-Yueh Chiang; Yash Akhauri; Xilai Dai; Huiqiang Jiang; Yucheng Li; Kai-Chiang Wu; Luis Ceze; Mohamed Abdelfattah;
401	Temporal Context Reinstatement Drives Episodic-Like Order Memory in Long-Context Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we investigate whether and, if so, how LLMs capture core behavioral signatures of humans of a central aspect of episodic memory via a temporal order memory task.	Mathis Pink; Vy Vo; Qinyuan Wu; Jianing Mu; Javier Turek; Uri Hasson; Kenneth Norman; Sebastian Michelmann; Alexander Huth; Mariya Toneva;
402	“very Likely” Means “uncertain”? How LLMs Diverge from Humans in Linguistic Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate: How LLMs diverge from humans in verbal uncertainty quantification?	Jinhao Duan; Zicheng Liu; Zijie Liu; Kaidi Xu; Tianlong Chen;
403	(Be Cautious!) Bio-Foundation Models Are Not Yet Robust to Biologically Plausible Perturbations and ML Transformations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we ask: Are Bio-FMs robust for real-world use?	Jinhao Duan; Ruichen Zhang; Gengwei Zhang; Huaizhi Qu; Jie Peng; Sijia Liu; Tianlong Chen;
404	Failure-Driven Workflow Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose \textbf{CE-Graph}, which maintains a counterexample pool, estimates dense failure modes, and applies operator-constrained graph edits via a \textbf{Propose-and-Verify} loop with a convergence-aware stopping rule.	Jusheng Zhang; Jing Yang; Kaitong Cai; Ziliang Chen; Yongsen Zheng; Kwok Yan Lam; Liang Lin; Keze Wang;
405	SOLAR for Offline MARL: Plateau-Triggered Potential Shaping Under World-Model Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that shaping becomes reliable when it is (i) activated only after \emph{statistically validated} learning plateaus and (ii) constrained to \emph{potential-based} shaping, which preserves the task optimum. Motivated by this, we propose \textsc{SOLAR}, a simulate–evaluate–shape framework.	Jusheng Zhang; Yijia Fan; Ruiqi Chen; Jing Yang; Ziliang Chen; Yongsen Zheng; Yanxi Chen; Jian Wang; Kwok Yan Lam; Liang Lin; Keze Wang;
406	Separating Representation from Reconstruction Enables Scalable Text Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Hence, we propose CrossBERT, a two-part architecture that separates the learning of high-quality encoded representations from the rigid grounding of token reconstruction.	Megi Dervishi; Mathurin VIDEAU; Yann LeCun;
407	Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Representation Editing (RepE) offers a intrinsic control, its application to dynamic reasoning trajectories remains underexplored. In this work, we bridge this gap by investigating the geometry of truth within unfolding reasoning chains.	Tianlong Wang; Yuhang Wang; Weibin Liao; Xin Gao; Xinyu Ma; Yang Lin; Yasha Wang; Liantao Ma;
408	Position: Comprehensive AI Governance Requires Addressing Non-model Capability Gains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Frontier AI governance often centres on the model-level governance paradigm, which assumes that a model’s capability profile is primarily a function of the compute and data used during training. This position paper argues that model-level governance becomes less effective when capability progress is increasingly driven by non-model gains—improvements that are independent from advances in the base model.	Arthur Goemans; Daniel Altman; Noemi Dreksler; Jonas Freund; Milan Gandhi; Zhengdong Wang; Sarah Cogan; Sebastien Krier; Demetra Brady; Lewis Ho; Allan Dafoe;
409	$\alpha$-PFN: Fast Entropy Search Via In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a two-stage amortization strategy that learns to approximate entropy search-based acquisition functions using Prior-data Fitted Networks (PFNs) in a single forward pass.	Herilalaina Rakotoarison; Steven Adriaensen; Tom Viering; Samuel Gabriel Müller; Carl Hvarfner; Frank Hutter; Eytan Bakshy;
410	Esoteric Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, smoothly interpolating between their perplexities while overcoming their respective limitations.	Subham Sekhar Sahoo; Zhihan Yang; Yash Akhauri; Johnna Liu; Deepansha Singh; Zhoujun Cheng; Zhengzhong Liu; Eric Xing; John Thickstun; Arash Vahdat;
411	The Power of Power Law: Asymmetry Enables Compositional Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While a common intuition suggests that reweighting or curating data toward a uniform distribution may help models better learn these long-tail skills, we find a counterintuitive result: across a wide range of compositional reasoning tasks, such as state tracking and multi-step arithmetic, training under power-law distributions consistently outperforms training under uniform distributions. To understand this advantage, we introduce a minimalist skill-composition task and show that learning under a power-law distribution provably requires significantly less training data.	Zixuan Wang; Xingyu Dang; Jason Lee; Kaifeng Lyu;
412	Position: World Models As An Intermediary Between Agents and The Real World Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The true bottleneck for achieving the next level of agent performance for these complex and high-cost domains lies in the expense of executing actions to acquire reward signals. To address this gap, this paper argues that we should use world models as an intermediary between agents and the real world.	Sherry Yang;
413	AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present AutoTool, a training framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories.	Jiaru Zou; Ling Yang; Yunzhe Qi; Sirui Chen; Mengting Ai; Ke Shen; Jingrui He; Mengdi Wang;
414	Evolutionary Generation of Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Evolutionary Generation of Multi-Agent Systems (EvoMAS), which formulates MAS generation as structured configuration generation.	Yuntong Hu; Matthew Trager; Yuting Zhang; Yi Zhang; Shuo Yang; Wei Xia; Stefano Soatto;
415	Mitigating Bias in Locally Constrained Decoding Via Tractable Proposals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a generic approach to construct proposals and potentials for SMC sampling from $p_{\texttt{lm}}( \cdot \mid \texttt{constraint})$.	Meihua Dang; Linxin Song; Honghua Zhang; Jieyu Zhao; Guy Van den Broeck; Stefano Ermon;
416	WISE: World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing research and evaluation standards predominantly focus on image realism and shallow text-image alignment, lacking a comprehensive assessment of complex semantic understanding and world knowledge integration in text-to-image generation. To address this challenge, we propose WISE, the first benchmark specifically designed for World Knowledge-Informed Semantic Evaluation.	Yuwei Niu; Munan Ning; Mengren Zheng; Weiyang Jin; Bin Lin; Peng Jin; Jiaqi Liao; Chaoran Feng; Fanqing Meng; Kun-Peng Ning; Bin Zhu; Li Yuan;
417	Position: It Is Time to Virtualize Foundation Models with A Self-evolving Operating System Layer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: It mirrors computing before operating systems, when every program re-implemented basic services. This position paper argues that the field now needs a Foundation Model Operating System (FMOS): a system layer that virtualizes FM interactions analogous to how virtual machines abstract physical hardware, giving applications the illusion of dedicated, trustworthy FM instances with effectively unbounded capabilities.	Suparna Bhattacharya; Tarun Kumar; Cong Xu; Satish Mopur; Jiahao Li; Ashish Mishra; Aalap Tripathy; ANNMARY KOOMTHANAM; Martin Foltin; Ian Foster;
418	Wait, Wait, Wait… Why Do Reasoning Models Loop? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This points to mismatches between the training distribution and the learned model, which we refer to as errors in learning, as a key cause. To understand how such errors cause loops, we introduce a synthetic graph reasoning task and demonstrate two mechanisms.	Charilaos Pipis; Shivam Garg; Vasilis Kontonis; Vaishnavi Shrivastava; Akshay Krishnamurthy; Dimitris Papailiopoulos;
419	SafeLab: An Interactive High-Fidelity Benchmark for Embodied Safety in Scientific Robotics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce SafeLab, a generative simulation benchmark designed for the full lifecycle of safe robot learning.	Fengshuo Bai; Yufeng Li; Ruihai Wu; Peishuo Wang; Yuhan Wang; Bernie Zhu; Yuanfei Wang; Tawei Chou; Gao; Runchuan Zhu; Ying Wen; Yaodong Yang; Yuanpei Chen;
420	AVTrack: Audio-Visual Speaker Tracking in Complex Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such oversimplified settings bias evaluation toward static audio–visual co-occurrence, rather than rigorously assessing robust spatiotemporal modeling and cross-modal reasoning in complex, dynamic scenes. To address these limitations, we introduce \textbf{AVTrack}, a human-centric audio-visual instance segmentation (AVIS) dataset designed for dynamic real-world scenarios.	Yaoting Wang; Yun Zhou; Zipei Zhang; Henghui Ding;
421	Towards Efficient Large Language Reasoning Models Via Extreme-Ratio Chain-of-Thought Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy.	Yuntian Tang; Bohan Jia; Wenxuan Huang; Lianyue Zhang; Jiao Xie; Wenxi Li; Wei Li; Jie Hu; Xinghao Chen; Rongrong Ji; Shaohui Lin;
422	Gaming Consensus: Coordinated Manipulation in Crowdsourced Fact-Checking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although this algorithm is designed to be robust against traditional brigading, we demonstrate an attack showing that coordinated users can strategically fabricate diverse agreement in the system’s latent space to manipulate the scoring algorithm. We validate this attack on real-world production data and find that a surprisingly large number of notes’ scores can potentially be manipulated with a small number (< 10) of coordinated votes, raising the risk that adversaries could surface arbitrary notes on these social media platforms.	Nikil Selvam; Jay Baxter; Sophie Hilgard; Brad Miller; Keith Coleman; Ellen Vitercik; Sanmi Koyejo;
423	ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These emerging AI capabilities offer new opportunities for scientific discovery and biomedical advances, but they are also changing the landscape of biosecurity risks. To address this, we introduce the Agentic Bio-Capabilities Benchmark (ABC-Bench), a suite of evaluations to measure \textit{agentic} biosecurity-relevant capabilities.	Andrew Liu; Samira Nedungadi; Bryce Cai; Alex Kleinman; Harmon Bhasin; Seth Donoughe;
424	Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent works have introduced Large Language Model (LLM) based agents for Linux kernel crash-resolution, their evaluation benchmarks are usually static and thus, do not capture the evolving nature of the Linux kernel, and suffer from potential data contamination due to LLM knowledge cutoffs. To address the above problem, we present (i) Live-kBench, an evaluation framework for self-evolving benchmarks that continuously scrapes and evaluates agents on freshly discovered kernel bugs, and (ii) kEnv, an agent-agnostic standardized crash-resolution environment for kernel compilation, execution, and feedback.	Chenxi Huang; Alex Mathai; Feiyang Yu; Aleksandr Nogikh; Petros Maniatis; Franjo Ivancic; Eugene Wu; Kostis Kaffes; Junfeng Yang; Baishakhi Ray;
425	Position: Multiple Definitions & Unrealistic Assumptions of Model Collapse Distract from Real World Threats Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We highlight that research on model collapse actually encompasses eight distinct and at times conflicting definitions of model collapse, and argue that inconsistent terminology within and between papers has hindered building a comprehensive understanding of model collapse.	Rylan Schaeffer; Joshua Kazdan; Alvan Arulandu; Sanmi Koyejo;
426	RefChess: Monte-Carlo Move Selection for Zero-Shot Referring Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, selecting the correct segmentation proposal remains challenging, as existing methods typically rely on independent proposal scoring and lack contextual reasoning among visually similar candidates. To address this limitation, we propose RefChess, a training-free framework that reformulates proposal selection as a decision-making problem under contextual perturbations rather than a single-step ranking task.	Shiyan Tong; Jinxia Zhang; Zhiyuan Wang; Hao Tian; YingYing Wang; Kanjian Zhang; Haikun Wei;
427	SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SWE-rebench V2, a language-agnostic automated pipeline for harvesting executable real-world SWE tasks and constructing RL training environments at scale.	Ibragim Badertdinov; Maksim Nekrashevich; Anton Shevtsov; Aleksandr Golubev;
428	FourTune: Towards Fully 4-Bit Efficient Post-Training for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, post-training of large diffusion models is still challenging due to the prohibitive memory footprints and slow training speed, which existing parameter-efficient fine-tuning methods only partially address. To overcome these limitations, we propose FourTune, an efficient post-training framework for diffusion models based on an end-to-end W4A4G4 paradigm.	Bowen Xue; Zihan Min; Xingyang Li; Muyang Li; Yujun Lin; Zhekai Zhang; Haocheng Xi; Lvmin Zhang; Maneesh Agrawala; Jun-Yan Zhu; Song Han;
429	Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior work has proposed a range of heuristics to counteract this effect, but these methods are ad hoc: they frequently trade off correctness for diversity, their effectiveness varies across tasks, and in some cases they even contradict one another. In this work, we place these observations on a rigorous foundation.	Jingchu Gai; Guanning Zeng; Huaqing Zhang; Aditi Raghunathan;
430	Demystifying Entropy Control in LLM RL Training: Theoretical Analysis and Dynamic Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates a pivotal yet debated component of reinforcement learning (RL) for training large language models (LLMs): controlling entropy (increasing or decreasing it) during RL fine-tuning.	Jingchu Gai; Guanning Zeng; Huaqing Zhang; Han Zhong; Yige Hong; Andrej Risteski; Aditi Raghunathan;
431	Direct 3D-Aware Object Insertion Via Decomposed Visual Proxies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DIRECT (Decomposed Injection for Reference Composition and Target-integration), a novel framework that integrates interactive pose manipulation with high-fidelity 2D image synthesis to enable precise geometric alignment.	Jingbo Gong; Yikai Wang; Yushi Lan; Yuhao Wan; Ziheng Ouyang; Rui Zhao; Ming-Ming Cheng; Qibin Hou; Chen Change Loy;
432	WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present WorldMirror, a unified feed-forward model for comprehensive 3D geometric prediction tasks.	Yifan Liu; Zhiyuan Min; Zhenwei Wang; Junta Wu; Tengfei Wang; Yixuan Yuan; Yawei Luo; Chunchao Guo;
433	Evolution Strategies at The Hyperscale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Evolution Guided GeneRal Optimisation via Low-rank Learning (EGGROLL), which improves arithmetic intensity by structuring individual perturbations as rank-$r$ matrices, resulting in a hundredfold increase in training speed for billion-parameter models at large population sizes, achieving up to 91\% of the throughput of pure batch inference.	Bidipta Sarkar; Mattie Fellows; Juan Duque; Alistair Letcher; Antonio Villares; Anya Sims; Clarisse Wibault; Dmitry Samsonov; Dylan Cope; Jarek Liesen; Kang Li; Lukas Seier; Theo Wolf; Uljad Berdica; Valentin Mohl; Alexander D. Goldie; Aaron Courville; Karin Sevegnani; Shimon Whiteson; Jakob Foerster;
434	UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe that naively adapting GRPO to UDM leads to unstable training and marginal performance. To address this, we propose \Ours, the first framework that integrates UDM with RL.	Jiaqi Wang; Haoge Deng; Ting Pan; Yang Liu; Chengyuan Wang; Fan Zhang; Yonggang Qi; Xinlong Wang;
435	SleepLM: Natural-Language Intelligence for Human Sleep Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SleepLM, a family of sleep-language foundation models that enable human sleep alignment, interpretation, and interaction with natural language.	Zongzhe Xu; Zitao Shuai; Eideen Mozaffari; Ravi Aysola; Rajesh Kumar; Yuzhe Yang;
436	Debiased Model-based Representations for Sample-efficient Continuous Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR. Q algorithm.	Jiafei Lyu; Zichuan Lin; Scott Fujimoto; Kai Yang; Yangkun Chen; Saiyong Yang; Zongqing Lu; Deheng Ye;
437	PretrainZero: Reinforcement Active Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose PretrainZero, a reinforcement active learning framework built on the pretraining corpus to extend RL from domain-specific post-training to general pretraining.	Xingrun Xing; Zhiyuan Fan; Jie Lou; Guoqi Li; Jiajun Zhang; Debing Zhang;
438	SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing language-image pre-training for remote sensing object detection is constrained by Monolithic Label Learning, which relies on exhaustively enumerating open-set categories via black-box data to acquire fine-grained representations, creating a dependency incompatible with the domain’s inherent data scarcity. To transcend this bottleneck, we propose SLIP-RS, establishing a Structured-Attribute Decoupling Paradigm that maps the open-ended category space into a finite, physically meaningful attribute space, unlocking fine-grained discriminability via explicit structural logic.	Chenxu Wang; Yuxuan Li; Yunheng Li; Xiang Li; Jingyuan Xia; Qibin Hou;
439	Recurrent Equivariant Constraint Modulation: Learning Per-Layer Symmetry Relaxation from Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Recurrent Equivariant Constraint Modulation (RECM), a layer-wise constraint modulation mechanism that learns appropriate relaxation levels solely from the training signal and the symmetry properties of each layer’s input-target distribution, without requiring any prior knowledge about the task-dependent target relaxation level.	Stefanos Pertigkiozoglou; Mircea Petrache; Shubhendu Trivedi; Kostas Daniilidis;
440	4RC: 4D Reconstruction Via Conditional Querying Anytime and Anywhere Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos.	Yihang Luo; Shangchen Zhou; Yushi Lan; Xingang Pan; Chen Change Loy;
441	World Guidance: World Modeling in Condition Space for Action Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches struggle to strike a balance between maintaining efficient, predictable future representations and preserving sufficient fine-grained information to guide precise action generation. To address this limitation, we propose WoG (World Guidance), a framework that maps future observations into compact conditions by injecting them into the action inference pipeline.	Yue Su; Sijin Chen; Haixin Shi; Mingyu Liu; Zhengshen Zhang; Ningyuan Huang; Weiheng Zhong; Zhengbang Zhu; Yuxiao Liu; Xihui Liu;
442	FAIL: Flow Matching Adversarial Imitation Learning for Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Flow Matching Adversarial Imitation Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons.	Yeyao Ma; Chen Li; Xiaosong Zhang; Han Hu; Weidi Xie;
443	Position: Peer Review Should Be Calibrated Via LLM Scoring Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As submission volumes grow, AI conference peer review increasingly suffers from scale drift and non-comparable scoring: similar rationales can yield markedly different numeric ratings due to subjective calibration and occasional incoherent or strategic scoring, even though scores often strongly influence outcomes. This position paper argues that AI conference workflows should incorporate an LLM-driven calibration layer that maps reviewer rationales (e.g., strengths and weaknesses) into consistent and auditable anchor scores.	Zijin Chen; lesui Yu; Xiaofei Liao; Hai Jin; Qinbin Li;
444	Autoregressive Image Generation with Masked Bit Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing discrete generation methods struggle to capitalize on this insight, suffering from performance degradation or prohibitive training costs with scaled codebook. To address this, we propose masked Bit AutoRegressive modeling (BAR), a scalable framework that supports arbitrary codebook sizes.	Qihang Yu; Qihao Liu; Ju He; Xinyang Zhang; Yang Liu; Liang-Chieh Chen; Peter Chen;
445	Position: Stop Automating Peer Review Without Rigorous Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify two critical issues: 1) AI reviewers exhibit a hivemind effect* of excessive agreement within and across papers that reduces perspective diversity.*	Joachim Baumann; Jiaxin Pei; Sanmi Koyejo; Dirk Hovy;
446	Beyond Soft Labels: Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recently, DD’s increasing reliance on original images suggests a convergence of the two directions. To investigate this convergence trend, we propose a unified dataset compression (DC) benchmark.	Lingao Xiao; Songhua Liu; Yang He; Xinchao Wang;
447	Antidistillation Fingerprinting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce antidistillation fingerprinting* (ADFP), a principled approach that aligns the fingerprinting objective with the student’s learning dynamics.*	Yixuan Xu; John Kirchenbauer; Yash Savani; Asher Trockman; Alexander Robey; Tom Goldstein; Fei Fang; Zico Kolter;
448	Discriminative Visual Process Rewards for Scaling Thinking at Test-Time with Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Discriminative Visual Process Reward Model (DiscPRM), a multimodal PRM that jointly evaluates textual and visual intermediate steps by modeling visual reasoning trajectories, image operations, and text-image consistency.	Bo-Wen Yin; Qize Yang; Boyuan Sun; Xihan Wei; Qibin Hou;
449	Reason with Thumbnails, Answer with Focus: An Efficient and Effective Paradigm for Multimodal Grounded Visual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To attain efficient and effective GVR, in this paper, we propose a novel paradigm called Reason with Thumbnails, Answer with Focus (RTAF), which feeds the model with low-resolution images to reason the relevant regions and high-resolution crops to answer the final answer.	An-Lan Wang; Guozhi Tang; Lei Liao; Hanshen Zhu; Kai Huang; Jingqun Tang; Jiaming Zhou; Kun-Yu Lin;
450	MixReasoning: Switching Modes to Think Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose MixReasoning, a framework that dynamically adjusts the depth of reasoning within a single response.	Haiquan Lu; Gongfan Fang; Xinyin Ma; Qi Li; Xinchao Wang;
451	Delving Into Non-Exchangeability for Conformal Prediction in Graph-Structured Multivariate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the spectral graph theory, such coupling resides in global trends and can be characterized by the low-frequency components, while high-frequency components are nearly exchangeable. Therefore, we propose a novel concept named Spectral Graph Conditional Exchangeability (SGCE), which conditions exchangeable high-frequency components on low-frequency ones to preserve global trends and enable effective CP in the spectral domain.	Ruichao Guo; Xingyao Han; Wenshui Luo; Zhe Liu; Chen Gong; Hesheng Wang;
452	ACTIVE-o3 : Empowering MLLMs with Active Perception Via Pure Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We first provide a systematic definition of MLLM-based active perception tasks and show that GPT-o3’s zoom-in strategy can be viewed as a special case, though it suffers from low efficiency and inaccurate region selection. To address these issues, we propose Active-o3, a reinforcement learning framework built on GRPO that equips MLLMs with active perception capabilities.	Muzhi Zhu; Hao Zhong; Canyu Zhao; Zongze Du; Mingyu Liu; Zheng Huang; Anzhou Li; Hao Chen; Cheng Zou; Jingdong Chen; Ming Yang; Chunhua Shen;
453	SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Empirically, we show that naive initialization at this stage disrupts activation statistics, triggering loss spikes, while copy-based initialization introduces gradient symmetry that hinders feature diversity. To address these issues, we propose SPARKLING (balancing Signal Preservation And symmetRy breaKing for width-progressive LearnING), a novel framework for mid-stage width expansion.	Qifan Yu; Xinyu Ma; Zhijian Zhuo; Minrui Wang; Deyi Liu; Shiyi Zhan; Yiyuan Ma; liang xiang; Xingyan Bin; Di He;
454	ObjEmbed: Towards Universal Multimodal Object Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present ObjEmbed, a novel MLLM embedding model that decomposes the input image into multiple regional embeddings, each corresponding to an individual object, along with global embeddings.	Shenghao Fu; Yukun Su; Fengyun Rao; Jing LYU; Xiaohua Xie; Wei-Shi Zheng;
455	OvisOCR: End-to-End Document Parsing Via Aligning Specialized Perception with General Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents OvisOCR, a lightweight and strictly end-to-end Multimodal Language Model (MLLM) tailored for document parsing.	Jun-Peng Jiang; Shiyin Lu; An-Yang Ji; Yinglun Li; Qing-Guo Chen; Zhao Xu; Weihua Luo; Kaifu Zhang; De-Chuan Zhan; Han-Jia Ye;
456	Dissecting Post-Training: Uncovering The Complementary Roles of SFT and RL for Document Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We further ground this phenomenon in the distinct theoretical nature of their respective objective functions. Based on these findings, we introduce a unified strategy that explicitly harnesses their individual strengths while mitigating their weaknesses.	Jun-Peng Jiang; An-Yang Ji; Shiyin Lu; Guodong Zheng; Weihong Zhang; Qing-Guo Chen; Weihua Luo; Kaifu Zhang; Long Chen; De-Chuan Zhan; Han-Jia Ye;
457	Causal Disentangled Anchor Learning for Scalable Fair Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing fair multi-view clustering methods typically suffer from a severe trade-off between clustering utility and fairness, while incurring prohibitive quadratic complexity on large-scale datasets. To address these challenges, we propose Causal Disentangled Anchor Learning (CDAL), a novel framework that achieves scalable fairness via structural disentanglement.	Suyuan Liu; Shengfei Wei; Wenjing Yang; Shengju Yu; Siwei Wang; Xueqiong Li; Wenpeng Lu; Xinwang Liu;
458	Safe and Scalable Web Agent Learning Via Recreated Websites Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments.	Hyungjoo Chae; Jungsoo Park; Alan Ritter;
459	PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations.	Yunhan Yang; Chunshi Wang; Junliang Ye; YANG LI; Zanxin Chen; Zehuan Huang; Yao Mu; Zhuo Chen; Chunchao Guo; Xihui Liu;
460	Position: Agent Evaluation Should Be Agentified for Openness, Standardization, and Reproducibility Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison across diverse agent designs. This position paper argues that the root problem is the lack of an open, agent-agnostic assessment interface.	Xiaoyuan Liu; Tianneng Shi; Wenbo Guo; Dawn Song;
461	Trust3R: Unifying Feed-Forward Pointmap Prediction and Evidential Learning for Trust-Aware 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Geometric foundation models hold promise for unconstrained dense geometry prediction from uncalibrated images; however, in current feed-forward designs, their predicted confidence scores are heuristic, lack probabilistic interpretation, and often fail to indicate where and how much the predicted geometry can be trusted. To fill this gap, we present *Trust3R*, a trust-aware 3D reconstruction framework that pairs a lightweight gated residual mean refinement with evidential learning to predict pointmap evidence under a Normal-Inverse-Wishart prior and yield a closed-form multivariate Student-t predictive distribution.	Zihao Zhu; Wenyuan Zhao; Nuo Chen; Chao Tian; Zhiwen Fan;
462	LoSA: Locality Aware Sparse Attention in Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this challenge, we observe that block-wise diffusion exhibits locality of representation changes across denoising steps: only a small fraction of tokens (active tokens) undergo significant hidden-state updates, while most tokens (stable tokens) remain nearly unchanged. Based on this insight, we propose LoSA (Locality-aware Sparse Attention), which reuses cached prefix-attention results for stable tokens and applies sparse attention only to active tokens with large representation changes.	Haocheng Xi; Harman Singh; Yuezhou Hu; Coleman Hooper; Rishabh Tiwari; Aditya Tomar; Wonjun Kang; Minjae Lee; Michael Mahoney; Chenfeng Xu; Kurt Keutzer; Amir Gholaminejad;
463	Scaling Long-Horizon Agent Via Context Folding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Context Folding, a framework that empowers agents to actively manage their working context.	Weiwei Sun; Lu Miao; Zhan Ling; Kang Liu; Xuesong Yao; Yiming Yang; Jiecao Chen;
464	Rethinking Human Intent to CAD: Parametric CAD Model Generation Via Cooperative Multi-Task Alignment and Spatial-Aware Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To support our study, we construct HiCAD, the first large-scale dataset aligning hand-drawn sketches, textual descriptions, and parametric CAD codes. Based on this, we introduce HiCAD, a two-stage framework comprising Cooperative Multi-Task Alignment to bridge the representational gap between heterogeneous inputs, and Spatial-Aware Reinforcement Learning to enforce geometric and topological consistency.	Qingwang Zhang; Jiahao Li; Xiangdong Zhou;
465	Improved Algorithms for Nash Welfare in Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although this notion has been extended to linear bandits, existing results suffer from suboptimality in ambient dimension $d$, stemming from proof techniques that rely on restrictive concentration inequalities. In this work, we resolve this open problem by introducing new analytical tools that yield an order-optimal Nash regret bound in linear bandits.	Dhruv Sarkar; Nishant Pandey; Sayak Ray Chowdhury;
466	OSF: On Pre-training and Scaling of Sleep Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With an enhanced pre-training and scaling recipe, we introduce OSF, a family of sleep FMs that achieves state-of-the-art performance across nine datasets on diverse sleep and disease prediction tasks.	Zitao Shuai; Zongzhe Xu; David Yang; Wei Wang; Yuzhe Yang;
467	Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Within this framework, we introduce estimators that efficiently quantify stage effects without retraining the model, accounting for both the data and key aspects of model optimization dynamics, including learning rate schedules, momentum, and weight decay.	Shichang Zhang; Hongzhe Du; Jiaqi Ma; Himabindu Lakkaraju;
468	Securing Multimodal AI Through Internal Information Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose FlowGuard, a lightweight inference-time framework that detects harmful inputs by monitoring internal multimodal consistency.	Jehyeok Yeon; Hyeonjeong Ha; Qiusi Zhan; Heng Ji;
469	NeurVLA: Unleashing Failure-Handling Capability of Vision-Language-Action Models Via Neural-Symbolic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These limitations lead to brittle decision-making when VLA models are deployed in novel tasks and environments. To address them, we propose NeurVLA, a neural-symbolic framework that jointly addresses failure correction and prevention via neural-symbolic reasoning and further internalizes these failure-handling capabilities into VLA models.	Xuqi Liu; Minghe Gao; Juncheng Li; Siliang Tang;
470	Data Agent: Learning to Select Data Via End-to-End Dynamic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods typically rely on task-specific handcrafted metrics or static/snapshot-based criteria to estimate sample importance, limiting scalability across learning paradigms and making it difficult to capture the evolving utility of data throughout training. To address this challenge, we propose Data Agent, an end-to-end dynamic data selection framework that formulates data selection as a training-aware sequential decision-making problem.	Suorong Yang; Fangjian Su; Hai Gan; Ziqi Ye; Jie Li; Baile Xu; Furao Shen; Soujanya Poria;
471	Position: Agent Security Needs Redefinition Through A Holistic Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that agent security must be redefined through a holistic framework including four core components: identity (who: authority and authentication), task (what to do: authorized objectives), trajectory (progress: action-observation boundaries), and memory (what can be retrieved: information access control).	Vincent Siu; Jingxuan He; Kyle Montgomery; Zhun Wang; Chenguang Wang; Dawn Song;
472	Test-time Generalization for Physics Through Neural Operator Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a method to enhance generalization at test-time, i.e, without modifying pretrained weights.	Louis Serrano; Rudy Morel; Jiequn Han; Edouard Oyallon; Shirley Ho;
473	Better, Faster: Harnessing Self-Improvement in Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose HSIR, which effectively Harnesses Self-Improvement in large Reasoning models via two simple-yet-effective approaches.	Qihuang Zhong; Liang Ding; Juhua Liu; Bo Du; Leszek Rutkowski; Dacheng Tao;
474	Learn to Think: Improving Multimodal Reasoning Through Vision-Aware Self-Improvement Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose VISTA, a VIsion-aware Self-improvement Training framework for enhancing the multimodal Reasoning of MLLMs.	Qihuang Zhong; Liang Ding; Wenjie Xuan; Juhua Liu; Bo Du; Dacheng Tao;
475	Rethinking Multimodal Time-Series Forecasting Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new context-enriched, multimodal time series forecasting benchmark TimesX.	Haoxin Liu; Yichen Zhou; Rajat Sen; B. Aditya Prakash; Abhimanyu Das;
476	CUARewardBench: Benchmark for Evaluating Reward Models on Computer-using Agent Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: (4) Unanimous Prompt Ensemble (UPE): Based on the insights from our comprehensive analysis, we propose UPE, a novel ensemble method that significantly enhances reward model reliability through strict unanimous voting and strategic prompt-template configurations.	Haojia Lin; Xiaoyu Tan; Yulei Qin; Zihan Xu; Yuchen Shi; Zongyi Li; Gang Li; Shaofei Cai; Siqi Cai; Yuzheng Cai; Chaoyou Fu; Ke Li; Xing Sun;
477	Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Discrete Diffusion VLA, a unified-transformer policy that models discretized action chunks with discrete diffusion retaining progressive refinement inside the VLM backbone.	Zhixuan Liang; Yizhuo Li; Tianshuo Yang; CHENGYUE WU; Sitong Mao; Liuao Pei; Tian Nian; Shunbo Zhou; Xiaokang Yang; Jiangmiao Pang; Yao Mu; Ping Luo;
478	Debate2Create: Robot Co-design Via Multi-Agent LLM Debate Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Debate2Create (D2C), a multi-agent LLM framework that formulates robot co-design as structured, iterative debate grounded in physics-based evaluation.	Kevin Qiu; Marek Cygan;
479	A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To enable more reliable evaluation, we propose ReliableBench, a benchmark of behaviors that remain more consistently judgeable, and JudgeStressTest, a dataset designed to expose judge failures.	Leo Schwinn; Moritz Ladenburger; Tim Beyer; Mehrnaz Mofakhami; Gauthier Gidel; Stephan Günnemann;
480	Revealing Long-context Potential of Attention Heads Via Frequency Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we use kernel methods to analyze static frequency kernels formed by different rotation frequency components of attention heads, and we design a Long-context Potential Score (LPS) to measure the potential of attention heads in processing long contexts.	Senyu Han; Yilu Cao; Kai Yu; Lu Chen;
481	When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we analyze the per-sample gradient of BT-loss and shows spurious learning signals due to representation distance.	Tong Xie; Ching-Yuan Bai; Yuanhao Ban; Yunqi Hong; Haoyu Li; Cho-Jui Hsieh;
482	Group Distributionally Robust Optimization-Driven RL for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our approach is principled and theory-driven: we provide no-regret guarantees for the Prompt-GDRO game (via an entropy-regularized GDRO surrogate) and a variance-proxy analysis that yields a square-root optimal compute allocation for Rollout-GDRO.	Kishan Panaganti; Zhenwen Liang; Wenhao Yu; Haitao Mi; Dong Yu;
483	Uncovering The Gradient Geometry of Long CoT: A Spectral-guided Approach to Reasoning Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This often leads student models to memorize superficial patterns rather than acquire generalizable reasoning capabilities. To better understand this limitation, we introduce \textit{Loss Subspace Attribution}, a gradient decomposition analysis approach that uncovers a striking geometric structure: Gradients corresponding to effective reasoning predominantly lie within a low-rank consensus subspace, while conflicting or unstructured signals dominate the residual subspace.	Sinan Fan; Xiaofeng Sun; Chen Shen; Chenxi Huang; Shaotian Yan; Bing Wang; kaiyuan liu; Xiaosong Yuan; Liang Xie; Wenxiao Wang; Jun Zhang; Hongyang Chen; Jieping Ye;
484	One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We categorize RM failures by complexity and propose a simple post-hoc intervention to mitigate low-complexity biases that arise from spurious correlations.	Daniel Fein; Max Lamparth; Violet Xiang; Mykel Kochenderfer; Nick Haber;
485	Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although general capability evaluations are widespread, social impact assessments covering bias, fairness, privacy, environmental costs, and labor remain uneven. To characterize this landscape, we conduct the first comprehensive analysis of social impact evaluation reporting, examining 186 first-party release reports and 248 third-party evaluation sources, supplemented by developer interviews.	Anka Reuel; Avijit Ghosh; Jenny Chim; Andrew Tran; Yanan Long; Jennifer Mickel; Usman Gohar; Srishti Yadav; Pawan Sasanka Ammanamanchi; Mowafak Allaham; Hossein A. Rahmani; Mubashara Akhtar; Felix Friedrich; Robert Scholz; Michael Riegler; Jan Batzner; Eliya Habba; Arushi Saxena; Anastassia Kornilova; Kevin Wei; Prajna Soni; Yohan Mathew; Kevin Klyman; Jeba Sania; Subramanyam Sahoo; Olivia Bruvik; Pouya Sadeghi; Sujata Goswami; Angelina Wang; Yacine Jernite; Zeerak Talat; Stella Biderman; Mykel Kochenderfer; Sanmi Koyejo; Irene Solaiman;
486	Breaking Manifold Continuity: Vector Quantized Modeling for Real-Centric Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In order to further enhance the generalization of discrete modeling, we propose an adaptive tangent space projection mechanism that yields a continuous relaxation of the discrete real distribution within a controllable range.	Changshuo Wang; Jiangming Wang; Ke-Yue Zhang; Taiping Yao; Shouhong Ding; Ran Yi; Lizhuang Ma;
487	Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We hypothesize that generalizable reasoning emerges through learning task-conditioned attractors.	Benhao Huang; Zhengyang Geng; Zico Kolter;
488	Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We address the problem of gradient estimation for stochastic differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions.	Felix Petersen; Christian Borgelt; Aashwin Mishra; Stefano Ermon;
489	Spatially-Regularized Entropy for Discriminative Token Merging in Fine-Grained Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In fine-grained retrieval, these approaches often discard or smooth out subtle but discriminative local details. To resolve this, we propose SRE-Merge, a training-free framework designed for discriminative token compression.	Shangze Li; Yifan Xu; Jingmiao Liang; Yongfei Zhang; Yuzhuo Ma; Yingbo Qu;
490	Real-Time and Lightweight Diffusion Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore the design of real-time and lightweight diffusion codecs by addressing two pivotal questions.	Zhaoyang Jia; Naifu Xue; Zihan Zheng; Jiahao Li; Bin Li; Xiaoyi Zhang; Zongyu Guo; Yuan Zhang; Houqiang Li; Yan Lu;
491	VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The challenge lies in the difficulty of finding sufficient training videos with the intended uncommon camera motions. To address this challenge, we propose VividCam, a training paradigm that enables diffusion models to learn complex camera motions from synthetic videos, releasing the reliance on collecting realistic training videos.	Qiucheng Wu; Handong Zhao; Zhixin Shu; Jing Shi; Yang Zhang; Shiyu Chang;
492	LaST$_{0}$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To mitigate these limitations, we propose LaST$_0$, a framework that enables efficient reasoning before acting through a Latent Spatio-Temporal Chain-of-Thought (CoT), capturing fine-grained physical and robotic dynamics that are often difficult to verbalize. Specifically, we introduce a token-efficient latent CoT space that models future visual dynamics, 3D structural information, and robot proprioceptive states, and further extends these representations across time to enable temporally consistent implicit reasoning trajectories.	Zhuoyang Liu; Jiaming Liu; Hao Chen; Jiale Yu; Ziyu Guo; Chengkai Hou; Xiangju Mi; Chenyang Gu; Renrui Zhang; Kun Wu; Zhengping Che; Jian Tang; Pheng Ann Heng; Shanghang Zhang;
493	When Do Hallucinations Arise? A Graph Perspective on The Evolution of Path Reuse and Path Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We model next-token prediction as a graph search process over an underlying graph, where entities correspond to nodes and learned transitions form edges.	Xinnan Dai; Kai Yang; cheng Luo; Shenglai Zeng; Kai Guo; Jiliang Tang;
494	From Directions to Regions: Decomposing Activations in Language Models Via Local Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we leverage Mixture of Factor Analyzers (MFA) as a scalable, unsupervised alternative that models the activation space as a collection of Gaussian regions with their local covariance structure.	Or Shafran; Shaked Ronen; Omri Fahn; Shauli Ravfogel; Atticus Geiger; Mor Geva;
495	When to Trust The Cheap Check: Weak and Strong Verification for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce metrics capturing incorrect acceptance, incorrect rejection, and strong-verification frequency.	Shayan Kiyani; Sima Noorani; George Pappas; Hamed Hassani;
496	Walrus: A Cross-domain Foundation Model for Continuum Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Experiments show that \Walrus\ outperforms prior foundation models on both short- and long-term prediction horizons on downstream tasks and across the breadth of pretraining data, while ablation studies confirm the value of our contributions to forecast stability, training throughput, and transfer performance over conventional approaches.	Michael McCabe; Payel Mukhopadhyay; Tanya Marwah; Bruno Régaldo-Saint Blancard; François Rozet; Cristiana Diaconu; Lucas Meyer; Kaze Wong; Hadi Sotoudeh; Alberto Bietti; Irina Espejo; Rio Fear; Siavash Golkar; Tom Hehir; Keiya Hirashima; Geraud Krawezik; Francois Lanusse; Rudy Morel; Ruben Ohana; Liam Parker; Mariel Pettee; Jeff Shen; Kyunghyun Cho; Miles Cranmer; Shirley Ho;
497	Caracal: Causal Architecture Via Spectral Mixing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we introduce Caracal, a novel architecture that replaces self-attention with a parameter-efficient, $\mathcal{O}(L \log L)$ Multi-Head Fourier (MHF) module.	BINGZHENG GAN; Tianyi Zhang; LI YUSU; Jing Huang; Wei Shi; Yangkai Ding; Tao Yu;
498	TopAdapter: Topology-Aware Prompt Tuning for Efficient Point Cloud Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing parameter-efficient fine-tuning (PEFT) methods predominantly focus on input token prompting, overlooking the intrinsic geometric information. To address this limitation, we propose TopAdapter, a novel PEFT framework that enhances geometric perception by injecting local topological information into pre-trained 3D vision models.	Changshuo Wang; Shuting He; Xiang Fang; Weijun Li; Yixian Shen; Mingkun Xu; Zhongtian Sun; Prayag Tiwari;
499	Video-SALMONN S: Memory-Enhanced Streaming Audio-Visual LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce video-SALMONN S, a memory-enhanced streaming audio-visual large language model that processes over 3-hour videos at $1$ FPS and $360$p resolution, outperforming strong non-streaming models under the same memory budget.	Guangzhi Sun; Yixuan Li; Xiaodong Wu; Yudong Yang; Wei Li; Zejun MA; Chao Zhang;
500	On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a systematic empirical study that examines horizon length through controlled task constructions.	Sunghwan Kim; Junhee Cho; Beong-woo Kwak; Taeyoon Kwon; Liang Wang; Nan Yang; Xingxing Zhang; Furu Wei; Jinyoung Yeo;

This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~6,500 papers), please visit Paper Digest: ICML-2026 (Full List).