Paper Digest: ACL 2026 Papers & Highlights

June 27, 2026July 23, 2026 admin

Annual Meeting of the Association for Computational Linguistics (ACL) is one of the top natural language processing conferences in the world. To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights to quickly get the main idea of each paper.

Search within ACL-2026

Literature review on a topic

Generate a written review of ACL-2026 research on any topic, with each claim cited to specific papers.

Browse & explore

Browse ~ 11,000 authors (ACL-2026), or explore the “Best Paper” Digest listing the most influential ACL papers of recent years.

Note: ACL-2026 accepts more than 2,400 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 2,400 ACL-2026 papers in a separate page, which takes quite some time to load.

Since 2018, Paper Digest has built a foundation of data spanning decades of conferences, journals, and research topics. The platform features a daily digest service that sifts through tens of thousands of new papers, clinical trials, news articles, and community posts, filtering the noise to highlight what matters most to specific interests. Beyond daily updates, dozens of built-in research tools streamline the academic workflow, supporting efficient reading and writing, comprehensive literature reviews, and automated research report generation.

Paper Digest Team
New York City, New York, 10017
team@paperdigest.org

TABLE 1: Paper Digest: ACL 2026 Papers & Highlights

	Paper	Author(s)
1	DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning.	Yinger Zhang; Shutong Jiang; Renhao Li; Jianhong Tu; Yang Su; Lianghao Deng; Xudong Guo; ChenXu Lv; Junyang Lin;
2	From Completion to Editing: Unlocking Context-Aware Code Infilling Via Search-and-Replace Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Chat LLMs offer safety and Agentic workflows provide flexibility, they suffer from performance degradation and prohibitive latency, respectively. To resolve this dilemma, we propose Search-and-Replace Infilling (SRI), a framework that internalizes the agentic verification-and-editing mechanism into a unified, single-pass inference process.	Jiajun Zhang; Zeyu Cui; Jiaxi Yang; Lei Zhang; Yuheng Jing; Zeyao Ma; Tianyi Bai; Zilei Wang; Qiang Liu; Liang Wang; Binyuan Hui; Junyang Lin;
3	Outcome Accuracy Is Not Enough: Aligning The Reasoning Process of Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Rationale Consistency, a fine-grained metric that quantifies the alignment between the model’s reasoning process and human judgment.	Binghai Wang; Yantao Liu; Yuxuan Liu; Tianyi Tang; Shenzhi Wang; Chang Gao; Chujie Zheng; Yichang Zhang; Le Yu; Shixuan Liu; Tao Gui; Qi Zhang; Xuanjing Huang; Bowen Yu; Fei Huang; Junyang Lin;
4	LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Masked diffusion language models present a promising paradigm for language modeling, yet the systematic theoretical analysis and comprehensive empirical validation of their alignment on general tasks remain relatively underexplored. In this paper, we identify the primary challenge for this problem: the high variance in Evidence Lower Bound (ELBO)-based likelihood estimates required for preference optimization.	Fengqi Zhu; Rongzhen Wang; Shen Nie; Xiaolu Zhang; Chunwei Wu; Jun Zhou; Yankai Lin; Ji-Rong Wen; Chongxuan Li;
5	PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window.	Jingcheng Hu; Yinmin Zhang; Shijie Shang; Xiaobo Yang; Yue Peng; Zhewei Huang; Hebin Zhou; Xin Wu; Jie Cheng; Fanqi Wan; Xiangwen Kong; Chengyuan Yao; Kaiwen Yan; Ailin Huang; Hongyu Zhou; Qi Han; Zheng Ge; Xiangyu Zhang; Heung-Yeung Shum;
6	Video-MMMU: Evaluating Knowledge Acquisition from Multidisciplinary Professional Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing video benchmarks fail to evaluate the knowledge acquisition capabilities of Large Multimodal Models (LMMs). To address this gap, we introduce Video-MMMU, a multi-modal, multi-discipline, multi-track benchmark that evaluates LMMs’ ability to acquire knowledge from college-level, educational videos.	Kairui Hu; Penghao Wu; Fanyi Pu; Wang Xiao; Xiang Yue; Bo Li; Yuanhan Zhang; Ziwei Liu;
7	SPARKLE: A Structured and Plug-and-play Agentic Retrieval Policy for Adaptive RAG Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods either rely on frozen large language models (LLMs) without explicit supervision or require costly LLM finetuning. Therefore, we propose SPARKLE, a structured and plug-and-play agentic retrieval policy where an additional proxy model is introduced to control the retrieval process.	Jinyuan Fang; Zaiqiao Meng; Craig Macdonald;
8	WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, misalignment with real-world user preferences, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with LLMs to create preference datasets automatically.	Taiwei Shi; Zhuoer Wang; Longqi Yang; Ying-Chun Lin; Zexue He; Mengting Wan; Pei Zhou; Sujay Kumar Jauhar; Sihao Chen; Shan Xia; Hongfei Zhang; Jieyu Zhao; Xiaofeng Xu; Xia Song; Jennifer Neville;
9	It’s Not What You Say, It’s How You Say It: Evaluating LLM Responses to Expressions of Belief Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a typology to systematically evaluate how different EoBs affect whether models follow context versus prior knowledge.	Kevin Du; Clara Kümpel; Michelle Wastl; Alex Warstadt;
10	What Do Prosody and Text Convey? Characterizing How Meaningful Information Is Distributed Across Multiple Channels Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose an information-theoretic approach to quantify how much is conveyed by prosody that is not recoverable from text alone, and, crucially, what prosody conveys.	Aditya Yadavalli; Tiago Pimentel; Tamar I Regev; Ethan Gotlieb Wilcox; Alex Warstadt;
11	Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries.	Sikuan Yan; Xiufeng Yang; Zuchao Huang; Ercong Nie; Zifeng Ding; Zonggen Li; Xiaowen Ma; Jinhe Bi; Kristian Kersting; Jeff Z. Pan; Hinrich Schuetze; Volker Tresp; Yunpu Ma;
12	AgentGym2: Benchmarking Large Language Model Agents in De-Idealized Real-World Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Consequently, they understate the difficulty of real deployments, where uncertainty and noise are ubiquitous and agents must proactively explore the environment to uncover new tools. To bridge this gap, we present AgentGym2, a new evaluation framework with task instances grounded in real-world end-to-end working demands.	Zhiheng Xi; Dingwen Yang; Jiaqi Liu; Jixuan Huang; Honglin Guo; Baodai Huang; Tinggang Chen; Qi Zhang; Zhonghang Lu; Chenyu Liu; Jiajun Sun; Jiazheng Zhang; Dingwei Zhu; Xin Guo; Junzhe Wang; Zhihao Zhang; Yuming Yang; Junjie Ye; Minghe Gao; Dongrui Liu; Jiaming Ji; Guohao Li; Tao Gui; Qi Zhang; Xuanjing Huang;
13	Merlin’s Whisper: Enabling Efficient Reasoning in Large Language Models Via Black-box Persuasive Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work presents a new approach to mitigating overthinking in LRMs via black-box persuasive prompting.	Heming Xia; Cunxiao Du; Rui Li; Chak Tou Leong; Yongqi Li; Wenjie Li;
14	MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition.	Junbo Niu; Zheng Liu; Zhuangcheng Gu; Bin Wang; Linke Ouyang; Zhiyuan Zhao; Tao Chu; Tianyao He; Fan Wu; Qintong Zhang; Zhenjiang Jin; Guang Liang; Rui Zhang; Wenzheng Zhang; Yuan Qu; Zhifei Ren; Yuefeng Sun; Zirui Tang; Boyu Niu; Yuanhong Zheng; Dongsheng Ma; Ziyang Miao; Hejun Dong; Siyi Qian; Junyuan Zhang; Fangdong Wang; Jingzhou Chen; Xiaomeng Zhao; Liqun Wei; Wei Li; Shasha Wang; RuiLiang Xu; Yuanyuan Cao; Lu Chen; Qianqian Wu; Huaiyu Gu; Lindong Lu; Dechen Lin; Shenguanlin; Xuanhe Zhou; Linfeng Zhang; Yuhang Zang; Xiaoyi Dong; Jiaqi Wang; Bo Zhang; Lei Bai; Pei Chu; Weijia Li; Jiang Wu; Lijun Wu; Zhenxiang Li; Guangyu Wang; Zhongying Tu; Chao Xu; Kai Chen; Bowen Zhou; Dahua Lin; Wentao Zhang; Conghui He;
15	BadScientist: Can A Research Agent Write Convincing But Unsound Papers That Fool LLM Reviewers? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We develop a rigorous evaluation framework with formal error guarantees (concentration bounds and calibration analysis), calibrated on real data.	Fengqing Jiang; Yichen Feng; Yuetai Li; Luyao Niu; Basel Alomair; Radha Poovendran;
16	Temporal Sampling for Forgotten Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Inspired by the phenomenon of Temporal Forgetting, we proposed Temporal Sampling, a simple decoding strategy that draws outputs from multiple checkpoints along the training trajectory.	Yuetai Li; Zhangchen Xu; Fengqing Jiang; Bhaskar Ramasubramanian; Luyao Niu; Bill Yuchen Lin; Xiang Yue; Radha Poovendran;
17	When Efficiency Becomes A Vulnerability: Computational Cost Attacks on WebAgents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Adversaries can inject malicious prompts into web pages, causing WebAgents to generate unnecessarily long reasoning processes and incur excessive computational cost, termed Computational Cost Attacks (CCA). In this paper, to systematically study this vulnerability under realistic black-box settings, we propose CostBomb, a generation-then-selection attack framework that leverages large language models to generate diverse adversarial prompts and a reinforcement learning–enhanced selector to identify the most effective perturbations.	Liang-Bo Ning; Yuchen Zhu; Heqing Huang; Xin Wang; Yi Chang; Li Qing; Wenqi Fan;
18	Selective Knowledge Distillation: Fusing LLM Semantic Strengths with DNN Efficiency for Binary Code Similarity Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, LLM-based BCSD methods are constrained by their large model sizes and high inference latency. To alleviate these limitations, this paper proposes BinSKD.	Shize Zhou; Peiyu Liu; Lirong Fu; Tong Ye; Wenhai Wang;
19	LLM-VA: Resolving The Jailbreak-Overrefusal Trade-off Via Vector Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify the root cause: LLMs encode the decision to respond (answer vector va) and the judgment of input safety (benign vector vb) as nearly orthogonal directions, treating them as independent processes. We propose LLM-VA, which aligns va with vb through closed-form weight updates, making the model’s willingness to respond causally dependent on its safety assessment—without fine-tuning or architectural changes.	Haonan Zhang; Dongxia Wang; Yi Liu; Kexin Chen; Wenhai Wang;
20	REST: Stress Testing Large Reasoning Models By Asking Multiple Problems at Once Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This single-question setup suffers from two major limitations: (1) vulnerability to data contamination and diminishing difficulty, forcing costly creation of new questions with significant human effort, (2) failure to evaluate models under multi-context pressure, a key requirement for real-world deployment. To bridge this gap, we present REST (Reasoning Evaluation through Simultaneous Testing), a stress-testing framework that exposes LRMs to multiple problems simultaneously.	Zhuoshi Pan; Qizhi Pei; Yu Li; Zinan Tang; QiYao Sun; H. Vicky Zhao; Conghui He; Lijun Wu;
21	Accommodation and Epistemic Vigilance: A Pragmatic Account of Why LLMs Fail to Challenge Harmful Beliefs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent evaluations show that large language models (LLMs) frequently fail to challenge users’ harmful beliefs in domains ranging from medical advice to social reasoning. We present a unifying analysis through the lens of pragmatics: these safety failures can be understood and addressed as LLMs exhibiting excessive accommodation and insufficient epistemic vigilance.	Myra Cheng; Robert D. Hawkins; Dan Jurafsky;
22	Gated Differentiable Working Memory for Long-Context Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reframe test-time adaptation as a budget-constrained memory consolidation problem, asking: given limited computational budget, which parts of the context should be consolidated into working memory?	Lingrui Mei; Shenghua Liu; Yiwei Wang; Yuyao Ge; Baolong Bi; Jiayu Yao; Jun Wan; Ziling Yin; Jiafeng Guo; Xueqi Cheng;
23	HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Ideally, LLMs should offer informative responses while avoiding the disclosure of harmful and sensitive information. To address these challenges, we introduce HiddenGuard, a novel framework for fine-grained safe generation in LLMs.	Lingrui Mei; Shenghua Liu; Yiwei Wang; Baolong Bi; Ruibin Yuan; Xueqi Cheng;
24	How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we conduct an empirical study on how memory management choices impact the LLM agents’ behavior, especially their long-term performance.	Zidi Xiong; Yuping Lin; Wenya Xie; Pengfei He; Zirui Liu; Jiliang Tang; Himabindu Lakkaraju; Zhen Xiang;
25	AgentOCR: Reimagining Agent History Via Optical Self-Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AgentOCR, a framework that exploits visual tokens’ superior information density by representing the accumulated observation-action history as a compact rendered image.	Lang Feng; Fuchao Yang; Feng Chen; Xin Cheng; Haiyang Xu; Zhenglin Wan; Ming Yan; Bo An;
26	Too Long, Do Re-weighting for Efficient LLM Reasoning Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Thinking Length Data Re-weighting (TLDR), that does not rely on sophisticated data annotations or interpolation between multiple models.	Zhong-Zhi Li; Xiao Liang; Zihao Tang; Lei Ji; Peijie Wang; Haotian Xu; Xing W; Haizhen Huang; Weiwei Deng; Yeyun Gong; Zhijiang Guo; Xiao Liu; Fei Yin; Cheng-Lin Liu;
27	OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Modern coding scaffolds turn LLMs into capable software agents, but their ability to follow scaffold-specified instructions remains under-examined, especially when constraints are heterogeneous and persist across interactions. To fill this gap, we introduce OctoBench, which benchmarks scaffold-aware instruction following in repository-grounded agentic coding.	Deming Ding; Shichun Liu; Enhui Yang; Jiahang Lin; Ziying Chen; Shihan Dou; Honglin Guo; Weiyu Cheng; Pengyu Zhao; Chengjun Xiao; Qunhong Zeng; Qi Zhang; Xuanjing Huang; Qidi Xu; Tao Gui;
28	Graph-Based Alternatives to LLMs for Human Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Graph-basEd Models for Human Simulation (GEMS) which formulates close-ended simulation as link prediction on a heterogeneous graph of individuals and choices.	Joseph Suh; Suhong Moon; Serina Chang;
29	Experience-driven Multi-turn Reinforcement Learning for GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ̲Experience-driven ̲Multi-turn ̲Policy ̲Optimization (EMPO), which leverages expert trajectories as environment experiences for on-policy multi-turn training.	Zhengxi Lu; Jiabo Ye; Fei Tang; Yongliang Shen; Haiyang Xu; Ziwei Zheng; Weiming Lu; Ming Yan; Fei Huang; Jun Xiao; Yueting Zhuang;
30	CE-GPPO: Coordinating Entropy Via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Coordinating Entropy via Gradient-Preserving Policy Optimization (CE-GPPO), a novel algorithm that reintroduces gradients from clipped tokens in native PPO in a gentle and bounded manner.	Zhenpeng Su; Leiyu Pan; Minxuan Lv; Yuntao Li; Wenping Hu; Fuzheng Zhang; Kun Gai; Guorui Zhou;
31	Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with An Automated Examiner Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Full-Duplex-Bench-v2 (FDB-v2), a streaming framework that integrates with an automated examiner that enforces staged goals under two pacing setups (Fast vs. Slow).	Guan-Ting Lin; Shih-Yun Shan Kuan; Jiatong Shi; Kai-Wei Chang; Siddhant Arora; Shinji Watanabe; Hung-yi Lee;
32	Beyond The Context Window: Scaling Agentic RL Via End-to-end Optimized Context Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing multi-turn RL pipelines suffer from degraded instruction following, excessive rollout costs, and most importantly, strict context limits. In this work, to address these challenges, we introduce summarization-based context management to training.	Miao Lu; Weiwei Sun; Weihua Du; Zhan Ling; Xuesong Yao; Kang Liu; Jiecao Chen;
33	OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents Via Hybrid Validation in Realistic Workflows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To establish a foundation for mobile agent safety research, we introduce MobileRisk-Live, a dynamic sandbox environment accompanied by a safety detection benchmark comprising realistic trajectories with fine-grained annotations. Built upon this, we propose OS-Sentinel, a novel hybrid safety detection framework that synergistically combines a Formal Verifier for detecting explicit system-level violations with a VLM-based Contextual Judge for assessing contextual risks and agent actions.	Qiushi Sun; Mukai Li; Zhoumianze Liu; Zhihui Xie; Fangzhi Xu; Zhangyue Yin; Kanzhi Cheng; Zehao Li; Zichen Ding; Qi Liu; Zhiyong Wu; Zhuosheng Zhang; Ben Kao; Lingpeng Kong;
34	Stratagem: Learning Transferable Reasoning Via Trajectory-Modulated Game Self-Play Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present STRATAGEM, which addresses two fundamental barriers to reasoning transfer: domain specificity, where learned patterns remain anchored in game semantics, and contextual stasis, where static game contexts fail to cultivate progressive reasoning.	Xiachong Feng; Deyi Yin; Xiaocheng Feng; Yi Jiang; Libo Qin; Yangfan Ye; Lei Huang; Weitao Ma; Qiming Li; Yuxuan Gu; Bing Qin; Lingpeng Kong;
35	VFA: Empowering Multilingual MLLMs Via Vision-Free Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Vision-Free Adaptation (VFA), a framework that decouples multilingual language enhancement from visual alignment by composing complementary task vectors over a shared LLM backbone.	Yixia Li; Yaqing Shi; Zhiwen Ruan; Dongdong Zhang; Lingjie Jiang; Shaohan Huang; Yun Chen; Guanhua Chen; Furu Wei;
36	OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present OMIBench, a benchmark designed to evaluate Olympiad-level reasoning when the required evidence is distributed over multiple images.	Qiguang Chen; Chengyu Luan; Jiajun Wu; Qiming Yu; Yi Yang; Yizhuo Li; Jingqi Tong; Xiachong Feng; Libo Qin; Wanxiang Che;
37	Thinking Beyond The Anthropomorphic Paradigm Benefits LLM Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Anthropomorphism, or the attribution of human traits to technology, is an automatic and unconscious response that occurs even in those with advanced technical expertise. In this position paper, we analyze hundreds of thousands of research articles to present empirical evidence of the prevalence and growth of anthropomorphic terminology in research on large language models (LLMs).	Lujain Ibrahim; Myra Cheng;
38	LLM-as-Scheduler: Agentic Workflow Dynamic Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In practice, many queries do not need such heavy processing and can be handled well by a single strong agent. To address this inefficiency, we propose LLM-as-Scheduler (LAS), a system that dynamically chooses the right workflow for each query.	Dawei Xiang; Kexin Chu; Wenyan Xu; Wenhui Zhang; Wei Zhang;
39	UI-Copilot: Advancing Long-Horizon GUI Automation Via Tool-Integrated Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, long-horizon scenarios remain challenging, as these agents are burdened with tasks beyond their intrinsic capabilities, suffering from memory degradation, progress confusion, and math hallucination. To address these challenges, we present UI-Copilot, a collaborative framework where the GUI agent focuses on task execution while a lightweight copilot provides on-demand assistance for memory retrieval and numerical computation.	Zhengxi Lu; Fei Tang; Guangyi Liu; Jin Ma; Kaitao Song; Xu Tan; Wenqi Zhang; Weiming Lu; Jun Xiao; Yueting Zhuang; Yongliang Shen;
40	AI Use in American Newspapers Is Widespread, Uneven, and Rarely Disclosed Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: AI is rapidly transforming journalism, but the extent of its use in published newspaper articles remains unclear. We address this gap by auditing a large-scale dataset of 186K articles from online editions of 1.	Jenna Russell; Marzena Karpinska; Destiny Akinode; James Zhou; Katherine Thai; Bradley Emi; Max Spero; Mohit Iyyer;
41	PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Professional Reasoning Bench (PRBench), a realistic, open-ended, and difficult benchmark of real-world problems in Finance and Law.	Afra Feyza Akyürek; Advait Gosai; Chen Bo Calvin Zhang; Vipul Gupta; Jaehwan Jeong; Anisha Gunjal; Tahseen Rabbani; Maria Mazzone; David Randolph IV; Mohammad Mahmoudi Meymand; Gurshaan Chattha; Paula Rodriguez; Diego A. Mares Buendia; Pavit Singh; Michael Liu; Subodh Chawla; Peter Cline; Lucy Ogaz; Ernesto Gabriel Hernández Montoya; Zihao Wang; Pavi Bhatter; Marcos Ayestaran; Bing Liu; Yunzhong He;
42	Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Hard2Verify, a human-annotated, step-level verification benchmark produced with over 500 hours of human labor.	Shrey Pandit; Austin Xu; Xuan-Phi Nguyen; Yifei Ming; Caiming Xiong; Shafiq Joty;
43	Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This effect leads to increasingly volatile importance-ratio signals and bursty clipping behavior, which consistently precede training collapse. Motivated by this diagnosis, we propose Router-Shift Policy Optimization (RSPO).	Di Zhang; Xun Wu; Shaohan Huang; Lingjie Jiang; Yaru Hao; Li Dong; Zewen Chi; Zhifang Sui; Furu Wei;
44	AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, the increasing complexity of proposed methods often renders reproduction a labor-intensive endeavor, necessitating profound domain expertise. To address this, we introduce the paper lineage, which systematically mines implicit knowledge from the cited literature.	Xuanle Zhao; Zilin Sang; Yuxuan Li; Qi Shi; Weilun Zhao; Shuo Wang; Duzhen Zhang; Xu Han; Zhiyuan Liu; Maosong Sun;
45	Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Mem-Gallery features high-quality multi-session conversations grounded in both visual and textual information, with long interaction horizons and rich multimodal dependencies. Building on this dataset, we propose a systematic evaluation framework that assesses key memory capabilities along three functional dimensions: memory extraction and test-time adaptation, memory reasoning, and memory knowledge management.	Yuanchen Bei; Tianxin Wei; Xuying Ning; Yanjun Zhao; Zhining Liu; Xiao Lin; Yada Zhu; Hendrik Hamann; Jingrui He; Hanghang Tong;
46	The Pitfalls of KV Cache Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify several pitfalls that practitioners should be aware of when deploying KV cache compressed LLMs.	Alex Chen; Renato Geh; Aditya Grover; Guy Van Den Broeck; Daniel Mingyi Israel;
47	J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We make three key contributions: (1) We propose the Equivalent Initial State Group Relative Policy Optimization (EIS-GRPO) algorithm, which allows us to train our judge to be robust to positional biases that arise in more complex evaluation settings.	Austin Xu; Yilun Zhou; Xuan-Phi Nguyen; Caiming Xiong; Shafiq Joty;
48	Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present Apertus, a fully open suite of large language models (LLMs) designed to address responsibility shortcomings in today’s open model ecosystem, namely data responsibility and global representation.	Alejandro Hernández-Cano; Alexander Hägele; Allen Hao Huang; Angelika Romanou; Antoni-Joan Solergibert; Barna Pásztor; Bettina Messmer; Dhia Garbaya; Eduard Frank Ďurech; Ido Hakimi; Juan Garcia Giraldo; Mete Ismayilzada; Negar Foroutan; Skander Moalla; Tiancheng Chen; Vinko Sabolčec; Yixuan Xu; Michael Aerni; Badr AlKhamissi; Inés Altemir Marinas; Mohammad Hossein Amani; Matin Ansaripour; Ilia Badanin; Harold Benoit; Emanuela Boros; Nicholas John Browning; Fabian Bösch; Maximilian Böther; Niklas Canova; Camille Challier; Clément Charmillot; Jonathan Coles; Jan Milan Deriu; Arnout Devos; Lukas Drescher; Daniil Dzenhaliou; Maud Ehrmann; Dongyang Fan; Simin Fan; Silin Gao; Miguel Gila; María Grandury; Diba Hashemi; Alexander Miserlis Hoyle; Jiaming Jiang; Mark Klein; Andrei Kucharavy; Anastasiia Kucherenko; Frederike Lübeck; Roman Machacek; Theofilos Ioannis Manitaras; Andreas Marfurt; Kyle Matoba; Simon Matrenok; Henrique Mendonça; Fawzi Roberto Mohamed; Syrielle Montariol; Luca Mouchel; Sven Najem-Meyer; Jingwei Ni; Gennaro Oliva; Matteo Pagliardini; Elia Palme; Andrei Panferov; Léo Paoletti; Marco Passerini; Ivan Pavlov; Auguste Poiroux; Kaustubh Ponkshe; Nathan Ranchin; Javier Rando; Mathieu Sauser; Jakhongir Saydaliev; Mukhammadali Sayfiddinov; Marian Schneider; Stefano Schuppli; Marco Scialanga; Andrei Semenov; Kumar Shridhar; Raghav Singhal; Anna Sotnikova; Alexander Sternfeld; Ayush Kumar Tarun; Paul Teiletche; Jannis Vamvas; Xiaozhe Yao; Hao Zhao; Alexander Ilic; Ana Klimovic; Andreas Krause; Caglar Gulcehre; David Rosenthal; Elliott Ash; Florian Tramèr; Joost VandeVondele; Livio Veraldi; Martin Rajman; Thomas C. Schulthess; Torsten Hoefler; Antoine Bosselut; Martin Jaggi; Imanol Schlag;
49	Reinforcement Learning for Diffusion LLMs Via Energy-Based Gibbs Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Diffusion-Gibbs Alignment (DGA), a novel variational framework that reformulates RL for dLLMs as a distribution matching problem.	Yijia Fan; Jing Yang; Mingyu Liu; Kaitong Cai; Jian Wang; Keze Wang; Jusheng Zhang;
50	Simulated Students in Tutoring Dialogues: Substance or Illusion? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Surprisingly, little work has been done to ensure or even measure the quality of simulated students. In this work, we formally define the student simulation task, propose a set of evaluation metrics that span linguistic, behavioral, and cognitive aspects, and benchmark a wide range of student simulation methods on these metrics.	Alexander Scarlatos; Jaewook Lee; Simon Woodhead; Andrew Lan;
51	LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This often results in incomplete evidence retrieval and degraded answer quality for multi-page reasoning tasks. To address these limitations, we propose LAD-RAG, a novel Layout-Aware Dynamic RAG framework.	Zhivar Sourati; Zheng Wang; Marianne Menglin Liu; Yazhe Hu; Mengqing Guo; Sujeeth Bharadwaj; Kyu J. Han; Tao Sheng; Sujith Ravi; Morteza Dehghani; Dan Roth;
52	ReFL: Reflective Feedback Learning for Hallucination Detection of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing hallucination detection methods either depend on external knowledge sources, incurring high computational costs and limiting real-time applicability, or extract the model’s internal states, leading to poor generalization. To address these issues, this paper proposes ReFL, a hallucination detection framework.	Cunhang Fan; Jun Zhang; Xue Zhang; Shuai Zhang; Zhao Lv; Jianhua Tao; Zhengqi Wen;
53	Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As such, they tend to overlook a critical upstream factor: the role of the original safety-alignment data. This paper therefore investigates the degradation of safety guardrails through the lens of representation similarity between upstream alignment datasets and downstream fine-tuning tasks.	Lei Hsiung; Tianyu Pang; Yung-Chen Tang; Linyue Song; Tsung-Yi Ho; Pin-Yu Chen; Yaoqing Yang;
54	EvoRoute: Experience-Driven Self-Routing LLM Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We formalize this challenge as the Agent System Trilemma: the inherent tension among achieving state-of-the-art performance, minimizing monetary cost, and ensuring rapid task completion. To dismantle this trilemma, we introduce EvoRoute, a self-evolving model routing paradigm that transcends static, pre-defined model assignments.	Guibin Zhang; Haiyang Yu; Kaiming Yang; Bingli Wu; Fei Huang; Yongbin Li; Shuicheng Yan;
55	ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ImplicitMemBench, the first systematic benchmark evaluating implicit memory through three cognitively grounded constructs drawn from standard cognitive-science accounts of non-declarative memory: Procedural Memory (one-shot skill acquisition after interference), Priming (theme-driven bias via paired experimental/control instances), and Classical Conditioning (Conditioned Stimulus–Unconditioned Stimulus (CS–US) associations shaping first decisions).	Chonghan Qin; Xiachong Feng; Weitao Ma; Xiaocheng Feng; Lingpeng Kong;
56	Characterizing The Expressivity of Local Attention in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although this restriction is usually motivated by efficiency, it has also been found to improve model quality, a phenomenon that has so far lacked a satisfactory explanation. We provide a formal account of this phenomenon in terms of recognizer expressivity.	Jiaoda Li; Ryan Cotterell;
57	From Word to World: Can Large Language Models Be Implicit Text-based World Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a three-level framework to evaluate LLM-based world models: (i) fidelity and consistency, (ii) scalability and robustness, and (iii) agent utility.	Yixia Li; Hongru Wang; Jiahao Qiu; Zhenfei Yin; Dongdong Zhang; Cheng Qian; Zeping Li; Xiaoteng Ma; Guanhua Chen; Heng Ji;
58	Calibrating LLM Judges: Linear Probes for Fast and Reliable Uncertainty Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce linear probes trained with a Brier score-based loss to provide calibrated uncertainty estimates from reasoning judges’ hidden states, requiring no additional model training.	Bhaktipriya Radharapu; Eshika Saxena; Kenneth Li; Chenxi Whitehouse; Adina Williams; Nicola Cancedda;
59	A Survey of Large Language Model-Based Search Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This survey provides the first systematic analysis of search agents.	Yunjia Xi; Jianghao Lin; Yongzhao Xiao; Zheli Zhou; Rong Shan; Te Gao; Jiachen Zhu; Weiwen Liu; Yong Yu; Weinan Zhang;
60	OctoTools: A Multi-Agent Framework with Extensible Tools for Complex Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible multi-agent framework designed to tackle complex reasoning across diverse domains.	Pan Lu; Bowen Chen; Sheng Liu; Rahul Thapa; Joseph Boen; James Zou;
61	Current Agents Fail to Leverage World Model As Tool for Foresight Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Agents built on vision-language models increasingly face tasks that demand anticipating future states rather than relying on short-horizon reasoning. Generative world models offer a promising remedy: agents could use them as external simulators to foresee outcomes before acting.	Cheng Qian; Emre Can Acikgoz; Bingxuan Li; Xiusi Chen; Yuji Zhang; Bingxiang He; Qinyu Luo; Gokhan Tur; Dilek Hakkani-Tür; Yunzhu Li; Heng Ji;
62	One Tokenizer To Rule Them All: Emergent Language Plasticity Via Multilingual Tokenizers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we study what relatively cheap interventions early on in training improve language plasticity, or adaptation capabilities of the model post-training to new languages.	Diana Abagyan; Alejandro R. Salamanca; Andres Felipe Cruz-Salinas; Kris Cao; Hangyu Lin; Acyr Locatelli; Marzieh Fadaee; Ahmet Üstün; Sara Hooker;
63	NavA3: Understanding Any Instruction, Navigating Anywhere, Finding Anything Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose NavA3, a hierarchical framework divided into two stages: global and local policies.	Lingfeng Zhang; Xiaoshuai Hao; Yingbo Tang; Haoxiang Fu; Xinyu Zheng; Pengwei Wang; Zhongyuan Wang; Wenbo Ding; Shanghang Zhang;
64	Native Hybrid Attention for Efficient Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Native Hybrid Attention (NHA), a novel hybrid architecture of linear and full attention that integrates both intra inter-layer hybridization into a unified layer design.	Jusen Du; Jiaxi Hu; Zhang Tao; Weigao Sun; Yu Cheng;
65	Interpretable Traces, Unexpected Outcomes: Investigating The Disconnect in Trace-Based Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To isolate the effect of trace semantics, we design experiments in the Question Answering (QA) domain using a rule-based problem decomposition method. This enables us to create Supervised Fine-Tuning (SFT) datasets for LLMs where – each QA problem is paired with either verifiably correct or incorrect CoT traces, while always providing the correct final solution.	Siddhant Bhambri; Upasana Biswas; Subbarao Kambhampati;
66	UniversalRAG: Retrieval-Augmented Generation Over Corpora of Diverse Modalities and Granularities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In contrast, real-world queries vary widely in the type of knowledge they require, which a single type of knowledge source cannot address. To address this, we introduce UniversalRAG, an any-to-any RAG framework designed to retrieve and integrate knowledge from heterogeneous sources with diverse modalities and granularities.	Woongyeong Yeo; Kangsan Kim; Soyeong Jeong; Jinheon Baek; Sung Ju Hwang;
67	MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Knowledge Poisoning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MM-PoisonRAG, a framework to systematically study the vulnerability of multimodal RAG under knowledge poisoning.	Hyeonjeong Ha; Qiusi Zhan; Jeonghwan Kim; Dimitrios Bralios; Saikrishna Sanniboina; Nanyun Peng; Kai-Wei Chang; Daniel Kang; Heng Ji;
68	PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This leads to assigning positive rewards to correct answers produced from incorrect derivations. To bridge this gap, we introduce PRIME, a benchmark for evaluating verifiers on PRocess-outcome alignment verification In Mathematics and Engineering.	Xiangfeng Wang; Hangyu Guo; Yanlin Lai; Mitt Huang; Liang Zhao; Chengyuan Yao; Yinmin Zhang; Qi Han; Xiaoxiaoren; Chun Yuan; Tong Xu; Zheng Ge; Xiangyu Zhang; Daxin Jiang;
69	Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current sycophancy research has largely overlooked its specific manifestations in the video-language domain, resulting in a notable absence of systematic benchmarks and targeted evaluations to understand how Video-LLMs respond under misleading user input. To fill this gap, we propose ViSE (Video-LLM Sycophancy Benchmarking and Evaluation), the first benchmark designed to evaluate sycophantic behavior in state-of-the-art Video-LLMs across diverse question formats, prompt biases, and visual reasoning tasks.	Wenrui Zhou; Mohamed Hendy; Shu Yang; Qingsong Yang; Zikun Guo; Yuyu Luo; Lijie Hu; Di Wang;
70	To Lie or Not to Lie? Investigating The Biased Spread of Global Lies By LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study how LLMs behave when prompted to spread misinformation across languages and target countries, and introduce GlobalLies, a multilingual parallel dataset of 440 misinformation generation prompt templates and 6,867 entities, spanning 8 languages and 195 countries.	Zohaib Khan; Mustafa Dogan; Ifeoma Okoh; Pouya Sadeghi; Siddhartha Shrestha; Sergius Justus Chesami Nyah; Mahmoud O. Mokhiamar; Michael J Ryan; Tarek Naous;
71	Are We Using The Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we uncover a counterintuitive yet consistent phenomenon: simple image downsampling outperforms many advanced visual token compression methods across multiple widely used benchmarks.	Chenfei Liao; Wensong Wang; Zichen Wen; Xu Zheng; Yiyu Wang; Haocong He; Yuanhuiyi Lyu; Lutao Jiang; Xin Zou; Yuqian Fu; Bin Ren; Linfeng Zhang; Xuming Hu;
72	Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hierarchical planning framework that analyzes web agents across three layers (i. e. , high-level planning, low-level execution, and re-planning), enabling process-based evaluation of reasoning, grounding, and recovery.	Mohamed Aghzal; Gregory J. Stein; Ziyu Yao;
73	The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically investigate verbalized calibration in tool-use agents, revealing a fundamental confidence dichotomy driven by tool type.	Weihao Xuan; Qingcheng Zeng; Heli Qi; Yunze Xiao; Junjue Wang; Naoto Yokoya;
74	FinSight: Towards Real-World Financial Deep Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent deep research systems excel in open-domain search, they struggle with financial reporting, specifically in handling financial data, ensuring analytical depth, and integrating professional visualizations. To address this, we introduce FinSight , the first multi-agent framework for automate end-to-end professional, multimodal financial report.	Jiajie Jin; Yuyao Zhang; Yimeng Xu; Hongjin Qian; Yutao Zhu; Zhicheng Dou;
75	S^4: Operationalizing Speech Act Theory for Strategic Semi-Structured Psychiatric Interview Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce S4, a comprehensive framework grounded in Speech Act Theory, modeling the interview as a unified process of internal strategy (Illocution and Perlocution) and external realization (Locution).	Guanqun Bi; Zhoufu Liu; Zhuang Chen; Dazhen Wan; Xiyao Xiao; Minlie Huang;
76	ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: During the RL stage, we design a novel multi-view ranking reward tailored to the multi-turn nature of listwise ranking.	Wenhan Liu; Xinyu Ma; Weiwei Sun; Yutao Zhu; Yuchen Li; Dawei Yin; Zhicheng Dou;
77	ACIArena: Toward Unified Evaluation for Agent Cascading Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing studies consider only limited attack strategies and simplified MAS settings, limiting their generalizability and comprehensive evaluation. To bridge this gap, we introduce ACIArena, a unified framework for evaluating the robustness of MAS.	Hengyu An; Minxi Li; Jinghuai Zhang; Naen Xu; Chunyi Zhou; Changjiang Li; Xiaogang Xu; Tianyu Du; Shouling Ji;
78	Probing for Reading Times Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we probe language model representations for human reading times.	Eleftheria Tsipidi; Samuel Kiegeland; Francesco Ignazio Re; Tianyang Xu; Mario Giulianelli; Karolina Stanczak; Ryan Cotterell;
79	AttnPO: Attention-Guided Process Supervision for Efficient Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, process-supervised methods are typically resource-intensive and suffer from inaccurate credit assignment. To address these issues, we propose ATTNPO, a low-overhead process-supervised RL framework that leverages the model’s intrinsic attention signals for step-level credit assignment.	Shuaiyi Nie; Dingsiyu; Wenyuan Zhang; Linhao Yu; Tianmeng Yang; Yao Chen; Weichong Yin; Yu Sun; Hua Wu; Tingwen Liu;
80	Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we present a systematic analysis of logical reasoning under controlled increases in logical complexity, and reveal a previously unrecognized phenomenon, which we term Logical Phase Transitions: rather than degrading smoothly, logical reasoning performance remains stable within a regime but collapses abruptly beyond a critical logical depth, mirroring physical phase transitions such as water freezing beyond a critical temperature threshold.	Xinglang Zhang; Yunyao Zhang; ZeLiang Chen; Junqing Yu; Wei Yang; Zikai Song;
81	Libra-VLA: Achieving Learning Equilibrium Via Asynchronous Coarse-to-Fine Dual-System Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This strategy overlooks the inherent hierarchy of robotic manipulation, where complex actions can be naturally modeled in a Hybrid Action Space, decomposing into discrete macro-directional reaching and continuous micro-pose alignment, severely widening the semantic-actuation gap and imposing a heavy representational burden on grounding high-level semantics to continuous actions. To address this, we introduce Libra-VLA, a novel Coarse-to-Fine Dual-System VLA architecture.	Yifei Wei; Linqing Zhong; Yi Liu; Yuxiang Lu; Xindong He; Maoqing Yao; Guanghui Ren;
82	Optimizing Length Compression in Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this specific inefficiency, we move beyond the general principles of Efficacy and Efficiency to propose two new, fine-grained principles: Brevity, which advocates for eliminating redundancy, and Sufficiency, which ensures critical reasoning steps are preserved. Guided by these principles, we introduce LC-R1, a post-training method based on Group Relative Policy Optimization (GRPO).	Zhengxiang Cheng; Dongping Chen; Mingyang Fu; Tianyi Zhou;
83	FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present FineSteer, a novel steering framework that decomposes inference-time steering into two complementary stages—conditional steering and fine-grained vector synthesis—allowing fine-grained control over when and how to steer internal representations.	Zixuan Weng; Jinghuai Zhang; Kunlin Cai; Ying Li; Peiran Wang; Yuan Tian;
84	LaMI: Augmenting Large Language Models Via Late Multi-Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a late multi-image fusion method: multiple images are generated from the text prompt with a lightweight parallel sampling, and their prediction probabilities are combined with those of a text-only LLM through a late-fusion layer that integrates projected visual features just before the final prediction.	Guy Yariv; Idan Schwartz; Yossi Adi; Sagie Benaim;
85	MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current approaches face three limitations: causal attention in VLM backbones is suboptimal for embedding tasks; scalability issues due to reliance on high-quality labeled paired data for contrastive learning; and limited diversity in training objectives and data. To address these issues, we propose MoCa, a two-stage framework for transforming pre-trained VLMs into bidirectional multimodal embedding models.	Haonan Chen; Hong Liu; Yuping Luo; Liang Wang; Nan Yang; Furu Wei; Zhicheng Dou;
86	PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our experiments show that current LLM agents perform poorly with high error rates, e. g. , Qwen-3-30B-Think has an average error rate of 35%. To address this gap, we propose PEARL, a reinforcement-learning framework that (i) augments the language agent with an external preference memory that stores and updates inferred strategies (e. g. , attendee priorities, topic importance, time/location preferences), and (ii) optimizes the agent with round-wise rewards that directly supervise decision correctness, ranking quality, and memory usage across rounds.	Bingxuan Li; Jeonghwan Kim; Cheng Qian; Xiusi Chen; Eitan Anzenberg; Niran Kundapur; Heng Ji;
87	PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we highlight Hierarchical Implicit Intent Alignment for Personalized GUI Agent (PersonalAlign), a new agent task that requires agents to leverage long-term user records as persistent context to resolve omitted preferences in vague instructions and anticipate latent routines by user state for proactive assistance.	Yibo Lyu; Gongwei Chen; Rui Shao; Weili Guan; Liqiang Nie;
88	When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD.	Hyeong Kyu Choi; Jerry Zhu; Sharon Li;
89	ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Mode Extraction (ModeX), an evaluator-free Best-of-N selection framework that generalizes majority voting to open-ended text generation by identifying the modal output representing the dominant semantic consensus among generated texts.	Hyeong Kyu Choi; Sharon Li;
90	Analyzing and Internalizing Complex Policy Documents for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis shows that workflow-governing policy specifications are the hardest to reason over, and that SFT on gold trajectories with chain-of-thought is data-hungry and struggles at high complexity. We propose Category-Aware Policy Continued Pretraining, an automated pipeline that analyzes policies, extracts key specifications, categorizes them into factual, behavioral, and conditional types, and isolates those driving workflow complexity.	Jiateng Liu; Zhenhailong Wang; Xiaojiang Huang; Yingjie Li; Xiang Li; Chenlei Guo; Xing Fan; Ruhi Sarikaya; Heng Ji;
91	Domain Generalizable AI Guardrails with Augmented Policy Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Augmented Policy Training (APT), a training recipe that enhances guardrail adaptability to unseen policies by using a suite of policy perturbation strategies during training to reduce overfitting and increase generalization.	Minqian Liu; Ioana Baldini; David Rabinowitz; David S Rosenberg; Sebastian Gehrmann; Mark Dredze;
92	Probing Audio-Visual Reasoning in Multimodal Language Models Through The Lens of Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This raises a fundamental question—does a deficiency in low-level audio perception constrain higher-level audio-visual reasoning? To address this, we introduce AV-Odyssey Bench—a comprehensive benchmark of 4,555 meticulously designed problems that integrate text, audio, and visual modalities.	Kaixiong Gong; Kaituo Feng; Bohao Li; Yibing Wang; Mofan Cheng; Shijia Yang; Jiaming Han; Benyou Wang; Yutong Bai; Zhuoran Yang; Xiangyu Yue;
93	Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Nirvana, an SGM featuring specialized memory, linear-time complexity, and test-time task information extraction.	Yuhua Jiang; Shuang Cheng; Yihao Liu; Ermo Hua; Che Jiang; Weigao Sun; Yu Cheng; Feifei Gao; Biqing Qi; Bowen Zhou;
94	SafeAgent: Safeguarding LLM Agents Via An Automated Risk Simulator Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in multi-turn, tool-augmented settings, dynamic user interactions, external tool use, and unintended harmful behaviors make robust safety assurance challenging. To address these challenges, we propose SafeAgent, a framework that improves agent safety through fully automated synthetic data generation.	Xueyang Zhou; Weidong Wang; Lin Lu; Jiawen Shi; Guiyao Tie; Xu Yongtian; Lixing Chen; Pan Zhou; Neil Zhenqiang Gong; Lichao Sun;
95	Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Bright-Pro, an evaluation framework that assesses the effectiveness of retrievers in agentic search systems.	Yilun Zhao; Jinbiao Wei; Tingyu Song; Siyue Zhang; Chen Zhao; Arman Cohan;
96	Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs As Semantic Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior studies often rely on general-purpose embedding benchmarks (e. g. , MTEB) when selecting LLMs, overlooking the unique characteristics of recommendation tasks. To address this gap, we introduce BLaIR, a comprehensive benchmark for evaluating LLMs as semantic encoders in recommendation scenarios.	Yupeng Hou; Jiacheng Li; Xiangjun Fu; Zhankui He; An Yan; Xiusi Chen; Julian McAuley;
97	Your Reasoning Model Is Secretly A Reward Model – Optimization-Free Verification from Experience Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study a third source of information—the model’s hidden states—for binary correctness verification in tasks with a reliable success/failure signal (e. g. , deterministic checkers or reference-grounded answers).	Zhenwen Liang; Ruosen Li; Yujun Zhou; Linfeng Song; Dian Yu; Xinya Du; Haitao Mi; Dong Yu;
98	Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In such environments, the lack of failure cases causes the advantage signal in group-relative algorithms (e. g. , GRPO) to vanish, driving policies into mode collapse. To address this, we propose Constrained Uniform Top-K Sampling (CUTS), a parameter-free decoding strategy enforcing structure-preserving exploration.	Zhenwen Liang; Yujun Zhou; Sidi Lu; Xiangliang Zhang; Haitao Mi; Dong Yu;
99	Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, to date, existing LLM-as-a-judge approaches face two limitations: persona descriptions of agents are often arbitrarily designed, and the frameworks are not generalizable to other tasks. To address these challenges, we propose MAJ-EVAL, a Multi-Agent-as-Judge evaluation framework that can automatically construct multiple evaluator personas with distinct dimensions from relevant text documents (e. g. , research papers), instantiate LLM agents with the personas, and engage in-group debates with multi-agents to generate multi-dimensional feedback.	Jiaju Chen; Yuxuan Lu; Xiaojie Wang; Huimin Zeng; Jing Huang; Jiri Gesi; Ying Xu; Bingsheng Yao; Dakuo Wang;
100	MT3: A Synergistic Multi-Task RL Framework for Specializing MLLMs in Text Image Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent advances in large-scale Reinforcement Learning (RL) have improved reasoning in Large Language Models (LLMs) and Multimodal LLMs (MLLMs), but their application to end-to-end TIMT is still underexplored. To bridge this gap, we introduce MT3, a novel Multi-Task RL framework to specialize MLLMs into end-to-end expert TIMT models.	Zhaopeng Feng; Yupu Liang; Shaosheng Cao; Jiayuan Su; Jiahan Ren; Zhijie Zhou; Wenxuan Huang; Jian Wu; Zuozhu Liu;
101	A Comprehensive Survey of Process Reward Models: Data Generation, Model Construction, and Usage Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our goal is to clarify design spaces, reveal open challenges, and guide future research toward fine-grained, robust reasoning alignment.	Congmin Zheng; Jiachen Zhu; Zhuoying Ou; Yuxiang Chen; Kangning Zhang; Rong Shan; Zeyu Zheng; Mengyue Yang; Jianghao Lin; Yong Yu; Weinan Zhang;
102	SDAR-VL: Stable and Efficient Block-wise Diffusion for Vision-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SDAR-VL, the first systematic application of block-wise discrete diffusion to large-scale vision-language understanding (VLU), together with an integrated framework for efficient and stable training.	Shuang Cheng; Yuhua Jiang; Zineng Zhou; Dawei Liu; Tao Wang; Linfeng Zhang; Biqing Qi; Bowen Zhou;
103	FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, progress in this domain has been constrained by a reliance on expensive, proprietary, and text-only data, limiting the development of advanced models. To address this gap, we introduce FinCall-Surprise (Financial Conference Call for Earning Surprise Prediction), the first large-scale, open-source, and multi-modal dataset for earnings surprise prediction.	Dong Shu; Yanguang Liu; Huopu Zhang; Mengnan Du;
104	FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce FinChart-Bench, the first benchmark specifically focused on real-world financial charts.	Dong Shu; Haoyang Yuan; Yuchen Wang; Yanguang Liu; Huopu Zhang; Mengnan Du;
105	GLARE: Agentic Reasoning for Legal Judgment Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce GLARE, an agentic legal reasoning framework that enables models to actively retrieve and apply external knowledge during decision-making.	Xinyu Yang; Chenlong Deng; Zhicheng Dou;
106	ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent Via Behavior Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose ET-Agent, a training framework for calibrating agent’s tool-use behavior through two synergistic perspectives: Self-evolving Data Flywheel and Behavior Calibration Training.	Yifei Chen; Guanting Dong; Zhicheng Dou;
107	Ted-Tok: Maintaining An Evolving Vocabulary for Lifelong Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The tokenizer, a foundational part of the system, is usually assumed to remain fixed in lifelong learning scenarios. In this work, we challenge the validity of this assumption: as language evolves, a static tokenizer fragments newly emerging lexical items, reducing compression efficiency and consequently degrading the model performance.	Jiameng Huang; Zhi Zhang; Zhenyu He; Jiacheng Sun; Di He;
108	R^3AG: Retriever Routing for Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This overlooks a critical distinction in RAG: a retrieved document must not only be relevant but also effectively support the generator in producing correct answers. To address this limitation, we propose R³AG, a novel routing framework that explicitly models the dynamic alignment between queries and retriever capabilities.	Tong Zhao; Yutao Zhu; Yucheng Tian; Zhicheng Dou;
109	ATIR: Towards Audio-Text Interleaved Contextual Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the Audio-Text Interleaved contextual Retrieval (ATIR) task, where queries can alternate between audio and text modalities.	Tong Zhao; Chenghao Zhang; Yutao Zhu; Zhicheng Dou;
110	Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we take shopping as a case study and present the first large-scale quantitative evaluation of state-of-the-art LLMs’ ability to accurately simulate human behavior.	Yuxuan Lu; Jing Huang; Yan Han; Bingsheng Yao; Sisong Bei; Yaochen Xie; Yisi Sang; Qi He; Dakuo Wang;
111	MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches to Visual Chain-of-Thought (VCoT) are often limited by rigid external tools or fail to generate the high-fidelity, strategically-timed diagrams necessary for complex problem-solving. To bridge this gap, we introduce MathCanvas, a comprehensive framework designed to endow unified Large Multimodal Models (LMMs) with intrinsic VCoT capabilities for mathematics.	Weikang Shi; Aldrich Yu; Rongyao Fang; Houxing Ren; Ke Wang; Aojun Zhou; Changyao Tian; Xinyu Fu; Yuxuan Hu; Zimu Lu; Linjiang Huang; Si Liu; Rui Liu; Hongsheng Li;
112	MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts.	Xueqing Peng; Lingfei Qian; Yan Wang; Ruoyu Xiang; Yueru He; Yang Ren; Mingyang Jiang; Vincent Jim Zhang; Yuqing Guo; Jeff Zhao; Huan He; Yi Han; Yun Feng; Yuechen Jiang; Yupeng Cao; Haohang Li; Yangyang Yu; Xiaoyu Wang; Penglei Gao; Shengyuan Lin; Keyi Wang; Shanshan Yang; Yilun Zhao; Zhiwei Liu; Peng Lu; Jerry Huang; Suyuchen Wang; Triantafillos Papadopoulos; Polydoros Giannouris; Efstathia Soufleri; Nuo Chen; Zhiyang Deng; Heming Fu; Yijia Zhao; Mingquan Lin; Meikang Qiu; Kaleb E Smith; Arman Cohan; Xiao-Yang Liu; Jimin Huang; Guojun Xiong; Alejandro Lopez-Lira; Xi Chen; Junichi Tsujii; Jian-Yun Nie; Sophia Ananiadou; Qianqian Xie;
113	Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This global treatment results in unnecessary utility degradation and extends optimization to content-agnostic regions. To address these limitations, we propose PALU (Prefix-Aware Localized Unlearning), a framework driven by a local entropy maximization objective across both temporal and vocabulary dimensions.	Naixin Zhai; Pengyang Shao; Binbin Zheng; Yonghui Yang; Fei Shen; Long Bai; Xun Yang;
114	SearchGym: Bootstrapping Real-World Search Agents Via Cost-Effective and High-Fidelity Environment Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This misalignment generates corrupted reward signals that destabilize training by penalizing correct reasoning or rewarding hallucination. To address this, we propose SearchGym, a simulation environment designed to bootstrap robust search agents.	Xichen Zhang; Ziyi He; Yinghao Zhu; Sitong Wu; Shaozuo Yu; Meng Chu; Wenhu Zhang; Haoru Tan; Jiaya Jia;
115	LLM-Generated Text May Harm Your Retrieval! A Robust Detection Strategy for Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore the usage paradigms of LLM text detectors for RAG and highlight key limitations of off-the-shelf or directly fine-tuned detectors.	Zhaoheng Huang; Yutao Zhu; Ji-Rong Wen; Zhicheng Dou;
116	MemRec: Collaborative Memory-Augmented Agentic Recommender System Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, naively utilizing collaborative memory causes severe context overload and introduces noise to downstream LLMs, alongside prohibitive computational costs. To resolve this, we propose MemRec, a framework that architecturally decouples memory management from reasoning.	Weixin Chen; Yuhan Zhao; Jingyuan Huang; Zihe Ye; Mingxuan Ju; Tong Zhao; Neil Shah; Li Chen; Yongfeng Zhang;
117	LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For Bangla, existing work provides valuable resources and models, however, they are mostly single-task (e. g. , binary hate/offense) with narrow coverage of key dimensions such as type, severity, and target. We address these gaps by introducing the first multi-task Bangla hate-speech dataset, BanglaMultiHate, one of the largest manually annotated dataset to date.	Md Arid Hasan; Firoj Alam; Md Fahad Hossain; Usman Naseem; Syed Ishtiaque Ahmed;
118	OneRec-Think: In-Text Reasoning for Generative Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing generative models (e. g. , OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning—a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation.	Zhanyu Liu; Shiyao Wang; Xingmei Wang; Rongzhou Zhang; Jiaxin Deng; Honghui Bao; Jinghao Zhang; Wuchao Li; PengFei Zheng; Xiangyu Wu; Yifei Hu; Qigen Hu; Xinchen Luo; Lejian Ren; Zhang Zixing; Qianqian Wang; Kuo Cai; Yunfan Wu; Hongtao Cheng; Zexuan Cheng; Lu Ren; Huanjie Wang; Yi Su; Ruiming Tang; Kun Gai; Guorui Zhou;
119	When Correct Is Not Safe: Can We Trust Functionally Correct Patches Generated By Code Agents? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we reveal a novel type of threat to real-world code-agents: functionally correct yet vulnerable (FCV) patches, which pass all test cases but contain vulnerable code.	Yibo Peng; James Song; Lei Li; Xinyu Yang; Mihai Christodorescu; Ravi Mangal; Corina S. Pasareanu; Haizhong Zheng; Beidi Chen;
120	Social Story Frames: Contextual Reasoning About Narrative Intent and Reception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, computational models of reader response are limited, preventing nuanced analyses. To address this gap, we introduce SocialStoryFrames, a formalism for distilling plausible inferences about reader response, such as perceived author intent, explanatory and predictive reasoning, affective responses, and value judgments, using conversational context and a taxonomy grounded in narrative theory, linguistic pragmatics, and psychology.	Joel Mire; Maria Antoniak; Steven R Wilson; Zexin Ma; Achyutarama R Ganti; Andrew Piper; Maarten Sap;
121	VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This design is mismatched to LVLMs: an incorrect prediction may arise from perceptual failures or from reasoning errors given correct perception, and a single confidence conflates these sources while visual uncertainty is often dominated by language priors. To address these issues, we propose VL-Calibration, a reinforcement learning framework that explicitly decouples confidence into visual and reasoning confidence.	Wenyi Xiao; Xinchi XU; Leilei Gan;
122	Mathematical Proof As A Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the high reported accuracy of these advanced models on popular datasets and reliance on purely numerical evaluation often mask their true reasoning shortcomings. To address this, we propose leveraging the inherent rigor and methodological complexity of mathematical proofs as a diagnostic tool to expose these hidden failures.	Dadi Guo; Jiayu Liu; Zhiyuan Fan; Zhitao He; Haoran Li; Yuxin Li; Yumeng Wang; Yi R. Fung;
123	Enabling Agents to Communicate Entirely in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Further compression not only substantially accelerates inference by up to 24× but also maintains competitive performance through an efficient information-preserving mechanism. We position this work as a feasibility study of entirely latent space inter-agent communication, and our results highlight its potential, offering valuable insights for future research.	Zhuoyun Du; Runze Wang; Huiyu Bai; Zouying Cao; Xiaoyong Zhu; Yu Cheng; Bo Zheng; Wei Chen; Haochao Ying;
124	TEMA: Anchor The Image, Follow The Text for Multi-Modification Composed Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typically cover only a limited range of salient changes, which induces two limitations highly relevant to practical applications, namely Insufficient Entity Coverage and Clause-Entity Misalignment. In order to address these issues and bring CIR closer to real-world use, we construct two instruction-rich multi-modification datasets, M-FashionIQ and M-CIRR.	Zixu Li; Yupeng Hu; Zhiheng Fu; Zhiwei Chen; Yongqi Li; Liqiang Nie;
125	SynthAgent: Adapting Web Agents with Synthetic Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose SynthAgent, a fully synthetic supervision framework that aims at improving synthetic data quality via dual refinement of both tasks and trajectories.	Zhaoyang Wang; Yiming Liang; Xuchao Zhang; Qianhui Wu; Siwei Han; Anson Bastos; Rujia Wang; Chetan Bansal; Baolin Peng; Jianfeng Gao; Saravan Rajmohan; Huaxiu Yao;
126	InsideOut: Measuring and Mitigating Insider–Outsider Bias in Interview Script Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify and systematically investigate LLMs’ insider-outsider bias, a phenomenon where models position themselves as insiders of mainstream cultures during generation while externalizing less dominant cultures.	Yixin Wan; Xingrun Chen; Kai-Wei Chang;
127	Crossing The Reward Bridge: Expanding Reinforcement Learning with Verifiable Rewards Across Diverse Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find their applicability is surprisingly narrow even in structured domains, a limitation that is compounded at scale: rule-based systems can paradoxically degrade in performance as multi-domain, free-form training data increases. To overcome these challenges, we propose a new RLVR framework that uses a generative verifier to provide soft, probabilistic rewards.	Yi Su; Dian Yu; Linfeng Song; Juntao Li; Haitao Mi; Zhaopeng Tu; Min Zhang; Dong Yu;
128	Human or LLM As Standardized Patients? A Comparative Study in Medical Education Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose EasyMED, a multi-agent VSP framework that separates case-grounded information disclosure from response generation to support stable, inquiry-conditioned patient behavior.	Bingquan Zhang; Xiaoxiao Liu; Yuchi Wang; Zhou Lei; Qianqian Xie; Benyou Wang;
129	A Goal Without A Plan Is Just A Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Task Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent’s planning abilities without human effort.	Shuzheng Si; Haozhe Zhao; Kangyang Luo; Gang Chen; Fanchao Qi; Minjia Zhang; Baobao Chang; Maosong Sun;
130	Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes the system architecture, agent roles, retrieval and scoring methods, knowledge graph schema, and evaluation interfaces that together form the Paper Circle research workflow.	Komal Kumar; Aman Chadha; Salman Khan; Fahad Shahbaz Khan; Hisham Cholakkal;
131	On The Proper Treatment of Units in Surprisal Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, surprisal-based predictors depend implicitly on ad hoc procedures that conflate two distinct modeling choices: the definition of the unit of analysis and the choice of regions of interest over which predictions are evaluated. In this paper, we disentangle these choices and give a unified framework for reasoning about surprisal over arbitrary unit inventories.	Samuel Kiegeland; Vésteinn Snæbjarnarson; Tim Vieira; Ryan Cotterell;
132	Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Drawing inspiration from human auditory perception, which adeptly integrates cross-modal cues and performs sophisticated auditory scene analysis, we introduce a novel two-stage automated pipeline.	Shunian Chen; Xinyuan Xie; Zheshu Chen; Owen Lee; Liyan Zhao; Zhan Su; Qilin Sun; Benyou Wang;
133	Evaluating Language Model Pluralism Through In-the-wild Crowd Discussions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PLURALEVAL, an evaluation framework that assesses LLM pluralism in open-ended generation by comparing outputs against free-form crowd responses.	Gagan Mundada; Rohan Surana; Nandhini Swaminathan; Bodhisattwa Prasad Majumder; Junda Wu; Julian McAuley; Zhouhang Xie;
134	MedVerse: Efficient and Reliable Medical Reasoning Via DAG-Structured Parallel Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their sequential autoregressive decoding forces inherently parallel clinical reasoning, such as differential diagnosis, into a single linear reasoning path, limiting both efficiency and reliability for complex medical problems. To address this, we propose MedVerse, a reasoning framework for complex medical inference that reformulates medical reasoning as a parallelizable directed acyclic graph (DAG) process based on Petri Net theory.	Jianwen Chen; Xinyu Yang; Peng Xia; Arian Azarang; Yueh Z Lee; Gang Li; Hongtu Zhu; Yun Li; Beidi Chen; Huaxiu Yao;
135	Semantic-Aware Logical Reasoning Via A Semiotic Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing studies largely overlook the interplay between logical complexity and semantic complexity, limiting their robustness under abstract propositions, ambiguous contexts, and conflicting stances, which are central to human reasoning. We propose LogicAgent, a semiotic-square–guided framework that jointly addresses these two axes of difficulty.	Yunyao Zhang; Xinglang Zhang; Junxi Sheng; Wenbing Li; Junqing Yu; Yi-Ping Phoebe Chen; Wei Yang; Zikai Song;
136	Tailored Primitive Initialization Is The Secret Key to Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we describe thinking token patterns with reasoning primitives and argue that initializing LLMs with diverse, high-quality primitives is crucial for stable and efficient RL training.	Yihang Yao; Guangtao Zeng; Raina Wu; Yang Zhang; Ding Zhao; Zhang-Wei Hong; Chuang Gan;
137	Act As You Think: Reinforcing Consistent Reasoning in Medical Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This degradation is most pronounced in specialized medical modalities (e. g. , Fundus, Ultrasound) where base VLMs lack robust understanding, a failure we attribute to a flawed reward mechanism exacerbated by the scarcity of diverse training data. To tackle this, we introduce Med-Zero-17K, a large-scale dataset spanning over 30 modalities and 24 clinically relevant tasks, and the Multi-Consistency Reward (MCR) framework, which explicitly rewards both perceptual grounding and logical coherence.	Songtao Jiang; Yuan Wang; Ruizhe Chen; Yan Zhang; Ruilin Luo; Bohan Lei; Yeying Jin; Sibo Song; ZhiBo Yang; Jimeng Sun; Jian Wu; Zuozhu Liu;
138	Saber: Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model in Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce efficient Sampling with Adaptive acceleration and Backtracking Enhanced Remasking (i. e. , Saber), a novel training-free sampling algorithm for DLMs that the first to improve both inference speed and output quality in code generation.	Yihong Dong; Zhaoyu Ma; Xue Jiang; Zhiyuan Fan; Jiaru Qian; Yongmin Li; Jianha Xiao; Zhi Jin; Ge Li;
139	RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Critically, RLVR can lead to the capability boundary collapse, narrowing the LLM’s problem-solving scope. To address this problem, we propose R-PLUS, a novel hybrid-policy optimization approach for LLMs that synergizes internal exploitation with external data to achieve stronger reasoning capabilities and surpass the boundaries of base models.	Yihong Dong; Xue Jiang; Yongding Tao; Huanyu Liu; Kechi Zhang; Lili Mou; Rongyu Cao; Yingwei MA; Jue Chen; Binhua Li; Zhi Jin; Fei Huang; Yongbin Li; Ge Li;
140	Behavior Knowledge Merge in Reinforced Agentic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: When standard global averaging is applied under this mismatch, RL’s non-overlapping task vectors that encode critical task-specific behaviors are reduced and parameter updates are diluted. To address this issue, we propose Reinforced Agent Merging (RAM), a distribution-aware merging framework explicitly designed for RL-trained agentic models.	Xiangchi Yuan; Dachuan Shi; Chunhui Zhang; Zheyuan Liu; Shenglong Yao; Soroush Vosoughi; Wenke Lee;
141	Compressing Then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose CoMa, a compressed pre-training phase, which serves as a warm-up stage for contrastive learning.	Da Li; Yuxiao Luo; Keping Bi; Jiafeng Guo; Wei Yuan; Biao Yang; Yan Wang; Fan Yang; Tingting Gao; Guorui Zhou;
142	Can We Predict Before Executing Machine Learning Agents? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we formalize the task of Data-centric Solution Preference and construct a comprehensive corpus of 18,438 pairwise comparisons.	Jingsheng Zheng; Jintian Zhang; Yujie Luo; Yuren Mao; Yunjun Gao; Lun Du; Huajun Chen; Ningyu Zhang;
143	Shanks: Simultaneous Hearing and Thinking for Spoken Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose SHANKS, a general inference framework that enables SLMs to generate unspoken chain-of-thought reasoning while listening to user input.	Cheng-Han Chiang; Xiaofei Wang; Linjie Li; Chung-Ching Lin; Kevin Lin; Shujie Liu; Zhendong Wang; Zhengyuan Yang; Hung-yi Lee; Lijuan Wang;
144	Learning Uncertainty from Sequential Internal Dispersion in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they suffer from strict assumptions on how hidden states should evolve across layers, and from information loss by solely focusing on last or mean tokens. To address these issues, we present Sequential Internal Variance Representation (SIVR), a supervised hallucination detection framework that leverages token-wise, layer-wise features derived from hidden states.	Ponhvoan Srey; Xiaobao Wu; Cong-Duy T Nguyen; Anh Tuan Luu;
145	Chaining The Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches primarily rely on binary outcome rewards, which fail to capture the comprehensiveness and factuality of agents’ reasoning process, and often lead to undesirable behaviors such as shortcut exploitation and hallucinations. To address these limitations, we propose Citation-aware Rubric Rewards (CaRR), a fine-grained reward framework for deep search agents that emphasizes reasoning comprehensiveness, factual grounding, and evidence connectivity.	Jiajie Zhang; Xin Lv; Ling Feng; Lei Hou; Juanzi Li;
146	Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we prove that more general heuristics can be parameterized by proposing Data Mixing Agent, the first model-based, end-to-end framework that learns to re-weight domains.	Kailai Yang; Xiao Liu; Lei Ji; Hao Li; Xiao Liang; Zhiwei Liu; Yeyun Gong; Peng Cheng; Mao Yang;
147	Efficient Test-Time Scaling of Multi-Step Reasoning By Probing Internal States of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a lightweight alternative for step-level reasoning verification based on probing the internal states of LLMs.	Jingwei Ni; Ekaterina Fadeeva; Tianyi Wu; Mubashara Akhtar; Jiaheng Zhang; Elliott Ash; Markus Leippold; Timothy Baldwin; See-Kiong Ng; Artem Shelmanov; Mrinmaya Sachan;
148	VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Ablation studies show compatibility with different T2I instruction LLMs, T2I generation models, and downstream LLMs.	Di Wu; Yixin Wan; Kai-Wei Chang;
149	Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although promising, our analysis reveals a fundamental misalignment between general-purpose post-training and DAC-style inference, which limits the model’s capacity to fully leverage this potential. To bridge this gap and fully unlock LLMs’ reasoning capabilities on the most challenging tasks, we propose an end-to-end reinforcement learning (RL) framework to enhance their DAC-style reasoning capacity.	Xiao Liang; Zhong-Zhi Li; Zhenghao Lin; Eric Hanchen Jiang; Hengyuan Zhang; Yelong Shen; Kai-Wei Chang; Ying Nian Wu; Yeyun Gong; Weizhu Chen;
150	KoCo-Bench: Can Large Language Models Leverage Domain Knowledge in Software Development? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present KOCO-bench, a novel benchmark designed for evaluating domain specialization methods in real-world software development.	Xue Jiang; Ge Li; Jiaru Qian; Xianjie Shi; Chenjie Li; Hao Zhu; Ziyu Wang; Jielun Zhang; Zeyu Zhao; Kechi Zhang; Jia Li; Wenpin Jiao; Zhi Jin; Yihong Dong;
151	Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we propose a novel trigger mechanism designated as the ASYMMETRIC CHAIN BACKDOOR (ACB).	Weiyang Guo; Zesheng Shi; Zeen Zhu; Yuan Zhou; Min Zhang; Jing Li;
152	Frankentext: Stitching Random Text Fragments Into Long-form Narratives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Frankentexts, a long-form narrative generation paradigm that treats an LLM as a composer of existing texts rather than as an author.	Chau Minh Pham; Jenna Russell; Dzung Pham; Mohit Iyyer;
153	MMSearch-R1: Incentivizing LMMs to Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables LMMs to perform on-demand, multi-turn search in real-world Internet environments.	Jinming Wu; Zihao Deng; Wei Li; Yiding Liu; Bo You; Bo Li; Zejun MA; Ziwei Liu;
154	AlignUSER: Human-Aligned LLM Agents Via World Models for Recommender System Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AlignUSER, a framework that learns world-model-driven agents from human interactions.	Nicolas Bougie; Gian Maria Marconi; Xiaotong Ye; Narimawa Watanabe;
155	EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits: (1) a dynamic Chain-of-Thought (CoT) switching mechanism designed to mitigate unnecessary token consumption, and (2) Diverse parallel-scaled reinforcement learning (RL) with trainable prefixes to enhance pass rates under constrained sampling passes.	Mukai Li; Linfeng Song; Zhenwen Liang; Jiahao Xu; Shansan Gong; Qi Liu; Haitao Mi; Dong Yu;
156	Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify three key barriers: limited scale of audio-text corpora, limited coverage of audio attributes in existing caption corpora, and lack of systematic exploration and evaluation. To fill this gap, we present the first principled empirical study of ALP.	Wei-Cheng Tseng; Xuanru Zhou; Mingyue Huo; Yiwen Shao; Hao Zhang; Dong Yu;
157	Backdoor Collapse: Eliminating Unknown Threats Via Known Backdoor Aggregation In Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Backdoor attacks are a significant threat to large language models (LLMs), often embedded via public checkpoints, yet existing defenses rely on impractical assumptions about trigger settings. To address this challenge, we propose Locphylax, a defense framework that requires no prior knowledge of trigger settings.	Liang Lin; Miao Yu; Moayad Aloqaily; Zhenhong Zhou; Kun Wang; Linsey Pang; Prakhar Mehrotra; Qingsong Wen;
158	Challenging The Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The rapid advancement of large reasoning models has saturated existing math benchmarks, underscoring the urgent need for more challenging evaluation frameworks. To address this, we introduce OlymMATH, a rigorously curated, Olympiad-level math benchmark comprising 350 problems, each with parallel English and Chinese versions.	Haoxiang Sun; Yingqian Min; Zhipeng Chen; Xin Zhao; Ji-Rong Wen;
159	LLM Reasoning As Trajectories: Step-Specific Representation Geometry and Correctness Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work characterizes large language models’ chain-of-thought generation as a structured trajectory through representation space.	Lihao Sun; Hang Dong; Bo Qiao; Qingwei Lin; Dongmei Zhang; Saravan Rajmohan;
160	A Survey of Reasoning-Intensive Retrieval: Progress and Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Reasoning-Intensive Retrieval (RIR) targets retrieval settings where relevance is mediated by latent inferential links between a query and supporting evidence, rather than …	Yiyang Wei; Tingyu Song; Siyue Zhang; Yilun Zhao;
161	WildReward: Learning Reward Models from In-the-Wild Human Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This raises the question: Can we develop reward models directly from in-the-wild interactions? In this work, we explore this possibility by adopting WildChat as an interaction source and proposing a pipeline to extract reliable human feedback, yielding 186k high-quality instances for training WildReward via ordinal regression directly on user feedback without preference pairs.	Hao Peng; Yunjia Qi; Xiaozhi Wang; Zijun Yao; Lei Hou; Juanzi Li;
162	The African Languages Lab: A Collaborative Approach to Advancing Low-Resource African NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite representing nearly one-third of the world’s languages, African languages remain critically underserved by modern NLP technologies, with 88% classified as severely underrepresented or completely ignored in computational linguistics. We present the African Languages Lab (All Lab), a comprehensive research initiative that addresses this technological gap through systematic data collection, model development, and empirical analysis.	Sheriff Issaka; Keyi Wang; Yinka Ajibola; Oluwatumininu Samuel-Ipaye; Zhaoyi Zhang; Nicte Aguillon Jimenez; Evans Kofi Agyei; Abraham Lin; Rohan Ramachandran; Sadick Abdul Mumin; Faith Nchifor; Mohammed Shuraim Issah; Erick Rosas Gonzalez; Lieqi Liu; Sylvester Kpei; Jemimah Kusi Osei; Carlene Ajeneza; Persis Boateng; Prisca Adwoa Dufie Yeboah; Saadia Gabriel;
163	Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recently, automated methods (LLM-as-a-judge) shed light on the scalability, but risk bias by relying on one or a few “authority” models. To tackle these issues, we propose Decentralized Arena (), a fully automated framework leveraging collective intelligence from all LLMs to evaluate each other.	Yanbin Yin; Kun Zhou; Zhen Wang; Xiangdong Zhang; Yifei Shao; Shibo Hao; Yi Gu; Jieyuan Liu; Somanshu Singla; Tianyang Liu; Eric P. Xing; Zhengzhong Liu; Haojian Jin; Zhiting Hu;
164	MARS2: Scaling Multi-Agent Tree Search Via Reinforcement Learning for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MARS2 (Multi-Agent Reinforced Tree-Search Scaling), a unified RL framework in which multiple independently-optimized agents collaborate within a shared tree-structured search environment.	Pengfei Li; Shijie Wang; Fangyuan Li; Yikun Fu; Kaifeng Liu; Kaiyan Zhang; Dazhi Zhang; Yuqiang Li; Biqing Qi; Bowen Zhou;
165	SciCoQA: Quality Assurance for Scientific Paper–Code Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We construct SciCoQA from GitHub issues and reproducibility papers, and propose a synthetic generation pipeline to scale beyond AI to Physics, Quantitative Biology, and other computational sciences.	Tim Baumgärtner; Iryna Gurevych;
166	LongVideoAgent: Multi-Agent Reasoning with Long Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a multi-agent framework in which a master LLM coordinates a grounding agent to localize question-relevant segments and a vision agent to extract targeted textual observations.	Runtao Liu; Ziyi Liu; Jiaqi Tang; Yue Ma; Renjie Pi; Jipeng Zhang; Qifeng Chen;
167	Evolving Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce EVA (Evolving Agents), a novel paradigm for autonomous learning driven by pseudo-symbolic abstraction.	Leonardo Ranaldi;
168	Demystifying Data Organization for Enhanced LLM Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Guided by them, we introduce two novel data ordering methods termed STR and SAW.	Yalun Dai; Yangyu Huang; Tongshen Yang; Yonghan Wang; Xin Zhang; Wenshan Wu; Qihao Zhao; Hao Li; Yuanyuan Gao; Kim-Hui Yap; Scarlett Li;
169	Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, real-world spreadsheets are often massive in scale, exceeding the input length that LLMs can efficiently process. To address these challenges, we propose SpreadsheetAgent, a two-stage multi-agent framework for spreadsheet understanding that adopts a step-by-step reading and reasoning paradigm.	Houxing Ren; Mingjie Zhan; Zimu Lu; Ke Wang; Yunqiao Yang; Haotian Hou; Hongsheng Li;
170	TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing benchmarks and datasets predominantly focus on single-agent settings, failing to capture the unique vulnerabilities of multi-agent dynamics and co-ordination. To address this gap, we introduce Threats and Attacks in Multi-Agent Systems (TAMAS), a benchmark designed to evaluate the robustness and safety of multi-agent LLM systems.	Ishan Kavathekar; Hemang Jain; Ameya Rathod; Ponnurangam Kumaraguru; Tanuja Ganu;
171	Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accordingly, we propose Deep-Reporter, a unified agentic framework for grounded multimodal long-form generation.	Fangda Ye; Kuicai Dong; Xie Zhifei; Yuxin Hu; Yihang Yin; Shurui Huang; Shikai Dong; Chen Zhang; Jianzhu Bao; Shuicheng Yan;
172	Mechanisms of Prompt-Induced Hallucination in Vision–Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40% without additional training.	William Rudman; Michal Golovanevsky; Dana Arad; Yonatan Belinkov; Carsten Eickhoff; Ritambhara Singh; Kyle Mahowald;
173	Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, they incur significant memory overhead due to the need to retain all MC samples for the gradient computation of non-linear terms in the RL objective, and thus restrict feasible sample sizes, leading to imprecise likelihood approximations and distorted RL objective. To address this, we propose Boundary-Guided Policy Optimization (BGPO), a memory-efficient RL algorithm that maximizes a specially constructed lower bound of the ELBO-based objective.	Nianyi Lin; Jiajie Zhang; Lei Hou; Juanzi Li;
174	From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a multi-concept evaluation setting using concepts such as sentiment, domain, voice, and tense.	Aaron Mueller; Andrew Lee; Shruti Joshi; Ekdeep Singh Lubana; Dhanya Sridhar; Patrik Reizinger;
175	DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we introduce DetectRL-X, a comprehensive multilingual benchmark designed to evaluate advanced detectors across 8 dimensions.	Junchao Wu; Yefeng Liu; Chenyu Zhu; Hao Zhang; Zeyu Wu; Tianqi Shi; Yichao Du; Longyue Wang; Weihua Luo; Jinsong Su; Derek F. Wong;
176	Tears or Cheers? Benchmarking LLMs Via Culturally Elicited Distinct Affective Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These benchmarks remain insufficient to capture the subjective interpretative variance inherent to diverse sociocultural lenses. To address this limitation, we introduce CEDAR, a multimodal benchmark constructed entirely from scenarios capturing Culturally Elicited Distinct Affective Responses.	Chongyuan Dai; Yaling Shen; Zihan Gao; Jia Li; Yishun Jiang; Yaxiong Wang; Liu Liu; Zongyuan Ge; Jinpeng Hu;
177	Evo-Attacker: Memory-Augmented Reinforcement Learning for Long-Horizon Tool Attacks on LLM-MAS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing tool attacks are limited by domain specificity or fixed and static templates. To address these challenges, we propose Evo-Attacker, which formulates the tool attack as a self-evolving, memory-augmented reinforcement learning process.	Bingyu Yan; Xiaoming Zhang; JinYu Hou; Chaozhuo Li; Ziyi Zhou; Yiming Hei; Litian Zhang;
178	Provably Safe Offline-to-Online RL: Decoupling Learning from Data-Driven Safety Enforcement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce RLPD-GX, a framework that decouples policy optimization from safety enforcement: a reward-seeking learner explores freely, while a projection-based guardian guarantees rule-consistent execution and safe value backups.	Kaitong Cai; Jusheng Zhang; Keze Wang;
179	Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, beyond general medical knowledge from open-ended datasets, clinical case-based knowledge is also critical for effective medical reasoning, as it provides context grounded in real-world patient experiences. Motivated by this, we propose Experience Retrieval-Augmentation ExpRAG framework based on Electronic Health Record(EHR), aiming to offer the relevant context from other patients’ discharge reports.	Justice Ou; Tinglin Huang; Yilun Zhao; Ziyang Yu; Peiqing Lu; Yifei Shen; Rex Ying;
180	Seeing But Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We then reveal that visual experts and domain experts exhibit layer-wise separation, with image inputs inducing significant routing divergence from text inputs in middle layers where domain experts concentrate. Based on these findings, we propose the Routing Distraction hypothesis: when processing visual inputs, the routing mechanism fails to adequately activate task-relevant reasoning experts.	Haolei Xu; Haiwen Hong; Hongxing Li; Rui Zhou; Yang Zhang; Longtao Huang; Hui Xue; Yongliang Shen; Weiming Lu; Yueting Zhuang;
181	LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, adapting these methods to long contexts presents three key challenges: (1) the excessive memory demands posed by draft models due to large Key-Value (KV) cache; (2) performance degradation resulting from the mismatch between short-context training and long-context inference; and (3) inefficiencies in tree attention mechanisms when managing long token sequences. This work introduces LongSpec, a framework that addresses these challenges through three core innovations: a memory-efficient draft model with a constant-sized KV cache; novel position indices that mitigate the training–inference mismatch; and an attention aggregation strategy that combines fast prefix computation with standard tree attention to enable efficient decoding.	Penghui Yang; Cunxiao Du; Fengzhuo Zhang; Haonan Wang; Tianyu Pang; Chao Du; Bo An;
182	Learning Diverse Responses with Prefix-Conditioned Supervised Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We attribute this issue in part to supervised fine-tuning (SFT): when a single prompt is paired with multiple reference responses, the model is trained to generate diverse outputs under the same prior condition, which induces optimization interference and can lead to diversity collapse. To address this, we propose Prefix-Conditioned SFT (P-SFT), a simple yet effective method that constructs semantically consistent yet distributionally distinct prior contents to different responses, thereby projecting the instruction into distinct latent regions to establish diverse prior distributions and decouple the one-to-many mapping.	Zhiyuan Fan; Guanqiao Chen; Yanyi Huang; Mingkuan Zhao; Dadi Guo; Yi R. Fung;
183	TRAC: Teacher-Guided Token Reward with Adaptive Calibration for Robust Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In reality, teacher models exhibit capability limitations and uncertainty, producing noisy signals that make student policies susceptible to reward hacking. To address this, we propose Teacher Reward Adaptive Calibration (TRAC), a robust framework that filters noisy supervision by dynamically modulating teacher influence via a multi-granularity calibration mechanism.	Sitong Wu; Haoru Tan; Xichen Zhang; Bin Xia; Wenhu Zhang; Xiaojuan Qi; Bei Yu; Jiaya Jia;
184	ARGUS: Policy-Adaptive Ad Governance Via Evolving Reinforcement with Adversarial Umpiring Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose ARGUS, a policy-adaptive governance system that enables evolving reinforcement through multi-agent adversarial umpiring.	Deyi Ji; Junyu Lu; Xuanyi Liu; Liqun Liu; Hailong Zhang; Peng Shu; Huan Yu; Jie Jiang; Tianrun Chen; Lanyun Zhu;
185	Benchmarking Web Agent Safety Under E-commerce Deceptive Interfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we study web agent behavior under realistic deceptive interfaces in the e-commerce domain.	Zijing Shi; Meng Fang; Ling Chen;
186	CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This neglects a crucial capability: agents’ ability to devise and adjust cost-optimal plans in response to changing environments. To bridge this gap, we introduce CostBench, a scalable, cost-centric benchmark designed to evaluate agents’ economic reasoning and replanning abilities.	Jiayu Liu; Cheng Qian; Zhaochen Su; Qing Zong; Shijue Huang; Bingxiang He; Yi R. Fung;
187	ReasonEmbed: Enhanced Text Embeddings for Reasoning-Intensive Document Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce ReasonEmbed, a novel text embedding model developed for reasoning-intensive document retrieval.	Jianlyu Chen; Junwei Lan; Chaofan Li; Defu Lian; Zheng Liu;
188	Illusions of Confidence? Diagnosing LLM Truthfulness Via Neighborhood Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference.	Haoming Xu; Ningyuan Zhao; Yunzhi Yao; Weihong Xu; Hongru Wang; Xinle Deng; Shumin Deng; Jeff Z. Pan; Huajun Chen; Ningyu Zhang;
189	Lizard: An Efficient Linearization Framework for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures.	Chien Van Nguyen; Huy Huu Nguyen; Ruiyi Zhang; Hanieh Deilamsalehy; Puneet Mathur; Viet Dac Lai; Haoliang Wang; Jayakumar Subramanian; Ryan A. Rossi; Trung Bui; Nikos Vlassis; Franck Dernoncourt; Thien Huu Nguyen;
190	Octopus: Gated Selective Attention for Memory-Bounded Long-Context Inference in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Octopus, a framework that confers fixed-memory inference onto pretrained Transformers without the information loss of linearization.	Chien Van Nguyen; Ryan A. Rossi; Linh Ngo Van; Franck Dernoncourt; Thien Huu Nguyen;
191	What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose SCRL (Selective-Complementary Reinforcement Learning), a robust test-time reinforcement learning framework that effectively mitigates label noise amplification.	Dong Yan; Jian Liang; Yanbo Wang; Shuo Lu; Ran He; Tieniu Tan;
192	DPDV: Dual-Pathway and Dual-View Representation Learning for Bridging Information Asymmetry in Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the asymmetry of cross-modal information poses a challenge to accurately establishing retrieval relationships. To overcome this challenge, we propose a novel video retrieval framework, termed the Dual-Pathway and Dual-View model (DPDV), which consists of the Dual-Pathway Partitioning Module (DPPM) for constructing features at an appropriate granularity and the Dual-View Interaction Module (DVIM) for performing effective feature interactions.	Zequn Xie; Xin Liu; Fangming Feng; Boyun Zhang; Tao Jin;
193	Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This oversight may result in hazardous or malicious sources being integrated into the final report. To address this, we introduce DeepResearchGuard, a framework featuring four-stage safeguards with open-domain evaluation, and DRSafeBench, a novel stage-wise safety benchmark.	Wei-Chieh Huang; Henry Peng Zou; Yaozu Wu; Dongyuan Li; Yankai Chen; Weizhi Zhang; Yangning Li; Angelo Zangari; Jizhou Guo; Chunyu Miao; Liancheng Fang; Langzhou He; Yinghui Li; Renhe Jiang; Philip S. Yu;
194	DPC: Training-Free Text-to-SQL Candidate Selection Via Dual-Paradigm Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce DPC (Dual-Paradigm Consistency), a multi-agent framework that reformulates SQL selection from a probabilistic guessing task on hidden data into a deterministic verification task on visible data.	Boyan Li; Ou Ocean Kun Hei; Yue Yu; Yuyu Luo;
195	Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing evaluation frameworks typically focus on diagnostic accuracy and win-rates and often overlook alignment with patient-specific goals, values, and personalities required for meaningful conversations. To address this, we introduce MedAgent, a novel framework for synthetically generating realistic, multi-turn mental health sensemaking conversations and use it to create the Mental Health Sensemaking Dialogue (MHSD) dataset, comprising over 2,200 patient–LLM conversations.	Mohit Chandra; Siddharth Sriraman; Harneet Singh Khanuja; Yiqiao Jin; Munmun De Choudhury;
196	Understanding Emergent Misalignment Via Feature Superposition Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite growing empirical evidence, its underlying mechanism remains unclear. To uncover the reason behind this phenomenon, we propose a mechanistic account based on the geometry of feature superposition.	Gouki Minegishi; Hiroki Furuta; Takeshi Kojima; Yusuke Iwasawa; Yutaka Matsuo;
197	When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce an OCR benchmark for industrial RAG systems covering 11 challenging document types, including extreme layouts, high-resolution pages, complex or watermarked backgrounds, historical documents with non-standard reading orders, visually decorated text, and documents containing tables and mathematical formulas.	Lin Sun; Wangdexian; Jingang Huang; Linglin Zhang; Change Jia; Zhengwei Cheng; Xiangzheng Zhang;
198	Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this challenge, we systematically investigate the identification of stable tokens and present three key findings: (1) naive lookahead decoding is unreliable, (2) token stability closely correlates with convergence trend, and (3) historical information is isolated. Building on these insights, we propose Anchor-based History-stable Decoding (AHD), a training-free, plug-and-play dynamic decoding strategy.	Shun Zou; Yong Wang; Zehui Chen; Lin Chen; Chongyang Tao; Feng Zhao; Xiangxiang Chu;
199	CRISP: Persistent Concept Unlearning Via Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CRISP, a parameter-efficient method for persistent concept unlearning using SAEs.	Tomer Ashuach; Dana Arad; Aaron Mueller; Martin Tutek; Yonatan Belinkov;
200	Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unified multimodal models aim to jointly enable visual understanding and generation, yet current benchmarks rarely examine their true integration.	Kai Zou; Ziqi Huang; Yuhao Dong; Shulin Tian; Dian Zheng; Hongbo Liu; Jingwen He; Bin Liu; Yu Qiao; Ziwei Liu;
201	Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting Via Automated Spectral Inspection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Due to the limited generalization and interpretability of deep learning classifiers, the final vetting of rare celestial object candidates still relies on manually intensive expert visual inspection, which has become a primary bottleneck as modern spectroscopic surveys continue to scale. To bridge this gap, we propose Spec-o3, a tool-augmented vision-language agent that performs astronomer-aligned spectral inspection via interleaved multimodal chain-of-thought reasoning.	Minghui Jia; Qichao Zhang; Ali Luo; Linjing Li; Shuo Ye; Hailing Lu; Wen Hou; Dongbin Zhao;
202	LiGen: Active Lipid Generation Via A Molecular Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a method, LiGen, to generate lipid molecules efficiently and actively, facilitating the discovery of high-performing LNP formulations.	Ying Zhan; Xiuqi Tang; Yan Zhang; Xiao Tan; Dian Shen; Zhou Yu; Beilun Wang;
203	Putting HUMANS First: Efficient LAM Evaluation with Human Preference Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: 85 correlation with human. To better predict preferences, we trained regression models on these selected subsets, achieving 0.	Woody Haosheng Gan; William Barr Held; Diyi Yang;
204	In-Context Representation Hijacking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Doublespeak, a simple in-context representation hijacking attack against language models.	Itay Yona; Amir Sarid; Michael Karasik; Yossi Gandelsman;
205	Don’t Stop Early: Scalable Enterprise Deep Research with Controlled Information Flow and Evidence-Aware Termination Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Enterprise deep research often fails to produce decision-ready reports due to uneven information coverage, context explosion, and premature stopping. We propose a scalable Enterprise Deep Research (EDR) architecture to address these failures.	Prafulla Kumar Choubey; Kung-Hsiang Huang; Pranav Narayanan Venkit; Jiaxin Zhang; Vaibhav Vats; Yu Li; Xiangyu Peng; Chien-Sheng Wu;
206	Nature-Inspired Population-Based Evolution of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Inspired by this principle, this paper formally defines a newly emerging problem: the population-based evolution of large language models (LLMs). We introduce a novel framework that starts with a population of parent LLMs and allows this population to evolve through four key operations: (i) crossover, merging the weights of different parents to create offspring LLMs, (ii) mutation, introducing small, random changes to model weights to foster diversity, (iii) selection, prioritizing high-performing models, and (iv) succession, transferring the learned experience from parent to offspring LLMs.	Yiqun Zhang; Peng Ye; Xiaocui Yang; Shi Feng; Shufei Zhang; Lei Bai; Wanli Ouyang; Shuyue Hu;
207	MTRouter: Cost-Aware Multi-Turn LLM Routing with History–Model Joint Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MTRouter, which encodes the interaction history and candidate models into joint history–model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility.	Yiqun Zhang; Hao Li; Zihan Wang; Shi Feng; Xiaocui Yang; Daling Wang; Bo Zhang; Lei Bai; Shuyue Hu;
208	Closing The Modality Reasoning Gap for Speech Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This gap could be associated with representational drift across Transformer layers and behavior deviations in long-chain reasoning. To address this issue, we introduce TARS, a reinforcement-learning framework that aligns text-conditioned and speech-conditioned trajectories through an asymmetric reward design.	Chaoren Wang; Heng Lu; Xueyao Zhang; Shujie Liu; Yan Lu; Jinyu Li; Zhizheng Wu;
209	MED-COREASONER: Reducing Language Disparities in Medical Reasoning Via Language-Informed Co-Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While reasoning-enhanced large language models perform strongly on English medical tasks, a persistent multilingual gap remains, with substantially weaker reasoning in local languages, limiting equitable global medical deployment. To bridge this gap, we introduce Med-CoReasoner, a language-informed co-reasoning framework that elicits parallel English and local-language reasoning, abstracts them into structured concepts, and integrates local clinical knowledge into an English logical scaffold via concept-level alignment and retrieval.	Fan Gao; Sherry T. Tong; Jiwoong Sohn; Jiahao Huang; Junfeng Jiang; Ding Xia; Piyalitt Ittichaiwong; Kanyakorn Veerakanjana; Hyunjae Kim; Qingyu Chen; Edison Marrese-Taylor; Kazuma Kobayashi; Akiko Aizawa; Irene Li;
210	Alexandria: A Multi-Domain Dialectal Arabic Machine Translation Dataset for Culturally Inclusive and Linguistically Diverse LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite this, machine translation (MT) systems often generalize poorly to dialectal input, limiting their utility for millions of speakers. We introduce Alexandria, a large-scale, community-driven, human-translated dataset designed to bridge this gap.	Abdellah EL Mekki; Samar M. Magdy; Houdaifa Atou; Ruwa AbuHweidi; Baraah Qawasmeh; Omer Nacar; Thikra Al-hibiri; Razan Saadie; Hamzah A. Alsayadi; Nadia Ghezaiel Hammouda; Alshima Mohammed Alkhazimi; Aya Hamod; Al-Yas Yaqoob Al-Ghafri; Wesam El-Sayed; Asila Ismail al Sharji; Mohamad Ballout; Anas Belfathi; Karim Ghaddar; Serry Sibaee; Alaa Aoun; Aeej Mohammed Aseri; Lina Abureesh; Ahlam Bashiti; Majdal Yousef; Abdulaziz Hafiz; Yehdih Mohamed; Emira Hamedtou; Brakehe Emehah; Rahaf Alhamouri; Youssef Nafea; Aya El Aatar; Walid Al-Dhabyani; Emhemed S. Hamed; Sara Shatnawi; Fakhraddin Alwajih; Khalid Elkhidir; Ashwag Alasmari; Abdurrahman Gerrio; Omar Said Alshahri; AbdelRahim A. Elmadany; Ismail Berrada; Amir Azad Adli Al-kathiri; Fadi Zaraket; Mustafa Jarrar; Yahya Mohamed EL Hadj; Hassan Alhuzali; Muhammad Abdul-Mageed;
211	Contextual Relevance and Adaptive Sampling for LLM-Based Document Reranking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To efficiently estimate contextual relevance, we propose TS-SetRank, a sampling-based, uncertainty-aware reranking algorithm.	Jerry Huang; Siddarth Madala; Cheng Niu; Julia Hockenmaier; Tong Zhang;
212	GTA: Generating Long-horizon Tasks for Web Agents at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a scalable framework GTA that integrates crawling, retrieval-based seeding, in-context generation, and automated quality control to produce realistic tasks paired with executable trajectories.	Tenghao Huang; Kung-Hsiang Huang; Prafulla Kumar Choubey; Yilun Zhou; Muhao Chen; Jonathan May; Chien-Sheng Wu;
213	ReCreate: Reasoning and Creating Domain Agents Driven By Experience Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such strategies overlook critical evidence explaining why an agent succeeds or fails, and often require high computational costs. To address these limitations, we propose ReCreate, an experience-driven framework for the automatic creation of domain agents.	Zhezheng Hao; Hong Wang; Jian Luo; Jianqing Zhang; Yuyan Zhou; Qiang Lin; Can Wang; Hande Dong; Jiawei Chen;
214	Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we conduct comprehensive theoretical and empirical analyses of entropy dynamics in RLVR, offering two main insights: (1) We derive a tight approximation for token-level entropy change at each update step, revealing four governing factors and providing a unified theoretical framework of how existing methods influence entropy; (2) We reveal a fundamental limitation of recent approaches: they rely on heuristic adjustments to one or two of these factors, leaving other relevant factors unconsidered, thus inherently limiting their effectiveness. Motivated by these findings, we propose STEER, a principled entropy-modulation method that adaptively reweighs tokens based on theoretically-estimated entropy variations.	Zhezheng Hao; Hong Wang; Haoyang Liu; Jian Luo; Jiarui Yu; Hande Dong; Qiang Lin; Can Wang; Jiawei Chen;
215	OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While LLMs have shown promising capabilities in generating believable human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPeRA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping sessions.	Ziyi Wang; Yuxuan Lu; Wenbo Li; Amirali Amini; Bo Sun; Yakov Bart; Weimin Lyu; Jiri Gesi; Tian Wang; Jing Huang; Yu Su; Upol Ehsan; Malihe Alikhani; Toby Jia-Jun Li; Lydia Chilton; Dakuo Wang;
216	Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User Intents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge the gap, we present Trajectory2Task a verifiable data generation pipeline for studying tool use at scale under three realistic user scenarios: ambiguous intent, changing intent, and infeasible intents.	Ziyi Wang; Yuxuan Lu; Yimeng Zhang; Pei Chen; Ziwei Dong; Jing Huang; Jiri Gesi; Xianfeng Tang; Chen Luo; Qun Liu; Yisi Sang; Hanqing Lu; Manling Li; Jin Lai; Dakuo Wang;
217	Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In all, we challenge claims that partial-input success is always a flaw, so we discuss how reasoning traces could separate problematic data from less problematic reasoning.	Nishant Balepur; Atrey Desai; Rachel Rudinger;
218	RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Understanding research papers remains challenging for foundation models due to specialized scientific discourse and complex figures and tables, yet existing benchmarks offer limited fine-grained evaluation at scale. To address this gap, we introduce RPC-Bench, a large-scale question-answering benchmark built from review–rebuttal exchanges of high-quality computer science papers, containing 15K human-verified QA pairs.	Yelin Chen; Fanjin Zhang; Suping Sun; Yunhe Pang; Yuanchun Wang; Jian Song; XiaoYan Li; Lei Hou; Shu Zhao; Jie Tang; Juanzi Li;
219	CoEvolve: Training LLM Agents Via Agent-Data Mutual Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Reinforcement learning for LLM agents is typically conducted on a static data distribution, which fails to adapt to the agent’s evolving behavior and leads to poor coverage of complex environment interactions. To address these challenges, we propose CoEvolve, an agent-data mutual evolution framework that enables LLM agents to improve through closed-loop, interaction-driven training.	Shidong Yang; Ziyu Ma; Tongwen Huang; Yiming Hu; Yong Wang; Xiangxiang Chu;
220	FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes Fine-grained Language-Audio Pretraining (FineLAP), a novel training paradigm that advances both clip- and frame-level alignment in CLAP with heterogeneous data.	Xiquan Li; Xuenan Xu; Ziyang Ma; Wenxi Chen; Haolin He; Qiuqiang Kong; Xie Chen;
221	MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present MeanAudio, a fast and faithful text-to-audio generator capable of rendering realistic sound with only one function evaluation (1-NFE).	Xiquan Li; Junxi Liu; Yuzhe Liang; Zhikang Niu; Wenxi Chen; Xie Chen;
222	RATE: Reviewer Profiling and Annotation-free Training for Expertise Ranking in Peer Review Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Reviewer assignment is increasingly critical yet challenging in the LLM era, where rapid topic shifts render many pre-2023 benchmarks outdated and where proxy signals poorly reflect true reviewer familiarity. We address this evaluation bottleneck by introducing LR-bench, a high-fidelity, up-to-date benchmark curated from 2024–2025 AI/NLP manuscripts with five-level self-assessed familiarity ratings collected via a large-scale email survey, yielding 1,055 expert-annotated paper–reviewer–score annotations.	Weicong Liu; Zixuan Yang; Yibo Zhao; Xiang Li;
223	AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce AdvancedIF, a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs’ ability to follow complex, multi-turn, and system-level instructions.	Yun He; Wenzhe Li; Hejia Zhang; Songlin Li; Karishma Mandyam; Sopan Khosla; Yuanhao Xiong; Nanshu Wang; Xiaoliang Peng; Beibin Li; Shengjie Bi; Shishir G Patil; Qi Qi; Shengyu Feng; Julian Katz-Samuels; Richard Yuanzhe Pang; Sujan Kumar Gonugondla; Hunter Lang; Yue Yu; Yundi Qian; Maryam Fazel-Zarandi; Licheng Yu; Amine Benhalloum; Hany Hassan Awadalla; Manaal Faruqui;
224	When One LLM Drools, Multi-LLM Collaboration Rules Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We challenge the status quo of relying solely on a single general-purpose LLM and argue for multi-LLM collaboration to better represent the extensive diversity of data, skills, and people.	Shangbin Feng; Wenxuan Ding; Alisa Liu; Zifeng Wang; Weijia Shi; Yike Wang; Shannon Zejiang Shen; Xiaochuang Han; Hunter Lang; Chen-Yu Lee; Tomas Pfister; Yejin Choi; Yulia Tsvetkov;
225	HopWeaver: Cross-Document Synthesis of High-Quality and Authentic Multi-Hop Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces HopWeaver, the first cross-document framework synthesizing authentic multi-hop questions without human intervention.	Zhiyu Shen; Jiyuan Liu; Yunhe Pang; Yanghui Rao; Fu Lee Wang; Jianxing Yu;
226	Understanding The Behaviors of Environment-aware Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To facilitate learning over multi-retrieval-step trajectories, we introduce a branching-based rollout technique that improves training stability.	Ruifeng Yuan; Chaohao Yuan; David Dai; Yu Rong; Hong Cheng; Hou Pong Chan; Chenghao Xiao;
227	Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present GenCluster, a scalable and reproducible test-time compute framework that attains IOI gold-level performance using open-weight models.	Mehrzad Samadi; Aleksander Ficek; Sean Narenthiran; Siddhartha Jain; Wasi Uddin Ahmad; Somshubra Majumdar; Vahid Noroozi; Boris Ginsburg;
228	Can Persona-Prompted LLMs Emulate Subgroup Values? An Empirical Analysis of Generalisability and Fairness in Cultural Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates the more challenging task of fine-grained value alignment: examining whether LLMs can emulate the distinct cultural values of demographic subgroups.	Bryan Chen Zhengyu Tan; Zhengyuan Liu; Xiaoyuan Yi; Jing Yao; Xing Xie; Nancy F. Chen; Roy Ka-Wei Lee;
229	LLM-MC-Affect: LLM-Based Monte Carlo Modeling of Affective Trajectories and Latent Ambiguity for Interpersonal Dynamic Insight Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce LLM-MC-Affect, a probabilistic framework that characterizes emotion not as a static label, but as a continuous latent probability distribution defined over an affective space.	Yu-Zheng Lin; Bono Po-Jen Shih; John Paul Martin Encinas; Elizabeth Victoria Achom; Karan Patel; Jesus Horacio Pacheco; Sicong Shao; Jyotikrishna Dass; Soheil Salehi; Pratik Satam;
230	Psyche-R1: Towards Reliable Psychological LLMs Through Unified Empathy, Expertise, and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent reasoning-augmented LLMs have achieved remarkable performance in mathematics and programming, while research in the psychological domain has predominantly emphasized emotional support and empathetic dialogue, with limited attention to reasoning mechanisms that are beneficial to generating accurate responses. Therefore, in this paper, we propose Psyche-R1, the first Chinese psychological LLM that jointly integrates empathy, psychological expertise, and reasoning, built upon a novel data curation pipeline.	Chongyuan Dai; Jinpeng Hu; Hongchang Shi; Zhuo Li; Dan Guo; Xun Yang; Meng Wang;
231	LOKA: Conflict-Aware LLM Knowledge Update with Adaptive Knowledge Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate the problem of LLM knowledge updates, which requires simultaneously unlearning unwanted information and learning new knowledge.	Binchi Zhang; Zhengzhang Chen; Zaiyi Zheng; Jundong Li; Haifeng Chen;
232	Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems Via Bi-Level Graph Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, the lack of interpretability in these methods limits their reliability and real-world applicability. To address these limitations, we propose , an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.	Junjun Pan; Yixin Liu; Rui Miao; Kaize Ding; Yu Zheng; Quoc Viet Hung Nguyen; Alan Wee-Chung Liew; Shirui Pan;
233	ReCode: Reinforcing Code Generation with Reasoning-Process Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work proposes ReCode(Reasoning-Reinforced Code Generation), a novel RL training framework comprising: (1) Contrastive Reasoning-Process Reward Learning (CRPL), which trains a reward model with synthesized optimized and degraded reasoning variants to assess the quality of reasoning process; and (2) Consistency-Gated GRPO (CG-GRPO), which integrates the reasoning-process reward model into RL by gating neural reasoning-process rewards with strict execution outcomes, using execution correctness as a hard gate to mitigate reward hacking.	Lishui Fan; Yu Zhang; Mouxiang Chen; Zhongxin Liu;
234	From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While online reinforcement learning (RL), specifically GPRO, offers a promising solution, we find that a naive application of GRPO leads to competitive degradation, as the simple linear aggregation of rewards with static weights causes conflicting gradient signals and a misalignment with the temporal dynamics of the diffusion process. To overcome these limitations, we propose Customized-GRPO, a novel framework featuring two key innovations: (i) Synergy-Aware Reward Shaping (SARS), a non-linear mechanism that explicitly penalizes conflicted reward signals and amplifies synergistic ones, providing a sharper and more decisive gradient.	Ziwei Huang; Ying Shu; Fanghao; Quanyu Long; Wenya Wang; Qiushi Guo; Tiezheng Ge; Leilei Gan;
235	How Should We Enhance The Safety of Large Reasoning Models: An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a comprehensive empirical study on how to enhance the safety of LRMs through Supervised Fine-Tuning (SFT).	Zhexin Zhang; Xian Qi Loye; Victor Shea-Jay Huang; Junxiao Yang; Qi Zhu; Shiyao Cui; Fei Mi; Lifeng Shang; Yingkang Wang; Hongning Wang; Minlie Huang;
236	MMSciCode: Real-world Evaluation of Multilingual Multi-Discipline Scientific Research Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MMSciCode, a comprehensive expert-level, multilingual multi-discipline benchmark for evaluating foundation models in scientific code generation.	Xue Xia; Zheyuan Yang; Arman Cohan; Yilun Zhao;
237	Attention Under Attack: Analog Noise Effects and Mechanistic Vulnerabilities in Transformer Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present the first fine-grained analysis of analog vulnerability in pretrained transformers, examining projection submodules, attention heads, and layer-wise dynamics across multiple NLP tasks.	Mafizur Rahman; Lijun Qian;
238	Visually-Guided Policy Optimization for Multimodal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: More importantly, our empirical analysis reveals that temporal visual forgetting along reasoning steps exacerbates this deficiency. To bridge this gap, we propose Visually-Guided Policy Optimization (VGPO), a novel framework to reinforce visual focus during policy optimization.	Zengbin Wang; Feng Xiong; Liang Lin; Xuecai Hu; Yong Wang; Yanlin Wang; Man Zhang; Xiangxiang Chu;
239	Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this work is to develop a universal approach for aligning subtitles (i. e. , spoken language text with corresponding timestamps) to continuous sign language videos.	Zifan Jiang; Youngjoon Jang; Liliane Momeni; Gül Varol; Sarah Ebling; Andrew Zisserman;
240	Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Embodied-Reasoner, a reasoning model for interactive embodied tasks.	Wenqi Zhang; Mengna Wang; Gangao Liu; Huixin Xu; Yiwei Jiang; Yongliang Shen; Guiyang Hou; Zhe Zheng; Hang Zhang; Xin Li; Jiajun Liu; Weiming Lu; Peng Li; Yueting Zhuang;
241	Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful Without Hint Verbalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With a new faithful@k metric, we show that larger inference-time budgets greatly increase hint verbalization (up to 90% in some settings), suggesting much apparent unfaithfulness is due to tight token limits.	Kerem Zaman; Shashank Srivastava;
242	TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing document OCR largely targets plain text or Markdown, discarding the structural and executable properties that make LaTeX essential for scientific publishing. We study page-level reconstruction of scientific PDFs into compilable LaTeX and introduce TexOCR-Bench, a benchmark, and TexOCR-Train, a large-scale training corpus, for this task.	Chengye Wang; Lin Fu; Zexi Kuang; Yilun Zhao;
243	Breaking The Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This generalizes unseen generators as a central and challenging problem for AI-text detection. To tackle this challenge, we propose a progressively structured framework that disentangles AI-detection semantics from generator-aware artifacts.	Xiao Pu; Zepeng Cheng; Lin Yuan; Yu Wu; Xiuli Bi;
244	Unified Thinker: A General Reasoning Core for Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows.	Sashuai Zhou; Qiang Zhou; Jijin Hu; Hanqing Yang; Yue Cao; Junpeng Ma; Yinchao Ma; Jun Song; Tiezheng Ge; Cheng Yu; Bo Zheng; Zhou Zhao;
245	Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In response, we introduce SP3F (Self-Play with Privileged Pairwise Feedback), a two-stage framework for enhancing multilingual reasoning without any data in the target language(s).	Lintang Sutawika; Gokul Swamy; Steven Wu; Graham Neubig;
246	MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches often suffer from representation collapse and expert load imbalance, which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach.	Jie Cao; Tianwei Lin; Bo Yuan; Rolan Yan; Hongyang He; Wenqiao Zhang; Juncheng Li; Dongping Zhang; Siliang Tang; Yueting Zhuang;
247	MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we propose Retrieval-Augmented Multi-modal Multi-agent Stance Detection (MM-StanceDet), a novel multi-agent framework integrating Retrieval Augmentation for contextual grounding, specialized Multimodal Analysis agents for nuanced interpretation, a Reasoning-Enhanced Debate stage for exploring perspectives, and Self-Reflection for robust adjudication.	Weihai Lu; Zhejun Zhao; Yanshu Li; Huan He;
248	Compatibility-Aware Dynamic Fine-Tuning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Compatibility-Aware Dynamic Fine-Tuning (CADFT), a principled extension of DFT that controls sample-level optimization variance.	Yucheng Zhou; Junwei Sheng; Qianning Wang; Jianbing Shen;
249	Multimodal Large Language Models for Multi-Subject In-Context Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To overcome the data scarcity, we introduce an automatic and scalable data generation pipeline that eliminates the need for manual annotation.	Yucheng Zhou; Dubing Chen; Huan Zheng; Jianbing Shen;
250	Efficient KL Divergence Estimation Via Truncated Top-K Integration for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose TIKE (Top-k Importance-weighted KL Estimator), which exploits the Zipfian structure of language model distributions: by deterministically integrating over only the top-k tokens, TIKE captures most of the probability mass while effectively reducing memory cost.	Xinyuan Wang; Luozhijie Jin; Bo Wang; Yuan Li; Zhangyue Yin; Xipeng Qiu;
251	What Do LLMs Learn First? Asymmetric Learning Dynamics of Input Complexity and Output Ambiguity in Preference Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through systematic analysis, we demonstrate that these dimensions induce asymmetric learning dynamics, with IC-related competencies developing rapidly in early training while OA-related competencies emerge more gradually. Building on this observation, we propose DECOPO, a training framework that maintains separate, adaptive pacing schedules for each dimension.	Mengyang Li; Jingwen Wang; Pinlong Zhao;
252	Follow The Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we investigate how semantic information is distributed across token representations in text-to-image prompts, analyzing it at two levels: (1) in-item representation—whether individual tokens represent their lexical item (i. e. , a word or expression conveying a single concept), and (2) cross-item interaction—whether information flows between tokens of different lexical items.	Guy Kaplan; Michael Toker; Yuval Reif; Yonatan Belinkov; Roy Schwartz;
253	USB: A COMPREHENSIVE AND UNIFIED SAFETY EVALUATION BENCHMARK FOR MULTIMODAL LARGE LANGUAGE MODELS Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current benchmarks fail to provide reliable assessments due to limited risk coverage, insufficient scale, and the oversight of complex modality combinations (e. g. , cross-modal risks). To address this, we introduce the Unified Safety Benchmark (USB), a comprehensive framework covering 61 risk categories across four distinct modality interactions.	Baolin Zheng; Guanlin Chen; Qingyang Teng; Hongqiong Zhong; Yingshui Tan; Zhendong Liu; Weixun Wang; Jiaheng Liu; Jian Yang; Huiyun Jing; Jincheng Wei; Wenbo Su; Xiaoyong Zhu; Bo Zheng; Kaifu Zhang;
254	Mechanistic Interpretability Should Prioritize Feature Consistency in Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Pairwise Dictionary Mean Correlation Coefficient (PW-MCC) as an assignment-based metric to quantify consistency and demonstrate that high levels are achievable (PW-MCC ≈ 0.	Xiangchen Song; Aashiq Muhamed; Yujia Zheng; Lingjing Kong; Zeyu Tang; Mona T. Diab; Virginia Smith; Kun Zhang;
255	BaseCal: Unsupervised Confidence Calibration Via Base Model Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work proposes two ways to achieve this.	Hexiang Tan; Wanli Yang; Junwei Zhang; Xin Chen; Rui Tang; Du Su; Jingang Wang; Yuanzhuo Wang; Fei Sun; Xueqi Cheng;
256	Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent Via Systematic Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Crucially, we find that directly applying task-level rewards often leads to convergence difficulties due to the sparse nature of GUI interactions. To address these challenges, we present Mobile-R1, a systematic training recipe that bridges atomic action execution and strategic task completion.	Jihao Gu; Qihang Ai; Yingyao Wang; Pi Bu; Jingxuan Xing; Yue Cao; Zekun Zhu; Wei Jiang; Ziming Wang; Yingxiu Zhao; Ming-Liang Zhang; Jun Song; Yuning Jiang; Bo Zheng;
257	Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To further strengthen jailbreak testing and simulate more realistic attack conditions, we propose a method to generate dynamic adversarial variants.	Zirui Song; Qian Jiang; Mingxuan Cui; Mingzhe Li; Lang Gao; Zeyu Zhang; Zixiang Xu; Yanbo Wang; Guangxian Ouyang; Zhenhao Chen; Xiuying Chen;
258	Why Steering Works: Toward A Unified View of Language Model Parameter Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Methods for controlling large language models (LLMs), including local weight fine-tuning, LoRA-based adaptation, and activation-based interventions, are often studied in isolation, obscuring their connections and making comparison difficult. In this work, we present a unified view that frames these interventions as dynamic weight updates induced by a control signal, placing them within a single conceptual framework.	Ziwen Xu; Chenyan WU; Hengyu Sun; Haiwen Hong; Mengru Wang; Yunzhi Yao; Longtao Huang; Hui Xue; Shumin Deng; Zhixuan Chu; Huajun Chen; Ningyu Zhang;
259	I²B-LPO: Latent Policy Optimization Via Iterative Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Latent Policy Optimization via Iterative Information Bottleneck ( I²B-LPO), which shifts from statistical perturbation of token distributions to topological branching of reasoning trajectories.	Huilin Deng; Hongchen Luo; Yue Zhu; Long Li; Zhuoyue Chen; Xinghao Zhao; Ming LI; Chuyang Zhao; Jihai Zhang; MengChang Wang; Yang Cao; Yu Kang;
260	EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce EverMemOS, a self-organizing memory operating system that implements an engram-inspired lifecycle for computational memory.	Chuanrui Hu; Xingze Gao; Zuyi Zhou; Dannong Xu; Yi Bai; Xintong Li; Hui Zhang; Tong Li; Chong Zhang; Lidong Bing; Yafeng Deng;
261	RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset Via Automated Coarse-to-Fine Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While rubric-based evaluation offers a structured proxy for verification, existing methods suffer from scalability bottlenecks and coarse criteria, resulting in a supervision ceiling effect. To address this, we propose an automated Coarse-to-Fine Rubric Generation framework.	Sunzhu Li; Jiale Zhao; Huimin Ren; Zhenlin Wei; Yang Zhou; Jingwen Yang; Shunyu Liu; Kaike Zhang; Chen Wei;
262	CachePrune: Teaching LLMs What Not to Follow Via KV-Cache Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This vulnerability stems from LLMs’ inability to distinguish between data and instructions within a prompt. We propose CachePrune that defends against this attack by identifying and pruning neurons associated with instruction-following, during KV cache encoding of the prompt context.	Rui Wang; Junda Wu; Yu Xia; Tong Yu; Ruiyi Zhang; Ryan A. Rossi; Subrata Mitra; Lina Yao; Julian McAuley;
263	Data Efficient RLVR Via Off-Policy Influence Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work proposes a theoretically-grounded approach using influence functions to estimate the contribution of each data point to the learning objective.	Erle Zhu; Dazhi Jiang; Yuan Wang; Xujun Li; Jiale Cheng; Yuxian Gu; Yilin Niu; Aohan Zeng; Jie Tang; Minlie Huang; Hongning Wang;
264	STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series Via Spatial-Aware Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To promote spatially grounded logic, we introduce S-GRPO, a reinforcement learning algorithm that rewards performance gains specifically attributable to spatial information.	Juntong Ni; Shiyu Wang; Qi He; Ming Jin; Wei Jin;
265	Evaluating Answer Leakage Robustness of LLM Tutors Against Adversarial Student Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study scenarios where students behave adversarially and aim to obtain the correct answer from the tutor.	Jin Zhao; Marta Knežević; Tanja Käser;
266	Think in Latent Thoughts: A New Paradigm for Gloss-Free Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We thus introduce a reasoning-driven SLT framework that uses an ordered sequence of latent thoughts as an explicit middle layer between the video and the generated text.	Yiyang Jiang; Li Zhang; Xiao-Yong Wei; Li Qing;
267	Learning from Near-Misses: Error-Aware Contrastive Few-Shot Learning for NL2Formula Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce an abstract syntax tree (AST)-based error taxonomy that organizes common error modes by the kind of decision that goes wrong in the parse tree. Building on this taxonomy, we propose Error-Aware Contrastive Few-Shot Learning (ECFL), an error-aware framework that unifies training and inference around typed error supervision.	Zhihao Shuai; Yiyun Chen; Maolin Ma; Yutong Chen; Hanjia Qiu; Jing Xu; Ziye Chen; Weikai Yang;
268	TPS-Bench: Evaluating AI Agents’ Tool Planning & Scheduling Abilities in Compounding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Given a broad, heterogeneous tool repository, LLM agents must not only select appropriate tools based on task planning analysis but also strategically schedule the execution order to ensure efficiency. This paper introduces TPS-Bench to benchmark the ability of LLM agents in solving such problems that demand Tool Planning and Scheduling.	Hanwen Xu; Xuyao Huang; Yuzhe Liu; Zhijie Deng;
269	From Regulatory Approvals to Patents: Cross-Domain Linking for Cardiovascular Device Traceability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Using cardiovascular devices as a high-impact, representative domain featuring diverse technologies, high recall rates, and abundant disclosures, we construct a benchmark with 434 devices, 698K patents, and 585 high-fidelity expert-verified pairs. To address these challenges, we propose Bridge-MedDevKG, a coarse-to-fine framework that integrates (1) MedDevOnto, a domain-specific ontology that anchors device concepts via three-tier UMLS normalization; (2) Multi-signal candidate generation fusing company affiliation, semantic similarity, and ontology-weighted entity overlap; and (3) Heterogeneous reranking with multi-signal scoring and XGBoost classification on hard negatives.	Qingqing Yang; Haijiang Liu; Moyan Li;
270	Reusable Experiences: Latent Routing and Modular Composition in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ReX (Reusable eXperience), an experience-centric adaptation framework that treats latent experiences — recurring reasoning patterns and skills — as fundamental units for LLM specialization.	Shuai Ling; Lizi Liao; Dongmei Jiang; Weili Guan;
271	POWSM: A Phonetic Open Whisper-Style Speech Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce POWSM (Phonetic Open Whisper-style Speech Model), the first unified framework capable of jointly performing multiple phone-related tasks.	Chin-Jou Li; Kalvin Chang; Shikhar Bharadwaj; Eunjung Yeo; Kwanghee Choi; Jian Zhu; David R. Mortensen; Shinji Watanabe;
272	When Background Matters: Breaking Medical Vision Language Models By Transferable Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, while transferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible.	Akash Ghosh; Subhadip Baidya; Sriparna Saha; Xiuying Chen;
273	HERMES: KV Cache As Hierarchical Memory for Efficient Streaming Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, extending these capabilities to streaming video inputs, remains challenging, as existing models struggle to simultaneously maintain stable understanding performance, real-time responses, and low GPU memory overhead. To address this challenge, we propose HERMES, a novel training-free architecture for real-time and accurate understanding of video streams.	Haowei Zhang; Shudong Yang; Jinlan Fu; See-Kiong Ng; Xipeng Qiu;
274	SimPBL: A Multi-Agent Framework for Project-Based Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SimPBL, a multi-agent framework with an orchestrator agent that provides adaptive scaffolding from interaction logs and collaborator agents that support project work through boundary-aware collaboration.	Daniel Zhang-Li; Joy Jia Yin Lim; Binglin Liu; Shangqing Tu; Zijun Yao; Hao Peng; Jifan Yu; Haoxuan Li; Zhanxin Hao; Ye He; Zekun Li; Jiangyi Wang; Lei Hou; Bin Xu; Xin Cong; Zhiyuan Liu; Huiqin Liu; Yu Zhang; Juanzi Li;
275	ROSE: An Intent-Centered Evaluation Metric for NL2SQL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: It is sensitive to syntactic variation, ignores that questions may admit multiple interpretations, and is easily misled by erroneous ground-truth SQL. To address this, we introduce ROSE, an intent-centered metric that focuses on whether the predicted SQL answers the question, rather than consistency with the ground-truth SQL under the reference-dependent paradigm.	Wenqi Pei; Shizheng Hou; Boyan Li; Chen Han; Zhichao Shi; Yuyu Luo;
276	Reinforcement Learning for Self-Improving Agent with Skill Library Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we introduce Skill Augmented GRPO for self-Evolution (SAGE), a novel RL framework that systematically incorporates skills into learning.	Jiongxiao Wang; Qiaojing Yan; Yawei Wang; Yijun Tian; Soumya Smruti Mishra; Zhichao Xu; Megha Gandhi; Panpan Xu; Lin Lee Cheong;
277	Beyond Meta-Reasoning: Metacognitive Consolidation for Self-Improving LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Metacognitive Consolidation, a novel framework in which a model consolidates metacognitive experience from past reasoning episodes into reusable knowledge that improves future meta-reasoning.	Ziqing Zhuang; Linhai Zhang; Jiasheng Si; Deyu Zhou; Yulan He;
278	TeamFusion: Supporting Open-ended Teamwork with Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present TeamFusion, a multi-agent system designed to support teamwork in open-ended domains by: 1.	Jiale Liu; Victor Bursztyn; Lin Ai; Haoliang Wang; Sunav Choudhary; Saayan Mitra; Qingyun Wu;
279	SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose SAC, a neural speech codec with semantic-acoustic dual-stream quantization.	Wenxi Chen; Ruiqi Yan; Yushen Chen; Zhikang Niu; Ziyang Ma; Xiquan Li; Yuzhe Liang; Wenhanlin; Shunshun Yin; Ming Tao; Xinsheng Wang; Xie Chen;
280	Guided By Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Guided by Gut (GG), an efficient self-guided TTS framework that enables LLMs to perform step-by-step reasoning at a low cost, without any reward models or verifiers.	Amirhosein Ghasemabadi; Keith G. Mills; Baochun Li; Di Niu;
281	Reinforcement Learning on Pre-Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The latter incentivizes reasoning capabilities with strong generalization, but is constrained by limited data availability due to its reliance on human annotation. To alleviate these issues, we propose Reinforcement Learning on Pre-Training data (RLPT), which combines the advantages of learning from general data and RL.	Siheng Li; Kejiao Li; Zenan Xu; Guanhua Huang; Kun Li; Haoyuan Wu; Wujiajia; Zihao Zheng; Chenchen Zhang; Kun Shi; Xue Gong; Qi Yi; Ruibin Xiong; Tingqiang Xu; Yuhao Jiang; Jianfeng Yan; Yuyuan Zeng; Guanghui Xu; Jinbao Xue; Zhijiang xu; Zheng Fang; Shuai LI; Qibin Liu; Xiaoxue Li; Zhuoyu Li; Yangyu Tao; Fei Gao; Cheng Jiang; Bochao Wang; Kai Liu; Jianchen Zhu; Wai Lam; Bo Zhou; Di Wang;
282	Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we observe that tokens evolve toward semantic fixing points, making further processing redundant.	Yujie Chen; Tailai Chen; Yifeng Gao; Zoe Wanying He; Yijue Xu; Shaobo Wang; Linfeng Zhang;
283	MTAVG-Bench: A Diagnostic Benchmark for Multi-Talker Dialogue-Centric Audio-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, structural failures in generated multi-talker dialogue videos, such as identity drift, unnatural turn transitions, and audio-visual misalignment, cannot be effectively diagnosed. To address this issue, we introduce MTAVG-Bench, a failure-driven diagnostic benchmark for multi-talker dialogue-centric audio-video generation.	Yanghao Zhou; Haitian Li; Rexar Lin; Heyan Huang; Jinxing Zhou; Changsen Yuan; Tian Lan; Ziqin Zhou; Yudong Li; Jiajun Xu; Jingyun Liao; YiMing Cheng; Xuefeng Chen; Xian-Ling Mao; Yousheng Feng;
284	CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CodeJudgeBench, a benchmark explicitly designed to evaluate LLM-as-a-Judge models across three critical coding tasks: code generation, code repair, and unit test generation.	Hongchao Jiang; Yiming Chen; Yushi Cao; Hung-yi Lee; Robby T. Tan;
285	MAGMA: A Multi-Graph Based Agentic Memory Architecture for AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose MAGMA, a multi-graph agentic memory architecture that represents each memory item across orthogonal semantic, temporal, causal, and entity graphs.	Dongming Jiang; Yi Li; Guanpeng Li; Bingzhe Li;
286	Parallel Test-Time Scaling for Latent Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For sampling, we introduce two uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive Gaussian Noise.	Runyang You; Yongqi Li; Meng Liu; Wenjie Wang; Liqiang Nie; Wenjie Li;
287	LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods struggle with a fundamental trade-off: prioritizing input-language consistency severely hampers reasoning quality, while prioritizing reasoning often leads to unintended language drift toward English. We address this challenge with LANG, a novel framework that leverages language-conditioned hints to guide exploration in non-English reasoning tasks.	Yuchun Fan; Bei Li; Peiguang Li; Yilin Wang; Yongyu Mu; Jian Yang; Xin Chen; Rongxiang Weng; Jingang Wang; Xunliang Cai; JingBo Zhu; Tong Xiao;
288	ContrastKV: Robust KV Cache Eviction Via Contrastive Signal Fusion for Multi-Query Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ContrastKV, a robust query-agnostic KV cache eviction algorithm for multi-query generalization.	Xingchi Chen; Peiyuan Zong; Ziqiang Gao; Qing Li; Yong Jiang; Fa Zhu; Hui Li;
289	BrowseComp-Plus: A Fair and Disentangled Evaluation Benchmark for Deep Search Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce BrowseComp-Plus, a benchmark derived from BrowseComp that employs a fixed, human-verified corpus, enabling controlled retrieval for deep search agents.	Zijian Chen; Xueguang Ma; Shengyao Zhuang; Ping Nie; Kai Zou; Sahel Sharifymoghaddam; Andrew Liu; Joshua Green; Kshama Patel; Ruoxi Meng; Mingyi Su; Yanxi Li; Haoran Hong; Xinyu Shi; Xuye Liu; Hosna Oyarhoseini; Nandan Thakur; Crystina Zhang; Luyu Gao; Wenhu Chen; Jimmy Lin;
290	Resonating with RoPE: Spectral Quantization for High-Fidelity Key Cache Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The linear growth of KV cache bottlenecks long-context LLMs, yet RoPE-induced oscillations complicate Key cache quantization. To address this issue, we propose SpectrumQuant, a frequency-domain framework that utilizes the Discrete Cosine Transform (DCT) to convert these oscillations into sparse spectral representations.	Xuefei Wang; Haoyu Tang; Tianyuan Liang; Zhibin Wang; Yupeng Hu; Weili Guan;
291	Long-Chain Reasoning Distillation Via Adaptive Prefix Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This mismatch leads to a gap between the provided supervision signal and the learning capacity of the student model. To address this challenge, we propose Prefix-ALIGNment distillation (P-ALIGN), a framework that fully exploits teacher CoTs for distillation through adaptive prefix alignment.	Zhenghao Liu; Zhuoyang Wu; Xinze Li; Yukun Yan; Shuo Wang; Zulong Chen; Yu Gu; Ge Yu; Maosong Sun;
292	LAFaCT: Attribution-based Localization and Focused Sequential Analysis of Fact-Critical Tokens for Hallucination Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While white-box hallucination detection methods that leverage hidden states prevail, they fail to identify and focus on fact-critical information when analyzing token sequences. To address this, we propose LAFaCT, a Localize-then-Analyze detection framework.	Xin Wang; Jiahao Li; Licheng Zhang; Zhendong Mao;
293	Language Models Don’t Know What You Want: Evaluating Personalization in Deep Research Needs Real Users Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We reveal nine nuanced errors of personalized DR undetectable by our LLM judges, and we study qualitative feedback to form lessons for future DR design.	Nishant Balepur; Malachi Hamada; Varsha Kishore; Sergey Feldman; Amanpreet Singh; Pao Siangliulue; Joseph Chee Chang; Eunsol Choi; Jordan Lee Boyd-Graber; Aakanksha Naik;
294	Explicit Trait Inference for Multi-Agent Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Explicit Trait Inference (ETI), a psychologically grounded method for improving coordination.	Suhaib Abdurahman; Etsuko Ishii; Katerina Margatina; Divya Bhargavi; Monica Sunkara; Yi Zhang;
295	Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing benchmarks primarily evaluate these systems on synthetic speech and single-turn tasks, leaving multi-turn conversational ability underexplored. We introduce Audio MultiChallenge an open-source benchmark to evaluate these systems under natural multi-turn interaction patterns.	Advait Gosai; Tyler Vuong; Utkarsh Tyagi; Steven Li; Wenjia You; Miheer Bavare; Arda Uçar; Zhongwang Fang; Brian Jang; Bing Liu; Yunzhong He;
296	Empirical Analysis of Decoding Biases in Masked Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reveal that prevalent uncertainty-based decoding strategies induce two decoding biases in MDMs: rigid boundary bias and trivial token bias.	Pengcheng Huang; Tianming Liu; Zhenghao Liu; Yukun Yan; Shuo Wang; Tong Xiao; Zulong Chen; Maosong Sun;
297	GIFT: Guided Fine-Tuning and Transfer for Enhancing Instruction-Tuned Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose GIFT (Guided Fine-Tuning and Transfer), a simple and efficient framework that incorporates instruction-level guidance into task adaptation.	Zhiwen Ruan; Yichao Du; Jianjie Zheng; Longyue Wang; Yun Chen; Peng Li; Jinsong Su; Yang Liu; Guanhua Chen;
298	Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by this gap, we demonstrate that steering vectors derived solely from text-only LLM backbones can effectively guide and enhance their multimodal counterparts, revealing a novel cross-modal transfer that enables reuse of existing interpretability tools. Using community-standard methods—Sparse Autoencoders (SAE), Mean Shift, and Linear Probing—we validate this transfer effect across diverse MLLM architectures and visual reasoning tasks.	Woody Haosheng Gan; Deqing Fu; Julian Asilis; Ollie Liu; Vatsal Sharan; Robin Jia; Willie Neiswanger;
299	IntrAgent: An LLM Agent for Content-Grounded Information Retrieval Through Literature Review Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Scientific research relies on accurate information retrieval from literature to support analytical decisions. In this work, we introduce a new task, INformation reTRieval through literAture reVIEW (IntraView), which aims to automate fine-grained information retrieval faithfully grounded in the provided content in response to research-driven queries, and propose IntrAgent, an LLM-based agent that addresses this challenging task.	Fengbo Ma; Zixin Rao; Xiaoting Li; Zhetao Chen; Hongyue Sun; Yiping Zhao; Xianyan Chen; Zhen Xiang;
300	Beyond Examples: Towards Automated Thought-level In-Context Reasoning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, traditional ICL struggles with complex reasoning mainly due to superficial, example-level implicit imitation. To address these limitations, we introduce ThoughtICR, an automated Thought-level In-Context Reasoning paradigm that shifts from surface-level examples to more guidance-oriented thought patterns.	Jinyang Wu; Mingkuan Feng; Shuai Zhang; Feihu Che; Zhengqi Wen; Chonghua Liao; Ling Yang; Haoran Luo; Zheng Lian; Jianhua Tao;
301	SPARK: Strategic Policy-Aware Exploration Via Dynamic Branching for Long-Horizon Agentic Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such attempts inherently waste substantial computation budget on trivial steps while failing to guarantee sample quality. To address this, we propose SPARK (Strategic Policy-Aware exploRation via Key-state dynamic branching), a novel framework that selectively branches at critical decision states for resource-efficient exploration.	Jinyang Wu; Shuo Yang; Yuhao Shen; Shuai Zhang; Zhengqi Wen; Jianhua Tao;
302	PDTrim: Targeted Pruning for Prefill-Decode Disaggregation in Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a pruning method that is highly integrated with PD disaggregation, enabling more precise pruning of blocks.	Hao Zhang; Lyu Mengsi; Zhuo Chen; Yulong Ao; Yonghua Lin;
303	TrendFact: A Benchmark Towards Hotspot Perception in Automatic Fact-Checking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Hotspot Perception Ability (HPA), the capacity to dynamically allocate reasoning resources based on social impact, is essential to mitigate this risk, existing benchmarks lack the social metadata and evaluation framework to meet this urgent evaluation needs, thereby hindering the advancement of these AFC systems. To bridge this gap, we introduce TrendFact, the first benchmark capable of evaluating HPA and three fact-checking tasks.	Xiaocheng Zhang; Xi Wang; Yifei Lu; Jianing Wang; Zhuangzhuang Ye; Mengjiao Bao; Peng Yan; Xiaohong Su;
304	Efficient Self-Evaluation for Diffusion Language Models Via Sequence Regeneration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose DiSE, a simple yet effective self-evaluation confidence quantification method for dLLMs.	Linhao Zhong; Linyu Wu; Wen Wang; Yuling Xi; Chenchen Jing; Jiaheng Zhang; Hao Chen; Chunhua Shen;
305	Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose EvoToken-DLM, a novel diffusion-based language modeling approach that replaces hard binary masks with evolving soft token distributions.	Linhao Zhong; Linyu Wu; Bozhen Fang; Tianjian Feng; Chenchen Jing; Wen Wang; Jiaheng Zhang; Hao Chen; Chunhua Shen;
306	ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ViDoRe V3, a comprehensive multimodal RAG benchmark featuring multi-type queries over visually rich document corpora.	António Loison; Quentin Macé; Antoine Edy; Victor Xing; Tom Balough; Gabriel de Souza P. Moreira; Bo Liu; Manuel Faysse; Celine Hudelot; Gautier Viaud;
307	Automated Creativity Evaluation of Language Models Across Open-Ended Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most existing creativity metrics are tightly coupled to specific tasks, embedding domain assumptions into the evaluation process, and limiting scalability and generality. To address this gap, we introduce an automated, domain-agnostic framework for quantifying LLM creativity across open-ended tasks.	Tan Min Sen; Zachary Choy Kit Chun; Syed Ali Redha Alsagoff; Nadya Yuki Wangsajaya; Banerjee Mohor; Swaagat Bikash Saikia; Alvin Chan;
308	GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.	Diganta Misra; Nizar Islah; Victor May; Brice Rauby; Zihan Wang; Justine Gehring; Antonio Orvieto; Muawiz Sajjad Chaudhary; Eilif B. Muller; Irina Rish; Samira Ebrahimi Kahou; Massimo Caccia;
309	Stop When Enough: Adaptive Early-Stopping for Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present REFRAIN ( ̲REFlective- ̲Redundancy for ̲Adaptive ̲INference), a training-free framework that adaptively determines when to stop reasoning to mitigate overthinking.	Renliang Sun; Wei Cheng; Dawei Li; Haifeng Chen; Wei Wang;
310	Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present an empirical analysis of attention in dLLMs and show that tokens attending more strongly to revealed context exhibit greater generation stability and play a critical role in reasoning.	Jia Deng; Junyi Li; Xin Zhao; Jinpeng Wang; Hongyu Lu; Ji-Rong Wen;
311	The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations.	Renmiao Chen; Yida Lu; Shiyao Cui; Xuan Ouyang; Victor Shea-Jay Huang; Shumin Zhang; Chengwei Pan; Han Qiu; Minlie Huang;
312	R1-RE: Cross-Domain Relation Extraction with RLVR Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the workflow of human annotators, we reframe RE as a reasoning task guided by annotation guidelines and introduce R1-RE, the first reinforcement learning with verifiable reward (RLVR) framework for RE tasks.	Runpeng Dai; Tong Zheng; Run Yang; Kaixian Yu; Hongtu Zhu;
313	Powering Verifiable Learning Via Automated Evolutionary Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce an evolutionary, task-agnostic, strategy-guided, executably-checkable data synthesis framework that, from minimal seed supervision, jointly synthesizes problems, diverse candidate solutions, and verification artifacts, and iteratively discovers strategies via a consistency-based evaluator that enforces agreement between human-annotated and strategy-induced checks.	He Du; Bowen Li; Aijun Yang; Siyang He; Qipeng Guo; Kai Chen; Dacheng Tao;
314	Standard-to-Dialect Transfer Trends Differ Across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems where speech first gets automatically transcribed and then further processed by a text model.	Verena Blaschke; Miriam Winkler; Barbara Plank;
315	RouteMoA: Dynamic Routing Without Pre-Inference Boosts Efficient Mixture-of-Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: They also lack model selection criteria and struggle with large model pools, where full inference is costly and can exceed context limits. To address this, we propose RouteMoA, an efficient mixture-of-agents framework with dynamic routing.	Jize Wang; Han Wu; Zhiyuan You; Yiming Song; Yijun Wang; Zifei Shan; Yining Li; Songyang Zhang; Xinyi Le; Cailian Chen; Xinping Guan; Dacheng Tao;
316	CUB: Benchmarking Context Utilisation Techniques for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we develop CUB (Context Utilisation Benchmark) – the first comprehensive benchmark designed to help diagnose CMTs under diverse noisy context conditions within retrieval-augmented generation (RAG).	Lovisa Hagström; Youna Kim; Haeun Yu; Sang-goo Lee; Richard Johansson; Hyunsoo Cho; Isabelle Augenstein;
317	Glyph: Scaling Context Windows Via Visual-Text Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce Glyph, a framework that renders long texts into compact visual pages and processes them with a vision-language model (VLM), allowing a fixed context window to cover substantially more text.	Jiale Cheng; Yusen Liu; Xinyu Zhang; Yulin Fei; Wenyi Hong; Ruiliang Lyu; Weihan Wang; Zhe Su; Xiaotao Gu; Xiao Liu; Yushi Bai; Jie Tang; Hongning Wang; Minlie Huang;
318	An Exploration of Mamba for Speech Self-Supervised Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Mamba has demonstrated strong performance in language modeling, its potential as a speech self-supervised learning (SSL) model remains underexplored, with prior studies limited to isolated tasks. To address this, we explore Mamba-based HuBERT models as alternatives to Transformer-based SSL architectures.	Tzu-Quan Lin; Heng-Cheng Kuo; Tzu-Chieh Wei; Hsi-Chun Cheng; Chun Wei Chen; Hsien-Fu Hsiao; Yu Tsao; Hung-yi Lee;
319	CT-FineBench: A Diagnostic Fidelity Benchmark for Fine-Grained Evaluation of CT Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conventional evaluation metrics offer only coarse measures of lexical overlap or entity matching and fail to reflect the granular diagnostic accuracy required for clinical use. To address this gap, we propose CT-FineBench, a benchmark built from CT-RATE and Merlin to evaluate the fine-grained factual consistency of CT reports, constructed from CT-RATE and Merlin.	Ruifeng Yuan; Wanxing Chang; Weiwei Cao; Bowen Shi; Zhongyu Wei; Ling Zhang; Jianpeng Zhang;
320	LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose LoVeC (Long-form Verbalized Confidence), a novel reinforcement learning (RL)–based method that trains LLMs to append an on-the-fly numerical confidence score to each generated statement during long-form generation.	Caiqi Zhang; Xiaochen Zhu; Chengzu Li; Nigel Collier; Andreas Vlachos;
321	What Deserves Memory: Adaptive Memory Distillation for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by cognitive ideas, we propose Nemori, an adaptive memory distillation framework that casts the assessment of the experience’s future utility as a matter of predictability.	Wenquan Ma; Jiayan Nan; WenLong Wu;
322	CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, we highlight a critical flaw: the inherent general adaptability of LLMs allows them to easily bypass selective unlearning by rapidly relearning or repurposing their general capabilities for harmful tasks. To address this fundamental limitation, we propose a paradigm shift: instead of selective removal, we advocate for inducing model collapse, effectively forcing the model to ”unlearn everything”, specifically in response to updates characteristic of malicious adaptation.	Biao Yi; Tiansheng Huang; Baolei Zhang; Tong Li; Lihai Nie; Zheli Liu; Li Shen;
323	NeuReasoner: Towards Explainable, Controllable, and Unified Reasoning Via Mixture-of-Neurons Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge these gaps, we conduct an in-depth white-box analysis, identifying key neurons (Mixture of Neurons, MoN) and their fluctuation patterns associated with distinct failures. Building upon these insights, we propose NeuReasoner, an explainable, controllable, and unified reasoning framework driven by MoN.	Haonan Dong; Kehan Jiang; Haoran Ye; Wenhao Zhu; Zhaolu Kang; Guojie Song;
324	Musical Score Understanding Benchmark: Evaluating Large Language Models’ Comprehension of Complete Musical Scores Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Musical Score Understanding Benchmark (MSU-Bench), a human-curated benchmark for score-level musical understanding across textual (ABC notation) and visual (PDF) modalities.	Congren Dai; Yue Yang; Krinos Li; Huichi Zhou; Shijie Liang; Zhang Bo; Enyang Liu; Ge Jin; Hongran An; Haosen Zhang; Peiyuan Jing; KinHei Lee; Zhenxuan Zhang; Xiaobing Li; Maosong Sun;
325	Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, robust concept representation learning is hindered by two key challenges: (i) clinically important cross-type dependencies (e. g. , diagnosis-medication and medication-procedure relations) are often missing or incomplete in existing ontology resources, limiting the ability to model complex EHR patterns; and (ii) rich clinical semantics are often missing from structured resources, and even when available as text, are difficult to integrate with KG structure for representation learning. To address these challenges, we present MedCo, an LLM-empowered graph learning framework for medical concept representation.	Mohsen Nayebi Kerdabadi; Arya Hadizadeh Moghaddam; Chen Chen; Dongjie Wang; Zijun Yao;
326	Focusing Condition: Inference-Time Self-Contrastive Steering Elicits Better Conditional Text Embeddings in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose an inference-time, plug-and-play Self-Contrastive Steering (SCS) method that constructs unconditional general text embeddings and uses them to refine conditional text embeddings, making them more focused on the target condition.	Zifeng Cheng; Lingyun Qian; Zhiwei Jiang; Cong Wang; Yafeng Yin; Fei Shen; Ao Zhou; Qing Gu;
327	Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on FCaps, we propose CLSP, a contrastive language-speech pre-trained model that integrates global and fine-grained supervision, enabling unified representations across multiple granularities.	Yifan Yang; Bing Han; Hui Wang; Wei Wang; Ziyang Ma; Long Zhou; Zengrui Jin; Guanrou Yang; Tianrui Wang; Xu Tan; Xie Chen;
328	LLM As A Risk Manager: LLM Semantic Filtering for Lead–Lag Trading in Prediction Markets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prediction markets provide a unique setting where event-level time series are directly tied to natural-language descriptions, yet discovering robust lead–lag relationships remains challenging due to spurious statistical correlations. We propose a hybrid two-stage causal screener to address this challenge: (i) a statistical stage that uses Granger causality to identify candidate leader–follower pairs from market-implied probability time series, and (ii) an LLM-based semantic stage that re-ranks these candidates by assessing whether the proposed direction admits a plausible economic transmission mechanism based on event descriptions.	Sumin Kim; Minjae Kim; Jihoon Kwon; Yoon Kim; Oscar Levy; Alejandro Lopez-Lira; Yongjae Lee; Chanyeol Choi;
329	ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose ImCoref-CeS, a novel framework that integrates an enhanced supervised model with LLM-based reasoning.	Kangyang Luo; Yuzhuo Bai; Shuzheng Si; Cheng Gao; Zhitong Wang; Yingli Shen; Wenhao Li; Zhu Liu; Yufeng Han; Jiayi Wu; Cunliang Kong; Maosong Sun;
330	Controllable Contamination Detection for Reliable LLM Evaluation with Statistical Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, contaminated data may be mistakenly retained, leading to unreliable evaluation. To address this challenge, we propose FTD (FDR-controlled Training Data detection), a principled framework that detects and filters contaminated evaluation data while providing a statistical guarantee: the proportion of contaminated samples mistakenly retained as clean, the false discovery rate (FDR), is provably controlled below a user-specified threshold.	Zheng Zhang; Qi Liu; Siyuan Liang; Ning Li; Zirui Hu; Weibo Gao; Rui Li; Zhenya Huang; Leszek Rutkowski; Baosheng Yu; Dacheng Tao;
331	MemBuilder: Reinforcing LLMs for Long-Term Memory Construction Via Attributed Dense Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MemBuilder, a reinforcement learning framework that trains models to orchestrate multi-dimensional memory construction with attributed dense rewards.	Zhiyu Shen; Ziming Wu; Fuming Lai; Shaobing Lian; Yanghui Rao;
332	UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: UNIKIE-BENCH consists of two complementary tracks: a constrained-category KIE track with scenario-predefined schemas that reflect practical application needs, and an open-category KIE track that extracts any key information that is explicitly present in the document.	Yifan Ji; Zhipeng Xu; Zhenghao Liu; Zulong Chen; Qian Zhang; Zhibo Yang; Junyang Lin; Yu Gu; Ge Yu; Maosong Sun;
333	IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose IF-RewardBench, a comprehensive meta-evaluation benchmark for instruction-following that covers diverse instruction and constraint types.	Bosi Wen; Yilin Niu; Cunxiang Wang; Xiaoying Ling; Ying Zhang; Pei Ke; Hongning Wang; Minlie Huang;
334	IF-CRITIC: Towards A Fine-Grained LLM Critic for Instruction-Following Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose IF-CRITIC, an LLM critic for fine-grained, efficient, and reliable instruction-following evaluation.	Bosi Wen; Yilin Niu; Cunxiang Wang; Pei Ke; Xiaoying Ling; Ying Zhang; Aohan Zeng; Hongning Wang; Minlie Huang;
335	Value of Information: A Framework for Human–Agent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches either rely on brittle confidence thresholds that require task-specific tuning, or fail to account for the varying stakes of different decisions. We introduce a decision-theoretic framework that resolves this trade-off through the Value of Information (VoI), enabling agents to dynamically weigh the expected utility gain from asking questions against the cognitive cost imposed on users.	Yijiang River Dong; Tiancheng Hu; Zheng Hui; Caiqi Zhang; Ivan Vulić; Andreea Bobu; Nigel Collier;
336	Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates the scaling behavior of Large Language Model (LLM) reinforcement learning post-training, focusing on mathematical reasoning.	Zelin Tan; Hejia Geng; Xiaohang Yu; Mulei Zhang; Guancheng Wan; Yifan Zhou; Qiang He; Xiangyuan Xue; Heng Zhou; Yutao Fan; Zhong-Zhi Li; Zaibin Zhang; Guibin Zhang; Chen Zhang; Zhenfei Yin; Philip Torr; Lei Bai;
337	Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs, but leveraging it is challenging due to its ambiguity, noise and massive volume, which hinders efficient preference alignment. To overcome these hurdles, we introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data.	Tianjun Wei; Huizhong Guo; Yingpeng Du; Zhu Sun; Huang Chen; Dongxia Wang; Jie Zhang;
338	CloneMem: Benchmarking Long-Term Memory for AI Clones Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CloneMem, a benchmark for evaluating long-term memory in AI Clone scenarios grounded in non-conversational digital traces, including diaries, social media posts, and emails, spanning one to three years.	Sen Hu; Zhiyu Zhang; Yuxiang Wei; Xueran Han; Zhenheng Tang; Ronghao Chen; Huacan Wang;
339	NL ⇒ Schedule: Evaluate Multitask Scheduling Capability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our evaluation of nine state-of-the-art LLMs reveals the limitations of different LLMs in procedure grounding and the strengths of advanced LLMs in global planning via local analysis. To address these shortcomings, we propose Mans, a novel multi-agent framework.	Wenrui Liao; Weihong Du; Yi Li; Hongru Liang; Wenqiang Lei;
340	Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing resources often fail to provide extensive reasoning problems with coherent CoT processes distilled from multiple teacher models, and do not account for multifaceted properties describing the internal characteristics of CoTs. To address these challenges, we introduce OmniThought, a large-scale dataset featuring 2 million CoT processes generated and validated by multiple powerful LRMs.	Wenrui Cai; Chengyu Wang; Junbing Yan; Jun Huang; Xiangzhong Fang;
341	GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: If a model can think like a human, can we influence its cognitive-stage decisions so that it proactively completes a jailbreak? To validate this idea, we propose GAMBIT (Gamified Adversarial Multimodal Breakout via Instructional Traps), a novel multimodal jailbreak framework that decomposes and reassembles harmful visual semantics, then constructs a gamified scene that drives the model to explore, reconstruct intent, and answer as part of winning the game.	Xiangdong Hu; Yangyang Jiang; Qin Hu; Xiaojun Jia;
342	Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: On the other hand, human evaluations, despite their richness, remain costly, inconsistent, and difficult to scale. We tackle this critical barrier by proposing a Dual-Axis Generative Reward Model, which is trained to understand complex interaction dynamics using a detailed taxonomy and an annotated dataset, produces a single score and, crucially, provides separate evaluations for semantic quality and interaction timing.	Yifu Chen; Shengpeng Ji; Zhengqing Liu; Qian Chen; Wen Wang; Ziqing Wang; Yangzhuo Li; Tianle Liang; Zhou Zhao;
343	Newspaper Eat Means Not Tasty: A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces CodedLang, a dataset of 7,744 Chinese Google Maps reviews, including 900 reviews with span-level annotations of coded language.	Ruyuan Wan; Changye Li; Ting-Hao Kenneth Huang;
344	Can AI Be A Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and The Future Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we synthesize techniques for (i) peer review generation, including fine-tuning strategies, agent-based systems, RL-based methods, and emerging paradigms to enhance generation; (ii) after-review tasks including rebuttals, meta-review and revision aligned to reviews; and (iii) evaluation methods spanning human-centered, reference-based, LLM-based and aspect-oriented.	Sihong Wu; Owen Jiang; Yilun Zhao; Tiansheng Hu; Yiling Ma; Kaiyan Zhang; Manasi Patwardhan; Arman Cohan;
345	Improving Long-Context Translation Via Self-Supervised Dual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose LongDu, a self-supervised post-training framework that improves long-document translation reliability via round-trip consistency.	Shanbo Cheng; Shuaijie She; Yu Bao; Jianbing Zhang; Jiajun Chen; Shujian Huang;
346	Chunks As Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the effectiveness of such approaches is often limited by the low diversity and factual inconsistencies in the generated data. To address these challenges, we propose LongMab, a novel framework that leverages a Multi-Armed Bandit (MAB) rollout strategy to identify the most informative chunks from the given long context for sampling high-quality and diverse responses and constructing preference data pairs for Direct Preference Optimization (DPO) training.	Shaohua Duan; Pengcheng Huang; Xinze Li; Zhenghao Liu; Xiaoyuan Yi; Yukun Yan; Shuo Wang; Yu Gu; Ge Yu; Maosong Sun;
347	SciMDR: Advancing Scientific Multimodal Document Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Constructing scientific multimodal document reasoning datasets for foundation model training involves an inherent trade-off among scale, faithfulness, and realism. To address this challenge, we introduce the synthesize-and-reground framework, a two-stage pipeline comprising: (1) Claim-Centric QA Synthesis, which generates faithful, isolated QA pairs and reasoning on focused segments, and (2) Document-Scale Regrounding, which programmatically re-embeds these pairs into full-document tasks to ensure realistic complexity.	Ziyu Chen; Yilun Zhao; Chengye Wang; Rilyn R. Han; Manasi Patwardhan; Arman Cohan;
348	BracketRank: Large Language Model Document Ranking Via Reasoning-based Competitive Elimination Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Third, ranking results depend heavily on initial document order, leading to inconsistent performance. We introduce BracketRank, a reasoning-driven competitive elimination framework that addresses these challenges through systematic group competition.	Abdelrahman Abdallah; Mohammed Ali; Bhawna Piryani; Adam Jatowt;
349	SDiaReward: Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current methods struggle with two critical gaps: the modality gap, involving prosody and emotion, and the colloquialness gap, distinguishing written scripts from natural speech. To address these challenges, we introduce SDiaReward, an end-to-end multi-turn reward model trained on SDiaReward-Dataset, a novel collection of episode-level preference pairs explicitly targeting these gaps.	Jingyu Lu; Yuhan Wang; Fan Zhuo; Xize Cheng; Changhao Pan; Xueyi Pu; Yifu Chen; Chenyuhao Wen; Tianle Liang; Zhou Zhao;
350	Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate knowledge forgetting in large language models with a focus on its generalisation—ensuring that models forget not only specific training samples but also related implicit knowledge.	Huazheng Wang; Yongcheng Jing; Haifeng Sun; Yingjie Wang; Jingyu Wang; Jianxin Liao; Dacheng Tao;
351	LinkQA: Synthesizing Diverse QA from Multiple Seeds Strongly Linked By Knowledge Points Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The advancement of large language models (LLMs) struggles with the scarcity of high-quality, diverse training data. To address this limitation, we propose LinkSyn, a KP-graph-based synthesis framework that for the first time enables flexible control over discipline and difficulty distributions while balancing KP coverage and popularity.	Xuemiao Zhang; Can Ren; Chengying Tu; Rongxiang Weng; Hongfei Yan; Jingang Wang; Xunliang Cai;
352	LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The rapid advancement of large language models (LLMs) has not been matched by their evaluation in low-resource languages, especially Southeast Asian languages like Lao. To fill this gap, we introduce LaoBench, the first large-scale, high-quality, and multidimensional benchmark for assessing LLM language understanding and reasoning in Lao.	Jian Gao; Richeng Xuan; Zhaolu Kang; Dingshi Liao; Wenxin Huang; Zongmou Huang; Yangdi Xu; Bowen Qin; Zheqi He; Xi Yang; Changjinli; Yonghua Lin;
353	AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, manually designing optimal communication topologies is labor-intensive, while automated expansion methods often result in bloated structures with redundant agents, leading to excessive token consumption. To address this problem, we introduce AgentSlimming, a plug-and-play compression framework for graph-structured multi-agent workflows.	Yulang Chen; Haoxuan Peng; Jinyan Liu; Zichen Wen; Dongrui Liu; Linfeng Zhang;
354	LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge the gap, we propose LLM-ForcedAligner, reformulating FA as a slot-filling paradigm: timestamps are treated as discrete indices, and special timestamp tokens are inserted as slots into the transcript.	Bingshen Mu; Xian Shi; Xiong Wang; Hexin Liu; Jin Xu; Lei Xie;
355	Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce MathIF, a dedicated benchmark for evaluating instruction-following in mathematical reasoning tasks.	Tingchen Fu; Yafu Li; Jiawei Gu; Xiaoye Qu; Yu Cheng;
356	GAM: Hierarchical Graph-based Agentic Memory for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to fluid narrative evolution. To address this, we propose GAM, a hierarchical Graph-based Agentic Memory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention.	Zhaofen Wu; Hanrong Zhang; Fulin Lin; Wujiang Xu; Xinran Xu; Yankai Chen; Henry Peng Zou; Shaowen Chen; Weizhi Zhang; Xue Liu; Philip S. Yu; Hongwei Wang;
357	Bias Fitting to Mitigate Length Bias of Reward Model in RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To accurately model the intricate nature of length bias and facilitate more effective bias mitigation, we propose FiMi-RM (Bias Fitting to Mitigate Length Bias of Reward Model), a framework that autonomously learns and corrects underlying bias patterns.	Kangwen Zhao; Jianfeng Cai; Jinhua Zhu; Ruopei Sun; Dongyun Xue; Wengang Zhou; Li Li; Houqiang Li;
358	Success and Cost Elicit Convention Formation for Efficient Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a method to train large multimodal models to form conventions, enabling efficient communication.	Saujas Vaduguru; Yilun Hua; Yoav Artzi; Daniel Fried;
359	MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing MCP research covers few servers, depends on costly manual curation, and lacks training support, hindering progress toward real-world deployment. To overcome these limitations, we introduce MCP-Flow, an automated web-agent-driven pipeline for large-scale server discovery, data synthesis, and model training.	WenHao Wang; Peizhi Niu; Zhao Xu; Zhaoyu Chen; Jian Du; Yaxin Du; Xianghe Pang; Keduan Huang; Yanfeng Wang; Qiang Yan; Siheng Chen;
360	Read As Human: Compressing Context Via Parallelizable Close Reading and Skimming Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their deployment in long-context scenarios is hindered by two challenges: computational inefficiency and redundant information. We propose RAM (Read As HuMan), a context compression framework that adopts an adaptive hybrid reading strategy, to address these challenges.	Jiwei Tang; Shilei Liu; Zhicheng Zhang; Qingsong Lv; Runsong Zhao; Tingwei Lu; Langming Liu; Haibin Chen; Yujin Yuan; Hai-Tao Zheng; Wenbo Su; Bo Zheng;
361	When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the mechanisms that VLMs use to resolve cross-modal conflicts by introducing WHOOPS-AHA!	Francesco Ortu; Zhijing Jin; Diego Doimo; Alberto Cazzaniga;
362	SchemaRAG: Dynamic Large Schema Reduction for LLM-driven Structured Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SchemaRAG, a retrieval-augmented generation (RAG) framework that dynamically prunes the output schema space for schema-conditioned information extraction tasks by leveraging schema metadata and few-shot examples (when available).	Sin Yu Bonnie Ho; Arlie Coles; Erik Larsson; Eric Marshall; Nathan Bodenstab; Paul Vozila;
363	D2Plan: Dual-Agent Dynamic Global Planning for Complex Retrieval-Augmented Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they face two critical failure modes as the accumulating context becomes flooded with both crucial evidence and irrelevant information: (1) ineffective search chain construction that produces incorrect queries or omits retrieval of critical information, and (2) reasoning hijacking by peripheral evidence that causes models to misidentify distractors as valid evidence. To address these challenges, we propose D²Plan, a Dual-agent Dynamic global Planning paradigm for complex retrieval-augmented reasoning.	Kangcheng Luo; Tinglang Wu; Yansong Feng;
364	AgentGL: Towards Agentic Graph Learning with LLMs Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet existing agentic frameworks treat external information as unstructured text and fail to leverage the topological dependencies inherent in real-world data. To bridge this gap, we introduce Agentic Graph Learning (AGL), a paradigm that reframes graph learning as an interleaved process of topology-aware navigation and LLM-based inference.	Yuanfu Sun; Kang Li; Dongzhe Fan; Jiajin Liu; Qiaoyu Tan;
365	Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs Via Interpretable Bi-Causal Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Vision-Language Introspection (VLI), a training-free inference framework that simulates a metacognitive self-correction process.	Shuliang Liu; Songbo Yang; Dong Fang; Sihang Jia; Yuqi Tang; Lingfeng Su; Ruoshui Peng; Yibo Yan; Xin Zou; Xuming Hu;
366	The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: ** In this work, we present a comprehensive evaluation of dLLMs (e. g. , LLaDA, Dream) across two distinct agentic paradigms: Embodied Agents (requiring long-horizon planning) and Tool-Calling Agents (requiring precise formatting).	Qingyu Lu; Liang Ding; Kanjian Zhang; Jinxia Zhang; Dacheng Tao;
367	Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose SkillNav, a modular framework that introduces structured, skill-based reasoning into Transformer-based VLN agents.	Tianyi Ma; Yue Zhang; Zehao Wang; Parisa Kordjamshidi;
368	The Imperfective Paradox in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate the Imperfective Paradox, a logical phenomenon where the past progressive aspect entails event realization for activities (e. g. , running → ran) but not for accomplishments (e. g. , building ↛ built). We introduce ImperfectiveNLI, a diagnostic dataset designed to probe this distinction across diverse semantic classes.	Bolei Ma; Yusuke Miyao;
369	Black-Box Membership Inference Attacks for Video Training Data in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Besides, while some methods mitigate this limitation by capturing relationships between frames, they require a model logit-accessible setting and are impractical in realistic black-box scenarios. To address these challenges, we propose a black-box MIA framework, named VideoMIA, that can provide reliable evidence of specific video data usage for training MLLMs.	Jinrui Wang; Zhenfeng Gao; Wendan Wang; Huili Wang; Zichen Qin; Linjie Zhu; Hongke Fu; Shangguang Wang; Tao Qi;
370	Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite RLVR-based optimization, existing methods still suffer from coarse-grained supervision, reward hacking, high training costs, and poor generalization. To address these issues, we propose the Graph Reasoning Paradigm (GRP), which realizes structured and symbolic reasoning, implemented via graph-structured representations with step-level cognitive labels.	Runxuan Liu; Xianhao Ou; Xinyan Ma; Jiyuan Wang; Jiafeng Liang; Jiaqi Li; Tao He; Zheng Chu; Rongchuan Mu; Zekun Wang; Baoxin Wang; Dayong Wu; Ming Liu; Shijin Wang; Guoping Hu; Bing Qin;
371	Verbal-R3: Verbal Reranker As The Missing Bridge Between Retrieval and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our empirical investigation reveals the potential of Verbal Annotations to substantially enhance the LLM’s ability to generate accurate, contextually-grounded responses. Motivated by this finding, we introduce Verbal-R3, a novel agentic RAG framework that consists of a Generator and a Verbal Reranker.	Sangkwon Park; Donghun Kang; Jisoo Mok; Sungroh Yoon;
372	Projecting Out The Malice: A Global Subspace Approach to LLM Detoxification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These highlight the challenge of identifying robust toxic subspace and removing them. Therefore, we propose GLOSS (GLobal tOxic Subspace Suppression), a lightweight method that mitigates toxicity by identifying and eliminating this global subspace from FFN parameters.	Zenghao Duan; Zhiyi Yin; Zhichao Shi; Liang Pang; Shaoling Jing; Zihe Huang; Jiayi Wu; Yu Yan; Jingcheng Deng; Huawei Shen; Xueqi Cheng;
373	Polymorphic Universal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present the first systematic study of Compute Distribution Skew, identifying it as the primary driver of extrapolation failure.	Yilong Chen; Zitian Gao; Yihao Xiao; Jason Klein Liu; Xinyu Yang; Yifan Luo; Haoming Luo; Zhengmao Ye; Tingwen Liu; Ran Tao; Bryan Dai;
374	WebAggregator: Enhancing Compositional Reasoning Capabilities of Deep Research Agent Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current agentic systems are often retrieval-heavy but reasoning-light, where success is predominantly determined by simple entity-seeking rather than the multi-step aggregation of scattered evidence. To address this, we propose a data synthesis pipeline WebAggregator, designed to shift the agentic paradigm from retrieval-centric to compositional aggregation.	Rui Wang; Ce Zhang; Jun-Yu Ma; Jianshu Zhang; Hongru Wang; Yi Chen; Boyang Xue; Tianqing Fang; Zhisong Zhang; Hongming Zhang; Haitao Mi; Dong Yu; Kam-Fai Wong;
375	Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current RL methods, exemplified by Group Relative Policy Optimization (GRPO), suffer from coarse-grained, trajectory-level rewards that provide insufficient learning signals for complex multi-turn interactions, leading to training stagnation. To address this issue, we propose Group Turn Policy Optimization (GTPO), a novel RL algorithm specifically designed for training LLMs on multi-turn TIR tasks.	Yifeng Ding; Hung Le; Songyang Han; Kangrui Ruan; Zhenghui Jin; Varun Kumar; Zijian Wang; Anoop Deoras;
376	MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL Via Agentic Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MTSQL-R1, an agentic training framework for long-horizon multi-turn Text-to-SQL.	Taicheng Guo; Hai Wang; Chaochun Liu; Mohsen Golalikhani; Xin Chen; Xiangliang Zhang; Chandan K. Reddy;
377	DocLens: A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: They struggle to retrieve relevant pages and overlook fine-grained details within visual elements, leading to limited performance and model hallucination. To address this, we propose DocLens, a tool-augmented multi-agent framework that effectively “zooms in” on evidence like a lens.	Dawei Zhu; Rui Meng; Jiefeng Chen; Sujian Li; Tomas Pfister; Jinsung Yoon;
378	Losing Our Tail, Again: (Un)Natural Selection & Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While they provide us quick access to information and impressively fluent output, beneath their (apparent) sophistication lies a subtle, insidious threat: the gradual decline and loss of linguistic diversity. In this position paper, I explore how model collapse, with a particular focus on translation technology, can lead to the loss of linguistic forms, grammatical features, and cultural nuance.	Eva Vanmassenhove;
379	Efficiently Learning To Reason or Not to Reason: Root-token Policy Optimization for Adaptive Thinking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In response, we introduce RPO (Root-token Policy Optimization), a framework that enables LRMs to self-determine when to reason by training only the initial root token (e. g. , whether to invoke the think tag) via group relative reward and group-wise advantages.	Taehyeon Kim; Hyunsoo Lee; Youngsoo Jang; Moontae Lee;
380	BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present BenchMarker, an education-inspired toolkit using LLM judges to flag three common MCQ flaws: 1) contamination—items appearing exactly online; 2) shortcuts—cues in the choices that enable guessing; and 3) writing errors—structural/grammatical issues based on a 19-rule education rubric.	Nishant Balepur; Bhavya Rajasekaran; Hyunjin Jane Oh; Michael Xie; Atrey Desai; Vipul Gupta; Steven James Moore; Eunsol Choi; Rachel Rudinger; Jordan Lee Boyd-Graber;
381	Mind The Gap in Cultural Alignment: Task-Aware Culture Management for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CultureManager, a novel pipeline for task-specific cultural alignment.	Binchi Zhang; Xujiang Zhao; Jundong Li; Haifeng Chen; Zhengzhang Chen;
382	Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This phenomenon ultimately amplifies computational and financial inequalities between users from different language backgrounds. To remedy this, we introduce Parity-aware Byte Pair Encoding (BPE), a variant of the widely-used BPE algorithm.	Negar Foroutan; Clara Meister; Debjit Paul; Joel Niklaus; Sina Ahmadi; Antoine Bosselut; Rico Sennrich;
383	InferenceDynamics: Adaptive LLM Routing Through Structured Capability and Knowledge Profiling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To overcome those challenges, we propose InferenceDynamics, a flexible and scalable multi-dimensional routing framework by modeling the capability and knowledge of models.	Haochen Shi; Tianshi Zheng; Weiqi Wang; Baixuan Xu; Chunyang Li; Chunkit Chan; Tao Fan; Yangqiu Song;
384	Think Parallax: Solving Multi-Hop Problems Via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ParallaxRAG, a symmetric multi-view framework that decouples queries and KGs into aligned, head-specific retrieval spaces.	Jinliang Liu; Jiale Bai; Shaoning Zeng;
385	DARM: Distribution-Aware Reward Modeling By Alleviating Biases from Low Preference-Context Dependency Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that standard RM training is vulnerable in data subsets where response quality depends only weakly on the context: such instances encourage the RM to ignore the context, leading to context neglect and degraded accuracy. To address this failure mode, we propose Distribution-Aware Reward Modeling (DARM), which augments the RM objective with a conditional mutual information regularizer that maximizes context and the predicted reward conditioned on the response.	Shaofan Liu; Guoqiang Zhang; Shihan Dou; Huiyuan Zheng; Yiming Zhou; Junjie Ye; Shaowen Wang; Shichun Liu; Jiazheng Zhang; Tao Gui; Qi Zhang; Xuanjing Huang;
386	MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We demonstrate that rater behavior in re-annotation aligns with our goals, and that re-annotation results in higher-quality annotations, mostly due to finding errors that were missed during the first pass.	Parker Riley; Daniel Deutsch; Mara Finkelstein; Colten DiIanni; Juraj Juraska; Markus Freitag;
387	Thinking Alignment of Scenario-Oriented User Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, to enhance controllability and situational coherence, we introduce scenario settings that describe the global context and user goals throughout multi-turn conversations. Using this dataset, we train user simulators called ThinkingUS on different base models.	Xiaoting Wu; Yi Huang; Chunyang Gao; Mengfei Guo; Jingyu Yao; Junlan Feng;
388	Aligning Backchannel and Dialogue Context Representations Via Contrastive LLM Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a two-stage framework: first, fine-tuning large language models on dialogue transcripts to derive rich contextual representations; and second, learning a joint embedding space for dialogue contexts and backchannel realizations.	Livia Qian; Gabriel Skantze;
389	CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current benchmarks primarily emphasize functional relevance while neglecting code quality. To address this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across four critical dimensions: correctness, efficiency, security, and maintainability.	Jiahui Geng; Fengyu Cai; Shaobo Cui; Qing Li; Liangwei Chen; Chenyang Lyu; Haonan Li; Derui Zhu; Alexander Pretschner; Heinz Koeppl; Fakhri Karray;
390	ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The advancement of large language models (LLMs) has enhanced tabular question answering (Tabular QA), yet they struggle with open-domain queries exhibiting underspecified or uncertain expressions. To address this, we introduce the ODUTQA-MDC task and the first comprehensive benchmark to tackle it.	Zhensheng Wang; ZhanTeng Lin; Wenmian Yang; Kun Zhou; Yiquan Zhang; Weijia Jia;
391	ReCoQA: A Benchmark for Tool-Augmented and Multi-Step Reasoning in Real Estate Question and Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Developing agents capable of navigating fragmented, multi-source information remains challenging, primarily due to the scarcity of benchmarks reflecting hybrid workflows combining database querying with external APIs. To bridge this gap, we introduce ReCoQA, a large-scale benchmark of 29,270 real-estate instances featuring machine-verifiable supervision for intermediate steps, including structured intent labels, SQL queries, and API calls.	Yindong Zhang; Wenmian Yang; Yiquan Zhang; Weijia Jia;
392	ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We attribute this to the underutilization of two key components of human reviewing: explicit rubrics and contextual grounding in existing work. To address this, we introduce ReviewBench, a benchmark evaluating review text according to paper-specific rubrics derived from official guidelines, the paper’s content, and human-written reviews.	Zhuofeng Li; Yi Lu; Dongfu Jiang; Haoxiang Zhang; Yuyang Bai; Chuan Li; Yu Wang; Shuiwang Ji; Jianwen Xie; Yu Zhang;
393	OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce OpenRubrics, a diverse, large-scale collection of (prompt, rubric) pairs for training rubric-generation and rubric-based reward models.	Tianci Liu; Ran Xu; Tony Yu; Ilgee Hong; Carl Yang; Tuo Zhao; Haoyu Wang;
394	MedMCP-Calc: Benchmarking LLMs for Realistic Medical Calculator Scenarios Via MCP Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their real-world use is an adaptive, multi-stage process, requiring proactive EHR data acquisition, scenario-dependent calculator selection, and multi-step computation, whereas current benchmarks focus only on static single-step calculations with explicit instructions. To address these limitations, we introduce MedMCP-Calc, the first benchmark for evaluating LLMs in realistic medical calculator scenarios through Model Context Protocol (MCP) integration.	Yakun Zhu; Yutong Huang; Shengqian Qin; Zhongzhen Huang; Shaoting Zhang; Xiaofan Zhang;
395	Evolving Sparsity: Leveraging Token Importance Dynamics for Efficient LLM Decoding with Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit sparse attention in LLMs and propose to model token importance as a dynamic process that evolves over decoding steps and propagates through model layers.	Ruizi Han; Miao Zhang; Ziyue Qiao; Liqiang Nie;
396	RoBSA: RoPE-based Blockwise Sparse Multi-head Latent Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce RoPE-based Blockwise Sparse Attention (RoBSA), a method designed specifically for MLA during the decoding stage of model inference.	Xinyu Shi; Kairong Luo; Zhen Zheng; Wenguang Chen;
397	Two-Stage Regularization-Based Structured Pruning for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior structured pruning methods directly remove unimportant parameters based on certain metrics, which often causes knowledge loss and necessitates extensive retraining. To overcome this, we introduce a novel pruning method TRSP: Two-Stage Regularization-Based Structured Pruning for LLMs.	Mingkuan Feng; Jinyang Wu; Siyuan Liu; Shuai Zhang; Hongjian Fang; Ruihan Jin; Feihu Che; Pengpeng Shao; Zhengqi Wen; Jianhua Tao;
398	Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet most benchmarks assume clean input, leaving the robustness of LLMs to typos across languages largely underexplored. To address this gap, we introduce MulTypo, a multilingual typo generation algorithm that simulates human-like errors based on language-specific keyboard layouts and typing behavior.	Raoyuan Zhao; Yihong Liu; Lena Altinger; Hinrich Schuetze; Michael A. Hedderich;
399	Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Today’s benchmarks fail to capture this fragility: agents may perform well under standard evaluations but degrade spectacularly in more realistic and varied settings. We address this robustness testing gap by introducing TraitBasis, a lightweight, model-agnostic method for systematically stress testing AI agents.	Muyu He; Anand Kumar; Soumyadeep Bakshi; James Zou; Nazneen Rajani;
400	GMSA: Enhancing Context Compression Via Group Merging and Layer Semantic Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in long-context scenarios, they face two challenges: high computational cost and information redundancy. To address these challenges, we propose GMSA, an encoder-decoder context compression framework that generates a compact sequence of soft tokens for downstream tasks.	Jiwei Tang; Zhicheng Zhang; Shunlong Wu; Jingheng Ye; Lichen Bai; Zitai Wang; Tingwei Lu; Lin Hai; Yiming Zhao; Hai-Tao Zheng; Hong-Gee Kim;
401	Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a novel extension of neural scaling laws to Mixture-of-Experts (MoE) models, focusing on the optimal allocation of compute between expert and attention sub-layers.	Junzhuo Li; Peijie Jiang; Changxin Tian; Jia Liu; Zhiqiang Zhang; Xuming Hu;
402	AutoSchemaKG: Autonomous Knowledge Graph Construction Through Dynamic Schema Induction from Web-Scale Corpora Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas.	Jiaxin Bai; Wei Fan; Qi Hu; Qing Zong; Chunyang Li; Hong Ting Tsang; Hongyu Luo; Yauwai Yim; Haoyu Huang; Xiao Zhou; Feng Qin; Tianshi Zheng; Xi Peng; Xin Yao; Huiwen Yang; Leijie Wu; JI Yi; Gong Zhang; Renhai Chen; Yangqiu Song;
403	Distillation Traps and Guards: A Calibration Knob for LLM Distillability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These traps manifest as overconfident hallucinations, self-correction collapse, and local decoding degradation, causing distillation to fail. Motivated by these findings, we propose a post-hoc calibration method that, to the best of our knowledge, for the first time enables control over a teacher’s distillability via reinforcement fine-tuning (RFT).	Weixiao Zhan; Yongcheng Jing; Leszek Rutkowski; Dacheng Tao;
404	CODERL+: Improving Code Generation Via Reinforcement with Execution Semantics Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose CODERL+, a novel approach that integrates execution semantics alignment into the RLVR training pipeline for code generation.	Xue Jiang; Yihong Dong; Mengyang Liu; Deng Hongyi; Tian Wang; Yongding Tao; Zhi Jin; Wenpin Jiao; Ge Li;
405	Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI Vs. Human Linguistic Differences Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Beyond detection performance, we provide the first comprehensive linguistic analysis contrasting human and LLM-generated persuasive texts, offering insights that may guide the development of more interpretable and robust detection tools.	Arkadiusz Modzelewski; Paweł Golik; Anna Kołos; Giovanni Da San Martino;
406	AV-Dialog: Spoken Dialogue Models with Audio-Visual Input Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present AV-Dialog, the first multimodal dialog framework that uses both audio and visual cues to track the target speaker, predict turn-taking, and generate coherent responses.	Tuochao Chen; Bandhav Veluri; Hongyu Gong; Shyamnath Gollakota;
407	Empowering GUI Agents Via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While small open-source MLLMs are cost-efficient and privacy-preserving compared with commercial large models, they suffer from weak planning and limited cross-website generalization. To address these limitations, we introduce the planning experience exploration and utilization (PEEU) method, which autonomously explores environments to discover experiences and utilizes hindsight experience to synthesize strictly aligned, high-level training data.	Tianyi Men; Zhuoran Jin; Pengfei Cao; Yubo Chen; Kang Liu; Jun Zhao;
408	MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing Reinforcement Learning with Verifiable Rewards (RLVR) algorithms, such as GRPO, rely on rigid, uniform, and symmetric trust region mechanisms that are fundamentally misaligned with the complex optimization dynamics of Large Language Models (LLMs). In this paper, we identify three critical challenges in these methods: (1) inefficient gradient utilization caused by the binary cutoff of hard clipping, (2) insensitive probability mass arising from uniform ratio constraints that ignore the token distribution, and (3) asymmetric signal reliability stemming from the disparate credit assignment ambiguity between positive and negative samples.	Xiaoliang Fu; Jiaye Lin; Yangyi Fang; Binbin Zheng; Chaowen Hu; Zekai Shao; Cong Qin; Lu Pan; Ke Zeng; Xunliang Cai;
409	KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present KASER (Knowledge-Aligned Student Error Simulator), a novel approach that aligns errors with student knowledge.	Zhangqi Duan; Nigel Fernandez; Andrew Lan;
410	MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MobileWorld, a substantially more challenging benchmark with 201 tasks across 20 applications that reflects real-world usage through long-horizon, cross-application workflows requiring nearly twice as many steps (27.	Quyu Kong; Xu Zhang; Zhenyu Yang; Nolan Gao; Chen Liu; Panrong Tong; Chenglin Cai; Hanzhang Zhou; Jianan Zhang; Liangyu Chen; Zhidan Liu; Steven Hoi; Yue Wang;
411	Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Consequently, specialist experts possessing critical long-tail knowledge are often assigned low gating scores and remain dormant—under-prioritized for specific tokens despite their proven causal importance on other inputs. To address this, we propose Counterfactual Routing (CoR), a training-free inference framework designed to awaken these dormant experts.	Wentao Hu; Yanbo Zhai; Xiaohui Hu; Mingkuan Zhao; Shanhong yu; Xue Liu; Kaidong Yu; Shuangyong Song; Xuelong Li;
412	Feeling Right Vs. Being Right: How AI Sycophancy Affects Value-Laden Deliberation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: As people increasingly turn to AI for personal deliberation beyond task-oriented assistance, concerns about sycophancy in these value-laden contexts have grown. Unlike human …	Jeongwoo Ryu; Soomin Kim; Jinsu Eun; Kyusik Kim; Changhoon Oh; Bongwon Suh;
413	Writing-RL: Advancing Long-form Writing Via Adaptive Curriculum Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To further advance long-form writing, we present Writing-RL: an Adaptive Curriculum Reinforcement Learning framework to advance long-form writing capabilities beyond SFT.	Xuanyu Lei; Chenliang Li; Yuning Wu; Kaiming Liu; Weizhou Shen; Peng Li; Ming Yan; Fei Huang; Ya-Qin Zhang; Yang Liu;
414	Agentic Oversight Via Dialectic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To make oversight grounded and scale as capabilities extend, we introduce an Agentic Oversight framework.	Leonardo Ranaldi; Federico Ranaldi;
415	Beyond The Last Frame: Process-aware Evaluation for Generative Video Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While these models show promise for Generative Video Reasoning (GVR), existing evaluation frameworks often rely on single-frame assessments, which can lead to outcome-hacking, where a model reaches a correct conclusion through an erroneous process. To address this, we propose a process-aware evaluation paradigm.	Yifan Li; YuKai Gu; Yingqian Min; Zikang Liu; Yifan Du; Kun Zhou; Min Yang; Xin Zhao; Minghui Qiu;
416	TOWER+: Bridging Generality and Translation Specialization in Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Tower+, a suite of models designed to deliver strong performance on both translation and multilingual general-purpose text capabilities.	Ricardo Rei; Nuno M Guerreiro; José Pombal; João Alves; Amin Farajian; Pedro Teixeirinha; Andre Martins;
417	ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ReGATE (Reference-Guided Adaptive Token Elision), an adaptive token pruning method for accelerating MLLM training.	Chaoyu Li; Yogesh Kulkarni; Pooyan Fazli;
418	MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning Via Bipartite Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation.	Changle Qu; Sunhao Dai; Hengyi Cai; Jun Xu; Shuaiqiang Wang; Dawei Yin;
419	Spectral Disentanglement: Rank-Aware Task Adaptation for Rehearsal-free Continual Learning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Parameter-Efficient Fine-Tuning methods, such as LoRA, enable efficient adaptation, we identify a critical flaw in current approaches termed Rank-Blindness: the enforcement of a single rank constraint across diverse tasks, which entangles task-shared and task-specific knowledge, leading to catastrophic forgetting of earlier tasks and underfitting on complex new ones. To address this, we propose SpaRTA, a novel rehearsal-free framework guided by a rank-spectrum perspective that explicitly disentangles knowledge into two orthogonal subspaces.	Huanxuan Liao; Shizhu He; Yupu Hao; Yequan Wang; Wenhao Teng; Xiangwen Liao; Jun Zhao; Kang Liu;
420	Do LLMs Really Need 10+ Thoughts for “Find The Time 1000 Days Later”? Towards Structural Understanding of LLM Overthinking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study introduces a systematic, fine-grained analyzer of LLMs’ thought process to bridge the gap, TRACE.	Xinliang Frederick Zhang; Anhad Mohananey; Alexandra Chronopoulou; Pinelopi Papalampidi; Somit Gupta; Tsendsuren Munkhdalai; Lu Wang; Shyam Upadhyay;
421	Quantifying and Improving The Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify and study spurious features in the RAG paradigm, a robustness issue caused by the sensitivity of LLMs to semantic-agnostic features.	Shiping Yang; Jie Wu; Wenbiao Ding; Ning Wu; Shining Liang; Ming Gong; Hongzhi Li; Hengyuan Zhang; Angel X Chang; Dongmei Zhang;
422	Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing long-term personalized dialogue systems struggle to reconcile unbounded interaction streams with finite context constraints, often succumbing to memory noise accumulation, reasoning degradation, and persona inconsistency. To address these challenges, this paper proposes Inside Out, a framework that utilizes a globally maintained PersonaTree as the carrier of long-term user profiling.	Jihao Zhao; Ding Chen; Zhaoxin Fan; Kerun Xu; Mengting Hu; Bo Tang; Feiyu Xiong; Zhiyu li;
423	M3-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present M3-VQA, a novel knowledge-based Visual Question Answering (VQA) benchmark, to enhance the evaluation of multimodal large language models (MLLMs) in fine-grained multimodal entity understanding and complex multi-hop reasoning.	Jiatong Ma; Longteng Guo; Yuchen Liu; Zijia Zhao; Dongze Hao; Xuanxu Lin; Jing Liu;
424	Instant Personalized Large Language Model Adaptation Via Hypernetwork Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Profile-to-PEFT, a scalable framework that employs a hypernetwork, trained end-to-end, to map a user’s encoded profile directly to a full set of adapter parameters (e. g. , LoRA), eliminating per-user training at deployment.	Zhaoxuan Tan; Zixuan Zhang; Haoyang Wen; Zheng Li; Rongzhi Zhang; Pei Chen; Fengran Mo; Zheyuan Liu; Qingkai Zeng; Qingyu Yin; Meng Jiang;
425	SciPedia: Unlocking The Value of Scientific Data for Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: First, we construct a large-scale raw scientific corpus but identify a critical Learnability Gap, revealing that direct pre-training yields negligible gains. To bridge this, we develop a multi-stage pipeline featuring content cleaning and pedagogical augmentation, resulting in SciPedia, a 900B-token corpus.	Yiwei Qin; Zhen Huang; Tiantian Mi; Weiye Si; Qipeng Guo; Siyuan Feng; Pengfei Liu;
426	RSMeM: Knowledge-Enhanced Memory Evolution for Remote Sensing Agents with Systematic Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, these failures are seldom consolidated into a reusable experience for subsequent analyses. To address this issue, we introduce RSMeM, a knowledge-enhanced memory evolution mechanism that bootstraps RS agents with pre-distilled domain knowledge and iteratively integrates online experience for robust multi-step tool execution.	Bingxian Wu; Yu Zhang; Zonghao Guo; Tang Liu; Chen Qian; Yuxiang Lu; Xingbo Du; Yanghao Li; Yidan Zhang; Chi Chen; Ling Yao; Chenghu Zhou; Maosong Sun;
427	CheckRLM: Effective Knowledge–Thought Coherence Checking in Retrieval-Augmented Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these chains are prone to containing factual errors, particularly in knowledge-intensive tasks. To address this issue, we propose CheckRLM, a framework that improves the reliability of the reasoning process through Retrieval-Augmented Generation (RAG) by timely checking and correcting factual errors.	Dingling Xu; Ruobing Wang; Qingfei Zhao; Yukun Yan; Zhichun Wang; Daren Zha; Shi Yu; Zhenghao Liu; Shuo Wang; Xu Han; Maosong Sun;
428	LearnerCoMPASS: Intelligent Tutoring System with Dynamic Cognitive Diagnosis and Multi-Model Path Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces LearnerCoMPASS (Cognitive Multi-model Planning Adaptive System), an integrated, end-to-end framework for adaptive learning.	Ziji Sheng; Guiyao Tie; Weidong Wang; Pan Zhou; Daizong Liu;
429	The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that identical scenarios paired with different user profiles produce systematically divergent emotional interpretations.	Xi Fang; Weijie Xu; Yuchong Zhang; Scott Nickleach; Stephanie Eckman; Chandan K. Reddy;
430	See The Forest for The Trees: Loosely Speculative Decoding Via Visual-Semantic Guidance for Efficient Inference of Video LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Speculative Decoding (SD) mitigates this by applying a draft-and-verify paradigm, yet existing methods are constrained by rigid exact-match rules, severely limiting the acceleration potential. To bridge this gap, we propose LVSpec, the first training-free loosely SD framework tailored for Video-LLMs.	Yicheng Ji; Jun Zhang; Jinpeng Chen; Cong Wang; Lidan Shou; Gang Chen; Huan Li;
431	HyperMem: Hypergraph Memory for Long-Term Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose HyperMem, a hypergraph-based hierarchical memory architecture that explicitly models such associations using hyperedges.	Juwei Yue; Chuanrui Hu; Jiawei Sheng; Zuyi Zhou; Wenyuan Zhang; Tingwen Liu; Li Guo; Yafeng Deng;
432	Prune As You Generate: Online Rollout Pruning for Faster and Better RLVR Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce ARRoL (Accelerating RLVR via online RoLlout Pruning), an online rollout pruning method that prunes rollouts during generation while explicitly steering the surviving ones more correctness-balanced to enhance learning signals.	Haobo Xu; Sirui Chen; Ruizhong Qiu; Yuchen Yan; Chen Luo; Monica Xiao Cheng; Jingrui He; Hanghang Tong;
433	Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we observe that current models are susceptible to reward hacking, leading to a substantial overestimation of a model’s reasoning ability.	Youliang Yuan; Qiuyang Mang; Jingbang Chen; Hong Wan; Xiaoyuan Liu; Junjielong Xu; Jen-tse Huang; Wenxuan Wang; Wenxiang Jiao; Pinjia He;
434	New Terms, New Toxicity: Consensus-based Chinese Neologism Toxicity Detection Via Search-Augmented LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate how to detect implicit toxicity expressed via neologisms.	Shiyao Cui; QingLin Zhang; Di Wang; Yida Lu; Zhexin Zhang; Jinhua Gao; Jinglin Yang; Min He; Han Qiu; Minlie Huang;
435	EfficientLLM: Unified Pruning-Aware Pretraining for Auto-Designed Compact Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Distinguished from direct pretraining that bounded by parameter scaling law, this work proposes the unified pruning-aware pretraining, focusing on pretraining compact models while preserving performance of much larger source models, termed EfficientLLM.	Xingrun Xing; Zheng Liu; Shitao Xiao; Boyan Gao; Yiming Liang; Haokun Lin; Xianlin Zeng; Guoqi Li; Jiajun Zhang;
436	JanusMM: A Benchmark for Self-Deprecation Understanding in Real-World Multimodal Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite self-deprecation is widespread in real-world conversations, the ability of multimodal large language models (MLLMs) to understand it remains underexplored. To fill this gap, we introduce JanusMM, the first benchmark designed to evaluate MLLMs’ understanding of self-deprecation in real-world conversations.	Xinyi Xu; Bingguang Hao; Yongyi Xiong; Zimo Chen; Xinchen Liu; Hongxin Guo; Xuelong Wang; Silin Zhou; Shihan Dou;
437	ToolScope: Enhancing LLM Agent Tool Use Through Tool Merging and Context-Aware Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: LLMs also face strict input context limits, preventing efficient consideration of large toolsets. To address these challenges, we propose ToolScope, which includes: (1) ToolScopeMerger with Auto-Correction to automatically audit and fix tool merges, reducing redundancy, and (2) ToolScopeRetriever to rank and select only the most relevant tools for each query, compressing toolsets to fit within context limits without sacrificing accuracy.	Marianne Menglin Liu; Daniel Garcia; Fjona Parllaku; Vikas Upadhyay; Fahad Shah; Dan Roth;
438	SCAN: Structured Capability Assessment and Navigation for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing research has focused on approximating model rankings, such benchmarks fail to provide users and developers with a comprehensive and fine-grained understanding of a specific model’s capabilities. To fill this gap, we propose SCAN (Structured Capability Assessment and Navigation), a practical framework that enables detailed characterization of LLM capabilities through comprehensive and fine-grained evaluation.	Zongqi Wang; Tianle Gu; Chen Gong; Xin Tian; Siqi Bao; Yujiu Yang;
439	From Naturalness to Norms: Interactional Cultural Competence for SpeechLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an evaluation framing that complements WER/MOS and broad capability suites by making speech events and interaction contracts explicit, diagnosing where modern pipelines lose interactional cues, and treating cultural appropriateness as a norm-conditioned target rather than generic “naturalness. ”	Santosh T.y.s.s;
440	NSF-SciFy: Mining The NSF Awards Database for Scientific Claims Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce NSF-SciFy, a comprehensive dataset of scientific claims and investigation proposals extracted from National Science Foundation award abstracts.	Delip Rao; Weiqiu You; Eric Wong; Chris Callison-Burch;
441	*Doc-V: Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA** Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Doc-V, an OCR-free agentic framework that casts multi-page DocVQA as sequential evidence aggregation.*	Yuanlei Zheng; Pei Fu; Hang Li; Ziyang Wang; Yuyi Zhang; Wenyu Ruan; Xiaojin Zhang; Zhongyu Wei; Zhenbo Luo; Jian Luan; Wei Chen; Xiang Bai;
442	From Synthesis to Clinical Assistance: A Strategy-Aware Agent Framework for Autism Intervention Based on Real Clinical Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, while Applied Behavior Analysis (ABA) serves as the gold standard for clinical intervention, general-purpose Large Language Models (LLMs) struggle to strictly adhere to its standardized procedures, often resulting in interactions that are linguistically fluent but strategically inconsistent. To address these challenges, we introduce ASDAgent, a strategy-aware framework designed to unify high-fidelity intervention dialogue synthesis and clinical decision support.	Junhong Lai; Shuzhong Lai; Yanhao Yu; Wanlin Chen; Chenyu Yan; Haifeng Li; Lin Yao; Yueming Wang;
443	Protecting Multimodal Large Language Models Against Misleading Visualizations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.	Jonathan Tonglet; Tinne Tuytelaars; Marie-Francine Moens; Iryna Gurevych;
444	Is This Chart Lying to Me? Automating The Detection of Misleading Visualizations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders.	Jonathan Tonglet; Jan Zimny; Tinne Tuytelaars; Iryna Gurevych;
445	Open Schrödinger’s Closed Box: Identifying Retrieval Augmented Generation in API-Accessible Large Language Model Services Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This opacity also risks misleading users about system capabilities. This work aims to bridge this gap by proposing RAG-ID, a framework for ̲IDentifying ̲RAG properties in LLM services.	Yukun Jiang; Xinyue Shen; Michael Backes; Zheng Li; Yang Zhang;
446	LLM Agents in Law: Taxonomy, Applications, and Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recently, LLM agents have attracted significant attention as a solution to these challenges, utilizing advanced capabilities such as planning, memory, and tool usage to meet the rigorous standards of legal practice. In this paper, we present a comprehensive survey of LLM agents for legal tasks, analyzing how these architectures bridge the gap between technical capabilities and domain-specific needs.	Shuang Liu; Ruijia Zhang; Ruoyun Ma; Yujia Deng; Lanyi Zhu; Jiayu Li; Zelong Li; Zhibin Shen; Mengnan Du;
447	Robust Tool Use Via Fission-GRPO: Learning to Recover from Execution Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In particular, standard reinforcement learning (RL) collapses rich failure experience into sparse negative rewards, while pre-collected error-correction datasets become mismatched to the policy’s evolving failure modes. To bridge this gap, we propose Fission-GRPO, a framework that converts execution errors into on-policy corrective supervision within the RL training loop.	Zhiwei Zhang; Fei Zhao; Rui Wang; Zezhong Wang; Bin Liang; Jiakang Wang; Yao Hu; Shaosheng Cao; Kam-Fai Wong;
448	Probing Semantic Alignment, Lexical Invariance, and Syntactic Influence in LLM Metaphor Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a diagnostic analysis that examines the limits of behavioral evidence by probing three complementary dimensions: semantic attribute alignment, lexical invariance, and syntactic sensitivity.	Fengying Ye; Shanshan Wang; Lidia S. Chao; Derek F. Wong;
449	G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present G-IdiomAlign, a gloss-pivoted benchmark where each idiom is anchored by an English gloss from Wiktionary.	Fengying Ye; Yanming Sun; Runzhe Zhan; Lidia S. Chao; Zheqi Zhang; Derek F. Wong;
450	Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset with an emphasis on reasoning trace evaluation.	Jinu Lee; Kyoung-Woon On; Sophia Simeng Han; Arman Cohan; Julia Hockenmaier;
451	Scaling External Knowledge Input Beyond Context Windows of LLMs Via Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we develop a multi-agent framework, ExtAgents, to overcome the bottlenecks and enable better scalability in inference-time knowledge integration without longer-context training.	Zijun Liu; Zhennan Wan; Peng Li; Ming Yan; Fei Huang; Yang Liu;
452	Think Before Go: Hierarchical Reasoning for Image-goal Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by the human cognitive principle that deliberate, high-level reasoning guides fast, reactive execution in complex tasks, we propose Hierarchical Reasoning Navigation (HRNav), a framework that decomposes image-goal navigation into high-level planning and low-level execution.	Pengna Li; Kangyi Wu; Shaoqing Xu; Fang Li; Lin Zhao; Long Chen; Zhi-Xin Yang; Nanning Zheng;
453	HiChunk: Evaluating and Enhancing Retrieval Augmented Generation with Hierarchical Chunking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper first analyzes why existing RAG evaluation benchmarks are inadequate for assessing document chunking quality, specifically due to evidence sparsity. Based on this conclusion, we propose HiCBench, which includes manually annotated multi-level document chunking points, synthesized evidence-dense question answer(QA) pairs, and their corresponding evidence sources.	Wensheng Lu; Keyu Chen; Zhifeng Shen; Ruizhi Qiao; Xing Sun;
454	Lost in Simulation: LLM-Simulated Users Are Unreliable Proxies for Human Users in Agentic Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that user simulators lack robustness, with agent success rates varying up to 9 percentage points across different user LLMs.	Preethi Seshadri; Samuel Cahyawijaya; Ayomide Odumakinde; Sameer Singh; Seraphina Goldfarb-Tarrant;
455	SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents Via Reconstructing Vulnerability-Introducing Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing benchmarks have provided valuable insights, but they fail to capture scenarios in which vulnerabilities are actually introduced by human developers, making fair comparisons between humans and agents infeasible. We therefore introduce SecureVibeBench, a benchmark of 105 C/C++ secure coding tasks sourced from 41 projects in OSS-Fuzz for code agents.	Junkai Chen; Huihui Huang; Yunbo Lyu; Junwen An; Jieke Shi; Chengran Yang; Ting Zhang; Haoye Tian; Yikun Li; Zhenhao Li; Xin Zhou; Xing Hu; David Lo;
456	Anchor: Branch-Point Data Generation for GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a trajectory expansion framework Anchor that bootstraps scalable desktop supervision from a small set of verified seed demonstrations.	Jinbiao Wei; Yilun Zhao; Kangqi Ni; Arman Cohan;
457	ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet existing medical Large Language Models (LLMs) predominantly follow a reactive paradigm, risking diagnostic errors by answering before seeking sufficient details. To bridge this gap, we propose ProMed, a reinforcement learning framework that transitions LLMs toward a proactive paradigm, enabling them to ask clinically valuable questions before decision-making.	Hongxin Ding; Baixiang Huang; Yue Fang; Weibin Liao; Xinke Jiang; Jinyang Zhang; Yinghao Zhu; Zheng Li; Liantao Ma; Junfeng Zhao; Yasha Wang;
458	POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Charts are a universally adopted medium for data communication, yet existing chart understanding benchmarks are overwhelmingly English-centric, limiting their accessibility and relevance to global audiences. To address this limitation, we introduce PolyChartQA, the first large-scale multilingual benchmark for chart question answering, comprising 22,606 charts and 26,151 QA pairs across 10 diverse languages.	Yichen Xu; Liangyu Chen; Liang Zhang; Zihao Yue; Jianzhe Ma; Wenxuan Wang; Qin Jin;
459	Different Types of Syntactic Agreement Recruit The Same Units Within Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate whether different syntactic phenomena recruit shared or distinct components in LLMs.	Daria Kryvosheieva; Andrea Gregor de Varda; Evelina Fedorenko; Greta Tuckute;
460	XY-Tokenizer: Mitigating The Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose XY-Tokenizer, a low-bitrate speech codec (around 1 kbps) trained with a structured multi-stage, multi-task strategy that aligns discrete speech representations with text while preserving fine-grained acoustic details for reconstruction.	Yitian Gong; Luozhijie Jin; Kuangwei Chen; Dong Zhang; Ruifan Deng; Xiaogui Yang; Xin Zhang; Zhaoye Fei; Qinyuan Cheng; Shimin Li; Xipeng Qiu;
461	Culinary Crossroads: A RAG Framework for Enhancing Diversity in Cross-Cultural Recipe Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This reveals a key limitation of RAG in creative tasks with multiple valid answers: it fails to leverage contextual diversity for generating varied responses. To address this issue, we propose CARRIAGE, A plug-and-play RAG framework for cross-cultural recipe adaptation that enhances diversity in both retrieval and context organization.	Tianyi Hu; Andrea Morales-Garzón; Jingyi Zheng; Maria Maistro; Daniel Hershcovich;
462	CodeEvo: Interaction-Driven Synthesis of Code-centric Data Through Hybrid and Iterative Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CodeEvo, a dual-agent architecture comprising a Coder for iterative solution synthesis and a Reviewer to orchestrate the generation trajectory.	Qiushi Sun; Jingyang Gong; Lei Li; Qipeng Guo; Fei Yuan;
463	CiteGuard: Faithful Citation Attribution for LLMs Via Retrieval-Augmented Validation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reframe citation evaluation as a problem of citation attribution alignment, which assesses whether LLM-generated citations match those a human author would include for the same text.	Yee Man Choi; Xuehang Guo; Yi R. Fung; Qingyun Wang;
464	A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To generate user queries that cover a broad range of tasks, we propose a data generation pipeline that leverages webpage content and interactive elements (e. g. , buttons, check boxes) to create diverse, functionality-grounded user queries covering tasks such as address management, wishlist management, and brand store following.	Xianren Zhang; Shreyas Prasad; Di Wang; Qiuhai Zeng; Suhang Wang; Wenbo Yan; Mat Hans;
465	Please Refuse to Answer Me! Mitigating Over-Refusal in Large Language Models Via Adaptive Contrastive Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we analyze how system prompts with varying safety levels affect LLM refusal behaviors when facing over-refusal queries.	Yupeng Qi; Ziyu Lyu; Lixin Cui; Lu Bai; Feng Xia;
466	Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce LinuxFLBench, a FL benchmark constructed from real-world Linux kernel bugs.	Zhenhao Zhou; Zhuochen Huang; Yike He; Chong Wang; Jiajun Wang; Yijian Wu; Xin Peng; Yiling Lou;
467	ServImage: An Image Generation and Editing Benchmark from Real-world Commercial Imaging Services Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These three dimensions are designed to characterize the factors that drive human payment decisions and indicate whether an image is commercially acceptable. (iii) ServImageModel: under this scoring system, we propose a payment prediction model trained on the human-annotated candidate images, achieving 82.	Fengxian Ji; Jingpu Yang; Zirui Song; Lang Gao; Junhong Liang; Zhenhao Chen; Jinghui Zhang; Xiuying Chen;
468	LEDOM: Reverse Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Reverse Reward, which reranks forward outputs using reverse posterior estimates, and prove that bidirectional scoring penalizes hallucinated reasoning chains whose backward reconstruction degrades.	Xunjian Yin; Sitao Cheng; Yuxi Xie; Xinyu Hu; Li Lin; Xinyi Wang; Liangming Pan; William Yang Wang; Xiaojun Wan;
469	UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on UniDataBench, we propose a novel LLM-based agent named ReActInsight, an autonomous agent that performs end-to-end analysis over diverse data sources by automatically discovering cross-source linkages, decomposing goals, and generating robust, self-correcting code to extract actionable insights.	Han Weng; Zhou Liu; Yuanfeng Song; Xiaoming Yin; Xing Chen; Wentao Zhang;
470	SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, many real-world applications, such as navigating text-based maps or interpreting structured tables, rely heavily on precise sub-token understanding. In this regard, we introduce SubTokenTest, a comprehensive benchmark that assesses sub-token understanding through practical, utility-driven tasks.	Shuyang Hou; Yi Hu; Muhan Zhang;
471	XtraGPT: Context-Aware and Controllable Academic Paper Revision Via Human-AI Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, academic writing is inherently iterative and revision-driven, a process that is not well supported by direct prompting-based paradigms. To address these scenarios, we propose a human-AI collaboration framework for academic paper revision, centered on criteria-guided intent alignment and context-aware modeling.	Nuo Chen; Andre Lin HuiKai; Jiaying Wu; Junyi Hou; Zining Zhang; Qian Wang; Xidong Wang; Bingsheng He;
472	Uncovering Temporal Framing in The News Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a taxonomy of eight temporal frames grounded in prior work on temporality and framing, and we realize it through expert annotation of a multilingual news corpus.	Tarek Mahmoud; Veronika Solopova; Premtim Sahitaj; Ariana Sahitaj; Max Upravitelev; Mervat Abassy; Hana Fatima Shaikh; Neda Foroutan; Vera Schmitt; Preslav Nakov;
473	Mind’s Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Mind’s Eye, a multiple-choice benchmark of eight visuo-cognitive tasks inspired by classic human intelligence tests and organized under a novel A–R–T taxonomy: Abstraction, Relation, and Transformation.	Rohit Sinha; Aditya Sanjiv Kanade; Sai Srinivas Kancheti; Vineeth N. Balasubramanian; Tanuja Ganu;
474	FoE: Forest of Errors Makes The First Solution The Best in Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through comprehensive empirical analysis, we characterize errors as a forest-structured Forest of Errors (FoE) and conclude that FoE makes the First the Best, which is underpinned by rigorous theoretical analysis. Leveraging these insights, we propose RED, a self-guided efficient reasoning framework comprising two components: I) Refining First, which suppresses FoE growth in the first solution; and II) Discarding Subs, which prunes subsequent FoE via dual-consistency.	Kehan Jiang; Haonan Dong; Zhaolu Kang; Zhengzhou Zhu; Guojie Song;
475	InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance Via Reinforcement Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we aim to develop an interactive system that actively seeks human confirmation at critical decision points.	Qihang Ai; Pi Bu; Yue Cao; Yingyao Wang; Jihao Gu; Jingxuan Xing; Zekun Zhu; Wei Jiang; Zhicheng Zheng; Jun Song; Yuning Jiang;
476	For-Value: Efficient Forward-Only Data Valuation for Finetuning LLMs and VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce For-Value, a forward-only data valuation framework that enables efficient batch-scalable value estimation while maintaining effectiveness.	Wenlong Deng; Qi Zeng; Jiaming Zhang; Minghui Chen; Zixin Ding; Christos Thrampoulidis; Boying Gong; Xiaoxiao Li;
477	VRPO: Rethinking Value Modeling for Robust RL Under Noisy Supervision in LLM Post-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To better optimize noisy supervision, we propose VRPO, a framework that enhances value modeling for robust RL in LLM post-training.	Dingwei Zhu; Shihan Dou; Zhiheng Xi; Senjie Jin; Guoqiang Zhang; Jiazheng Zhang; Junjie Ye; Mingxu Chai; Enyu Zhou; Ming Zhang; Yuhui Wang; Caishuang Huang; Chenhao Huang; Yunke Zhang; Yuran Wang; Tao Gui; Qi Zhang; Xipeng Qiu; Xuanjing Huang;
478	Adam’s Law: Textual Frequency Law on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel research direction in terms of textual data frequency, which is an understudied topic.	Hongyuan Lu; Zixuan Li; Zefan Zhang; Bowen Cao; Wai Lam;
479	Reason-Code: Reliable Code Generation Via Test-Driven Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In practice, reliability is often improved through multi-sample inference, but its cost grows linearly with the sample size, making it impractical under strict latency constraints. To address this, we propose Reason-Code, an inference-time framework that formulates code generation as a search process guided by execution feedback.	Zixu Li; Zhiqi Peng;
480	Memory Efficiency and Resource-rational Encoding in Sentence Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we take a novel yet simple approach to constraining WM in language models, in a way that reflects models of human cognition where memory is treated as a limited resource and deployed strategically.	Weijie Xu; Brian Dillon; Richard Futrell;
481	VecInfer: Efficient LLM Inference with Low-Bit KV Cache Via Outlier-Suppressed Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although existing vector quantization (VQ) methods reduce KV cache usage and provide flexible representational capacity across bit-widths, they suffer severe performance degradation at ultra-low bit-widths due to key cache outliers that hinder effective codebook utilization. To address this challenge, we propose VecInfer, a novel VQ method for aggressive KV cache compression while enabling efficient inference.	Dingyu Yao; Chenxu Yang; Zhengyang Tong; Zheng Lin; Wei Liu; Jian Luan; Weiping Wang;
482	Diffuse Thinking: Exploring Diffusion Language Models As Efficient Thought Proposers for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, diffusion language models (DLMs) adopt a parallel, non-autoregressive generation mechanism that enables the efficient production of diverse candidate outputs. Motivated by this complementarity, we explore a collaborative reasoning framework that combines diffusion-based generation with autoregressive evaluation.	Chenyang Shao; Sijian Ren; Fengli Xu; Yong Li;
483	ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose ProactiveEval, a unified framework for evaluating proactive dialogue capabilities of LLMs.	Tianjian Liu; Fanqi Wan; Jiajian Guo; Xiaojun Quan;
484	AutoGraph-R1: End-to-End Reinforcement Learning for Knowledge Graph Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its effectiveness is hindered by a fundamental disconnect: the knowledge graph (KG) construction process is decoupled from its downstream application, yielding suboptimal graph structures. To bridge this gap, we introduce AutoGraph-R1, the first framework to directly optimize KG construction for task performance using Reinforcement Learning (RL).	Hong Ting Tsang; Jiaxin Bai; Haoyu Huang; Qiao Xiao; Tianshi Zheng; Baixuan Xu; Shujie Liu; Yangqiu Song;
485	Stress Testing Factual Consistency Metrics for Long-Document Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically evaluate the reliability of six widely used reference-free factuality metrics, originally proposed for short-form summarization, in the long-document setting.	Zain Muhammad Mujahid; Dustin Wright; Isabelle Augenstein;
486	KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing Reinforcement Learning (RL) approaches typically rely on outcome-oriented rewards, which can inadvertently reinforce fabricated reasoning paths when the final answer is correct. To address this, we propose Knowledge-enhanced RL, KnowRL, a framework that integrates factual supervision directly into the reasoning process.	Baochang Ren; Shuofei Qiao; Ningyu Zhang; Da Zheng; Huajun Chen;
487	LASA: Language-Agnostic Semantic Alignment at The Semantic Bottleneck for LLM Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models (LLMs) have demonstrated better safety performance in high-resource languages than in low-resource languages.	Junxiao Yang; Haoran Liu; Jinzhe Tu; Jiale Cheng; Zhexin Zhang; Shiyao Cui; Jiaqi Weng; Jialing Tao; Hui Xue; Hongning Wang; Han Qiu; Minlie Huang;
488	NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite significant advances in LLM-driven GUI agents, the field remains constrained by the challenge of reconciling high-fidelity realism with verifiable evaluation accuracy. To address this, we introduce NaturalGAIA, a verifiable evaluation dataset grounded in real-world human GUI interaction intents.	Zihan Zheng; Tianle Cui; Taoran Wang; Fengtao Wang; Jiahui Pan; Lewei He; Qianglong Chen;
489	Uncertainty-Aware Routing for Principled Alignment with MoE Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moving beyond static routing, we present a systematic study of the MoE lifecycle using Helmholtz Free Energyand Router Entropy.	Yilong Chen; Junyuan Shang; Yuchen Feng; Zhenyu Zhang; Naibin Gu; Ziqi Wang; Tingwen Liu; Shuohuan Wang; Yu Sun; Hua Wu; Haifeng Wang;
490	Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on The Edge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To break the memory barrier, we propose Chain Federated Fine-tuning (ChainFed), an innovative paradigm that forgoes end-to-end updates in favor of a sequential, layer-by-layer manner.	Yebo Wu; Jingguang Li; Chunlin Tian; KaHou Tam; Zhijiang Guo; Li Li;
491	Bringing Real-World Relations Into Video Generation with Graph-Structured Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches rely on scaling laws and large-scale, high-quality video datasets to implicitly learn physical dynamics, yet this paradigm is constrained by prohibitive costs and the burdensome demands of data curation. Motivated by this, we propose a novel framework that integrates graph-structured temporal knowledge into video latent diffusion models to enhance compositional generation and interaction fidelity.	Joonhyung Park; Jaeyun Song; Sihwan Park; Eunho Yang;
492	Reasoning Over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By guiding LLMs with case-augmented reasoning instead of extensive code-like safety rules, we avoid rigid adherence to narrowly enumerated rules and enable broader adaptability. Building on these insights, we propose CADA, a case-augmented deliberative alignment method for LLMs utilizing reinforcement learning on self-generated safety reasoning chains.	Can Jin; Rui Wu; Tong Che; Qixin Zhang; Hongwu Peng; Jiahui Zhao; Zhenting Wang; Wenqi Wei; Ligong Han; Zhao Zhang; Yuan Cao; Ruixiang Tang; Dimitris N. Metaxas;
493	VIGIL: Defending LLM Agents Against Tool-Stream Injection Via Verify-Before-Commit Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing defenses encounter a critical dilemma as advanced models prioritize injected rules due to strict alignment while static protection mechanisms sever the feedback loop required for adaptive reasoning. To reconcile this conflict, we propose VIGIL, a framework that shifts the paradigm from restrictive isolation to a verify-before-commit protocol.	Junda Lin; Zhaomeng Zhou; Zhi Zheng; Shuochen Liu; Tong Xu; Yong Chen; Enhong Chen;
494	CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The quadratic complexity and indefinitely growing key-value (KV) cache of standard Transformers pose a major barrier to long-context processing. To overcome this, we introduce the Collaborative Memory Transformer (CoMeT), a novel architecture that enables LLMs to handle arbitrarily long sequences with constant memory usage and linear time complexity.	Runsong Zhao; Shilei Liu; Jiwei Tang; Langming Liu; Haibin Chen; Weidong Zhang; Yujin Yuan; Tong Xiao; JingBo Zhu; Wenbo Su; Bo Zheng;
495	JARVIS or Ultron? A Survey on The Safety and Security Threats of Computer-Using Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a systematization of knowledge on the safety and security threats of CUAs.	Ada Chen; Yongjiang Wu; Junyuan Zhang; Jingyu Xiao; Shu Yang; Jen-tse Huang; Kun Wang; Wenxuan Wang; Shuai Wang;
496	Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we introduce Contrastive Reasoning Path Synthesis (CRPS), a framework that transforms supervision extraction from a filtering process into a synthesis procedure.	Peiyang Liu; Zhirui Chen; Xi Wang; Di Liang; Youru Li; Zhi Cai; Wei Ye;
497	From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper introduces a comprehensive framework for scalable personalized alignment of LLMs.	Jia-Nan Li; Jian Guan; Songhao Wu; Wei Wu; Rui Yan;
498	Simple-VGC: Enhancing Visual Grounding in Multimodal Reasoning Via Adaptive Tool Composition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify three fundamental types of visual grounding failures: Long-Context Grounding Error, where visual information gradually decays over long sequences; Fine-Grained Grounding Error, where low-resolution or degraded inputs hinder the recovery of detailed visual information; and Regional Grounding Error, where spatially diffuse attention weakens region-level vision-language alignment. To address these issues, we propose a tool-augmented reasoning framework with three targeted compensation strategies: reuse, which re-injects the original image to mitigate visual forgetting; focus_area, which constrains attention to task-relevant regions; and zoom_in, which enhances visual resolution for fine-grained perception.	Ye Wang; Qianglong Chen; Siyuan Wang; Zejun Li; Shijie Guo; Zhirui Zhang; Zhongyu Wei;
499	VoxMind: An End-to-End Agentic Spoken Dialogue System Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, existing research has largely concentrated on core perception and generation, with comparatively limited exploration of such tool-augmented extensions. To bridge this gap, we present VoxMind, an integrated framework designed to equip end-to-end spoken dialogue models with comprehensive agentic abilities.	Tianle Liang; Yifu Chen; Shengpeng Ji; Yijun Chen; Zhiyang Jia; Jingyu Lu; Fan Zhuo; Xueyi Pu; Yangzhuo Li; Zhou Zhao;
500	Self-SoftCoT: A Self-Consistent Framework Via Position-Aware Latent Space Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing continuous reasoning approaches, such as SoftCoT, mitigate this but typically rely on external auxiliary models, resulting in complex deployment and fractured inference pipelines. To address these challenges, we propose Self-SoftCoT, a self-contained framework that enables a frozen LLM to internally generate and consume latent thoughts without external assistants.	Liangliang Dong; Lianlei Shan; Shuaimin Li;

This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~2,400 papers), please visit Paper Digest: ACL-2026 (Full List).