ICLR 2026 Papers with Code & Data
To facilitate rapid community engagement with the presented research, we have compiled an extensive index of accepted papers that have associated public code or data repositories. We list all of them in the following table. This index was generated using an automated extraction process. While we strive for completeness, some papers with public resources may have been missed. Please inform us if you discover any additional papers that should be included. Readers should be aware that some code repositories may not be made fully public until the conference officially begins.
In addition to this index, we encourage readers to explore our related resources: ICLR-2026 papers & highlights: For curated summaries and key takeaways from this year’s conference. “Best Paper” Digest (ICLR): A historical overview of the most influential ICLR papers published since 2018.
Since 2018, Paper Digest has built a foundation of data spanning decades of conferences, journals, and research topics. The platform features a daily digest service that sifts through tens of thousands of new papers, clinical trials, news articles, and community posts, filtering the noise to highlight what matters most to specific interests. Beyond daily updates, dozens of built-in research tools streamline the academic workflow, supporting efficient reading and writing, comprehensive literature reviews, and automated research report generation.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: ICLR 2026 Papers with Code & Data
| Paper | Author(s) | Code | |
|---|---|---|---|
| 1 | Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Cosmos Policy, a simple approach for adapting a large pretrained video model (Cosmos-Predict2) into an effective robot policy through a single stage of post-training on the robot demonstration data collected on the target platform, with no architectural modifications. |
Moo Jin Kim; Yihuai Gao; Tsung-Yi Lin; Yen-Chen Lin; Yunhao Ge; Grace Lam; Percy Liang; Shuran Song; Ming-Yu Liu; Chelsea Finn; Jinwei Gu; | code |
| 2 | Revisiting Multimodal Positional Encoding in Vision–Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through extensive experiments, we identify three key guidelines: positional coherence, full frequency utilization, and preservation of textual priors—ensuring unambiguous layout, rich representation, and faithful transfer from the pre-trained LLM. Based on these insights, we propose Multi-Head RoPE (MHRoPE) and MRoPE-Interleave (MRoPE-I), two simple and plug-and-play variants that require no architectural changes. |
Jie Huang; Xuejing Liu; Sibo Song; RuiBing Hou; Hong Chang; Junyang Lin; Shuai Bai; | code |
| 3 | A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present A$^2$Search, an annotation-free, end-to-end training framework to recognize and handle ambiguity. |
Fengji Zhang; Xinyao Niu; Chengyang Ying; Guancheng Lin; Zhongkai Hao; Zhou Fan; Chengen Huang; Jacky Keung; Bei Chen; Junyang Lin; | code |
| 4 | SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The comparison reveals significant inconsistencies between these metrics, making them impractical and inaccurate criteria for guiding SBDD methods toward synthesizable drug design. Therefore, we propose a simple yet effective SE(3)-invariant \textit{\underline{SYN}thesizability \underline{C}lassifier} (SYNC) to enable better synthesizability estimation in SBDD, which demonstrates superior generalizability and speed compared to existing metrics on five curated datasets. |
Yunfan Liu; Lirong Wu; Zhifeng Gao; Yufei Huang; Cheng Tan; Haitao Lin; Zicheng Liu; Changxi Chi; Chang Yu; Stan Z. Li; | code |
| 5 | Lyra: Generative 3D Scene Reconstruction Via Video Diffusion Model Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a self-distillation framework that aims to distill the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation, eliminating the need for multi-view training data. |
Sherwin Bahmani; Tianchang Shen; Jiawei Ren; Jiahui Huang; Yifeng Jiang; Haithem Turki; Andrea Tagliasacchi; David B. Lindell; Zan Gojcic; Sanja Fidler; Huan Ling; Jun Gao; Xuanchi Ren; | code |
| 6 | Ctrl-World: A Controllable Generative World Model for Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we make a step forward by introducing a controllable multi-view world model that can be used to evaluate and improve the instruction-following ability of generalist robot policies. |
Yanjiang Guo; Lucy Xiaoyang Shi; Jianyu Chen; Chelsea Finn; | code |
| 7 | 3D Aware Region Prompted Vision Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Spatial Region 3D (SR-3D) aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. |
An-Chieh Cheng; Yang Fu; Yukang Chen; Zhijian Liu; Xiaolong Li; Subhashree Radhakrishnan; Song Han; Yao Lu; Jan Kautz; Pavlo Molchanov; Hongxu Yin; Xiaolong Wang; Sifei Liu; | code |
| 8 | Scaling Up Memory for Robotic Control Via Experience Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hierarchical policy framework, where the high-level policy is trained to select and track previous task-relevant keyframes from its experience. |
Ajay Sridhar; Jennifer Pan; Satvik Sharma; Chelsea Finn; | code |
| 9 | Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Detecting these semantic-level anomalies is essential for assessing the trustworthiness of AIGC media, especially in AIGC image analysis, explainable deepfake detection and semantic authenticity assessment.In this paper, we formalize \textbf{semantic anomaly detection and reasoning} for AIGC images and introduce \textbf{AnomReason}, a large-scale benchmark with structured annotations as quadruples \emph{(Name, Phenomenon, Reasoning, Severity)}.We will release code, metrics, data, and task-aligned models to support reproducible research on semantic authenticity and interpretable AIGC forensics. |
Chuangchuang Tan; Xiang Ming; Jinglu Wang; Renshuai Tao; Bin Li; Yunchao Wei; Yao Zhao; Yan Lu; | code |
| 10 | ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present ChronoEdit, a framework that reframes image editing as a video generation problem. |
Jay Zhangjie Wu; Xuanchi Ren; Tianchang Shen; Tianshi Cao; Kai He; Yifan Lu; Ruiyuan Gao; Enze Xie; Shiyi Lan; Jose M. Alvarez; Jun Gao; Sanja Fidler; Zian Wang; Huan Ling; | code |
| 11 | YuE: Scaling Open Foundation Models for Long-Form Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We tackle the task of long-form music generation, particularly the challenging \textbf{lyrics-to-song} problem, by introducing \textbf{YuE (乐)}, a family of open-source music generation foundation models. |
Ruibin Yuan; Hanfeng Lin; Shuyue Guo; Ge Zhang; Jiahao Pan; Yongyi Zang; Haohe Liu; Yiming Liang; Wenye Ma; Xingjian Du; Xeron Du; Zhen Ye; Tianyu Zheng; Zhengxuan Jiang; Yinghao Ma; Minghao Liu; Zeyue Tian; Ziya Zhou; Liumeng Xue; Xingwei Qu; Yizhi LI; Shangda Wu; Tianhao Shen; Ziyang Ma; Jun Zhan; Chunhui Wang; Yatian Wang; Xiaowei Chi; Xinyue Zhang; Zhenzhu Yang; XiangzhouWang; Shansong Liu; Lingrui Mei; Peng Li; Junjie Wang; Jianwei Yu; Guojian Pang; Xu Li; Zihao Wang; Xiaohuan Zhou; Lijun Yu; Emmanouil Benetos; Yong Chen; Chenghua Lin; Xie Chen; Gus Xia; Zhaoxiang Zhang; Chao Zhang; Wenhu Chen; Xinyu Zhou; Xipeng Qiu; Roger Dannenberg; Jiaheng Liu; Jian Yang; Wenhao Huang; Wei Xue; Xu Tan; Yike Guo; | code |
| 12 | Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on a systematic pipeline of information-based feature selection and additive feature modeling, we introduce RAGLens, a lightweight hallucination detector that accurately flags unfaithful RAG outputs using LLM internal representations. |
Guangzhi Xiong; Zhenghao He; Bohan Liu; Sanchit Sinha; Aidong Zhang; | code |
| 13 | TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, massive model sizes demand heavy compute, restricting practical deployments and real-time applications. To address this, we propose TSPulse, an ultra-light pre-trained model (1M parameters) that performs disentangled masked reconstruction across spaces and abstraction levels, explicitly learning three disentangled views: temporal embeddings for fine-grained time analysis, spectral embeddings for frequency-aware fidelity, and semantic embeddings for high-level task understanding. |
Vijay Ekambaram; Subodh Kumar; Arindam Jati; Sumanta Mukherjee; Tomoya Sakai; Pankaj Dayama; Wesley M. Gifford; Jayant Kalagnanam; | code |
| 14 | Music Flamingo: Scaling Music Understanding in Audio Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Music Flamingo, a novel large audio–language model, designed to advance music (including song) understanding in foundational audio models.We believe this work provides both a benchmark and a foundation for the community to build the next generation of models that engage with music as richly and meaningfully as humans do. |
Sreyan Ghosh; Arushi Goel; Lasha Koroshinadze; Sang-gil Lee; Zhifeng Kong; Joao Felipe Santos; Ramani Duraiswami; Dinesh Manocha; Wei Ping; Mohammad Shoeybi; Bryan Catanzaro; | code |
| 15 | Mixture of Contexts for Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We recast long-context video generation as an internal information retrieval task and propose a simple, learnable sparse attention routing module, Mixture of Contexts (MoC), as an effective long-term memory retrieval engine. |
Shengqu Cai; Ceyuan Yang; Lvmin Zhang; Yuwei Guo; Junfei Xiao; Ziyan Yang; Yinghao Xu; Zhenheng Yang; Alan Yuille; Leonidas Guibas; Maneesh Agrawala; Lu Jiang; Gordon Wetzstein; | code |
| 16 | GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that the interpretable nature of language often provides a much richer learning medium for LLMs, compared to policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. |
Lakshya A Agrawal; Shangyin Tan; Dilara Soylu; Noah Ziems; Rishi Khare; Krista Opsahl-Ong; Arnav Singhvi; Herumb Shandilya; Michael J Ryan; Meng Jiang; Christopher Potts; Koushik Sen; Alex Dimakis; Ion Stoica; Dan Klein; Matei Zaharia; Omar Khattab; | code |
| 17 | CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, Constructing such a benchmark is challenging owing to sparse interoperating code in real-world multi-programming-language projects, diverse Inter-process Communication (IPC) mechanisms, vast Foreign Function Interface (FFI) language pairs, and the difficulty of evaluation. To address this gap, we introduce CrossPL, the first benchmark for systematically assessing LLM performance of CPL code generation across two primary interoperation modes and 2534 tasks, specifically 1,982 IPC tasks spanning six languages and 522 Python–C FFI tasks. |
zhanhang xiong; Dongxia Wang; Yuekang Li; Xinyuan An; Wenhai Wang; | code |
| 18 | Does FLUX Already Know How to Perform Physically Plausible Image Composition? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SHINE, a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors. |
Shilin Lu; Zhuming Lian; Zihan Zhou; Shaocong Zhang; Chen Zhao; Adams Wai-Kin Kong; | code |
| 19 | Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce LGTM (Less Gaussians, Texture More), a feed-forward and pose-free framework that predicts both compact geometric primitives and associated per-primitive texture maps in a single forward pass without per-scene optimization. |
Yixing Lao; Xuyang BAI; Xiaoyang Wu; Nuoyuan Yan; Zixin Luo; Tian Fang; Jean-Daniel Nahmias; Yanghai Tsin; Shiwei Li; Hengshuang Zhao; | code |
| 20 | UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce UFO-4D, a unified feedforward framework to reconstruct a dense, explicit 4D representation from just a pair of unposed images. |
Junhwa Hur; Charles Herrmann; Songyou Peng; Philipp Henzler; Zeyu Ma; Todd Zickler; Deqing Sun; | code |
| 21 | Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. |
Zihan Wang; Jiashun Wang; Jeff Tan; Yiwen Zhao; Jessica K. Hodgins; Shubham Tulsiani; Deva Ramanan; | code |
| 22 | AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making Via Multi-Turn RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the open-source community currently lacks a unified RL framework capable of training agents from scratch across diverse and realistic environments. To bridge this gap, we introduce AgentGym-RL, a modular and decoupled framework specifically designed for RL-based agent in multi-turn decision-making tasks. |
Zhiheng Xi; Jixuan Huang; Chenyang Liao; Baodai Huang; Jiaqi Liu; Honglin Guo; yajie yang; Rui Zheng; Junjie Ye; Jiazheng Zhang; Wenxiang Chen; Wei He; Yiwen Ding; Guanyu Li; Zehui Chen; Zhengyin Du; Xuesong Yao; Yufei Xu; Jiecao Chen; Tao Gui; Zuxuan Wu; Qi Zhang; Xuanjing Huang; Yu-Gang Jiang; | code |
| 23 | ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A key limitation, however, is their failure to learn from this accumulated experience, forcing them to discard valuable insights and repeat past errors. Unlike prior works that primarily store raw experience or successful routines, we propose ReasoningBank, a novel memory framework that allows an agent to self-curate generalizable reasoning strategies from both its successful and failed experiences for future leverage. |
Siru Ouyang; Jun Yan; I-Hung Hsu; Yanfei Chen; Ke Jiang; Zifeng Wang; Rujun Han; Long Le; Samira Daruki; Xiangru Tang; Vishy Tirumalashetty; George Lee; Mahsan Rofouei; Hangfei Lin; Jiawei Han; Chen-Yu Lee; Tomas Pfister; | code |
| 24 | VADv2: End-to-End Vectorized Autonomous Driving Via Probabilistic Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a probabilistic planning model for end-to-end autonomous driving, termed VADv2.We also provide comprehensive evaluations on the NAVSIM dataset and a large-scale 3DGS-based benchmark, demonstrating its effectiveness in real-world applications. |
Bo Jiang; Shaoyu Chen; Hao Gao; Bencheng Liao; Qian Zhang; Wenyu Liu; Xinggang Wang; | code |
| 25 | Deep Think with Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. |
Yichao Fu; Xuewei Wang; Hao Zhang; Yuandong Tian; Jiawei Zhao; | code |
| 26 | Uniform Discrete Diffusion with Metric Path for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit discrete generative modeling and present Uniform discRete diffuSion with metric pAth (URSA), a simple yet powerful framework that bridges the gap with continuous approaches for the scalable video generation. |
Haoge Deng; Ting Pan; Fan Zhang; Yang Liu; Zhuoyan Luo; Yufeng Cui; Wenxuan Wang; Chunhua Shen; Shiguang Shan; Zhaoxiang Zhang; Xinlong Wang; | code |
| 27 | AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce AlphaBench, the first systematic benchmark for evaluating LLMs in FAFM. |
Haochen Luo; Ho Tin Ko; Jiandong Chen; David Sun; Yuan Zhang; Chen Liu; | code |
| 28 | Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In parallel, text describing the visual world proves crucial, though its performance impact saturates rapidly. Leveraging these insights, we propose a data-centric recipe for pre-training vision-aware LLMs and verify it in 1T token scale pre-training. |
Junlin Han; Shengbang Tong; David Fan; Yufan Ren; Koustuv Sinha; Philip Torr; Filippos Kokkinos; | code |
| 29 | Locality-Attending Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we seek to enhance the segmentation performance of vision transformers after being trained using the usual image-level classification objective. |
Sina Hajimiri; Farzad Beizaee; Fereshteh Shakeri; Christian Desrosiers; Ismail Ben Ayed; Jose Dolz; | code |
| 30 | ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. |
Hanyu Lai; Xiao Liu; Yanxiao Zhao; Han Xu; Hanchen Zhang; Bohao Jing; Yanyu Ren; Shuntian Yao; Yuxiao Dong; Jie Tang; | code |
| 31 | YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present YoNoSplat, a feedforward model that reconstructs high-quality 3D Gaussian Splatting representations from an arbitrary number of images. |
Botao Ye; Boqi Chen; Haofei Xu; Daniel Barath; Marc Pollefeys; | code |
| 32 | A Noise Is Worth Diffusion Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a noise refinement framework where a refining network is trained to minimize the difference between images generated by unguided sampling from the refined noise and those produced by guided sampling from the input Gaussian noise. |
Donghoon Ahn; Jiwon Kang; Sanghyun Lee; Jaewon Min; Minjae Kim; Wooseok Jang; Hyoungwon Cho; Sayak Paul; SeonHwa Kim; Eunju Cha; Kyong Hwan Jin; Seungryong Kim; | code |
| 33 | Learning to Reason Without External Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data. We propose Intuitor, an RLIF method that uses a model’s own confidence—termed self-certainty—as its sole reward signal. |
Xuandong Zhao; Zhewei Kang; Aosong Feng; Sergey Levine; Dawn Song; | code |
| 34 | Geometric-Mean Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we propose Geometric-Mean Policy Optimization (GMPO), with the aim to improve the stability of GRPO through suppressing token reward outliers. |
Yuzhong Zhao; Yue Liu; Junpeng Liu; Jingye Chen; Xun Wu; Yaru Hao; Tengchao Lv; Shaohan Huang; Lei Cui; Qixiang Ye; Fang Wan; Furu Wei; | code |
| 35 | FlashDLM: Accelerating Diffusion Language Model Inference Via Efficient KV Caching and Guided Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, parallel token generation introduces token incoherence problems, and current sampling heuristics suffer from significant quality drops with decreasing denoising steps. We address these limitations with two training-free techniques. |
Zhanqiu Hu; Jian Meng; Yash Akhauri; Mohamed S. Abdelfattah; Jae-sun Seo; Zhiru Zhang; Udit Gupta; | code |
| 36 | MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To overcome these shortcomings, we introduce MomaGraph, a unified scene representation for embodied agents that integrates spatial-functional relationships and part-level interactive elements. |
Yuanchen Ju; Yongyuan Liang; Yen-Jen Wang; Gireesh Nandiraju; Yuanliang Ju; Seungjae Lee; Qiao Gu; Elvis Hsieh; Furong Huang; Koushil Sreenath; | code |
| 37 | Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that interleaved image-text inputs offer richer and less biased context and enable robots to better handle unseen tasks with more versatile human-robot interaction. Building on this insight, we introduce Interleave-VLA, a robot learning paradigm extending interleaved image-text instructions from digital world to directly generating continuous action sequences in the physical world. |
Cunxin Fan; Xiaosong Jia; Yihang Sun; Yixiao Wang; Jianglan Wei; Ziyang Gong; Xiangyu Zhao; Masayoshi Tomizuka; Xue Yang; Junchi Yan; Mingyu Ding; | code |
| 38 | VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we introduce VideoPhy-2, an action-centric dataset for evaluating physical commonsense in generated videos.We will release the dataset, videos, auto-rater model, and code in the camera-ready version. |
Hritik Bansal; Clark Peng; Yonatan Bitton; Roman Goldenberg; Aditya Grover; Kai-Wei Chang; | code |
| 39 | Manipulation As in Simulation: Enabling Accurate Geometry Perception in Robots Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Camera Depth Models (CDMs) as a simple plugin on daily-use depth cameras, which take RGB images and raw depth signals as input and output denoised, accurate metric depth. |
Minghuan Liu; Zhengbang Zhu; Xiaoshen Han; PengHu; Haotong Lin; Xinyao Li; Jingxiao Chen; Jiafeng Xu; Yichu Yang; Yunfeng Lin; Xinghang Li; Yong Yu; Weinan Zhang; Tao Kong; Bingyi Kang; | code |
| 40 | BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose BFM-Zero, a framework that learns an effective shared latent representation that embeds motions, goals, and rewards into a common space, enabling a single policy to be prompted for multiple downstream tasks without retraining. |
Yitang Li; Zhengyi Luo; Tonghe Zhang; Cunxi Dai; Anssi Kanervisto; Andrea Tirinzoni; Haoyang Weng; Kris Kitani; Mateusz Guzek; Ahmed Touati; Alessandro Lazaric; Matteo Pirotta; Guanya Shi; | code |
| 41 | BOAD: Discovering Hierarchical Software Engineering Agents Via Bandit Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by how human engineers decompose problems into sub-tasks, we argue that SWE agents should be structured as orchestrators coordinating specialized sub-agents, each responsible for a specific sub-task such as bug reproduction, fault localization, code modification, or validation. |
Iris Xu; Guangtao Zeng; Zexue He; Charles Jin; Aldo Pareja; Dan Gutfreund; Chuang Gan; Zhang-Wei Hong; | code |
| 42 | AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AgentSynth, a scalable and cost-efficient pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. |
Jingxu Xie; Dylan Xu; Xuandong Zhao; Dawn Song; | code |
| 43 | Agentic Reinforced Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current RL algorithms typically employ trajectory-level rollout sampling, consistently neglecting the fine-grained exploration of multi-turn tool-call steps. To bridge this gap, we propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents. |
Guanting Dong; Hangyu Mao; Kai Ma; Licheng Bao; Yifei Chen; Zhongyuan Wang; Zhongxia Chen; Jiazhen Du; Huiyang Wang; Fuzheng Zhang; Guorui Zhou; Yutao Zhu; Ji-Rong Wen; Zhicheng Dou; | code |
| 44 | WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although multiple prior works have proposed efficient SVD variants to enable low-rank operations, we find that in practice it remains difficult to achieve substantial latency reduction during model execution. To address this limitation, we introduce a new computational pattern and apply SVD at a finer granularity, enabling real and measurable improvements in execution latency. |
Haiyu Wang; Yutong Wang; Jack Jiang; Sai Qian Zhang; | code |
| 45 | Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this issue, we propose the reasoning MLLM, Vision-R1, to improve multimodal reasoning capability.Specifically, we first construct a high-quality multimodal CoT dataset without human annotations by leveraging an existing MLLM and DeepSeek-R1 through modality bridging and data filtering to obtain a 200K multimodal CoT dataset, Vision-R1-cold dataset. |
Wenxuan Huang; Bohan Jia; Shaosheng Cao; Zheyu Ye; Fei zhao; Zhe Xu; Yao Hu; Shaohui Lin; | code |
| 46 | Interleaving Reasoning for Better Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Interleaving Reasoning Generation (IRG), a framework that alternates between text-based thinking and image synthesis: the model first produces a text-based thinking to guide an initial image, then reflects on the result to refine fine-grained details, visual quality, and aesthetics while preserving semantics. |
Wenxuan Huang; Shuang Chen; Zheyong Xie; Shaosheng Cao; SHIXIANG TANG; Yufan Shen; Qingyu Yin; Wenbo Hu; Xiaoman Wang; Yuntian Tang; Junbo Qiao; Hangyu Guo; Yao Hu; Zhenfei Yin; Philip Torr; Yu Cheng; Wanli Ouyang; Shaohui Lin; | code |
| 47 | CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent rotation-based smoothing techniques alleviate the problem by redistributing outlier magnitudes, residual errors remain and continue to impede reliable low-precision deployment. In this work, we tackle this challenge by introducing a unified quantization-and-clustering scheme that contains smoothing activation outliers via learnable rotation and absorbing weight outliers into fine-tuned cluster centroids for MoE. |
Xiangyang Yin; Xingyu Liu; Tianhua Xia; BO BAO; Vithursan Thangarasa; Valavan Manohararajah; Eric Sather; Sai Qian Zhang; | code |
| 48 | From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AutoExperiment, a benchmark that evaluates AI agents’ ability to implement and run machine learning experiments based on natural language descriptions in research papers. |
Gyeongwon James Kim; Alex Wilf; Louis-Philippe Morency; Daniel Fried; | code |
| 49 | NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms. |
Run Luo; Xiaobo Xia; Lu Wang; Longze Chen; Renke Shan; Jing Luo; Min Yang; Tat-Seng Chua; | code |
| 50 | When AI Agents Collude Online: Financial Fraud Risks By Collaborative LLM Agents on Social Platforms Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the risks of collective financial fraud in large-scale multi-agent systems, driven by large language model (LLM) agents. |
Qibing Ren; Zhijie Zheng; Jiaxuan Guo; Junchi Yan; Lizhuang Ma; Jing Shao; | code |
| 51 | Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, an overlooked yet potentially powerful question is: can one leverage auxiliary $\textit{unpaired}$ multimodal data to directly enhance representation learning in a $\textit{target}$ modality? We introduce $\textbf{UML}$: $\textbf{U}$npaired $\textbf{M}$ultimodal $\textbf{L}$earner, a modality-agnostic training paradigm in which a single model alternately processes inputs from different modalities while sharing parameters across them. |
Sharut Gupta; Shobhita Sundaram; Chenyu Wang; Stefanie Jegelka; Phillip Isola; | code |
| 52 | Prosperity Before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit this challenge and uncover a \emph{prosperity-before-collapse} phenomenon: stale data can be as informative as on-policy data if exploited properly. Building on this insight, we introduce M2PO (Second-Moment Trust Proxy Optimization), which constrains the second moment of importance weights to suppress only extreme outliers while preserving informative updates. |
Haizhong Zheng; Jiawei Zhao; Beidi Chen; | code |
| 53 | RobotArena $\infty$: Scalable Robot Benchmarking Via Real-to-Sim Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As policies expand in scope and complexity, these barriers only intensify, since defining “success in robotics often hinges on nuanced human judgments of execution quality. In this paper, we introduce a new benchmarking framework that overcomes these challenges by shifting VLA evaluation into large-scale simulated environments augmented with online human feedback. |
Yash Jangir; Yidi Zhang; Kashu Yamazaki; Chenyu Zhang; Kuan-Hsun Tu; Tsung-Wei Ke; Lei Ke; Yonatan Bisk; Katerina Fragkiadaki; | code |
| 54 | SealQA: Raising The Bar for Reasoning in Search-Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SealQA, a challenge benchmark for evaluating SEarch-Augmented Language models on fact-seeking questions where web search yields conflicting, noisy, or unhelpful results. |
Thinh Pham; Nguyen Phan Nguyen; Pratibha Zunjare; Weiyuan Chen; Yu-Min Tseng; Tu Vu; | code |
| 55 | Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a theoretical analysis of how a common workaround, training for multiple epochs on the same dataset, reshapes the data scaling laws. |
Tingkai Yan; Haodong Wen; Binghui Li; Kairong Luo; Wenguang Chen; Kaifeng Lyu; | code |
| 56 | TTT3R: 3D Reconstruction As Test-Time Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. |
Xingyu Chen; Yue Chen; Yuliang Xiu; Andreas Geiger; Anpei Chen; | code |
| 57 | FALCON: Few-step Accurate Likelihoods for Continuous Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Few-step Accurate Likelihoods for Continuous Flows (FALCON), a method which allows for few-step sampling with a likelihood accurate enough for importance sampling applications by introducing a hybrid training objective that encourages invertibility. |
Danyal Rehman; Tara Akhound-Sadegh; Artem Gazizov; Yoshua Bengio; Alexander Tong; | code |
| 58 | GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present GoT-R1, a framework that applies reinforcement learning to enhance semantic-spatial reasoning in autoregressive visual generation models. |
Chengqi Duan; Rongyao Fang; Yuqing Wang; Kun Wang; Linjiang Huang; Xingyu Zeng; Hongsheng Li; Xihui Liu; | code |
| 59 | MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present an online agentic reinforcement learning framework MOBILERL to enhance GUI agents in mobile environments. |
Yifan Xu; Xiao Liu; Xinghan Liu; Jiaqi Fu; Jiayu Huang; Hanchen Zhang; Bohao Jing; Shudan Zhang; Yuting Wang; Zhao wenyi; Yuxiao Dong; | code |
| 60 | Spatially Guided Training for Vision-Language-Action Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SP-VLA, a dual-system **V**ision–**L**anguage–**A**ction framework that leverages **S**patial **P**riors as a bridge between linguistic instructions and embodiment-specific control.We will release code, data, and model checkpoints to support future research. |
Jinhui Ye; Fangjing Wang; Ning Gao; Junqiu Yu; Zhu Yangkun; Bin Wang; Jinyu Zhang; Weiyang Jin; Yanwei Fu; Feng Zheng; Yilun Chen; Jiangmiao Pang; | code |
| 61 | Large Scale Diffusion Distillation Via Score-Regularized Continuous-Time Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our investigation reveals fundamental quality limitations of sCM in fine-detail generation, which we attribute to error accumulation and the “mode-covering” nature of its forward-divergence objective. To remedy this, we propose the score-regularized continuous-time consistency model (rCM), which incorporates score distillation as a long-skip regularizer. |
Kaiwen Zheng; Yuji Wang; Qianli Ma; Huayu Chen; Jintao Zhang; Yogesh Balaji; Jianfei Chen; Ming-Yu Liu; Jun Zhu; Qinsheng Zhang; | code |
| 62 | EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we use Apple Vision Pro to collect EgoDex: the largest and most diverse dataset of dexterous human manipulation to date. |
Ryan Hoque; Peide Huang; David J. Yoon; Mouli sivapurapu; Jian Zhang; | code |
| 63 | Should We Still Pretrain Encoders with Masked Language Modeling? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it remains unclear whether these gains reflect an inherent advantage of the CLM approach or arise from confounding factors such as model and data scale. In this paper, we address this question through a series of large-scale, carefully controlled pretraining ablations, training a total of 38 models ranging from 210 million to 1 billion parameters, and conducting over 15,000 fine-tuning and evaluation runs. |
Hippolyte Gisserot-Boukhlef; Nicolas Boizard; Manuel Faysse; Duarte Miguel Alves; Emmanuel Malherbe; Andre Martins; CELINE HUDELOT; Pierre Colombo; | code |
| 64 | Prompt and Parameter Co-Optimization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, prior work has typically studied them in isolation, leaving their synergistic potential largely underexplored. To bridge this gap, in this paper, we introduce MetaTuner, a novel framework that jointly integrates prompt optimization and fine-tuning for LLM training. |
Xiaohe Bo; Rui Li; Zexu Sun; Quanyu Dai; Zeyu Zhang; Zihang Tian; Xu Chen; Zhenhua Dong; | code |
| 65 | LiveClin: A Live Clinical Benchmark Without Leakage Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The reliability of medical LLM evaluation is critically undermined by data contamination and knowledge obsolescence, leading to inflated scores on static benchmarks. To address these challenges, we introduce LiveClin, a live benchmark designed for the faithful replication of clinical practice. |
Xidong Wang; Guo shuqi; Yue Shen; Junying Chen; Jian Wang; Jinjie Gu; Ping Zhang; Lei Liu; Benyou Wang; | code |
| 66 | MobileLLM-R1: Exploring The Limits of Sub-Billion Language Model Reasoners with Open Training Recipes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit the necessity of scaling to extremely large corpora (>10T tokens) for reasoning emergence. |
Changsheng Zhao; Ernie Chang; Zechun Liu; Chia-Jung Chang; Wei Wen; Chen Lai; Sheng Cao; Yuandong Tian; Raghuraman Krishnamoorthi; Yangyang Shi; Vikas Chandra; | code |
| 67 | Learn to Reason Efficiently with Adaptive Length-based Reward Shaping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate RL-based approaches to promote reasoning efficiency. |
Wei Liu; Ruochen Zhou; Yiyun Deng; Yuzhen Huang; Junteng Liu; Yuntian Deng; Yizhe Zhang; Junxian He; | code |
| 68 | Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Surprisingly, we reveal a new and concerning risk along with the practice: the provider of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to the fine-tuned downstream model. |
Zhexin Zhang; Yuhao Sun; Junxiao Yang; Shiyao Cui; yuanchao zhang; Hongning Wang; Minlie Huang; | code |
| 69 | Much Ado About Noising: Dispelling The Myths of Generative Robotic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. |
Chaoyi Pan; Giri Anantharaman; Nai-Chieh Huang; Claire Jin; Daniel Pfrommer; Chenyang Yuan; Frank Permenter; Guannan Qu; Nicholas Matthew Boffi; Guanya Shi; Max Simchowitz; | code |
| 70 | LogicReward: Incentivizing LLM Reasoning Via Step-Wise Logical Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior work introduces supervision on intermediate steps but still lacks guarantees of logical soundness, which is crucial in high-stakes scenarios where logical consistency is paramount. To address this, we propose LogicReward, a novel reward system that guides model training by enforcing step-level logical correctness with a theorem prover. |
Jundong Xu; Hao Fei; Huichi Zhou; Xin Quan; Qijun Huang; Shengqiong Wu; William Yang Wang; Mong-Li Lee; Wynne Hsu; | code |
| 71 | AlphaFlow: Understanding and Improving MeanFlow Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that the MeanFlow objective naturally decomposes into two parts: trajectory flow matching and trajectory consistency. |
Huijie Zhang; Aliaksandr Siarohin; Willi Menapace; Michael Vasilkovsky; Sergey Tulyakov; Qing Qu; Ivan Skorokhodov; | code |
| 72 | Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a simple yet effective approach to mitigate quality degradation in long-horizon video generation without requiring supervision from long-video teachers or retraining on long video datasets. |
Justin Cui; Jie Wu; Ming Li; Tao Yang; Xiaojie Li; Rui Wang; Andrew Bai; Yuanhao Ban; Cho-Jui Hsieh; | code |
| 73 | EIP: Weighted Ranking of LLMs By Quantifying Question Difficulty Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing benchmarks fail to differentiate question difficulty, limiting their ability to effectively distinguish models’ capabilities. To address this limitation, we propose RankLLM, a novel framework designed to quantify both question difficulty and model competency. |
Xingjian Hu; Ziqian Zhang; Yue Huang; Kai Zhang; Ruoxi Chen; Yixin Liu; Qingsong Wen; Kaidi Xu; Xiangliang Zhang; Neil Zhenqiang Gong; Lichao Sun; | code |
| 74 | One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling a broad and diverse suite of tasks, gradient conflicts and the loss of model plasticity often constrain their sample efficiency. In this work, we address these challenges from two complementary perspectives: the single learning iteration and the overall learning process. |
Yuan Pu; Yazhe Niu; Jia Tang; Junyu Xiong; Shuai Hu; Hongsheng Li; | code |
| 75 | DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This constraint severely degrades code infilling performance when the predefined mask size mismatches the ideal completion length. To address this, we propose DreamOn, a novel diffusion framework that enables dynamic, variable-length generation. |
Zirui Wu; Lin Zheng; Zhihui Xie; Jiacheng Ye; Jiahui Gao; Shansan Gong; Yansong Feng; Zhenguo Li; Wei Bi; Guorui Zhou; Lingpeng Kong; | code |
| 76 | Towards Efficient Constraint Handling in Neural Solvers for Routing Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present Construct-and-Refine (CaR), the first general and efficient constraint-handling framework for neural routing solvers based on explicit learning-based feasibility refinement. |
Jieyi Bi; Zhiguang Cao; Jianan Zhou; Wen Song; Yaoxin Wu; Jie Zhang; Yining Ma; Cathy Wu; | code |
| 77 | Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a systematic and comprehensive investigation of omni detailed perception from the perspectives of the data pipeline, models, and benchmark. |
Ziyang Ma; Ruiyang Xu; Zhenghao Xing; Yunfei Chu; Yuxuan Wang; Jinzheng He; Jin Xu; Pheng-Ann Heng; Kai Yu; Junyang Lin; Eng Siong Chng; Xie Chen; | code |
| 78 | AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To better address the trade-off between safety and utility, we present a theoretically grounded and empirically effective activation steering method called AlphaSteer. |
Leheng Sheng; Changshuo Shen; Weixiang Zhao; Junfeng Fang; Xiaohao Liu; Zhenkai Liang; Xiang Wang; An Zhang; Tat-Seng Chua; | code |
| 79 | TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A key challenge in aligning TTA models lies in creating preference pairs, as TTA lacks structured mechanisms like verifiable rewards or gold-standard answers available for Large Language Models (LLMs). To address this, we propose CLAP-Ranked Preference Optimization (CRPO), a novel framework that iteratively generates and optimizes preference data to enhance TTA alignment. |
Chia-Yu Hung; Navonil Majumder; Zhifeng Kong; Ambuj Mehrish; Amir Zadeh; Chuan Li; Rafael Valle; Bryan Catanzaro; Soujanya Poria; | code |
| 80 | Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation arises from entangled features in overlapping regions, leading to degraded visual fidelity. To address this, we present RoboMaster, a novel framework that models inter-object dynamics via a collaborative trajectory formulation. |
Xiao Fu; Xintao Wang; Xian Liu; Jianhong Bai; Runsen Xu; Pengfei Wan; Di ZHANG; Dahua Lin; | code |
| 81 | Flow Caching for Autoregressive Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present FlowCache, the first caching framework specifically designed for autoregressive video generation. |
Yuexiao Ma; Xuzhe Zheng; Jing Xu; Xiwei Xu; Feng Ling; Xiawu Zheng; Huafeng Kuang; Huixia Li; XING WANG; Xuefeng Xiao; Fei Chao; Rongrong Ji; | code |
| 82 | Enhancing Visual Token Representations for Video Large Language Models Via Training-free Spatial-Temporal Pooling and Gridding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods, such as LLaVA family, utilize simplistic pooling or interpolation techniques that overlook the intricate dynamics of visual tokens. To bridge this gap, we propose ST-GridPool, a novel training-free visual token enhancement method designed specifically for Video LLMs. |
Bingjun Luo; Tony Wang; Hanqi Chen; Xinpeng Ding; | code |
| 83 | ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches largely overlook a critical dimension of video content, i.e., changes and turning points, and they lack a collaborative model for spatio-temporal relationships. To address this, we propose a new perspective: similarity is for identifying redundancy, while difference is for capturing key events. |
Bingjun Luo; Tony Wang; Chaoqi Chen; Xinpeng Ding; | code |
| 84 | Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel multimodal time series anomaly detection model (MindTS) that focuses on addressing two key challenges: (1) how to achieve semantically consistent alignment across heterogeneous multimodal data, and (2) how to filter out redundant modality information to enhance cross-modal interaction effectively. |
Shiyan Hu; Jianxin Jin; Yang Shu; Peng Chen; Bin Yang; Chenjuan Guo; | code |
| 85 | Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and the actual noise level encoded in intermediate states during sampling. |
Jincheng Zhong; Boyuan Jiang; Xin Tao; Pengfei Wan; Kun Gai; Mingsheng Long; | code |
| 86 | MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce *MathNet*, a large-scale, high-quality, multilingual, and multimodal dataset of Olympiad-level problems.We publicly release both the dataset and benchmark at http://mathnet.netlify.app/. |
Shaden Alshammari; Kevin Wen; Abrar Zainal; Mark Hamilton; Navid Safaei; Sultan Albarakati; William T. Freeman; Antonio Torralba; | code |
| 87 | TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As challenges remain in extracting 3D modal features and bridging the gap between different modalities, we propose TIGaussian, a framework that harnesses 3D Gaussian Splatting (3DGS) characteristics to strengthen cross-modality alignment through multi-branch 3DGS tokenizer and modality-specific 3D feature alignment strategies. |
Jiarun Liu; Qifeng Chen; Yiru Zhao; Minghua Liu; Baorui Ma; Sheng Yang; | code |
| 88 | Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This insight reveals a key potential issue in downstream multiple instance learning models: linear layers are geometry-agnostic and, as we show empirically, can distort the manifold geometry of the features. To address this, we propose the Manifold Residual (MR) block, a plug-and-play module that is explicitly geometry-aware. |
Conghao Xiong; Zhengrui Guo; Zhe Xu; Yifei Zhang; Raymond Kai-yu Tong; Si Yong Yeo; Hao Chen; Joseph JY Sung; Irwin King; | code |
| 89 | DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present DiffWind, a physics-informed differentiable framework that unifies wind–object interaction modeling, video-based reconstruction, and forward simulation. |
Yuanhang Lei; Boming Zhao; Zesong Yang; Xingxuan Li; Tao Cheng; Haocheng Peng; Ru Zhang; yang yang; Siyuan Huang; Yujun Shen; Ruizhen Hu; Hujun Bao; Zhaopeng Cui; | code |
| 90 | Visual Multi-Agent System: Mitigating Hallucination Snowballing Via Visual Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: It leads us to identify a subset of vision tokens with a unimodal attention peak in middle layers that best preserve visual evidence but gradually diminish in deeper agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we propose ViF, a lightweight, plug-and-play mitigation paradigm that relays inter-agent messages with Visual Flow powered by the selected visual relay tokens and applies attention reallocation to amplify this pattern. |
Xinlei Yu; Chengming Xu; Guibin Zhang; Yongbo He; Zhangquan Chen; Zhucun Xue; Jiangning Zhang; Yue Liao; Xiaobin Hu; Yu-Gang Jiang; Shuicheng YAN; | code |
| 91 | Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present STAR-MD (Spatio-Temporal Autoregressive Rollout for Molecular Dynamics), a scalable SE(3)-equivariant diffusion model that generates physically plausible protein trajectories over microsecond timescales. |
Nima Shoghi; Yuxuan Liu; Yuning Shen; Rob Brekelmans; Pan Li; Quanquan Gu; | code |
| 92 | Next Visual Granularity Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel approach to image generation by decomposing an image into a structured sequence, where each element in the sequence shares the same spatial resolution but differs in the number of unique tokens used, capturing different level of visual granularity. |
Yikai Wang; Zhouxia Wang; Zhonghua Wu; Qingyi Tao; Kang Liao; Chen Change Loy; | code |
| 93 | FACM: Flow-Anchored Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the Flow-Anchored Consistency Model (FACM), where a Flow Matching (FM) task serves as a dynamic anchor for the primary CM shortcut objective. |
Yansong Peng; Kai Zhu; Yu Liu; Pingyu Wu; Hebei Li; Xiaoyan Sun; Feng Wu; | code |
| 94 | Measuring and Mitigating Rapport Bias of Large Language Models Under Multi-Agent Social Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce KAIROS, a benchmark simulating quiz contests with peer agents of varying reliability, offering fine-grained control over conditions such as expert–novice roles, noisy crowds, and adversarial peers. |
Maojia Song; Tej Deep Pala; Ruiwen Zhou; Weisheng Jin; Amir Zadeh; Chuan Li; Dorien Herremans; Soujanya Poria; | code |
| 95 | MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Cognitive science suggests that humans rely on working memory to buffer short-lived representations for immediate control, while the hippocampal system preserves verbatim episodic details and semantic gist of past experience for long-term memory. Inspired by these mechanisms, we propose MemoryVLA, a Cognition-Memory-Action framework for long-horizon robotic manipulation. |
Hao Shi; Bin Xie; Yingfei Liu; Lin Sun; Fengrong Liu; Tiancai Wang; Erjin Zhou; Haoqiang Fan; Xiangyu Zhang; Gao Huang; | code |
| 96 | SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In scalable applications, fine-tuning-based methods are time-consuming to precisely erase multiple target concepts, while real-time editing-based methods often degrade the generation quality of non-target concepts due to conflicting optimization objectives. To address this dilemma, we introduce SPEED, an efficient concept erasure approach that directly edits model parameters. |
Ouxiang Li; Yuan Wang; Xinting Hu; Houcheng Jiang; Tao Liang; Yanbin Hao; Guojun Ma; Fuli Feng; | code |
| 97 | StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce **StreamSplat**, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner. |
Zike Wu; Qi Yan; Xuanyu Yi; Lele Wang; Renjie Liao; | code |
| 98 | BiasFreeBench: A Benchmark for Mitigating Bias in Large Language Model Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To enable consistent evaluation across debiasing methods and bridge this gap, we introduce **BiasFreeBench**, an empirical benchmark that comprehensively compares eight mainstream bias mitigation techniques (covering four prompting-based and four training-based methods) on two test scenarios (multi-choice QA and open-ended multi-turn QA) by reorganizing existing datasets into a unified query-response setting.We release our benchmark, aiming to establish a unified testbed for bias mitigation research [https://github.com/xxupiano/BiasFreeBench](https://github.com/xxupiano/BiasFreeBench). |
Xin Xu; Xunzhi He; Churan Zhi; Ruizhe Chen; Julian McAuley; Zexue He; | code |
| 99 | Composition-Grounded Data Synthesis for Visual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce COGS (COmposition-Grounded instruction Synthesis), a data-efficient framework for equipping MLLMs with advanced reasoning abilities from a small set of seed questions.We release the code and data at https://cogsynthesis.github.io. |
Xinyi Gu; Jiayuan Mao; Zhang-Wei Hong; Zhuoran Yu; Pengyuan Li; Dhiraj Joshi; Rogerio Feris; Zexue He; | code |
| 100 | VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite significant breakthroughs in text-based reasoning with large language models, multi-modal reasoning – especially for videos – remains limited. In this work, we fill this gap by introducing VideoMind, a novel video-language agent for temporal-grounded video reasoning. |
Ye Liu; Kevin Qinghong Lin; Chang Wen Chen; Mike Zheng Shou; | code |
| 101 | LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We formally prove that the outcome-only reward leads to exponentially vanishing gradients for the context grounding process, rendering learning intractable. To overcome this bottleneck, we introduce LongRLVR to augment the sparse answer reward with a dense and verifiable context reward. |
Guanzheng Chen; Michael Qizhe Shieh; Lidong Bing; | code |
| 102 | Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We aim to improve the reasoning capabilities of language models via reinforcement learning with verifiable rewards (RLVR). |
Shubham Parashar; Shurui Gui; Xiner Li; Hongyi Ling; Sushil Vemuri; Blake Olson; Eric Li; Yu Zhang; James Caverlee; Dileep Kalathil; Shuiwang Ji; | code |
| 103 | Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We give the first unified mathematical model that captures the core trade-offs between KV cache eviction and query routing. |
Fangzhou Wu; Sandeep Silwal; Qiuyi Zhang; | code |
| 104 | Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To fill these gaps, we present Orak, a benchmark for training and evaluating LLM agents across 12 popular video games spanning all major genres.We further release a fine-tuning dataset of expert LLM gameplay trajectories spanning multiple genres, turning general LLMs into effective game agents. |
Dongmin Park; Minkyu Kim; Beongjun Choi; Junhyuck Kim; Keon Lee; Jonghyun Lee; Inkyu Park; Byeong-Uk Lee; Jaeyoung Hwang; Jaewoo Ahn; Ameya Sunil Mahabaleshwarkar; Bilal Kartal; Pritam Biswas; Yoshi Suhara; Kangwook Lee; Jaewoong Cho; | code |
| 105 | Fastcar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on The Edge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With the insights, we propose **FastCar** to accelerate the decode phase for the AR video generation by exploring the temporal redundancy. |
Xuan Shen; Weize Ma; Yufa Zhou; Enhao Tang; Yanyue Xie; Zhengang Li; Yifan Gong; Quanyi Wang; Henghui Ding; Yiwei Wang; Pu Zhao; Jun Lin; Jiuxiang Gu; | code |
| 106 | ReFocusEraser: Refocusing for Small Object Removal with Robust Context-Shadow Repair Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose ReFocusEraser, a two-stage framework for small object removal that combines camera-adaptive zoom-in inpainting with robust context- and shadow-aware repair. |
Qingping Zheng; Bo Huang; Yang Liu; Haoyu Zhao; Ling Zheng; Zengmao Wang; Ying Li; Jiankang Deng; | code |
| 107 | PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) — functions that map sentences to scalar values. |
Jiahao Huo; Shuliang Liu; Bin Wang; Junyan Zhang; Yibo Yan; Aiwei Liu; Xuming Hu; Mingxun Zhou; | code |
| 108 | AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce AssetFormer, an autoregressive Transformer-based model designed to generate modular 3D assets from textual descriptions. |
Lingting Zhu; Shengju Qian; Haidi Fan; Jiayu Dong; Zhenchao Jin; SiweiZhou; Dong Gen; Xin Wang; Lequan Yu; | code |
| 109 | Discrete Diffusion for Bundle Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accordingly, we identify two technical challenges: 1) how to effectively and efficiently model the higher-order intra-bundle relations with the growth of bundle length; and 2) how to learn item embeddings that are sufficiently discriminative while maintaining a relatively smaller search space other than the huge item set. To address these challenges, we propose DDBC, a Discrete Diffusion model for Bundle Construction. |
Teng Tu; Ai Li; Yunshan Ma; Shuo Xu; Xiaohao Liu; Haokai Ma; Liang Pang; Tat-Seng Chua; | code |
| 110 | RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods face two main issues: resolution diversity, where resizing or padding distorts forensic traces and reduces efficiency, and the modality gap, as images and videos often require separate models. To address these challenges, we propose RelayFormer, a unified framework that adapts to varying resolutions and modalities. |
Wen Huang; Jiarui Yang; Tao Dai; Jiawei Li; Shaoxiong Zhan; Bin Wang; Shu-Tao Xia; | code |
| 111 | ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ABBA, a new PEFT architecture that reparameterizes the update as a Hadamard product of two independently learnable low-rank matrices. |
Raghav Singhal; Kaustubh Ponkshe; Rohit Vartak; Praneeth Vepakomma; | code |
| 112 | VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose VisionTrim, a unified framework for training-free MLLM acceleration, integrating two effective plug-and-play modules: 1) the Dominant Vision Token Selection (DVTS) module, which preserves essential visual tokens via global-local view, and 2) the Text-Guided Vision Complement (TGVC) module, which facilitates context-aware token merging guided by textual cues. |
Hanxun Yu; Wentong Li; Xuan Qu; Song Wang; Junbo Chen; Jianke Zhu; | code |
| 113 | Glance and Focus Reinforcement for Pan-cancer Screening Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by radiologists’ glance and focus diagnostic strategy, we introduce GF-Screen, a Glance and Focus reinforcement learning framework for pan-cancer screening. |
Linshan Wu; Jia-Xin Zhuang; Hao Chen; | code |
| 114 | RATE-DISTORTION OPTIMIZED PRAGMATIC COMMUNICATION FOR COLLABORATIVE PERCEPTION Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While prior work has explored the empirical trade-off between task performance and communication volume, a significant gap remains in the theoretical foundation. To fill this gap, we draw on information theory and introduce a pragmatic rate-distortion theory for multi-agent collaboration, specifically formulated to analyze performance-communication trade-off in goal-oriented multi-agent systems. |
Genjia Liu; Anning Hu; Yue Hu; Wenjun Zhang; Siheng Chen; | code |
| 115 | Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs’ General Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To obtain training data, we propose Code2Logic, a novel approach that adapts game code to synthesize game reasoning task data, thus obtaining the GameQA dataset of 30 games and 158 tasks with controllable difficulty gradation. |
Jingqi Tong; Jixin Tang; Hangcheng Li; Yurong Mou; Ming Zhang; Jun Zhao; Yanbo Wen; Fan Song; Jiahao Zhan; Yuyang Lu; Chaoran Tao; Zhiyuan Guo; Jizhou Yu; Tianhao Cheng; Zhiheng Xi; Changhao Jiang; Zhangyue Yin; Yining Zheng; Weifeng Ge; Guanhua Chen; Tao Gui; Xipeng Qiu; Qi Zhang; Xuanjing Huang; | code |
| 116 | Full-Graph Vs. Mini-Batch Training: Comprehensive Analysis from A Batch Size and Fan-Out Size Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our key contributions include: 1) We provide a novel generalization analysis using the Wasserstein distance to study the impact of the graph structure, especially the fan-out size. |
Mengfan Liu; Da Zheng; Junwei Su; Chuan Wu; | code |
| 117 | ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose ARTDECO, a unified framework that combines the efficiency of feed-forward models with the reliability of SLAM-based pipelines. |
Guanghao Li; Kerui Ren; Linning Xu; Zhewen Zheng; Changjian Jiang; Xin Gao; Bo Dai; Jian Pu; Mulin Yu; Jiangmiao Pang; | code |
| 118 | Parameters Vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In such cases, LLMs struggle to determine whether to rely more on their own parameters or the conflicted context. To address this, we propose CK-PLUG, a plug-and-play method for controlling LLMs’ reliance on parametric and contextual knowledge. |
Baolong Bi; Shenghua Liu; Yiwei Wang; Yilong Xu; Junfeng Fang; Lingrui Mei; Xueqi Cheng; | code |
| 119 | Towards Understanding The Nature of Attention with Low-Rank Sparse Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Low-Rank Sparse Attention (Lorsa), a sparse replacement model of Transformer attention layers to disentangle original Multi Head Self Attention (MHSA) into individually comprehensible components. |
Zhengfu He; Junxuan Wang; Rui Lin; Xuyang Ge; Wentao Shu; Qiong Tang; Junping Zhang; Xipeng Qiu; | code |
| 120 | Dynamic Speculative Agent Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, they provide minimal user control over the tradeoff between acceleration and other performance metrics. To address these gaps, we introduce **Dynamic Speculative Planning** (DSP), an asynchronous online reinforcement learning framework that provides lossless acceleration with substantially reduced costs without requiring additional pre-deployment preparation. |
Yilin Guan; Qingfeng Lan; Fei Sun; Dujian Ding; Devang Acharya; Chi Wang; William Yang Wang; Wenyue Hua; | code |
| 121 | LoongRL: Reinforcement Learning for Advanced Reasoning Over Long Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce LoongRL, a data-driven RL method for advanced long-context reasoning. |
Siyuan Wang; Gaokai Zhang; Li Lyna Zhang; Ning Shang; Fan Yang; Dongyao Chen; Mao Yang; | code |
| 122 | Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into operational code repositories. |
Minju Seo; Jinheon Baek; Seongyun Lee; Sung Ju Hwang; | code |
| 123 | Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This imbalance is counterintuitive because the understanding module is typically trained with several times more data on complex reasoning tasks than the generation module. To address this issue, we introduce *Draw-In-Mind* (DIM), a dataset comprising two complementary subsets: (**i**) DIM-T2I, containing 14M long-context image–text pairs to enhance complex instruction comprehension; and (**ii**) DIM-Edit, consisting of 233K chain-of-thought imaginations generated by GPT-4o, serving as explicit design blueprints for image edits. |
Ziyun Zeng; David Junhao Zhang; Wei Li; Mike Zheng Shou; | code |
| 124 | PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unfortunately, the patch-wise fusion framework and model-agnostic supervision hinder the exploitation of informative cues, thereby limiting performance gains. To overcome this deficiency, we introduce PromptHub, a framework that holistically strengthens multi-prompting through locality-aware fusion, concentration and alignment. |
Tianci Luo; Jinpeng Wang; Shiyu Qin; Niu Lian; Yan Feng; Bin Chen; Chun Yuan; Shu-Tao Xia; | code |
| 125 | JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation Under Jailbreak Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we propose JailNewsBench, the first benchmark for evaluating LLM robustness against jailbreak-induced fake news generation. |
Masahiro Kaneko; Ayana Niwa; Timothy Baldwin; | code |
| 126 | ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ProtoTS, a novel interpretable forecasting framework that achieves both high accuracy and transparent decision-making through modeling prototypical temporal patterns. |
Ziheng Peng; Shijie Ren; Xinyue Gu; Linxiao Yang; Xiting Wang; Liang Sun; | code |
| 127 | When to Use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This raises a critical question: Is GraphRAG really effective, and in which scenarios do graph structures provide measurable benefits for RAG systems? To address this, we propose GraphRAG-Bench, a comprehensive benchmark designed to evaluate GraphRAG models on both hierarchical knowledge retrieval and deep contextual reasoning. |
Zhishang Xiang; Chuanjie Wu; Qinggang Zhang; Shengyuan Chen; Zijin Hong; Xiao Huang; Jinsong Su; | code |
| 128 | ProRe: A Proactive Reward System for GUI Agents Via Reasoner–Actor Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing rule-based or model-based reward methods struggle to generalize to GUI agents, where access to ground-truth trajectories or application databases is often unavailable, and static trajectory-based LLM-as-a-Judge approaches suffer from limited accuracy. To address these challenges, we propose ProRe, a proactive reward system that leverages a general-purpose reasoner and domain-specific evaluator agents (actors). |
Gaole Dai; Shiqi Jiang; Ting Cao; Yuqing Yang; Yuanchun Li; Rui Tan; Mo Li; Lili Qiu; | code |
| 129 | Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Human-MME, a rigorously curated benchmark designed to provide a more holistic evaluation of MLLMs in human-centric scene understanding. |
Yuansen Liu; Haiming Tang; Jinlong Peng; Jiangning Zhang; Xiaozhong Ji; Qingdong He; Donghao Luo; Zhenye Gan; Junwei Zhu; Yunhang Shen; Chaoyou Fu; Chengjie Wang; Xiaobin Hu; Shuicheng YAN; | code |
| 130 | Multiplayer Nash Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Multiplayer Nash Preference Optimization (MNPO), a novel framework that generalizes NLHF to the multiplayer regime. |
Fang Wu; Xu Huang; Weihao Xuan; Zhiwei Zhang; Yijia Xiao; Guancheng Wan; Xiaomin Li; Bing Hu; Peng Xia; Jure Leskovec; Yejin Choi; | code |
| 131 | DualEdit: Mitigating Safety Fallback in LLM Backdoor Editing Via Affirmation-Refusal Regulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose DualEdit, a dual-objective model editing framework that jointly promotes affirmative outputs and suppresses refusal responses. |
Houcheng Jiang; Zetong Zhao; Junfeng Fang; Haokai Ma; Ruipeng Wang; Xiang Wang; Xiangnan He; Yang Deng; | code |
| 132 | SparseD: Sparse Attention for Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These unique findings render well-studied fixed sparse attention methods in ARs largely incompatible with DLMs, as their fixed patterns fail to capture head-specific patterns in DLMs, and sparse attention applied in the early steps can lead to degradation in generation. To address these challenges, we propose **SparseD**, a novel sparse attention method for DLMs. |
Zeqing Wang; Gongfan Fang; Xinyin Ma; Xingyi Yang; Xinchao Wang; | code |
| 133 | The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces DC-CoT, the first data-centric benchmark that investigates data manipulation in chain-of-thought (CoT) distillation from method, model and data perspectives. |
Ruichen Zhang; Rana Shahroz; Zhen Tan; Dawei Li; Song Wang; Tianlong Chen; | code |
| 134 | CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce *mid-training*, the first concept and practical method that inserts a lightweight intermediate stage between the (diffusion) pre-training and the final flow map training (i.e., post-training) for vision generation. |
Zheyuan Hu; Chieh-Hsin Lai; Yuki Mitsufuji; Stefano Ermon; | code |
| 135 | Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce M3-Agent, a novel multimodal agent framework equipped with long-term memory. |
Lin Long; Yichen He; Wentao Ye; Yiyuan Pan; Yuan Lin; Hang Li; Junbo Zhao; Wei Li; | code |
| 136 | No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an annotation-free training framework that improves both reasoning and grounding. |
Damiano Marsili; Georgia Gkioxari; | code |
| 137 | RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by recent advances in reasoning for language model, we propose RePrompt, a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning. |
Mingrui Wu; Lu Wang; Pu Zhao; Fangkai Yang; Jianjin Zhang; Jianfeng Liu; Yuefeng Zhan; Weihao Han; Hao Sun; Jiayi Ji; Xiaoshuai Sun; Qingwei Lin; Weiwei Deng; Dongmei Zhang; Feng Sun; Rongrong Ji; | code |
| 138 | Graph Homophily Booster: Reimagining The Role of Discrete Features in Heterophilic Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a simple yet effective framework called Graph Homophily Booster (GRAPHITE) to address graph heterophily. |
Ruizhong Qiu; Ting-Wei Li; Gaotang Li; Hanghang Tong; | code |
| 139 | Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose TraceRL, a trajectory-aware reinforcement learning framework for DLMs that incorporates information from inference trajectories into post-training and is applicable to both full-attention and block-attention diffusion models. |
Yinjie Wang; Ling Yang; Bowen Li; Ye Tian; Ke Shen; Mengdi Wang; | code |
| 140 | Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose VLMFP, a Dual-VLM-guided framework that can autonomously generate both PDDL problem and domain files for formal visual planning. |
Yilun Hao; Yongchao Chen; Chuchu Fan; Yang Zhang; | code |
| 141 | RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current approaches rely on natural language planning, which often produces unclear specifications, misaligned components, and brittle designs due to its inherent ambiguity and lack of structure. To address these limitations, we introduce the Repository Planning Graph (RPG), a structured representation that encodes capabilities, file structures, data flows, and functions in a unified graph. |
Jane Luo; Xin Zhang; Steven Liu; Jie Wu; Jianfeng Liu; Yiming Huang; Yangyu Huang; Chengyu Yin; Ying Xin; Yuefeng Zhan; Hao Sun; Qi Chen; Scarlett Li; Mao Yang; | code |
| 142 | UniCA: Unified Covariate Adaptation for Time Series Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their design primarily targets real-valued series, limiting their ability to handle general forecasting tasks involving diverse and often \emph{heterogeneous covariates}—such as categorical variables and multimodal data (e.g., images, text)—which are typically task-specific and difficult to leverage during pretraining. To address this gap, we propose Unified Covariate Adaptation (UniCA), a framework to bridge TSFMs with general covariate-aware forecasting. |
Lu Han; Yu Liu; Lan Li; Qiwen Deng; Jian Jiang; Yinbo sun; Zhe Yu; Binfeng Wang; Xingyu Lu; Lintao Ma; Han-Jia Ye; De-Chuan Zhan; | code |
| 143 | ViPRA: Video Prediction for Robot Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most of them lack labeled actions, which limits their use in robot learning. We present *Video Prediction for Robot Actions* (**ViPRA**), a simple pretraining-finetuning framework that learns continuous robot control from these actionless videos. |
Sandeep Routray; Hengkai Pan; Unnat Jain; Shikhar Bahl; Deepak Pathak; | code |
| 144 | ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Among these, computer-using agents, capable of interacting with operating systems as humans do, are paving the way to automated scientific problem-solving and addressing routines in researchers’ workflows. Recognizing the transformative potential of these agents, we introduce ScienceBoard, which encompasses two complementary contributions: (i) a realistic, multi-domain environment featuring dynamic and visually rich scientific workflows with integrated professional software, where agents can autonomously interact via different interfaces to accelerate complex research tasks and experiments; and (ii) a challenging benchmark of 169 high-quality, rigorously validated real-world tasks curated by humans, spanning scientific-discovery workflows in domains such as biochemistry, astronomy, and geoinformatics. |
Qiushi Sun; Zhoumianze Liu; Chang Ma; Zichen Ding; Fangzhi Xu; Zhangyue Yin; Haiteng Zhao; Zhenyu Wu; Kanzhi Cheng; Zhaoyang Liu; Jianing Wang; Qintong Li; Xiangru Tang; Tianbao Xie; Xiachong Feng; Xiang Li; Ben Kao; Wenhai Wang; Biqing Qi; Lingpeng Kong; Zhiyong Wu; | code |
| 145 | DA$^{2}$: Depth Anything in Any Direction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, due to the spherical distortions inherent in panoramas, many approaches rely on perspective splitting (\textit{e.g.}, cubemaps), which leads to suboptimal efficiency. To address these challenges, we propose $\textbf{DA}$$^{\textbf{2}}$: $\textbf{D}$epth $\textbf{A}$nything in $\textbf{A}$ny $\textbf{D}$irection, an accurate, zero-shot generalizable, and fully end-to-end panoramic depth estimator. |
Haodong Li; Wangguandong Zheng; Jing He; Yuhao LIU; Xin Lin; Xin Yang; Ying-Cong Chen; Chunchao Guo; | code |
| 146 | Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation As Entropy Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ERA, a new paradigm for entropy-constrained policy via output activation. |
Zilin Kang; Chonghua Liao; Tingqiang Xu; Huazhe Xu; | code |
| 147 | Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large-scale pre-training of foundation visual encoders has advanced many fields, as the enormous quantity of data helps to learn the general distribution of normal images. We observe that the anomaly amount in an image directly correlates with the difference in the learnt embeddings and utilize this to design a few-shot anomaly detector termed FoundAD. |
Guangyao Zhai; Yue Zhou; Xinyan Deng; Lars Heckler-Kram; Nassir Navab; Benjamin Busam; | code |
| 148 | The Devil Behind The Mask: An Emergent Safety Vulnerability of Diffusion LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present **DIJA**, the first systematic study and jailbreak attack framework that exploits unique safety weaknesses of dLLMs. |
Zichen Wen; Jiashu Qu; Zhaorun Chen; Xiaoya Lu; Dongrui Liu; Zhiyuan Liu; Ruixi Wu; Yicun Yang; Xiangqi Jin; Haoyun Xu; Xuyang Liu; Weijia Li; Chaochao Lu; Jing Shao; Conghui He; Linfeng Zhang; | code |
| 149 | CARD: Towards Conditional Design of Multi-agent Topological Structures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the effectiveness and robustness of these systems critically depend on their communication topology, which is often fixed or statically learned, ignoring real-world dynamics such as model upgrades, API (or tool) changes, or knowledge source variability. To address this limitation, we propose CARD (Conditional Agentic Graph Designer), a conditional graph-generation framework that instantiates AMACP, a protocol for adaptive multi-agent communication. |
Tongtong Wu; Yanming Li; Ziye Tang; Chen Jiang; Linhao Luo; Guilin Qi; Shirui Pan; Gholamreza Haffari; | code |
| 150 | AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To better capture cross-modal correspondence, we introduce a series of temporal consistency mechanisms that guide the vector field estimator toward learning robust audiovisual alignment, enabling accurate and resilient separation in complex scenes. |
Xize Cheng; Chenyuhao Wen; Slytherin Wang; Yongqi Wang; Zehan Wang; Rongjie Huang; Tao Jin; Zhou Zhao; | code |
| 151 | VLMgineer: Vision-Language Models As Robotic Toolsmiths Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present VLMgineer, the first fully automatic framework designs tools and actions from scratch by harnessing the creativity of Vision–Language Models (VLMs) together with evolutionary search.To facilitate future research on automated tool invention, we will release our benchmark and code. |
George Jiayuan Gao; Tianyu Li; Junyao Shi; Yihan Li; Zizhe Zhang; Nadia Figueroa; Dinesh Jayaraman; | code |
| 152 | Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, purely vision-centric approaches may be constrained by the inherent modality gap still exhibited by modern vision-language models. In this work, we connect these challenges to the paradigm of hybrid retrieval, investigating whether a lightweight dense text retriever can enhance a stronger vision-centric model. |
Omri Uzan; Asaf Yehudai; roi pony; Eyal Shnarch; Ariel Gera; | code |
| 153 | From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To validate its causal role, we design training-free interventions that directly manipulate attention allocation at inference time, yielding consistent 1–2% gains without retraining. Building on these insights, we propose Attention-Guided Visual Anchoring and Reflection (AVAR), a comprehensive cold-start framework that integrates visual-anchored data synthesis, attention-guided objectives, and visual-anchored reward shaping. |
Ruilin Luo; Chufan Shi; Yizhen Zhang; Cheng Yang; Songtao Jiang; Tongkun Guan; Ruizhe Chen; Ruihang Chu; Peng Wang; Mingkun Yang; Lei Wang; Yujiu Yang; Junyang Lin; Zhibo Yang; | code |
| 154 | Generalizable End-to-End Tool-Use RL with Synthetic CodeGym Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce **CodeGym**, a scalable framework that synthesizes diverse, verifiable, and controllable multi-turn tool-use environments for agent RL, enabling LLM agents to explore and master various workflows actively. |
Weihua Du; Hailei Gong; Zhan Ling; Kang Liu; Lingfeng Shen; Xuesong Yao; Yufei Xu; Dingyuan Shi; Yiming Yang; Jiecao Chen; | code |
| 155 | Trion: FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a computationally efficient and conceptually simple, two-step procedure to approximate SVD/QR-based gradient projections into lower-dimensional spaces by using a predefined orthogonal matrix of the Discrete Cosine Transform (DCT). |
Ionut-Vlad Modoranu; Mher Safaryan; Erik Schultheis; Max Ryabinin; Artem Chumachenko; Dan Alistarh; | code |
| 156 | VCWorld: A Biological World Model for Virtual Cell Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: More critically, these models often function as black boxes, offering predictions without interpretability or consistency with biological principles, which undermines their credibility in scientific research. To address these challenges, we present VCWorld, a cell-level white-box simulator that integrates structured biological knowledge with the iterative reasoning capabilities of large language models to instantiate a biological world model. |
Zhijian Wei; Runze Ma; Zichen Wang; Zhongmin Li; Shuotong Song; Shuangjia Zheng; | code |
| 157 | Evolution of Concepts in Language Model Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we track linear interpretable feature evolution across pre-training snapshots using a sparse dictionary learning method called crosscoders. |
Xuyang Ge; Wentao Shu; Jiaxing Wu; Yunhua Zhou; Zhengfu He; Xipeng Qiu; | code |
| 158 | Nef-Net V2: Adapting Electrocardio Panorama in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents NEF-NET V2, an enhanced framework for realistic panoramic ECG synthesis that supports arbitrary-length signal synthesis from any desired view, generalizes across ECG devices, and compensates for operator-induced deviations in electrode place- ment.To rigorously evaluate panoramic ECG synthe- sis, we construct a new Electrocardio Panorama benchmark, called Panobench, comprising 4470 recordings with 48-view per subject, capturing the full spatial variability of cardiac electrical activity. |
Zehui Zhan; Yaojun Hu; Jiajing Zhang; Wanchen Lian; Wanqing Wu; Jintai Chen; | code |
| 159 | NextQuill: Causal Preference Modeling for Enhancing LLM Personalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce NextQuill, a novel LLM personalization alignment framework grounded in causal preference modeling. |
Xiaoyan Zhao; Juntao You; Yang Zhang; Wenjie Wang; Hong Cheng; Fuli Feng; See-Kiong Ng; Tat-Seng Chua; | code |
| 160 | Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we provide the first formal treatment of textual sharpness in the discrete, semantic space of prompts, together with an operational robustness criterion over a semantic neighborhood; the design is black-box or API-only, requiring no gradients to update the model’s parameters. |
Guancheng Wan; Lucheng Fu; Haoxin Liu; Yiqiao Jin; Hui Yi Leong; Eric Hanchen Jiang; Hejia Geng; Jinhe Bi; Yunpu Ma; Xiangru Tang; B. Aditya Prakash; Yizhou Sun; Wei Wang; | code |
| 161 | SceneStreamer: Continuous Scenario Generation As Next Token Group Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose InfGen, a scenario generation framework that outputs agent states and trajectories in an autoregressive manner. |
Zhenghao Peng; Yuxin Liu; Bolei Zhou; | code |
| 162 | G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction. |
Junfeng Ni; Yixin Chen; Zhifei Yang; Yu Liu; Ruijie Lu; Song-Chun Zhu; Siyuan Huang; | code |
| 163 | CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present CIMemories, a benchmark for evaluating whether LLMs appropriately control information flow from memory based on task context. |
Niloofar Mireshghallah; Neal Mangaokar; Narine Kokhlikyan; Arman Zharmagambetov; Manzil Zaheer; Saeed Mahloujifar; Kamalika Chaudhuri; | code |
| 164 | Flash-Searcher: Fast and Effective Web Agents Via DAG-Based Parallel Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Flash-Searcher, a novel parallel agent reasoning framework that fundamentally reimagines the execution paradigm from sequential chains to directed acyclic graphs (DAGs). |
Tianrui Qin; Qianben Chen; Sinuo Wang; He Xing; King Zhu; He Zhu; Dingfeng Shi; Xinxin Liu; Ge Zhang; Jiaheng Liu; Xitong Gao; Yuchen Eleanor Jiang; Wangchunshu Zhou; | code |
| 165 | Lean Finder: Semantic Search for Mathlib That Understands User Intents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We further align Lean Finder with mathematicians’ preferences using diverse feedback signals, encoding it with a rich awareness of their goals from multiple perspectives. |
Jialin Lu; Kye Emond; Kaiyu Yang; Swarat Chaudhuri; Weiran Sun; Wuyang Chen; | code |
| 166 | Tokenization to Transfer: Do Genomic Foundation Models Learn Good Representations? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To assess the usefulness of pretraining in genomics, we evaluated seven different GFMs across 52 diverse genomic tasks, comparing them to their counterparts with randomly initialized weights. Across benchmarks, we find that randomly initialized models provide surprisingly strong baselines and tokenizer and architecture choices strongly shape both these baselines and the gains from pretraining. |
Kirill Vishniakov; Karthik Viswanathan; Aleksandr Medvedev; Praveenkumar Kanithi; Marco AF Pimentel; Ronnie Rajan; Shadab Khan; | code |
| 167 | Compositional Visual Planning Via Inference-Time Diffusion Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose that the key to stable compositional generation lies in enforcing boundary agreement on the estimated clean data (Tweedie estimates) rather than on noisy intermediate states. |
Yixin Zhang; Yunhao Luo; Utkarsh Aashu Mishra; Woo Chul Shin; Yongxin Chen; Danfei Xu; | code |
| 168 | TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While multiple benchmarks have been proposed to answer this fundamental question, most are manually curated and focus on narrow domains or specific skill sets. To address this limitation, we propose scalable methods for creating comprehensive time series reasoning benchmarks that combine the flexibility of templates with the creativity of LLM agents. |
Malgorzata Gwiazda; Yifu Cai; Mononito Goswami; Arjun Choudhry; Artur Dubrawski; | code |
| 169 | Landscape of Thoughts: Visualizing The Reasoning Process of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the reasoning behavior of LLMs remains poorly understood, posing challenges to research, development, and safety. To address this gap, we introduce landscape of thoughts (LoT), the first landscape visualization tool to inspect the reasoning trajectories with certain reasoning methods on any multi-choice dataset. |
Zhanke Zhou; Zhaocheng Zhu; Xuan Li; Mikhail Galkin; Xiao Feng; Sanmi Koyejo; Jian Tang; Bo Han; | code |
| 170 | ACPBench Hard: Unrestrained Reasoning About Action, Change, and Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ACPBench Hard, a dataset of generative, open-ended questions which LLM models needs to answer in order to plan. |
Harsha Kokel; Michael Katz; Kavitha Srinivas; Shirin Sohrabi; | code |
| 171 | ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unified multimodal models (UMMs) have shown remarkable advances in jointly understanding and generating text and images. |
Yongyuan Liang; Wei Chow; Feng Li; Ziqiao Ma; Xiyao Wang; Jiageng Mao; Jiuhai Chen; Jiatao Gu; Yue Wang; Furong Huang; | code |
| 172 | JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge the gap, this paper presents JavisDiT++, a concise yet powerful framework for efficient and effective JAVG. |
Kai Liu; Yanhao Zheng; Kai Wang; Shengqiong Wu; Rongjunchen Zhang; Jiebo Luo; Dimitrios Hatzinakos; Ziwei Liu; Hao Fei; Tat-Seng Chua; | code |
| 173 | JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces JavisDiT, a novel Joint Audio-Video Diffusion Trans- former designed for synchronized audio-video generation (JAVG).Furthermore, we propose a new benchmark, JavisBench, which consists of 10,140 high-quality text-captioned sounding videos and focuses on synchronization evaluation in diverse and complex real-world scenarios. |
Kai Liu; Wei Li; Lai Chen; Shengqiong Wu; Yanhao Zheng; Jiayi Ji; Fan Zhou; Jiebo Luo; Ziwei Liu; Hao Fei; Tat-Seng Chua; | code |
| 174 | EgoTwin: Dreaming Body and View in First Person Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge this gap, we introduce a novel task of joint egocentric video and human motion generation, characterized by two key challenges: 1) Viewpoint Alignment: the camera trajectory in the generated video must accurately align with the head trajectory derived from human motion; 2) Causal Interplay: the synthesized human motion must causally align with the observed visual dynamics across adjacent video frames. To address these challenges, we propose EgoTwin, a joint video-motion generation framework built on the diffusion transformer architecture. |
Jingqiao Xiu; Fangzhou Hong; Yicong Li; Mengze Li; Wentao Wang; Sirui Han; Liang Pan; Ziwei Liu; | code |
| 175 | VidGuard-R1: AI-Generated Video Detection and Explanation Via Reasoning MLLMs and RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In addition to accurate classification, it is essential that detection models provide interpretable explanations to ensure transparency for regulators and end users. To address these challenges, we introduce VidGuard-R1, the first video authenticity detector that fine-tunes a multi-modal large language model (MLLM) using group relative policy optimization (GRPO). |
Kyoungjun Park; Yifan Yang; Juheon Yi; Shicheng Zheng; Muhammad Muaz; Yifei Shen; Dongqi Han; Caihua Shan; Lili Qiu; | code |
| 176 | Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces significant memory overhead, which constrains scalability and computational efficiency. In this work, we re-frame the exponential moving average (EMA) used in these momenta as the training of a linear regressor via online gradient flow. |
Zhengbo Wang; Jian Liang; Ran He; Zilei Wang; Tieniu Tan; | code |
| 177 | Beyond Linear Probes: Dynamic Safety Monitoring for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that safety monitors should be flexible–costs should rise only when inputs are difficult to assess, or when more compute is available. To achieve this, we introduce Truncated Polynomial Classifiers (TPCs), a natural extension of linear probes for dynamic activation monitoring. |
James Oldfield; Philip Torr; Ioannis Patras; Adel Bibi; Fazl Barez; | code |
| 178 | Reducing Class-Wise Performance Disparity Via Margin Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present Margin Regularization for performance disparity Reduction ( $MR^2$ ), a theoretically principled regularization for classification by dynamically adjusting margins in both the logit and representation spaces. |
Beier Zhu; Kesen Zhao; Jiequan Cui; Qianru Sun; Yuan Zhou; Xun Yang; Hanwang Zhang; | code |
| 179 | Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-context dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. |
Xiaoran Liu; Yuerong Song; Zhigeng Liu; Zengfeng Huang; Qipeng Guo; Zhaoxiang Liu; Shiguo Lian; Ziwei He; Xipeng Qiu; | code |
| 180 | GUI-Shift: Enhancing VLM-Based GUI Agents Through Self-supervised Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This approach eliminates the need for natural language instructions and enables scalable dataset construction from existing GUI trajectories or automated exploration. Building on this task, we propose GUI-Shift, a reinforcement learning (RL) framework that combines rule-based optimization with data filtering to improve VLM performance. |
Longxi Gao; Li Zhang; Pengzhi Gao; Wei Liu; Jian Luan; Mengwei Xu; | code |
| 181 | Reinforced Latent Reasoning for LLM-based Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning. |
Yang Zhang; Wenxin Xu; Xiaoyan Zhao; Wenjie Wang; Fuli Feng; Xiangnan He; Tat-Seng Chua; | code |
| 182 | MR3: Multilingual Rubric-Agnostic Reward Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce mR3, a massively multilingual, rubric-agnostic reward reasoning model trained on 72 languages, achieving the broadest language coverage in reward modeling to date. |
David Anugraha; Shou-Yi Hung; Zilu Tang; En-Shiun Annie Lee; Derry Tanti Wijaya; Genta Indra Winata; | code |
| 183 | There Is No VAE: End-to-End Pixel-Space Generative Modeling Via Self-Supervised Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Pixel-space generative models are often more difficult to train and generally underperform compared to their latent-space counterparts, leaving a persistent performance and efficiency gap. In this paper, we introduce a novel two-stage training framework that closes this gap for pixel-space diffusion and consistency models. |
Jiachen Lei; Keli Liu; Julius Berner; Y HoiM; Hongkai Zheng; Jiahong Wu; Xiangxiang Chu; | code |
| 184 | CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Second, although some methods turn to general text-to-image models without relying on such target-specific training, they suffer from a significant distributional mismatch, as the web-scale priors encapsulated in these foundation models fail to faithfully capture the target-specific semantics, leading to suboptimal performance. To tackle these challenges, we propose Core Distribution Alignment (CoDA), a framework that enables effective DD using only an off-the-shelf text-to-image model. |
Letian Zhou; Songhua Liu; Xinchao Wang; | code |
| 185 | Mathesis: Towards Formal Theorem Proving from Natural Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Mathesis, the first pipeline for the systematic study of formal theorem proving from natural language. |
Yu Xuejun; Jianyuan Zhong; Zijin Feng; Pengyi Zhai; Roozbeh Yousefzadeh; Wei Chong Ng; Haoxiong Liu; Ziyi Shou; Jing Xiong; Yudong Zhou; Claudia Beth Ong; Austen Jeremy Sugiarto; Yaoxi Zhang; Wai Ming Tai; Huan Cao; Dongcai Lu; Jiacheng Sun; Qiang Xu; SHEN XIN; Zhenguo Li; | code |
| 186 | Zephyrus: An Agentic Framework for Weather Science Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models (LLMs) excel at understanding and generating text but cannot reason about high-dimensional meteorological datasets. We bridge this gap by building a novel agentic framework for weather science. |
Sumanth Varambally; Marshall Fisher; Jas Thakker; Yiwei Chen; Zhirui Xia; Yasaman Jafari; Ruijia Niu; Manas Jain; Veeramakali Vignesh Manivannan; Zachary Novack; Luyu Han; Srikar Eranky; Salva Rühling Cachay; Taylor Berg-Kirkpatrick; Duncan Watson-Parris; Yian Ma; Rose Yu; | code |
| 187 | OCR-Reasoning Benchmark: Unveiling The True Capabilities of MLLMs in Complex Text-Rich Image Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their capabilities in text-rich image reasoning tasks remain understudied due to the absence of a dedicated and systematic benchmark. To address this gap, we propose OCR-Reasoning, a novel benchmark designed to systematically assess Multimodal Large Language Models on text-rich image reasoning tasks. |
Mingxin Huang; Yongxin Shi; Dezhi Peng; Songxuan Lai; Zecheng Xie; Lianwen Jin; | code |
| 188 | Part-X-MLLM: Part-aware 3D Multimodal Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Part-X-MLLM, a native 3D multimodal large language model that unifies diverse 3D tasks by formulating them as programs in a structured, executable grammar. |
Chunshi Wang; Junliang Ye; Yunhan Yang; YANG LI; Zizhuo Lin; Jun Zhu; Zhuo Chen; Yawei Luo; Chunchao Guo; | code |
| 189 | ARES: Multimodal Adaptive Reasoning Via Difficulty-Aware Token-Level Entropy Shaping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these models tend to *overthink* on simple problems, producing unnecessarily lengthy reasoning traces, while *under-exploring* on challenging ones, leading to missed solutions. To address this imbalance, we propose **ARES**, a unified open-source framework for *adaptive reasoning* that dynamically allocates exploration effort based on task difficulty. |
Shuang Chen; Hangyu Guo; Yimeng Ye; Shijue Huang; Wenbo Hu; Jiayu Chen; Manyuan Zhang; Haoxi Li; Song Guo; Nanyun Peng; | code |
| 190 | H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Triply-Hierarchical Diffusion Policy (H$^3$DP), a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation. |
Yiyang Lu; Yufeng Tian; Zhecheng Yuan; Xianbang Wang; Pu Hua; Zhengrong Xue; Huazhe Xu; | code |
| 191 | Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the aforementioned issues, we propose a conjoint framework called CoCo, which captures compactness and consistency in the learned node representations for deep graph clustering. |
Wei Ju; Siyu Yi; Kangjie Zheng; Yifan Wang; Ziyue Qiao; Li Shen; Yongdao Zhou; Xiaochun Cao; Jiancheng Lv; | code |
| 192 | R1-Code-Interpreter: LLMs Reason with Code Via Supervised and Multi-stage Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present R1-Code-Interpreter, an extension of a text-only LLM trained via multi-turn supervised fine-tuning (SFT) and reinforcement learning (RL) to autonomously generate multiple code queries during step-by-step reasoning. |
Yongchao Chen; Yueying Liu; Junwei Zhou; Yilun Hao; Jingquan Wang; Yang Zhang; Na Li; Chuchu Fan; | code |
| 193 | PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing resources rarely provide datasets that combine behavioral descriptors with complementary modalities such as facial attributes and biographical information. To address this gap, we present PersonaX, a curated collection of multimodal datasets designed to enable comprehensive analysis of public traits across modalities. |
Loka Li; Wong Yu Kang; Minghao Fu; Guangyi Chen; Zhenhao Chen; Gongxu Luo; Yuewen Sun; Salman Khan; Peter Spirtes; Kun Zhang; | code |
| 194 | LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present LLM-guided Evolution for Materials design (**LLEMA**), a unified framework that couples the scientific knowledge embedded in large language models with chemistry-informed evolutionary rules and memory-based refinement. |
Nikhil Abhyankar; Sanchit Kabra; Saaketh Desai; Chandan K. Reddy; | code |
| 195 | SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current sports benchmarks either cover single sports or lack the detailed reasoning chains and precise visual grounding needed to robustly evaluate these core capabilities in a multi-sport context. To address this gap, we introduce SportR, the first multi-sports large-scale benchmark designed to train and evaluate MLLMs on the fundamental reasoning required for sports intelligence. |
Haotian Xia; Haonan Ge; Junbo Zou; Hyun Woo Choi; Xuebin Zhang; Danny Suradja; Botao Rui; Ethan Tran; Wendy Jin; Zhen Ye; Xiyang Lin; Christopher Lai; Shengjie Zhang; Junwen Miao; Shichao Chen; Rhys Tracy; Vicente Ordonez; Weining Shen; Hanjie Chen; | code |
| 196 | On The Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). |
Yongliang Wu; Yizhou Zhou; Zhou Ziheng; Yingzhe Peng; Xinyu Ye; Xinting Hu; Wenbo Zhu; Lu Qi; Ming-Hsuan Yang; Xu Yang; | code |
| 197 | FastVGGT: Fast Visual Geometry Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis reveals a “token collapse” phenomenon, where many tokens attend to nearly identical regions, resulting in redundant computation and inefficiency. Motivated by this finding, we propose FastVGGT, a training-free framework that strategically prunes these redundant tokens. |
You Shen; Zhipeng Zhang; Yansong Qu; Xiawu Zheng; Jiayi Ji; Shengchuan Zhang; Liujuan Cao; | code |
| 198 | In-Context Watermarks for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: One illustrative example is the use of LLMs by dishonest reviewers in the context of academic peer review, where conference organizers have no access to the model used but still need to detect AI-generated reviews. Motivated by this gap, we introduce In-Context Watermarking (ICW), which embeds watermarks into generated text solely through prompt engineering, leveraging LLMs’ in-context learning and instruction-following abilities. |
Yepeng Liu; Xuandong Zhao; Christopher Kruegel; Dawn Song; Yuheng Bu; | code |
| 199 | Adaptive Hopfield Network: Rethinking Similarities in Associative Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing models evaluate the quality of retrieval based on proximity, which cannot guarantee that the retrieved pattern has the strongest association with the query, failing correctness. We reframe this problem by proposing that a query is a generative variant of a stored memory pattern, and define a variant distribution to model this subtle context-dependent generative process. |
Shurong Wang; Yuqi Pan; Zhuoyang Shen; Meng Zhang; Hongwei Wang; Guoqi Li; | code |
| 200 | How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from A Binary-Matrix Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we connect two fundamental questions: (1) What is the minimal set of wrong codes sufficient to represent the entire error space? |
Xianzhen Luo; JinYang Huang; Wenzhen Zheng; Qingfu Zhu; Mingzheng Xu; Yiheng Xu; YuanTao Fan; Wanxiang Che; | code |
| 201 | LiveResearchBench: A Live Benchmark for User-Centric Deep Research in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing benchmarks fall short of these principles, often focusing on narrow domains or posing ambiguous questions that hinder fair comparison. Guided by these principles, we introduce LiveResearchBench, a benchmark of 100 expert-curated tasks spanning daily life, enterprise, and academia, each requiring extensive, dynamic, real-time web search and synthesis. |
Jiayu Wang; Yifei Ming; Riya Dulepet; Qinglin Chen; Austin Xu; Zixuan Ke; Frederic Sala; Aws Albarghouthi; Caiming Xiong; Shafiq Joty; | code |
| 202 | Bridging Successor Measure and Online Policy Learning with Flow Matching-Based Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although SM is a powerful predictive object, it lacks compact representations tailored for online RL. To address this, we introduce Successor Flow Features (SF2), a representation learning framework that bridges SM estimation with policy optimization. |
Haosen Shi; Jianda Chen; Sinno Jialin Pan; | code |
| 203 | Referring Layer Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge this gap and enable both compositional understanding and controllable editing, we introduce the Referring Layer Decomposition (RLD) task, which predicts complete RGBA layers from a single RGB image, conditioned on flexible user prompts, such as spatial inputs (e.g., points, boxes, masks), natural language descriptions, or combinations thereof.We will release our dataset, evaluation tools, and model for future research. |
Fangyi Chen; Yaojie Shen; Lu Xu; Ye Yuan; Shu Zhang; Yulei Niu; Longyin Wen; | code |
| 204 | FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We suggest that low-frame-rate codecs’ limitations are in both insufficient semantic decoupling and insufficient time resolution at capturing transient phonetic details. This paper introduces **FlexiCodec** to address this limitation. |
Jiaqi Li; Yao Qian; Yuxuan Hu; leying zhang; Xiaofei Wang; Heng Lu; Manthan Thakker; Jinyu Li; sheng zhao; Zhizheng Wu; | code |
| 205 | ACADREASON: Exploring The Limits of Reasoning Models with Academic Research Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing evaluations focus mainly on math/code contests or general tasks, while existing multi-domain academic benchmarks lack sufficient reasoning depth, leaving the field without a rigorous benchmark for high-level reasoning. To fill this gap, we introduce the ACADREASON benchmark, designed to evaluate the ability of LLMs and agents to acquire and reason over academic knowledge. |
Xin Gui; King Zhu; JinCheng Ren; Qianben Chen; Zekun Moore Wang; Yizhi LI; Xinpeng Liu; REN WENLI; Linyu Miao; Tianrui Qin; Ziqi Shu; He Zhu; Dingfeng Shi; Jiaheng Liu; Yuchen Eleanor Jiang; Minghao Liu; Ge Zhang; Wangchunshu Zhou; | code |
| 206 | UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we pioneer the exploration of generative embeddings, unifying embedding tasks within a generative paradigm. |
Zhibin Lan; Liqiang Niu; Fandong Meng; Jie Zhou; Jinsong Su; | code |
| 207 | SimpleGVR: A Simple Baseline for Latent-Cascaded Generative Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these issues, we introduce SimpleGVR, a lightweight VSR model designed to operate entirely within the latent space.To further enhance the performance and practical applicability of SimpleGVR, we introduce a set of crucial training optimizations: a detail-aware timestep sampler, a suitable noise augmentation range, and an efficient interleaving temporal unit mechanism for long-video handling. |
Liangbin Xie; Yu Li; Shian Du; Menghan Xia; Xintao Wang; Fanghua Yu; Ziyan Chen; Pengfei Wan; Jiantao Zhou; Chao Dong; | code |
| 208 | SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SAGE ($\underline{S}$patial-visual $\underline{A}$daptive $\underline{G}$raph $\underline{E}$xploration), a unified training pipeline that enhances granular spatial–visual discrimination by jointly improving local feature aggregation, organize samples during training, and hard sample mining. |
Shunpeng Chen; Changwei Wang; Rongtao Xu; Peixingtian; yukun Song; Jinzhou Lin; Wenhao Xu; jingyizhang; Li Guo; Shibiao Xu; | code |
| 209 | Achieving Low-bit Muon Through Subspace Preservation and Grid Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the low-bit compression of Muon and systematically analyze the quantization error exacerbated by orthogonalization. |
Huaijin Wu; Bingrui Li; Yebin Yang; Yi Tu; Zhanpeng Zhou; Jianfei Chen; Junchi Yan; | code |
| 210 | DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce *DriftLite*, a lightweight, training-free particle-based approach that steers the inference dynamics on-the-fly with provably optimal stability control. |
Yinuo Ren; Wenhao Gao; Lexing Ying; Grant M. Rotskoff; Jiequn Han; | code |
| 211 | WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce WAVE (\textbf{u}nified \& \textbf{v}ersatile \textbf{a}udio-\textbf{v}isual \textbf{e}mbeddings), the first LLM-based embedding that creates a unified representation space for text, audio, and video modalities. |
Changli Tang; Qinfan Xiao; Ke Mei; Tianyi Wang; Fengyun Rao; Chao Zhang; | code |
| 212 | VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textbf{VLM4VLA}, a minimal adaptation pipeline that converts general-purpose VLMs into VLA policies using only a small set of new learnable parameters for fair and efficient comparison. |
Jianke Zhang; Xiaoyu Chen; Yanjiang Guo; Yucheng Hu; Jianyu Chen; | code |
| 213 | VERINA: Benchmarking Verifiable Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce VERINA (Verifiable Code Generation Arena), a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions. |
Zhe Ye; Zhengxu Yan; Jingxuan He; Timothe Kasriel; Kaiyu Yang; Dawn Song; | code |
| 214 | Skill Learning Via Policy Diversity Yields Identifiable Representations for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Self-supervised feature learning and pretraining methods in reinforcement learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL). These methods aim to learn a representation of the environment while also incentivizing exploration thereof. |
Patrik Reizinger; Bálint Mucsányi; Siyuan Guo; Benjamin Eysenbach; Bernhard Schölkopf; Wieland Brendel; | code |
| 215 | TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textbf{TreeGRPO}, a novel RL framework that dramatically improves training efficiency by recasting the denoising process as a search tree. |
Zheng Ding; Weirui Ye; | code |
| 216 | AUHead: Realistic Emotional Talking Head Generation Via Action Units Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current methods struggle with nuanced emotional expressions due to the lack of fine-grained emotion control. To address this issue, we introduce a novel two-stage method (AUHead) to disentangle fine-grained emotion control, i.e. , Action Units (AUs), from audio and achieve controllable generation. |
Jiayi Lyu; Leigang Qu; Wenjing Zhang; Hanyu Jiang; Kai Liu; Zhenglin Zhou; Xiaobo Xia; Jian Xue; Tat-Seng Chua; | code |
| 217 | Reasoned Safety Alignment: Ensuring Jailbreak Defense Via Answer-Then-Check Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a novel safety alignment approach called Answer-Then-Check, which enhances LLM robustness against malicious prompts by applying thinking ability to mitigate jailbreaking problems before producing a final answer to the user. |
Chentao Cao; Xiaojun Xu; Bo Han; Hang Li; | code |
| 218 | Catching The Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our work presents a practical and scalable solution for enhancing the fine-grained perception of MLLMs without requiring costly supervision or full model fine-tuning. |
Yuheng Shi; Xiaohuan Pei; Minjing Dong; Chang Xu; | code |
| 219 | Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition By Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, MLLMs tend to overfit to seen sub-categories and generalize poorly to unseen ones. To address these challenges, we propose Fine-R1, an MLLM tailored for FGVR through an R1-style training framework: (1) Chain-of-Thought Supervised Fine-tuning, where we construct a high-quality FGVR CoT dataset with rationales of visual analysis, candidate sub-categories, comparison, and prediction”, transition the model into a strong open-world classifier; and (2) Triplet Augmented Policy Optimization, where Intra-class Augmentation mixes trajectories from anchor and positive images within the same category to improve robustness to intra-class variance, while Inter-class Augmentation maximizes the response distinction conditioned on images across sub-categories to enhance discriminative ability. |
Hulingxiao He; Zijun Geng; Yuxin Peng; | code |
| 220 | Compositional Diffusion with Guided Search for Long-Horizon Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this \emph{mode averaging} problem by embedding search directly within the diffusion denoising process. |
Utkarsh Aashu Mishra; David He; Yongxin Chen; Danfei Xu; | code |
| 221 | Arbitrary Generative Video Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present ArbInterp, a novel generative VFI framework that enables efficient interpolation at any timestamp and of any length.Experimentally, we develop comprehensive benchmarks for multi-scale frame interpolation (2× to 32×) to assess generalizability across arbitrary interpolation factors. |
Guozhen Zhang; Haiguang Wang; Chunyu Wang; Yuan Zhou; Qinglin Lu; Limin Wang; | code |
| 222 | Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For online training, because of limited sampling and dynamically shifting output distributions, PBC estimation is difficult. Therefore, we introduce a new online variant, computed from normalized advantage and subjective uncertainty. |
Hao Yi; Yulan Hu; Xin Li; Sheng Ouyang; Lizhong Ding; Yong Liu; | code |
| 223 | Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Flow2GAN, a two-stage framework that combines Flow Matching training for learning generative capabilities with GAN fine-tuning for efficient few-step inference. |
Zengwei Yao; Wei Kang; Han Zhu; Liyong Guo; Lingxuan Ye; Fangjun Kuang; Weiji Zhuang; Zhaoqing Li; Zhifeng Han; Long Lin; Daniel Povey; | code |
| 224 | Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Goedel-Prover-V2, a family of two language models that establish a new state-of-the-art (SOTA) in open-source ATP, using the Lean proof assistant. |
Yong Lin; Shange Tang; Bohan Lyu; Ziran Yang; Jui-Hui Chung; Haoyu Zhao; Lai Jiang; Yihan Geng; Jiawei Ge; Jingruo Sun; Jiayun Wu; Jiri Gesi; Ximing Lu; David Acuna; Kaiyu Yang; Hongzhou Lin; Yejin Choi; Danqi Chen; Sanjeev Arora; Chi Jin; | code |
| 225 | Prior-free Tabular Test-time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on the problem of \textit{prior-free tabular test-time adaptation} where no access to source data and any prior knowledge is allowed, and we propose a novel method, \underline{P}rior-\underline{F}ree \underline{T}abular \underline{T}est-\underline{T}ime \underline{A}daptation (PFT$_3$A), which has three designs to simultaneously address label shift and feature shift without source domain or prior access. |
Rundong He; Jieming Shi; | code |
| 226 | InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce InternSpatial, the largest open-source dataset for spatial reasoning in VLMs, along with InternSpatial-Bench, a corresponding evaluation benchmark designed to assess spatial understanding under diverse instruction formats. |
Nianchen Deng; Lixin Gu; Shenglong Ye; Yinan He; Zhe Chen; Songze Li; Haomin Wang; Jinhui Yin; Qi Wei; Tianshuo Yang; Min Dou; Tong He; Wenqi Shao; Kaipeng Zhang; Yi Wang; Botian Shi; Yanting Zhang; Jifeng Dai; Yu Qiao; Wenhai Wang; Hongjie Zhang; | code |
| 227 | The Lie of The Average: How Class Incremental Learning Evaluation Deceives You? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce the concept of extreme sequences and provide theoretical justification for their crucial role in the reliable evaluation of CIL. |
Guannan Lai; Da-Wei Zhou; Xin Yang; Han-Jia Ye; | code |
| 228 | Capacity-Aware Inference: Mitigating The Straggler Effect in Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, under expert parallelism, MoE suffers from inference inefficiencies due to imbalanced token-to-expert assignment, where underloaded experts complete computations early but must wait for overloaded experts, leading to global delays. We define this phenomenon as the \textbf{\textit{Straggler Effect}}, as the most burdened experts dictate the overall inference latency. |
Shwai He; Weilin Cai; Jiayi Huang; Ang Li; | code |
| 229 | Visual Autoregressive Modeling for Instruction-Guided Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present VAREdit, a visual autoregressive (VAR) framework that reframes image editing as a next-scale prediction problem. |
Qingyang Mao; Qi Cai; Yehao Li; Yingwei Pan; Mingyue Cheng; Ting Yao; Qi Liu; Tao Mei; | code |
| 230 | AlignFlow: Improving Flow-based Generative Models with Semi-Discrete Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces AlignFlow, a new approach using Semi-Discrete Optimal Transport (SDOT) to enhance FGM training by establishing explicit alignment between noise and data pairs. |
Lingkai Kong; Molei Tao; Yang Liu; Bryan Wang; Jinmiao Fu; Chien-Chih Wang; Huidong Liu; | code |
| 231 | Towards Lossless Memory-efficient Training of Spiking Neural Networks Via Gradient Checkpointing and Spike Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel and broadly applicable pipeline for memory-efficient SNN training that preserves BPTT’s accuracy. |
Yifan Huang; Wei Fang; Zecheng Hao; Zhengyu Ma; Yonghong Tian; | code |
| 232 | Calibrated Information Bottleneck for Trusted Multi-modal Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, unreliable or noisy pseudo-labels may lead to an overconfident clustering outcome. To address these challenges, this paper proposes a novel CaLibrated Information Bottleneck (CLIB) framework designed to learn a clustering that is both accurate and trustworthy. |
Shizhe Hu; Zhangwen Gou; Shuaiju Li; Jin Qin; Xiaoheng Jiang; Pei Lv; Mingliang Xu; | code |
| 233 | EmoPrefer: Can Large Language Models Understand Human Emotion Preferences? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: * To answer this, we propose **EmoPrefer**, a pioneering work exploring the potential of LLMs in decoding human emotion preferences.Specifically, we construct the first emotion preference dataset, **EmoPrefer-Data**, featuring high-quality preference annotations from experts. |
Zheng Lian; Licai Sun; Lan Chen; Haoyu Chen; Zebang Cheng; Fan Zhang; Ziyu Jia; Ziyang Ma; Fei Ma; Xiaojiang Peng; Jianhua Tao; | code |
| 234 | PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce **PrismAudio**, the first framework to integrate Reinforcement Learning into V2A generation with specialized Chain-of-Thought (CoT) planning. |
Huadai Liu; Kaicheng Luo; Wen Wang; Qian Chen; Peiwen Sun; Rongjie Huang; Xiangang Li; Jieping Ye; Wei Xue; | code |
| 235 | MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MATA (Multi-Agent hierarchical Trainable Automaton), a multi-agent system presented as hierarchical finite-state automaton for visual reasoning whose top-level transitions are chosen by a trainable hyper agent. |
Zhixi Cai; Fucai Ke; Kevin Leo; Sukai Huang; Maria Garcia de la Banda; Peter J. Stuckey; Hamid Rezatofighi; | code |
| 236 | Joint Optimization for 4D Human-Scene Reconstruction in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose JOSH, a novel optimization-based method for 4D human-scene reconstruction in the wild from monocular videos. |
Zhizheng Liu; Joe Lin; Wayne Wu; Bolei Zhou; | code |
| 237 | Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite recent progress, few existing methods offer the flexibility to adaptively balance CI and CD strategies in response to varying channel dependencies. To address this, we propose a generic plugin xCPD, that can adaptively model the channel-patch dependencies from the perspective of graph spectral decomposition. |
Dongyuan Li; Shun Zheng; Chang Xu; Jiang Bian; Renhe Jiang; | code |
| 238 | IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel KV-cache management strategy called IceCache, that integrates semantic token clustering with PagedAttention, a memory-efficient paging mechanism. |
Yuzhen Mao; Qitong Wang; Martin Ester; Ke Li; | code |
| 239 | CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Artificial Neural Networks (ANNs) can often omit BN, SNNs critically depend on it, limiting the adoption of SNNs for energy-efficient control on resource-constrained devices. To overcome this, we propose Confidence-adaptive and Re-calibration Batch Normalization (CaRe-BN), which introduces (i) a confidence-guided adaptive update strategy for BN statistics and (ii) a re-calibration mechanism to align distributions. |
Zijie Xu; Xinyu Shi; Yiting Dong; Zihan Huang; Zhaofei Yu; | code |
| 240 | Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on theoretical results from BAM, we propose Plan-and-Budget, a model-agnostic, test-time framework that decomposes complex queries into sub-questions and allocates token budgets based on estimated complexity using adaptive scheduling. |
Junhong Lin; Xinyue Zeng; Jie Zhu; Song Wang; Julian Shun; Jun Wu; Dawei Zhou; | code |
| 241 | Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, training deep SNNs has critically depended on explicit normalization schemes, leading to a trade-off between performance and biological realism. To resolve this conflict, we propose a normalization-free learning framework that incorporates lateral inhibition inspired by cortical circuits. |
Peiyu Liu; Jianhao Ding; Zhaofei Yu; | code |
| 242 | SLA: Beyond Sparsity in Diffusion Transformers Via Fine-Tunable Sparse–Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This naturally suggests applying sparse acceleration to the first part and low-rank acceleration to the second. Based on this finding, we propose SLA (**S**parse-**L**inear **A**ttention), a trainable attention method that fuses sparse and linear attention to accelerate diffusion models. |
Jintao Zhang; Haoxu Wang; Kai Jiang; Shuo Yang; Kaiwen Zheng; Haocheng Xi; Ziteng Wang; Hongzhou Zhu; Min Zhao; Ion Stoica; Joseph E. Gonzalez; Jianfei Chen; Jun Zhu; | code |
| 243 | SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on our findings, we propose $\textbf{S}$parse $\textbf{A}$utoencoder-guided $\textbf{S}$upervised $\textbf{F}$ine$\textbf{t}$uning (SASFT), which teaches LLMs to maintain appropriate pre-activation values of specific language features during training. |
Boyi Deng; Yu Wan; Baosong Yang; Fei Huang; Wenjie Wang; Fuli Feng; | code |
| 244 | To Compress or Not? Pushing The Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a theoretical and empirical study of an \emph{exponent concentration} phenomenon in GenAI weights: exponents consistently exhibit low entropy across architectures and modalities. |
Zeyu Yang; Tianyi Zhang; Jianwen Xie; Chuan Li; Zhaozhuo Xu; Anshumali Shrivastava; | code |
| 245 | Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large Language Models (LLMs) have the potential to support clinicians in this process, however, most applications of LLMs in clinical decision support suffer from one of two limitations: Either they assume the unrealistic scenario of immediate availability of all patient information and do not model the interactive and iterative investigation process, or they restrict themselves to the limited out-of-the-box capabilities of large pre-trained models without performing task-specific training. In contrast to this, we propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM, that converges towards a diagnosis via repeatedly requesting and interpreting relevant tests. |
David Bani-Harouni; Chantal Pellegrini; Ege Özsoy; Nassir Navab; Matthias Keicher; | code |
| 246 | Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel Reinforcement Learning approach that allows to directly fine-tune LLMs to express calibrated confidence estimates alongside their answers to factual questions. |
David Bani-Harouni; Chantal Pellegrini; Paul Stangel; Ege Özsoy; Kamilia Zaripova; Nassir Navab; Matthias Keicher; | code |
| 247 | DeepEyes: Incentivizing Thinking with Images Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large Vision-Language Models excel at multimodal understanding but struggle to deeply integrate visual information into their predominantly text-based reasoning processes, a key challenge in mirroring human cognition. To address this, we introduce DeepEyes, a model that learns to “think with images”, trained end-to-end with reinforcement learning and without pre-collected reasoning data for supervised fine-tuning (SFT) as a cold-start. |
Ziwei Zheng; Michael Yang; Jack Hong; Chenxiao Zhao; Guohai Xu; Le Yang; Chao Shen; XingYu; | code |
| 248 | CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: There have also emerged methods specifically designed for EICG, but they excessively rely on word-level attribute labels for guidance, which suffer from semantic incoherence, ambiguity, and limited scalability. To address these challenges, we propose CoEmoGen, a novel pipeline notable for its semantic coherence and high scalability. |
Kaishen Yuan; Yuting Zhang; Shang Gao; Yijie Zhu; Wenshuo Chen; Yutao Yue; | code |
| 249 | Naming to Learn: Class Incremental Learning for Vision-Language Model with Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we instead tackle a more realistic scenario in which only unlabeled data and the class-name set are available for each new class. |
Qiwei Li; Xiaochen Yang; Jiahuan Zhou; | code |
| 250 | VenusX: Unlocking Fine-Grained Functional Understanding of Proteins Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study introduces VenusX, the first benchmark designed to assess protein representation learning with a focus on fine-grained intra-protein functional understanding. |
Yang Tan; Wenrui Gou; Bozitao Zhong; Huiqun Yu; Liang Hong; Bingxin Zhou; | code |
| 251 | Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Any-order Any-subset Autoregressive modeling (A3), a novel sequence generation framework that generalizes standard autoregressive (AR) factorization to support the prediction of arbitrary token groups in any order. |
Tianqi Du; Lizhe Fang; Weijie Yang; Chenheng Zhang; Zeming Wei; Yifei Wang; Yisen Wang; | code |
| 252 | A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these limitations, we introduce MIMIC-Ext-CXR-QBA (abbr.We automatically generated our VQA dataset from scene graphs (also made available), which we constructed using LLM-based information extraction from radiology reports. |
Philip Müller; Friederike Jungmann; Georgios Kaissis; Daniel Rueckert; | code |
| 253 | OffTopicEval: When Large Language Models Enter The Wrong Chat, Almost Always! Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While most studies and global discussions focus on generic harms, such as models assisting users in harming themselves or others, enterprises face a more fundamental concern: whether LLM-based agents are safe for their intended use case. To address this, we introduce operational safety, defined as an LLM’s ability to appropriately accept or refuse user queries when tasked with a specific purpose. |
Jingdi Lei; Varun Gumma; Rishabh Bhardwaj; Seok Min Lim; Chuan Li; Amir Zadeh; Soujanya Poria; | code |
| 254 | Incentivizing LLM Reasoning Via Reinforcement Learning with Functional Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose ***R**einforced **F**unctional **T**oken **T**uning* (RFTT), a novel reinforced fine-tuning framework that empowers Large Language Models (LLMs) with learn-to-reason capabilities. |
Kongcheng Zhang; QI YAO; Baisheng Lai; Jiaxing Huang; Wenkai Fang; Dacheng Tao; Mingli Song; Shunyu Liu; | code |
| 255 | Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce *Distillation from Corrupted Data (DCD)*, a unified framework for learning high-fidelity, one-step generative models using **only** degraded data of the form $ y = \mathcal{A}(x) + \sigma \varepsilon, \ x\sim p_X,\ \varepsilon\sim \mathcal{N}(0,I_m), $ where the mapping $\mathcal{A}$ may be the identity or a non-invertible corruption operator (e.g., blur, masking, subsampling, Fourier acquisition). |
Yasi Zhang; Tianyu Chen; Zhendong Wang; Ying Nian Wu; Mingyuan Zhou; Oscar Leong; | code |
| 256 | EchoMotion: Unified Human Video and Motion Generation Via Dual-Modality Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation stems from the intrinsic constraints of pixel-only training objectives, which inherently bias models toward appearance fidelity at the expense of learning underlying kinematic principles. To address this, we introduce EchoMotion, a framework designed to model the joint distribution of appearance and human motion, thereby improving the quality of complex human action video generation. |
Yuxiao Yang; Hualian Sheng; Sijia Cai; Jing Lin; Jiahao Wang; Bing Deng; Junzhe Lu; Haoqian Wang; Jieping Ye; | code |
| 257 | MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce MetaEmbed, a new framework for multimodal retrieval that rethinks how multimodal embeddings are constructed and interacted with at scale. |
Zilin Xiao; Qi Ma; Mengting Gu; Chun-cheng Jason Chen; Xintao Chen; Vicente Ordonez; Vijai Mohan; | code |
| 258 | GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, to support more realistic and challenging applications, routing must extend into agentic LLM settings—where task planning, multi-round cooperation among heterogeneous agents, and memory utilization are indispensable. To address this gap, we pro- pose GraphPlanner, a heterogeneous graph-based agentic router that generates routing workflows for each query and supports both inductive and transductive inference. |
Tao Feng; Haozhen Zhang; Zijie Lei; Peixuan Han; Jiaxuan You; | code |
| 259 | Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model Based on Multiview Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents Omni-View, which extends the unified multimodal understanding and generation to 3D scenes based on multiview images, exploring the principle that “generation facilitates understanding. |
JiaKui Hu; Shanshan Zhao; Qing-Guo Chen; Xuerui Qiu; Jialun Liu; Zhao Xu; Weihua Luo; Kaifu Zhang; Yanye Lu; | code |
| 260 | Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we identify a critical yet underexplored issue in RL training: low-probability tokens disproportionately influence model updates due to their large gradient magnitudes. |
Zhihe Yang; Xufang Luo; Zilong Wang; Dongqi Han; Zhiyuan He; Dongsheng Li; Yunjian Xu; | code |
| 261 | Group-Relative REINFORCE Is Secretly An Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Off-policy reinforcement learning (RL) for large language models (LLMs) is attracting growing interest, driven by practical constraints in real-world applications, the complexity … |
Chaorui Yao; Yanxi Chen; Yuchang Sun; Yushuo Chen; Wenhao Zhang; Xuchen Pan; Yaliang Li; Bolin Ding; | code |
| 262 | Uni-X: Mitigating Modality Conflict with A Two-End-Separated Architecture for Unified Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We trace this issue to the fundamentally different low-level statistical properties of images and text, while noting that conflicts diminish in middle layers where representations become more abstract and semantically aligned. To overcome this challenge, we propose Uni-X, a two-end-separated, middle-shared architecture. |
Jitai Hao; Hao Liu; Xinyan Xiao; Qiang Huang; Jun Yu; | code |
| 263 | Out of The Memory Barrier: A Highly Memory-Efficient Training System for LLMs with Million-Token Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The primary culprits are the activations, whose memory footprints scale linearly with sequence length. We introduce OOMB, a highly memory-efficient training system that directly confronts this barrier. |
Wenhao Li; Daohai Yu; Gen Luo; Yuxin Zhang; Yifan Wu; Jiaxin Liu; Ziyang Gong; Zimu Liao; Fei Chao; Rongrong Ji; | code |
| 264 | Scale-wise Distillation of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this perspective, we introduce SwD, a scale-wise diffusion distillation framework that equips few-step models with progressive generation, avoiding redundant computations at intermediate diffusion timesteps. |
Nikita Starodubcev; Ilya Drobyshevskiy; Denis Kuznedelev; Artem Babenko; Dmitry Baranchuk; | code |
| 265 | Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL) that combines subgoal decomposition with factored structure to solve long-horizon tasks in domains with multiple entities. |
Dan Haramati; Carl Qi; Tal Daniel; Amy Zhang; Aviv Tamar; George Konidaris; | code |
| 266 | Demystifying and Enhancing The Efficiency of Large Language Model Based Search Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Second, we identify inefficiencies in system design, including improper scheduling and frequent retrieval stalls, which lead to cascading latency—where even minor delays in retrieval amplify end-to-end inference time. To address these challenges, we introduce \texttt{SearchAgent-X}, a high-efficiency inference framework for LLM-based search agents. |
Tiannuo Yang; Zebin Yao; Bowen Jin; Lixiao Cui; Yusen Li; Gang Wang; xiaoguang Liu; Willie Neiswanger; | code |
| 267 | GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches rely on fixed problem sets, which causes inefficient training and limits the model to tackle complex problems. To overcome these limitations, we propose **GAR**: *Generative Adversarial Reinforcement learning*, a comprehensive RL training framework that jointly trains the problem composer and solver in an adversarial loop. |
Ruida WANG; Jiarui Yao; Rui Pan; Shizhe Diao; Tong Zhang; | code |
| 268 | Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs Via Visual Perception Reward Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through McNemar’s test, we find that existing RLVR method fails to effectively enhance the multimodal perception capabilities of MLLMs, thereby limiting their further improvement in multimodal reasoning. To address this limitation, we propose Perception-R1, which introduces a novel visual perception reward that explicitly encourages MLLMs to perceive the visual content accurately, thereby can effectively incentivizing both their multimodal perception and reasoning capabilities. |
Tong Xiao; Xin Xu; Zhenya Huang; Hongyu Gao; Quan Liu; Qi Liu; Enhong Chen; | code |
| 269 | Joint Selection for Large-Scale Pre-Training Data Via Policy Gradient-based Mask Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in our empirical study, selecting samples based on quality metrics exhibit severe diminishing returns during long-term pre-training, while selecting on diversity metrics removes too many valuable high-quality samples, both of which limit pre-trained LLMs’ capabilities. Therefore, we introduce DATAMASK, a novel and efficient joint learning framework designed for large-scale pre-training data selection that can simultaneously optimize multiple types of metrics in a unified process, with this study focusing specifically on quality and diversity metrics. |
Ziqing Fan; Yuqiao Xian; Yan Sun; Ke Shen; Li Shen; | code |
| 270 | Learning What Reinforcement Learning Can’t: Interleaved Online Fine-Tuning for Hardest Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by the complementary strengths of RL and SFT, we introduce \textbf{ReLIFT} (\textbf{Re}inforcement \textbf{L}earning \textbf{I}nterleaved with Online \textbf{F}ine-\textbf{T}uning), a novel training strategy. |
Lu Ma; Hao Liang; Meiyi Qiang; Lexiang Tang; Xiaochen Ma; Zhen Hao Wong; Junbo Niu; Chengyu Shen; Runming He; Yanhao Li; Wentao Zhang; Bin CUI; | code |
| 271 | Rex-Thinker: Grounded Object Referring Via Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Rex-Thinker, a model that formulates object referring as an explicit CoT reasoning task. |
Qing Jiang; Xingyu Chen; Zhaoyang Zeng; Junzhi Yu; Lei Zhang; | code |
| 272 | MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes \textbf{M}ixture-\textbf{o}f-\textbf{N}ovices-and-\textbf{E}xperts (\textbf{MoNE}), a novel expert pruning method that replaces redundant experts with lightweight novices to achieve effective and robust model compression. |
Geng Zhang; Han Yuxuan; Yuxuan Lou; Yiqi Zhang; Wangbo Zhao; Yang You; | code |
| 273 | Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation stems from a structural mismatch: ***MedTS signals are inherently centralized, whereas the Transformer’s attention is decentralized***, making it less effective at capturing global synchronization and unified waveform patterns. To bridge this gap, we propose **CoTAR** (Core Token Aggregation-Redistribution), a centralized MLP-based module tailored to replace the decentralized attention. |
Guoqi Yu; Juncheng Wang; Chen Yang; Jing Qin; Angelica I Aviles-Rivero; Shujun Wang; | code |
| 274 | Dragging with Geometry: From Pixels to Geometry-Guided Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, they often produce imprecise and inconsistent edits, particularly in geometry-intensive scenarios such as rotations and perspective transformations. To address these limitations, we propose a novel geometry-guided drag-based image editing method—GeoDrag, which addresses three key challenges: 1) incorporating 3D geometric cues into pixel-level editing, 2) mitigating discontinuities caused by geometry-only guidance, and 3) resolving conflicts arising from multi-point dragging. |
Xinyu Pu; Hongsong Wang; Jie Gui; Pan Zhou; | code |
| 275 | Human Behavior Atlas: Benchmarking Unified Psychological And Social Behavior Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: On Human Behavior Atlas, we train three models: Omnisapiens-7B SFT, Omnisapiens-7B BAM, and Omnisapiens-7B RL. |
Keane Ong; Wei Dai; Carol Li; Dewei Feng; Hengzhi Li; Jingyao Wu; Jiaee Cheong; Rui Mao; Gianmarco Mengaldo; Erik Cambria; Paul Pu Liang; | code |
| 276 | SongEcho: Towards Cover Song Generation Via Instance-Adaptive Element-wise Linear Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reformulate our cover song generation as a conditional generation, which simultaneously generates new vocals and accompaniment conditioned on the original vocal melody and text prompts. |
Sifei Li; Yang Li; Zizhou Wang; Yuxin Zhang; Fuzhang Wu; Oliver Deussen; Tong-Yee Lee; Weiming Dong; | code |
| 277 | Harder Is Better: Boosting Mathematical Reasoning Via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Data-wise, augmentation approaches primarily rephrase questions to enhance diversity without systematically increasing intrinsic difficulty. To address these issues, we propose a two-dual MathForge framework to improve mathematical reasoning by targeting harder questions from both perspectives, which comprises a Difficulty-Aware Group Policy Optimization (DGPO) algorithm and a Multi-Aspect Question Reformulation (MQR) strategy. |
Yanqi Dai; Yuxiang Ji; Xiao Zhang; Yong Wang; Xiangxiang Chu; Zhiwu Lu; | code |
| 278 | Bridging Radiology and Pathology Foundation Models Via Concept-Based Multimodal Co-Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, many clinical workflows rely on joint diagnosis from heterogeneous domains, such as radiology and pathology, where fully leveraging the representation capacity of multiple FMs remains an open challenge. To address this gap, we propose Concept Tuning and Fusing (CTF), a parameter-efficient framework that uses clinically grounded concepts as a shared semantic interface to enable cross-modal co-adaptation before fusion. |
Yihang Chen; Yanyan Huang; Fuying Wang; Maximus Yeung; Yuming Jiang; Shujun Wang; Lequan Yu; | code |
| 279 | Demystifying Robot Diffusion Policies: Action Memorization and A Simple Lookup Table Alternative Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the reason for this performance remains a mystery. In this paper, we offer a surprising hypothesis: diffusion policies essentially memorize an action lookup table—\emph{and this is beneficial}. |
Chengyang He; Xu Liu; Gadiel Mark Sznaier Camps; Joseph Bruno; Guillaume Adrien Sartoretti; Mac Schwager; | code |
| 280 | VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Systematic testing on this benchmark reveals that even the most advanced MLLMs (such as GPT-5) still exhibit significant gaps compared to human experts in judgment, with a Mean Absolute Error (MAE) of 0.553 and a correlation with human ratings of only 0.428. To address this issue, we propose VisJudge, a model specifically designed for visualization aesthetics and quality assessment. |
Yupeng Xie; Zhiyang Zhang; Yifan Wu; Sirong Lu; Jiayi Zhang; Zhaoyang Yu; Jinlin Wang; Sirui Hong; Bang Liu; Chenglin Wu; Yuyu Luo; | code |
| 281 | Towards Self-Evolving Agent Benchmarks : Validatable Agent Trajectory Via Test-Time Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing agent benchmarks are showing a trend of rapid ceiling-hitting by newly developed agents, making it difficult to meet the demands for evaluating agent abilities. To address this problem, we propose the Trajectory-based Reproducible Agent- benchmark Complexity Evolution (TRACE) framework. |
Dadi Guo; Tianyi Zhou; Dongrui Liu; Chen Qian; Qihan Ren; Shuai Shao; Zhiyuan Fan; Yi R. Fung; Kun Wang; Linfeng Zhang; Jing Shao; | code |
| 282 | What Do Large Language Models Know About Opinions? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: What large language models (LLMs) know about human opinions has important implications for aligning LLMs with human values, simulating humans with LLMs, and understanding what … |
Erfan Jahanparast; Zhiqing Hong; Serina Chang; | code |
| 283 | The Geometry of LLM Quantization: GPTQ As Babai’s Nearest Plane Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai’s nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer’s inputs. |
Jiale Chen; Yalda Shabanzadeh; Elvir Crnčević; Torsten Hoefler; Dan Alistarh; | code |
| 284 | Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they still recompute the attention and feed-forward blocks for every token position at every step—even when many unmasked tokens are essentially fixed, resulting in substantial waste in compute. We propose \textbf{\textsc{SureLock}}: when the posterior at an unmasked position has stabilized across steps (our \emph{sure} condition), we \emph{lock} that position—thereafter skipping its query projection and feed-forward sublayers—while caching its attention keys and values so other positions can continue to attend to it. |
Daisuke Oba; Danushka Bollegala; Masahiro Kaneko; Naoaki Okazaki; | code |
| 285 | DRAGON: Guard LLM Unlearning in Context Via Negative Detection and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although these methods can perform well when both forget and retain data are available, few works have demonstrated equivalent capability in more practical, data-limited scenarios. To overcome these limitations, we propose Detect-Reasoning Augmented GeneratiON (DRAGON), a systematic, reasoning-based framework that utilizes in-context chain-of-thought (CoT) instructions to guard deployed LLMs before inference. |
Yaxuan Wang; Chris Yuhao Liu; Quan Liu; Jinlong Pang; Wei Wei; Yujia Bao; Yang Liu; | code |
| 286 | FlexiVoice: Enabling Flexible Style Control in Zero-Shot TTS with Natural Language Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study proposes FlexiVoice, a text-to-speech (TTS) synthesis system capable of flexible style control with zero-shot voice cloning. |
Dekun Chen; Xueyao Zhang; Yuancheng Wang; Kenan Dai; Li Ma; Zhizheng Wu; | code |
| 287 | A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on these principles, we introduce a proof-of-concept multi-call framework for MHQA, InfoQA.We construct a stringent and noise-rich benchmark to validate our theory and framework. |
Kaiyang Wan; Lang Gao; Honglin Mu; Preslav Nakov; Yuxia Wang; Xiuying Chen; | code |
| 288 | EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing state-of-the-art feature attribution methods, such as Shapley value and LIME are computationally impractical and lack robustness guarantees when applied to random subspace methods. In this work, we propose EnsembleSHAP, an intrinsically faithful and robust feature attribution for random subspace methods that reuses its computational byproducts. |
Yanting Wang; Jinyuan Jia; | code |
| 289 | Instilling An Active Mind in Avatars Via Cognitive Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current video avatar models can generate fluid animations but struggle to capture a character’s authentic essence, primarily synchronizing motion with low-level audio cues instead of understanding higher-level semantics like emotion or intent. To bridge this gap, we propose a novel framework for generating character animations that are not only physically plausible but also semantically rich and expressive. |
Jianwen Jiang; Weihong Zeng; Zerong Zheng; Jiaqi Yang; Chao Liang; Wang Liao; Han Liang; Weifeng Chen; XING WANG; Yuan Zhang; Mingyuan Gao; | code |
| 290 | Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Shop-R1, a novel reinforcement learning (RL) framework aimed at enhancing the reasoning ability of LLMs for simulation of real human behavior in online shopping environments. |
Yimeng Zhang; Tian Wang; Jiri Gesi; Ziyi Wang; Yuxuan Lu; Jiacheng Lin; Simon Sinong Zhan; Vianne R. Gao; Ruochen Jiao; Junze Liu; Kun Qian; Yuxin Tang; Ran Xue; Houyu Zhang; qingjun cui; Yufan Guo; Dakuo Wang; | code |
| 291 | Controllable First-Frame-Guided Video Editing Via Mask-Aware LoRA Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: First-frame-guided editing provides control over the first frame, but lacks fine-grained control over the edit’s subsequent temporal evolution. To address this, we propose a mask-based LoRA (Low-Rank Adaptation) tuning method that adapts pretrained Image-to-Video models for flexible video editing. |
Chenjian Gao; Lihe Ding; Xin Cai; Zhanpeng Huang; Zibin Wang; Tianfan Xue; | code |
| 292 | Exploratory Diffusion Model for Unsupervised Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the **Ex**ploratory **D**iffusion **M**odel (**ExDM**), which leverages the expressive power of diffusion models to fit diverse replay-buffer distributions, thus providing accurate density estimates and a score-based intrinsic reward that drives exploration into under-visited regions. |
Chengyang Ying; Huayu Chen; Xinning Zhou; Zhongkai Hao; Hang Su; Jun Zhu; | code |
| 293 | DexNDM: Closing The Reality Gap for Dexterous In-Hand Rotation Via Joint-Wise Neural Dynamics Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The complex, contact-rich dynamics of dexterous manipulation create a reality gap that has limited prior work to constrained scenarios involving simple geometries, limited object sizes and aspect ratios, constrained wrist poses, or customized hands. We address this sim-to-real challenge with a novel framework that enables a single policy, trained in simulation, to generalize to a wide variety of objects and conditions in the real world. |
Xueyi Liu; He Wang; Li Yi; | code |
| 294 | Human3R: Everyone Everywhere All at Once Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Human3R, a unified, feed-forward framework for online 4D human-scene reconstruction, in the world frame, from casually captured monocular videos. |
Yue Chen; Xingyu Chen; Yuxuan Xue; Anpei Chen; Yuliang Xiu; Gerard Pons-Moll; | code |
| 295 | ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis identifies two key issues of such systems: important policy-relevant cues such as targets and attack types are not hypothesized by the model as a likely explanation; and the binary reward signal is insufficient to guide reasoning. To address these challenges, we propose ExPO-HM (Explain-then-Detect Policy Optimization for Hateful Memes), inspired by the training and evaluation process of human annotators. |
Jingbiao Mei; Mingsheng Sun; Jinghong Chen; Pengda Qin; Yuhong Li; Da Chen; Bill Byrne; | code |
| 296 | Is Your Paper Being Reviewed By An LLM? Benchmarking AI Text Detection in Peer Review Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this deficiency, we introduce a comprehensive dataset containing a total of 788,984 AI-written peer reviews paired with corresponding human reviews, covering 8 years of papers submitted to each of two leading AI research conferences (ICLR and NeurIPS).To support future research and reproducibility, we will publicly release our dataset upon publication. |
Sungduk Yu; Man Luo; Avinash Madasu; Vasudev Lal; Phillip Howard; | code |
| 297 | TS-DDAE: A Novel Temporal-Spectral Denoising Diffusion AutoEncoder for Wireless Signal Recognition Model Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they either apply the mask-reconstruction pre-training strategy, which may disrupt intricate local dependencies of signals, or overlook latent spectral characteristics. Therefore, in this paper, we follow the diffusion models and propose a pre-training framework for WSR, named the Temporal-Spectral Denoising Diffusion AutoEncoder (TS-DDAE), which learns signal representations by corrupting signals with temporal and spectral noise, and then reconstructing the original data with a learned neural network. |
Yaoqi Liu; Jin Wang; Hui Wang; Chuan Shi; | code |
| 298 | Escaping Low-Rank Traps: Interpretable Visual Concept Learning Via Implicit Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, we identify a critical and pervasive challenge that undermines this process: \emph{representational collapse}, where visual patch features degenerate into a low-rank subspace during training, severely degrading the quality of learned concept activation vectors, thus hindering both model interpretability and downstream performance. To address these issues, we propose Implicit Vector Quantization (IVQ), a lightweight regularizer that maintains high-rank, diverse representations throughout training. |
Shujian Gao; Yuan Wang; Chenglong Ma; Xin Gao; Jiangtao Yan; Junzhi Ning; Cheng Tang; Changkai Ji; Huihui Xu; Wei Li; Ziyan Huang; Jiashi Lin; Ming Hu; Jiyao Liu; Wenhao Tang; Ye Du; Tianbin Li; Jin Ye; Junjun He; | code |
| 299 | Diverse Text Decoding Via Iterative Reweighting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Reweighting-based Iterative DEcoding (OverRIDE) approach that dynamically adjusts the decoding process with history responses. |
Ruiqi Shi; Sinno Jialin Pan; | code |
| 300 | PySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multi-modal Large Language Models (MLLMs) have demonstrated strong capabilities in general-purpose perception and reasoning, but they still struggle with tasks that require spatial understanding of the 3D world. To address this, we introduce pySpatial, a visual programming framework that equips MLLMs with the ability to interface with spatial tools via Python code generation. |
Zhanpeng Luo; Ce Zhang; Silong Yong; Cunxi Dai; Qianwei Wang; Haoxi Ran; Guanya Shi; Katia P. Sycara; Yaqi Xie; | code |
| 301 | STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While naively generating a complete chain-of-thought (CoT) reasoning before starting to talk can enable thinking for SLMs, this induces additional latency for the speech response, as the CoT reasoning can be arbitrarily long. To solve this issue, we propose STITCH, a novel generation method that alternates between the generation of unspoken reasoning chunks and spoken response chunks. |
Cheng-Han Chiang; Xiaofei Wang; Linjie Li; Chung-Ching Lin; Kevin Lin; Shujie LIU; Zhendong Wang; Zhengyuan Yang; Hung-yi Lee; Lijuan Wang; | code |
| 302 | RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these manually predefined, task-agnostic frameworks are applied uniformly across diverse tasks, lacking adaptability. To improve this, we propose **RL-of-Thoughts (RLoT)**, where we train a lightweight navigator model with reinforcement learning (RL) to generate task-adaptive logical structures at inference time, enhancing LLM reasoning. |
Qianyue Hao; Sibo Li; Jian Yuan; Yong Li; | code |
| 303 | UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose UniLIP, a unified framework that adapts CLIP for multimodal understanding, generation and editing. |
Hao Tang; Chen-Wei Xie; Xiaoyi Bao; Tingyu Weng; Pandeng Li; Yun Zheng; Liwei Wang; | code |
| 304 | Improving Long-Range Interactions in Graph Neural Simulators Via Hamiltonian Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these challenges, we propose Information-preserving Graph Neural Simulators (IGNS), a graph-based neural simulator built on the principles of Hamiltonian dynamics.To evaluate these properties systematically, we introduce new benchmarks that target long-range dependencies and challenging external forcing scenarios. |
Tai Hoang; Alessandro Trenta; Alessio Gravina; Niklas Freymuth; Philipp Becker; Davide Bacciu; Gerhard Neumann; | code |
| 305 | Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we propose \textbf{Interp3D}, a novel training-free framework for textured 3D morphing.For comprehensive evaluations, we construct a dedicated dataset, Interp3DData, with graded difficulty levels and assess generation results from fidelity, transition smoothness, and plausibility. |
Xiaolu Liu; Yicong Li; Qiyuan He; Jiayin Zhu; Wei Ji; Angela Yao; Jianke Zhu; | code |
| 306 | Constantly Improving Image Models Need Constantly Improving Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we present ECHO, a framework for constructing benchmarks directly from real-world evidence of model use: social media posts that showcase novel prompts and qualitative user judgments.Applying this framework to GPT-4o Image Gen, we construct a dataset of over 35,000 prompts curated from such posts. |
Jiaxin Ge; Grace Luo; Heekyung Lee; Nishant Malpani; Long Lian; XuDong Wang; Aleksander Holynski; Trevor Darrell; Sewon Min; David M. Chan; | code |
| 307 | SelfReflect: Can LLMs Communicate Their Internal Answer Distribution? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Instead of generating a single answer and then hedging it, an LLM that is fully transparent to the user needs to be able to reflect on its internal belief distribution and output a summary of all options it deems possible, and how likely they are. To test whether LLMs possess this capability, we develop the SelfReflect metric, an information-theoretic distance between a given summary and a distribution over answers. |
Michael Kirchhof; Luca Füger; Adam Golinski; Eeshan Gunesh Dhekane; Arno Blaas; Seong Joon Oh; Sinead Williamson; | code |
| 308 | Omni-IML: Towards Unified Interpretable Image Manipulation Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Omni-IML, the first generalist model designed to unify IML across diverse tasks.We will release our code and dataset. |
Chenfan Qu; Yiwu Zhong; Fengjun Guo; Lianwen Jin; | code |
| 309 | Test-Time Training Done Right Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This approach improves hardware utilization by orders of magnitude, and more importantly, facilitates scaling of nonlinear state size (up to 40% of model parameter size), hence substantially improving state capacity, all without requiring cumbersome and error-prone custom kernel implementations. |
Tianyuan Zhang; Sai Bi; Yicong Hong; Kai Zhang; Fujun Luan; Songlin Yang; Kalyan Sunkavalli; William T. Freeman; Hao Tan; | code |
| 310 | Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, they are inherently limited as altering user text to hide sensitive cues still allows attribute inference to occur through models’ reasoning capabilities. To address these limitations, we propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS). |
Dong Yan; Jian Liang; Ran He; Tieniu Tan; | code |
| 311 | FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FrontierCO, a benchmark for evaluating ML-based CO solvers under real-world structure and extreme scale. |
Shengyu Feng; Weiwei Sun; Shanda Li; Ameet Talwalkar; Yiming Yang; | code |
| 312 | Beyond Visual Reconstruction Quality: Object Perception-aware 3D Gaussian Splatting for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such a limitation can significantly harm the applicability of reconstruction in the ADS domain. To address this gap, we propose two complementary solutions: a perception-aligned loss, which directly leverages the output differences between reconstructed and ground truth images during the training process; and an object zone quality loss, which specifically reinforces the training on object locations identified by the perception model on ground-truth images. |
Renzhi Wang; Yuxiang Fu; Wuqi Wang; Haigen Min; Wei Feng; Lei Ma; Qing Guo; | code |
| 313 | EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a comprehensive and professional benchmark for the Earth sciences, designed to evaluate the capabilities of LLMs in scientific exploration within this domain, spanning from fundamental to advanced levels.Leveraging a corpus of 100,000 research papers, we first construct two Question Answering (QA) datasets: Earth-Iron, which offers extensive question coverage for broad assessment, and Earth-Silver, which features a higher level of difficulty to evaluate professional depth. |
Wanghan Xu; Xiangyu Zhao; Yuhao Zhou; Xiaoyu Yue; Ben Fei; Fenghua Ling; Wenlong Zhang; LEI BAI; | code |
| 314 | Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Even with this engineering, these approaches still struggle to reliably solve long-horizon, dexterous manipulation tasks. To provide a seamless tool for robotic data generation in simulation, we introduce a simple framework that enables on-policy reinforcement learning to reliably solve an array of such tasks with a single reward function, set of algorithm hyper-parameters, no auto-curricula, and no human demonstrations. |
Patrick Yin; Tyler Westenbroek; Zhengyu Zhang; Ignacio Dagnino; Eeshani Shilamkar; Numfor Mbiziwo-Tiapo; Simran Bagaria; Xinlei Liu; Galen Mullins; Andrey Kolobov; Abhishek Gupta; | code |
| 315 | ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite notable progress under supervised training, GR still struggles to generalize to zero-shot IR scenarios, which are prevalent in real-world applications. To tackle this challenge, we propose ZeroGR, a zero-shot generative retrieval framework that leverages natural language instructions to extend GR across a wide range of IR tasks. |
Weiwei Sun; Keyi Kong; Xinyu Ma; Shuaiqiang Wang; Dawei Yin; Maarten de Rijke; Zhaochun Ren; Yiming Yang; | code |
| 316 | Attention, Please! Revisiting Attentive Probing Through The Lens of Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit attentive probing through the lens of the accuracy vs. parameter-efficiency trade-off. |
Bill Psomas; Dionysis Christopoulos; Eirini Baltzi; Ioannis Kakogeorgiou; Tilemachos Aravanis; Nikos Komodakis; Konstantinos Karantzalos; Yannis Avrithis; Giorgos Tolias; | code |
| 317 | Attention As A Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a novel PSRL framework (AttnRL), which enables efficient exploration for reasoning models. |
Runze Liu; Jiakang Wang; Yuling Shi; Zhihui Xie; Chenxin An; Kaiyan Zhang; Jian Zhao; Xiaodong Gu; Lei Lin; Wenping Hu; Xiu Li; Fuzheng Zhang; Guorui Zhou; Kun Gai; | code |
| 318 | SAE As A Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To open up the black box, we propose the SAE-based Transferability Score (STS), a new metric that leverages sparse autoencoders (SAEs) to forecast post-training transferability. |
Qi Zhang; Yifei Wang; Xiaohan Wang; Jiajun Chai; Guojun Yin; Wei Lin; Yisen Wang; | code |
| 319 | RAP: 3D Rasterization Augmented End-to-End Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we argue that photorealism is unnecessary for training end-to-end planners. |
Lan Feng; Yang Gao; Eloi Zablocki; Quanyi Li; Wuyang Li; Sichao Liu; Matthieu Cord; Alexandre Alahi; | code |
| 320 | InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While trainable sparse attention methods offer a promising solution, existing approaches such as NSA introduce excessive extra parameters and disrupt the conventional pretrain-on-short, finetune-on-long workflow, resulting in slow convergence and difficulty in acceleration. To overcome these limitations, we introduce Dense-Sparse Switchable Attention framework (DSSA), a trainable sparse attention that seamlessly adapts models from short to long sequences. |
Weilin Zhao; Zihan Zhou; Zhou su; Chaojun Xiao; Yuxuan Li; Yanghao Li; Yudi Zhang; Weilun Zhao; Zhen Li; Yuxiang Huang; Ao Sun; Xu Han; Zhiyuan Liu; | code |
| 321 | Urban Socio-Semantic Segmentation with Vision-Language Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we achieve socio-semantic segmentation by vision-language model reasoning. |
Yu Wang; Yi Wang; Rui Dai; Yujie Wang; Kaikui Liu; Xiangxiang Chu; Yansheng Li; | code |
| 322 | Hallucination-aware Intermediate Representation Edit in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These factors hinder their practical applicability. To address the above issue, we propose a framework for dynamically detecting hallucination representations and performing hallucination-eliminating edits on these representations. |
Wei Suo; Hanzu Zhang; Lijun Zhang; Ji Ma; PENG WANG; Yanning Zhang; | code |
| 323 | Empowering Multi-Robot Cooperation Via Sequential World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, extending MBRL to physical multi-robot cooperation remains challenging due to the complexity of joint dynamics. To address this challenge, we propose the Sequential World Model (**SeqWM**), a novel framework that integrates the sequential paradigm into multi-robot MBRL. |
Zijie Zhao; Honglei Guo; Shengqian Chen; Kaixuan Xu; Bo Jiang; Yuanheng Zhu; Dongbin Zhao; | code |
| 324 | PropensityBench: Evaluating Latent Safety Risks in Large Language Models Via An Agentic Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present **PropensityBench**, a novel benchmark framework that assesses the proclivity of models to engage in risky behaviors when equipped with simulated dangerous capabilities using proxy tools. |
Udari Madhushani Sehwag; Shayan Shabihi; Alex McAvoy; Vikash Sehwag; Yuancheng Xu; Dalton towers; Furong Huang; | code |
| 325 | HUMOF: Human Motion Forecasting in Interactive Social Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose an effective method for human motion forecasting in dynamic scenes. |
Caiyi Sun; Yujing Sun; Xiao Han; Zemin Yang; Jiawei Liu; Xinge Zhu; SM Yiu; Yuexin Ma; | code |
| 326 | Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we revisit classical normalizing flows in the context of BGs that offer efficient sampling and likelihoods, but whose training via maximum likelihood is often unstable and computationally challenging. |
Danyal Rehman; Oscar Davis; Jiarui Lu; Jian Tang; Michael M. Bronstein; Yoshua Bengio; Alexander Tong; Joey Bose; | code |
| 327 | Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our findings demonstrate that low-level perception supports faithful high-level reasoning in mathematical MLLMs. We provide both methodological frameworks and empirical evidence to guide future research in this direction. |
Yanpeng Sun; Shan Zhang; Wei Tang; Aotian Chen; Piotr Koniusz; Kai Zou; Yuan Xue; Anton van den Hengel; | code |
| 328 | P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches often struggle with semantic richness, structural nuances, and lack standardized benchmarks for evaluating generated academic posters comprehensively. To address these limitations, we introduce P2P, the first flexible, LLM-based multi-agent framework that generates high-quality, HTML-rendered academic posters directly from research papers. |
Tao Sun; Enhao Pan; Zhengkai Yang; Kaixin Sui; Jiajun Shi; Xianfu Cheng; Tongliang Li; Ge Zhang; Wenhao Huang; Jian Yang; Zhoujun Li; | code |
| 329 | LogiConBench: Benchmarking Logical Consistencies of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To strengthen its evaluative significance, we evaluate 14 frontier LLMs on two tasks with varying difficulty levels, and find that the Enumerative task remains extremely challenging, with the best exact accuracy as only 34%.While we release a 280K-sample corpus in this work, the framework can be scaled to generate unlimited data. |
Zheng CHEN; Chuan Zhou; Fengxiang Cheng; Yip Tin Po; Fenrong Liu; Yisen Wang; Jiajun Chai; Xiaohan Wang; Guojun Yin; Wei Lin; Bo Li; Haoxuan Li; Zhouchen Lin; | code |
| 330 | Horseshoe Splatting: Handling Structural Sparsity for Uncertainty-Aware Gaussian-Splatting Radiance Field Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Horseshoe Splatting, a Bayesian extension of 3D Gaussian Splatting (3DGS) that jointly addresses structured sparsity in per-splat covariances and delivers calibrated uncertainty. |
Feng Wu; Tsai Hor Chan; Yihang Chen; Lingting Zhu; Guosheng Yin; Lequan Yu; | code |
| 331 | Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, due to a lack of controllability, a single observation may yield numerous plausible but redundant or irrelevant hypotheses on large-scale knowledge graphs. To address this limitation, we introduce the task of controllable hypothesis generation to improve the practical utility of abductive reasoning. |
Yisen Gao; Jiaxin Bai; Tianshi Zheng; Ziwei Zhang; Qingyun Sun; Xingcheng Fu; Jianxin Li; Yangqiu Song; | code |
| 332 | Modality-free Graph In-context Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce **M**odality-**F**ree **G**raph **I**n-context **A**lignment (MF-GIA), a framework that makes a pretrained graph encoder promptable for few-shot prediction across heterogeneous domains without modality assumptions. |
Wei Zhuo; Siqiang Luo; | code |
| 333 | Q-RAG: Long Context Multi‑Step Retrieval Via Value‑Based Embedder Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). |
Artyom Sorokin; Nazar Buzun; Aleksandr Anokhin; Egor KONSTANTINOVICH VEDERNIKOV; Petr Anokhin; Mikhail Burtsev; Evgeny Burnaev; | code |
| 334 | Modeling The Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accurate detection of all pathological findings in 3D medical images remains a significant challenge, as supervised models are limited to detecting only the few pathology classes annotated in existing datasets. To address this, we frame pathology detection as an unsupervised visual anomaly segmentation (UVAS) problem, leveraging the inherent rarity of pathological patterns compared to healthy ones. |
Mikhail Goncharov; Eugenia Soboleva; Daniil Ignatyev; Mariia Donskova; Mikhail Belyaev; Ivan Oseledets; Marina Munkhoeva; Maxim Panov; | code |
| 335 | PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce PoliCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to draft consensus resolutions based on divergent party positions under varying collective decision-making contexts and political requirements. |
Zhaowei Zhang; Xiaobo Wang; Minghua Yi; Mengmeng Wang; Fengshuo Bai; Zilong Zheng; Yipeng Kang; Yaodong Yang; | code |
| 336 | GeoBench: Rethinking Multimodal Geometric Problem-Solving Via Hierarchical Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current evaluations of geometric reasoning in vision-language models (VLMs) face limitations, including the risk of test data contamination from textbook-based benchmarks, overemphasis on final answers over reasoning processes, and insufficient diagnostic granularity. To address these issues, we present GeoBench, a hierarchical benchmark featuring four reasoning levels in geometric problem-solving: Visual Perception, Goal-Oriented Planning, Rigorous Theorem Application, and Self-Reflective Backtracking. |
Yuan Feng; Yue Yang; Xiaohan He; Jiatong Zhao; Jianlong Chen; Daocheng Fu; Qi Liu; Renqiu Xia; Bo Zhang; Junchi Yan; | code |
| 337 | LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multimodal Large Language Models (MLLMs) have achieved significant advances in integrating visual and linguistic information, yet their ability to reason about complex and real-world scenarios remains limited. |
Ruilin Yao; Bo Zhang; Jirui Huang; Xinwei Long; Yifang Zhang; Tianyu Zou; Shili Xiong; Yi Rong; Yufei Wu; Shichao Su; Yifan Xu; Wenxi Zeng; Zhaoyu Yang; Guoyou Li; Shilan Zhang; Zichan Li; Yaxiong Chen; Shengwu Xiong; Peng Xu; Jiajun Zhang; Bowen Zhou; David A. Clifton; Luc Van Gool; | code |
| 338 | End-to-end Listen, Look, Speak and Act Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present ELLSA (End-to-end Listen, Look, Speak and Act), which, to our knowledge, is the first full-duplex, end-to-end model that simultaneously perceives and generates across vision, text, speech, and action within a single architecture, enabling interaction patterns previously out of reach, yielding more natural, human-like behaviors. |
Siyin Wang; Wenyi Yu; Xianzhao Chen; Xiaohai Tian; Jun Zhang; Lu Lu; Yuxuan Wang; Chao Zhang; | code |
| 339 | ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This approach entangles timescales with video content, thereby hindering a clear assessment of MLLM multi-timescale performance. To address this, we introduce ScaleLong, the first benchmark to disentangle these factors by embedding questions targeting four hierarchical timescales\textemdash clip (seconds), shot (tens of seconds), event (minutes), and story (hours)\textemdash all within the same video content. |
David Ma; Huaqing Yuan; Xingjian Wang; Qianbo Zang; Tianci Liu; Xinyang He; Yanbin Wei; Jiawei Guo; nijiahui; Zhenzhu Yang; Meng Cao; Shanghaoran Quan; Yizhi LI; Wangchunshu Zhou; Jiaheng Liu; Wenhao Huang; Ge Zhang; Shiwen Ni; Xiaojie Jin; | code |
| 340 | IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Further analysis reveals key factors influencing model performance on IV-Bench, including inference pattern, frame number, and resolution. These findings collectively provide valuable insights for future research. |
David Ma; Yuanxing Zhang; JinCheng Ren; Jiawei Guo; Yifan Yao; Zhenlin Wei; Zhenzhu Yang; Zhongyuan Peng; Boyu Feng; Jun Ma; 顾潇; King Zhu; Zhoufutu Wen; Yancheng He; Meng Cao; Wangchunshu Zhou; Shiwen Ni; Jiaheng Liu; Wenhao Huang; Ge Zhang; Xiaojie Jin; | code |
| 341 | Language in The Flow of Time: Time-Series-Paired Texts Weaved Into A Unified Temporal Narrative Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this context, we identify that time-series-paired texts may naturally exhibit periodic properties that closely mirror those of the original time series. Building on this insight, we propose a novel framework, Texts as Time Series (TaTS), which considers the time-series-paired texts to be auxiliary variables of the time series. |
Zihao Li; Xiao Lin; Zhining Liu; Jiaru Zou; Ziwei Wu; Lecheng Zheng; Dongqi Fu; Yada Zhu; Hendrik Hamann; Hanghang Tong; Jingrui He; | code |
| 342 | An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet progress in this field is constrained by data scarcity and incomplete understanding of mechanisms of action, which limit the transition from experience-based design to AI-driven approaches. To address these challenges, we present the first benchmark dedicated to adjuvants, constructed in an open-ended Q\&A format and annotated by domain experts. |
yi chen; Yu Zhang; Jian Xu; Hua Yue; Xinming Wang; Zequan Lyu; Xu-Yao Zhang; Wei Wei; Cheng-Lin Liu; | code |
| 343 | Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This lack of diversity stems primarily from token-level stochastic sampling, where local variations are likely to collapse into near-identical reasoning paths. To address this limitation, we propose Lookahead Tree-Based Rollouts (LATR), a novel rollout strategy designed to explicitly promotes trajectory-level diversity by enforcing branching into different candidate tokens likely to yield distinct continuations. |
Shangyu Xing; Siyuan Wang; Chenyuan Yang; Xinyu Dai; Xiang Ren; | code |
| 344 | RFS: Reinforcement Learning with Residual Flow Steering for Dexterous Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose an efficient reinforcement learning (RL) framework for fast adaptation of pretrained generative policies. |
Entong Su; Tyler Westenbroek; Anusha Nagabandi; Abhishek Gupta; | code |
| 345 | Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Virne, a comprehensive benchmarking framework designed to accelerate the research and application of deep RL for NFV-RA. |
Tianfu Wang; Liwei Deng; Xi Chen; Junyang Wang; Huiguo He; Zhengyu Hu; Wei Wu; Leilei Ding; Qilin Fan; Hui Xiong; | code |
| 346 | Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we analyze Transformers through the lens of rank structure. |
Annan Yu; Danielle C. Maddix; Boran Han; Xiyuan Zhang; Abdul Fatir Ansari; Oleksandr Shchur; Christos Faloutsos; Andrew Gordon Wilson; Michael W. Mahoney; Bernie Wang; | code |
| 347 | $\pi^3$: Permutation-Equivariant Visual Geometry Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce $\pi^3$, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. |
Yifan Wang; Jianjun Zhou; Haoyi Zhu; Wenzheng Chang; Yang Zhou; Zizun Li; Junyi Chen; Jiangmiao Pang; Chunhua Shen; Tong He; | code |
| 348 | MENLO: From Preferences to Proficiency – Evaluating and Modeling Native-like Quality Across 47 Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we introduce MENLO, a framework that operationalizes the evaluation of native-like response quality based on audience design-inspired mechanisms.Using MENLO, we create a dataset of 6,423 human-annotated prompt–response preference pairs covering four quality dimensions with high inter-annotator agreement in 47 language varieties.We release our dataset and evaluation framework to support further research in multilingual LLM evaluation. |
Chenxi Whitehouse; Sebastian Ruder; Tony Zhiyang Lin; Oksana Kurylo; Haruka Takagi; Janice Lam; Nicolò Busetto; Denise Diaz; | code |
| 349 | TianQuan-S2S: A Subseasonal-to-Seasonal Global Weather Model Via Incorporate Climatology State Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent data-driven studies have shown promising results, but their performance is limited by the inadequate incorporation of climate states and a model tendency to degrade, progressively losing fine-scale details and yielding over-smoothed forecasts. To overcome these limitations, we propose TianQuan-S2S, a global S2S forecasting model that integrates initial weather states with climatological means via incorporating climatology into patch embedding and enhancing variability capture through an uncertainty-augmented Transformer. |
Guowen Li; Xintong Liu; Yang Liu; Mengxuan Chen; Shilei Cao; Xuehe Wang; Juepeng Zheng; Jinxiao Zhang; Haoyuan Liang; Lixian Zhang; Jiuke Wang; Meng Jin; Hong Cheng; Haohuan Fu; | code |
| 350 | Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For the general case, we introduce a rigorous lossless procedure that leverages BPE recursive structure, complemented by a fast approximation that keeps large-vocabulary settings practical. |
Buu Phan; Ashish J Khisti; Karen Ullrich; | code |
| 351 | Pretrain Value, Not Reward: Decoupled Value Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore how directly pretraining a value model simplifies and stabilizes reinforcement learning from human feedback (RLHF). |
Chenghua Huang; Lu Wang; Fangkai Yang; Pu Zhao; Qingwei Lin; Dongmei Zhang; Saravan Rajmohan; | code |
| 352 | Preference Leakage: A Contamination Problem in LLM-as-a-judge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. |
Dawei Li; Renliang Sun; Yue Huang; Ming Zhong; Bohan Jiang; Jiawei Han; Xiangliang Zhang; Wei Wang; huan liu; | code |
| 353 | GenCompositor: Generative Video Compositing with Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This new task strives to adaptively inject identity and motion information of foreground video to the target video in an interactive manner, allowing users to customize the size, motion trajectory, and other attributes of the dynamic elements added in final video. Specifically, we designed a novel Diffusion Transformer (DiT) pipeline based on its intrinsic properties. |
Shuzhou Yang; Xiaoyu Li; Xiaodong Cun; Guangzhi Wang; Lingen Li; Ying Shan; Jian Zhang; | code |
| 354 | JanusCoder: Towards A Foundational Visual-Programmatic Interface for Code Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, progress has been impeded by the scarcity of high-quality multimodal code data, a bottleneck stemming from challenges in synthesis and quality assessment. To address these challenges, we make contributions from both a data and modeling perspective. |
Qiushi Sun; Jingyang Gong; Yang Liu; Qiaosheng Chen; Lei Li; Kai Chen; Qipeng Guo; Ben Kao; Fei Yuan; | code |
| 355 | Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation stems from existing AIGI detection benchmarks, which, despite featuring a broad collection of synthetic images, remain restricted in their coverage of artifact diversity and lack detailed, localized annotations. To bridge this gap, we introduce a fine-grained benchmark towards eXplainable AI-Generated image Detection, named X-AIGD, which provides pixel-level, categorized annotations of perceptual artifacts, spanning low-level distortions, high-level semantics, and cognitive-level counterfactuals. |
Yao Xiao; Weiyan Chen; Jiahao Chen; Zijie Cao; Weijian Deng; Binbin Yang; ZiYi Dong; Xiangyang Ji; Wei Ke; Pengxu Wei; Liang Lin; | code |
| 356 | Einstein Fields: A Neural Perspective To Computational General Relativity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce *Einstein Fields*, a neural representation designed to compress computationally intensive *four-dimensional* numerical relativity simulations into compact implicit neural network weights. |
Sandeep Suresh Cranganore; Andrei Bodnar; Arturs Berzins; Johannes Brandstetter; | code |
| 357 | ProxyThinker: Test-Time Guidance Through Small Visual Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose ProxyThinker, an inference-time technique that enables large models to inherit the visual reasoning capabilities from small, slow-thinking visual reasoners without any training. |
Zilin Xiao; Jaywon Koo; Siru Ouyang; Jefferson Hernandez; Yu Meng; Vicente Ordonez; | code |
| 358 | End-to-End Probabilistic Framework for Learning with Hard Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present ProbHardE2E, a probabilistic forecasting framework that incorporates hard operational/physical constraints and provides uncertainty quantification. |
Utkarsh Utkarsh; Danielle C. Maddix; Ruijun Ma; Michael W. Mahoney; Bernie Wang; | code |
| 359 | MindMix: A Multimodal Foundation Model for Auditory Perception Decoding Via Deep Neural-Acoustic Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, the lack of deep coupling between neural signals and auditory inputs hampers the models’ ability to generalize effectively across diverse auditory tasks. To bridge this gap, we introduce MindMix, a multimodal foundation model designed to bridge the gap between unimodal EEG foundations and task-specific auditory decoders. |
RUI LIU; Zhige Chen; Pengshu; Wenlong You; Zhi-An Huang; Jibin Wu; KC Tan; | code |
| 360 | AudioTrust: Benchmarking The Multifaceted Trustworthiness of Audio Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that significant trustworthiness risks in ALLMs arise from non-semantic acoustic cues, such as timbre, accent, and background noise, which can be used to manipulate model behavior. To address this gap, we propose AudioTrust, the first framework for large-scale and systematic evaluation of ALLM trustworthiness concerning these audio-specific risks. |
Kai Li; Can Shen; Yile Liu; Jirui Han; Kelong zheng; Xuechao Zou; Lionel Z. WANG; Shun Zhang; Xingjian Du; Hanjun Luo; Yingbin Jin; Xinxin Xing; Ziyang Ma; Yue Liu; YiFan Zhang; Junfeng Fang; Kun Wang; Yibo Yan; Gelei Deng; Haoyang LI; Yiming Li; Xiaobin Zhuang; Tianlong Chen; Qingsong Wen; Tianwei Zhang; Yang Liu; Haibo Hu; Zhizheng Wu; Xiaolin Hu; Eng Siong Chng; Wenyuan Xu; XiaoFeng Wang; Wei Dong; Xinfeng Li; | code |
| 361 | ProxyAttn: Guided Sparse Attention Via Representative Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose ProxyAttn, a training-free sparse attention algorithm that achieves token-level estimation by compressing the dimension of attention heads. |
Yixuan Wang; Huang He; Siqi Bao; Hua Wu; Haifeng Wang; Qingfu Zhu; Wanxiang Che; | code |
| 362 | MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant context size when solving long multi-turn tasks. |
Zijian Zhou; Ao Qu; Zhaoxuan Wu; Sunghwan Kim; Alok Prakash; Daniela Rus; Bryan Kian Hsiang Low; Paul Pu Liang; | code |
| 363 | Generalization of Diffusion Models Arises with A Balanced Representation Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Practically, we propose a representation-based memorization detection method and a simple representation-steering method that enables controllable editing of generalized samples. |
Zekai Zhang; Xiao Li; Xiang Li; Lianghe Shi; Meng Wu; Molei Tao; Qing Qu; | code |
| 364 | Hilbert: Recursively Building Formal Proofs with Informal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, a significant gap remains: current prover LLMs solve substantially fewer problems than general-purpose LLMs operating in natural language. We introduce Hilbert, an agentic framework that bridges this gap by combining the complementary strengths of informal reasoning and formal verification. |
Sumanth Varambally; Thomas Voice; Yanchao Sun; Zhifeng Chen; Rose Yu; Ke Ye; | code |
| 365 | LEXam: Benchmarking Legal Reasoning on 340 Law Exams Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Long-form legal reasoning remains a key challenge for large language models (LLMs) in spite of recent advances in test-time scaling. To address this, we introduce ***LEXam***, a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels. |
Yu Fan; Jingwei Ni; Jakob Merane; Yang Tian; Yoan Hermstrüwer; Yinya Huang; Mubashara Akhtar; Etienne Salimbeni; Florian Geering; Oliver Dreyer; Daniel Brunner; Markus Leippold; Mrinmaya Sachan; Alexander Stremitzer; Christoph Engel; Elliott Ash; Joel Niklaus; | code |
| 366 | Do We Need All The Synthetic Data? Targeted Image Augmentation Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that synthetically augmenting part of the data that is not learned early in training with faithful images—containing same features but different noise—outperforms augmenting the entire dataset. |
Dang Nguyen; Jiping Li; Jinghao Zheng; Baharan Mirzasoleiman; | code |
| 367 | HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from The Energy Spectra of Non-Hermitian Crystals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their significance as fingerprints for electronic behavior, their systematic study has been intractable due to the reliance on manual extraction. To unlock this potential, we introduce $\textbf{Poly2Graph}$ (https://github.com/sarinstein-yan/Poly2Graph): a high-performance, open-source pipeline that automates the mapping of 1-D crystal Hamiltonians to spectral graphs. |
Xianquan Yan; Hakan Akgün; Kenji Kawaguchi; N. Duane Loh; Ching Hua Lee; | code |
| 368 | Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose PAINET, a principled SE(3)-equivariant neural architecture for learning all-pair interactions in multi-body systems. |
Kai Yang; Yuqi Huang; Junheng Tao; Wanyu Wang; Qitian Wu; | code |
| 369 | Toward Complex-Valued Neural Networks for Waveform Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present ComVo, a Complex-valued neural Vocoder whose generator and discriminator use native complex arithmetic. |
Hyung-Seok Oh; Deok-Hyeon Cho; Seung-Bin Kim; Seong-Whan Lee; | code |
| 370 | MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Zero-shot anomaly detection (ZSAD) often leverages pretrained vision or vision-language models, but many existing methods use prompt learning or complex modeling to fit the data distribution, resulting in high training or inference cost and limited cross-domain stability. To address these limitations, we propose Memory-Retrieval Anomaly Detection method (MRAD), a unified framework that replaces parametric fitting with a direct memory retrieval. |
Chaoran Xu; Chengkan Lv; Qiyu Chen; Feng Zhang; Zhengtao Zhang; | code |
| 371 | DeepAFL: Deep Analytic Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, to enable representable analytic models while preserving the ideal invariance to data heterogeneity for FL, we propose our Deep Analytic Federated Learning approach, named DeepAFL. |
Jianheng Tang; Yajiang Huang; Kejia Fan; Feijiang Han; Jiaxu Li; Jinfeng Xu; Run He; Anfeng Liu; Houbing Herbert Song; Huiping Zhuang; Yunhuai Liu; | code |
| 372 | Plan-R1: Safe and Feasible Trajectory Planning As Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the success of large language models, we propose Plan-R1, a two-stage trajectory planning framework that decouples principle alignment from behavior learning. |
Xiaolong Tang; Meina Kan; Shiguang Shan; Xilin Chen; | code |
| 373 | SliderQuant: Accurate Post-Training Quantization for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we address post-training quantization (PTQ) for large language models (LLMs) from an overlooked perspective: given a pre-trained high-precision LLM, the predominant sequential quantization framework treats different layers equally, but this may be not optimal in challenging bit-width settings. |
Shigeng Wang; Chao Li; Yangyuxuan Kang; Jiawei Fan; Zhonghong Ou; Anbang Yao; | code |
| 374 | MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Mixture-of-Groups Attention (MoGA), an efficient sparse attention that uses a lightweight, learnable token router to precisely match tokens without blockwise estimation. |
Weinan Jia; Yuning Lu; Mengqi Huang; Hualiang Wang; Binyuan Huang; Nan Chen; Mu Liu; Jidong Jiang; Zhendong Mao; | code |
| 375 | Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce **OC-STORM**, an object-centric MBRL framework that enhances a learned world model with object representations extracted by a pretrained segmentation network. |
Weipu Zhang; Adam Jelley; Trevor McInroe; Amos Storkey; Gang Wang; | code |
| 376 | Enhancing Geometric Perception in VLMs Via Translator-Guided Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Vision-language models (VLMs) often struggle with geometric reasoning due to their limited perception of fundamental diagram elements. To tackle this challenge, we introduce GeoPerceive, a benchmark comprising diagram instances paired with domain-specific language (DSL) representations, along with an efficient automatic data generation pipeline. |
Hao Yu; Shuning Jia; Guanghao Li; Wenhao Jiang; Chun Yuan; | code |
| 377 | LLaVA-4D: Embedding SpatioTemporal Prompt Into LMMs for 4D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose LLaVA-4D, a general LMM framework with a novel spatiotemporal prompt for visual representation in 4D scene understanding.Additionally, we construct a 4D vision-language dataset with spatiotemporal coordinate annotations for instruction fine-tuning LMMs. |
Hanyu Zhou; Gim Hee Lee; | code |
| 378 | MMPD: Diverse Time Series Forecasting Via Multi-Mode Patch Diffusion Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Multi-Mode Patch Diffusion (MMPD) loss, which can be applied to any patch-based backbone that outputs latent tokens for the future. |
Yunhao Zhang; Wenyao Hu; Jiale Zheng; Lujia Pan; Junchi Yan; | code |
| 379 | Earth-Agent: Unlocking The Full Landscape of Earth Observation with Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Agent-based methods offer a promising direction, but current attempts remain in their infancy, confined to RGB perception, shallow reasoning, and lacking systematic evaluation protocols. To overcome these limitations, we introduce Earth-Agent, the first agentic framework that unifies RGB and spectral EO data within an MCP-based tool ecosystem, enabling cross-modal, multi-step, and quantitative spatiotemporal reasoning beyond pretrained MLLMs. |
Peilin Feng; Zhutao Lv; Junyan Ye; Xiaolei Wang; Xinjie Huo; Jinhua Yu; Wanghan Xu; Wenlong Zhang; LEI BAI; Conghui He; Weijia Li; | code |
| 380 | SpinBench: Perspective and Rotation As A Lens on Spatial Reasoning in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SpinBench, a cognitively grounded diagnostic benchmark for evaluating spatial reasoning in vision language models (VLMs). |
Yuyou Zhang; Radu Corcodel; Chiori Hori; Anoop Cherian; Ding Zhao; | code |
| 381 | Generation Then Reconstruction: Accelerating Masked Autoregressive Models Via Two-Stage Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Masked Autoregressive (MAR) models promise better efficiency in visual generation than continuous autoregressive (AR) models for the ability of parallel generation, yet their acceleration potential remains constrained by the modeling complexity of spatially correlated visual tokens in a single step. To address this limitation, we introduce Generation then Reconstruction (GtR), a training-free hierarchical sampling strategy that decomposes generation into two stages: structure generation establishing global semantic scaffolding, followed by detail reconstruction efficiently completing remaining tokens. |
Feihong Yan; Yao Zhu; Peiru Wang; Pang Kaiyu; Qingyan Wei; Huiqi Li; Linfeng Zhang; | code |
| 382 | ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent related methods often fail in the first stage after input-image downsampling, due to perception-driven reasoning, where clear visual information is required for effective reasoning. To address this issue, we propose ERGO (Efficient Reasoning & Guided Observation) that performs reasoning-driven perception—leveraging multimodal context to determine where to focus. |
Jewon Lee; Wooksu Shin; Seungmin Yang; Ki-Ung Song; DongUk Lim; Jaeyeon Kim; Tae-Ho Kim; Bo-Kyeong Kim; | code |
| 383 | SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a stage-aware, video-based reward modeling framework that jointly predicts task stage and fine-grained progress, using natural-language subtask annotations to derive consistent labels across variable-length demonstrations. |
Qianzhong Chen; Justin Yu; Mac Schwager; Pieter Abbeel; Fred Shentu; Philipp Wu; | code |
| 384 | StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce $\textbf{\texttt{StepORLM}}$, a novel self-evolving framework with generative process supervision. |
Chenyu Zhou; Tianyi Xu; Jianghao Lin; Dongdong Ge; | code |
| 385 | MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MolLangBench, a comprehensive benchmark designed to evaluate fundamental molecule-language interface tasks: language-prompted molecular structure recognition, editing, and generation. |
Feiyang Cai; Jiahui Bai; Tao Tang; Guijuan He; Joshua Luo; Tianyu Zhu; Srikanth Pilla; Gang Li; Ling Liu; Feng Luo; | code |
| 386 | KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose KaLM-Embedding-V2, a series of versatile and compact embedding models, systematically incentivizing advanced embedding capability in LLMs by superior training techniques and high-quality data. |
Xinping Zhao; Xinshuo Hu; Zifei Shan; Shouzheng Huang; Yao Zhou; Xin Zhang; Zetian Sun; zhenyu liu; Dongfang Li; Xinyuan Wei; Youcheng Pan; Yang Xiang; Meishan Zhang; Haofen Wang; Jun Yu; Baotian Hu; Min Zhang; | code |
| 387 | OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present OrthAlign, an innovative approach that pioneers a new paradigm by leveraging orthogonal subspace decomposition to fundamentally resolve gradient-level conflicts in multi-objective preference alignment. |
Liang Lin; Zhihao Xu; Junhao Dong; Jian Zhao; Yuchen Yuan; Guibin Zhang; Miao Yu; Yiming Zhang; Zhengtao Yao; Huahui Yi; HAICHUAN TANG; Dongrui Liu; Xinfeng Li; Kun Wang; | code |
| 388 | RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment. |
Peiyan Hu; Haodong Feng; Hongyuan Liu; Tongtong Yan; Wenhao Deng; Tianrun Gao; Rong Zheng; Haoren Zheng; Chenglei Yu; Chuanrui Wang; Kaiwen Li; Zhi-Ming Ma; Dezhi Zhou; Xingcai Lu; Dixia Fan; Tailin Wu; | code |
| 389 | Beyond Markovian: Reflective Exploration Via Bayes-Adaptive RL for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our resulting algorithm, BARL, instructs the LLM to stitch and switch strategies based on the observed outcomes, offering principled guidance on when and how the model should reflectively explore. |
Shenao Zhang; Yaqing Wang; Yinxiao Liu; Tianqi Liu; Peter Grabowski; Eugene Ie; Zhaoran Wang; Yunxuan Li; | code |
| 390 | DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conversely, a vision encoder trained via contrastive learning aligns well with language but struggles to decode back into the pixel space for generation tasks. To bridge this gap, we propose DualToken, a method that unifies representations for both understanding and generation within a single tokenizer. |
Wei Song; Yuran Wang; Zijia Song; Yadong Li; Zenan Zhou; Long Chen; Xu Jhua; Jiaqi Wang; Kaicheng Yu; | code |
| 391 | More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on this perspective, we propose Conflict-Aware Direct Preference Optimization (C-APO), an LLM-Rec framework that jointly optimizes RP and CP while adaptively reconciling their agreement and conflict, delivering robust recommendation performance and logically consistent rationales. |
Chung Park; Hyeongjun Yun; Taesan Kim; Junui Hong; Dongjoon Hong; Mira Myong; Jihoon Oh; MinCheol Cho; Kijung Park; Min sung Choi; Jihwan Seok; Jaegul Choo; | code |
| 392 | SsToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their strong empirical performance, existing token-level selection methods share two key limitations: (1) requiring training or accessing an additional reference model, and (2) relying solely on loss information for token selection, which cannot well preserve semantically important tokens that are not favored by loss-based metrics. To address these challenges, we propose **ssToken**, a **S**elf-modulated and **S**emantic-aware **Token** Selection approach. |
Xiaohan Qin; Xiaoxing Wang; Ning Liao; Cancheng Zhang; Xiangdong Zhang; Mingquan Feng; Jingzhi Wang; Junchi Yan; | code |
| 393 | Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose CalibRL, a hybrid-policy RLVR framework that supports controllable exploration with expert guidance, enabled by two key mechanisms. |
Zhuoxu Huang; Mengxi Jia; Hao Sun; Xuelong Li; Jungong Han; | code |
| 394 | PepTri: Tri-Guided All-Atom Diffusion for Peptide Design Via Physics, Evolution, and Mutual Information Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While deep generative modelshave shown promise for peptide design, existing approaches are often structure-centric and therefore generate sequences and structures in a decoupled manner, failing to ensure that designs are simultaneously physically stable, evolutionarily plausible, and internally coherent. To overcome this limitation, we introduce PepTri, a novel diffusion framework that addresses this by jointly generating peptide sequences and 3D structures within a unified, SE(3)-equivariant latent space. |
Ngoc-Quang Nguyen; Jaeyoon Jung; Seijung Kim; Sunkyu Kim; Jaewoo Kang; | code |
| 395 | OmniCT: Towards A Unified Slice-Volume LVLM for Comprehensive CT Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present \textbf{OmniCT}, a powerful unified slice–volume LVLM for CT scans, which makes three contributions: \textbf{(i) Spatial Consistency Enhancement (SCE):} volumetric slice composition combined with tri-axial positional encoding introduces volumetric consistency, and an MoE hybird projection enables efficient slice–volume adaptation; \textbf{(ii) Organ-level Semantic Enhancement (OSE):} segmentation and ROI localization explicitly align anatomical regions, emphasizing lesion- and organ-level semantics; \textbf{(iii) MedEval-CT:} the largest slice–volume CT dataset and hybrid benchmark integrates multi-level metrics for unified evaluation. |
Tianwei Lin; Zhongwei Qiu; Wenqiao Zhang; Jiang Liu; Yihan Xie; Mingjian Gao; Zhenxuan Fan; Zhaocheng Li; Sijing Li; Zhongle Xie; Peng LU; Yueting Zhuang; Ling Zhang; Beng Chin Ooi; Yingda Xia; | code |
| 396 | Metis: Training LLMs with FP4 Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work presents \emph{Metis}, a spectral-domain quantization framework that partitions anisotropic spectra into narrower sub-distributions for independent quantization, thereby reducing errors and preserving spectral structure. |
Hengjie Cao; Mengyi Chen; Yifeng Yang; Fang Dong; Ruijun Huang; Jixian Zhou; Anrui Chen; Mingzhi Dong; Yujiang Wang; Jinlong Hou; Yuan Cheng; FAN WU; Fan Yang; Tun Lu; Ning Gu; Li Shang; | code |
| 397 | AdAEM: An Adaptively and Automated Extensible Measurement of LLMs’ Value Difference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, current value measurement methods face the informativeness challenge: with often outdated, contaminated, or generic test questions, they can only capture the orientations on comment safety values, e.g., HHH, shared among different LLMs, leading to indistinguishable and uninformative results. To address this problem, we introduce AdAEM, a novel, self-extensible evaluation algorithm for revealing LLMs’ inclinations. |
Jing Yao; Shitong Duan; Xiaoyuan Yi; Dongkuan Xu; Peng Zhang; Tun Lu; Ning Gu; Zhicheng Dou; Xing Xie; | code |
| 398 | VTool-R1: VLMs Learn to Think with Images Via Reinforcement Learning on Multimodal Tool Use Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce VTool-R1, the first RFT framework that trains VLMs to generate multimodal chains of thought by interleaving text and intermediate visual reasoning steps. |
Mingyuan Wu; Jingcheng Yang; Jize Jiang; Meitang Li; Kaizhuo Yan; Hanchao Yu; Minjia Zhang; ChengXiang Zhai; Klara Nahrstedt; | code |
| 399 | Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Sat3DGen to address these fundamental challenges, which embodies a geometry-first methodology.For validation, we first constructed a new benchmark by pairing the VIGOR-OOD test set with high-resolution DSM data. |
Ming Qian; Zimin Xia; Changkun Liu; Shuailei Ma; Wen Wang; Zeran Ke; Bin Tan; Hang Zhang; Gui-Song Xia; | code |
| 400 | CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hybrid framework that combines verbal and numerical guidance, the latter achieved by fine-tuning the LLM via reinforcement learning (RL) based on the quality of generated heuristics. |
Ziyao Huang; Weiwei Wu; Kui Wu; Wei-Bin Lee; Jianping Wang; | code |
| 401 | Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The aim of this contribution is to study the learning process of a Transformer when applied to MTS by revealing connections to statistical time series methods. |
Charalampos Shimillas; Kleanthis Malialis; Konstantinos Fokianos; Marios Polycarpou; | code |
| 402 | TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent Table-as-Multimodality strategies attempt to combine textual and visual views, but they (1) statically process both modalities for every query-table pair within large multimodal LLMs (MLLMs), inevitably introducing redundancy and even conflicts, and (2) depend on costly fine-tuning of MLLMs. In light of this, we propose TableDART, a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models. |
Xiaobo Xing; Wei Yuan; Tong Chen; Quoc Viet Hung Nguyen; Xiangliang Zhang; Hongzhi Yin; | code |
| 403 | MOLM: Mixture of LoRA Markers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a general watermarking framework that formulates the encoding problem as key-dependent perturbation of the parameters of a generative model. |
Samar Fares; Nurbek Tastan; Noor Hazim Hussein; Karthik Nandakumar; | code |
| 404 | SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a novel and practical method, SNAP-UQ, for single-pass, label-free uncertainty estimation based on depth-wise next-activation prediction. |
Ismail Lamaakal; Chaymae Yahyati; Khalid El Makkaoui; Ibrahim Ouahbi; Yassine Maleh; | code |
| 405 | RRNCO: Towards Real-World Routing with Neural Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce RRNCO, a novel architecture specifically designed to address these complexities.Moreover, we introduce a new VRP benchmark grounded in real-world data crucial for bridging this sim-to-real gap, featuring asymmetric distance and duration matrices from 100 diverse cities, enabling the training and validation of NCO solvers on tasks that are more representative of practical settings. |
Jiwoo Son; Zhikai Zhao; Federico Berto; Chuanbo Hua; Zhiguang Cao; Changhyun Kwon; Jinkyoo Park; | code |
| 406 | Scalable In-Context Q-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In the paper, we propose **S**calable **I**n-**C**ontext **Q**-**L**earning (**S-ICQL**), an innovative framework that harnesses dynamic programming and world modeling to steer ICRL toward efficient reward maximization and task generalization, while retaining the scalability and stability of supervised pretraining. |
Jinmei Liu; Fuhong Liu; Zhenhong Sun; Jianye HAO; Huaxiong Li; Bo Wang; Daoyi Dong; Chunlin Chen; Zhi Wang; | code |
| 407 | Play to Generalize: Learning to Reason Through Game Play Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by literature suggesting that gameplay promotes transferable reasoning skills, we propose a novel post-training method, Visual Game Learning (ViGaL), where MLLMs develop generalizable reasoning skills through playing arcade-like games. |
Yunfei Xie; Yinsong Ma; Shiyi Lan; Alan Yuille; Junfei Xiao; Chen Wei; | code |
| 408 | Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing training objectives such as mean squared error mostly treat each future step as an independent, equally weighted task, which we found leading to the following two issues: (1) overlook the *label autocorrelation effect* among future steps, leading to biased training objective; (2) fail to set *heterogeneous task weights* for different forecasting tasks corresponding to varying future steps, limiting the forecasting performance. To fill this gap, we propose a novel quadratic-form weighted training objective, addressing both of the issues simultaneously. |
Eric Wang; Licheng Pan; Yuan Lu; Zi Ciu Chan; Tianqiao Liu; Shuting He; Zhixuan Chu; Qingsong Wen; Haoxuan Li; Zhouchen Lin; | code |
| 409 | DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose DistDF, which achieves alignment by alternatively minimizing a discrepancy between the conditional forecast and label distributions. |
Eric Wang; Licheng Pan; Yuan Lu; Zhixuan Chu; Xiaoxi Li; Shuting He; Zi Ciu Chan; Qingsong Wen; Haoxuan Li; Zhouchen Lin; | code |
| 410 | Do LLMs Forget What They Should? Evaluating In-Context Forgetting in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ICF-Bench, a comprehensive benchmark for evaluating In-Context Forgetting (ICF). |
Yuli Qian; Zechuan Yang; Wenbiao Ding; Hongzhi Li; Yutao Xie; | code |
| 411 | Error As Signal: Stiffness-Aware Diffusion Sampling Via Embedded Runge-Kutta Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose **E**mbedded **R**unge–**K**utta based **Guid**ance (ERK-Guid), which exploits detected stiffness to reduce LTE and stabilize sampling. |
Inho Kong; Sojin Lee; Youngjoon Hong; Hyunwoo J. Kim; | code |
| 412 | LightCtrl: Training-free Controllable Video Relighting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although these methods can relight videos under various conditions, their ability to explicitly control the illumination in the relighted video remains limited. Therefore, we present LightCtrl, the first controllable video relighting method that offers explicit control over the video illumination through a user-supplied light trajectory in a training-free manner. |
Yizuo Peng; Xuelin Chen; Kai Zhang; Xiaodong Cun; | code |
| 413 | Weak-to-Strong Generalization with Failure Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by the human learning process, we propose to generalize not only successful knowledge but also failed experiences so that the strong model can learn from the failed trajectories accumulated by weak models. |
Ruimeng Ye; Zihan Wang; Yang Xiao; Zinan Ling; Manling Li; Bo Hui; | code |
| 414 | Your Language Model Secretly Contains Personality Subnetworks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that LLMs already contain persona-specialized subnetworks in their parameter space. |
Ruimeng Ye; Zihan Wang; Zinan Ling; Yang Xiao; Manling Li; Xiaolong Ma; Bo Hui; | code |
| 415 | SHIELD: Suppressing Hallucinations In LVLM Encoders Via Bias and Vulnerability Defense Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast to previous work focusing on LLM components, this paper is the first to trace LVLM hallucinations to visual encoders and identifies three key issues: statistical bias, inherent bias, and vulnerability. To address these challenges, we propose SHIELD, a training-free framework that mitigates hallucinations through three strategies: re-weighting visual tokens to reduce statistical bias, introducing noise-derived tokens to counter inherent bias, and applying adversarial attacks with contrastive decoding to address vulnerability. |
Yiyang Huang; Liang Shi; Yitian Zhang; Yi Xu; Yun Fu; | code |
| 416 | Diffusion Alignment As Variational Expectation-Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV), a framework that formulates diffusion alignment as an iterative process alternating between two complementary phases: the E-step and the M-step. |
Jaewoo Lee; Minsu Kim; Sanghyeok Choi; Inhyuck Song; Sujin Yun; Hyeongyu Kang; Woocheol Shin; Taeyoung Yun; Kiyoung Om; Jinkyoo Park; | code |
| 417 | Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose STAMP, a Spatial Transcriptomics-Augmented Multimodal Pathology representation learning framework that integrates spatially-resolved gene expression profiles to enable molecule-guided joint embedding of pathology images and transcriptomic data. |
Minghao Han; Dingkang Yang; Linhao Qu; Zizhi Chen; Gang Li; Han Wang; Jiacong Wang; Lihua Zhang; | code |
| 418 | ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To better understand and address these challenges, we first establish a theoretical link between policy entropy and training stability of tool-use tasks, which reveals that structured, low-entropy tokens are primary determinants of rewards. Motivated by this insight, we propose Reshaped Token-level policy gradients (ResT) for tool-use tasks. |
Zihan Lin; Xiaohan Wang; Jie Cao; Jiajun Chai; Guojun Yin; Wei Lin; Ran He; | code |
| 419 | Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that the key to successful transfer lies in the sign structure of the gradients of the new model. |
Filippo Rinaldi; Aniello Panariello; Giacomo Salici; Fengyuan Liu; Marco Ciccone; Angelo Porrello; Simone Calderara; | code |
| 420 | Align-Then-stEer: Adapting The Vision-Language Action Models Through Unified Latent Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our work presents a general and lightweight solution that greatly enhances the practicality of deploying VLA models to new robotic platforms and tasks. |
Yang Zhang; Chenwei Wang; ouyang lu; Yuan Zhao; Yunfei Ge; Zhenglong Sun; Xiu Li; Chi Zhang; Chenjia Bai; Xuelong Li; | code |
| 421 | A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we propose a two-phase machine learning method, known as ShockCast, to model high-speed flows with adaptive time-stepping. |
Jacob Helwig; Sai Sreeharsha Adavi; Xuan Zhang; Yuchao Lin; Felix S. Chim; Luke Takeshi Vizzini; Haiyang Yu; Muhammad Hasnain; Saykat Kumar Biswas; John J. Holloway; Narendra Singh; N. K. Anand; Swagnik Guhathakurta; Shuiwang Ji; | code |
| 422 | Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a unified benchmark, \textbf{Spatial-DISE}, based on a cognitively grounded taxonomy that categorizes tasks into four fundamental quadrants: \textbf{I}ntrinsic-\textbf{S}tatic, Intrinsic-\textbf{D}ynamic, \textbf{E}xtrinsic-Static, and Extrinsic-Dynamic spatial reasoning. |
Xinmiao Huang; Qisong He; Zhenglin Huang; Boxuan Wang; Zhuoyun Li; Guangliang Cheng; Yi Dong; Xiaowei Huang; | code |
| 423 | Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current Parameter-Efficient Fine-Tuning (PEFT) methods, designed for vision or language tasks, fail to address the unique challenges of weather downstream tasks, such as variable heterogeneity, resolution diversity, and spatiotemporal coverage variations, leading to suboptimal performance when applied to WFMs. To bridge this gap, we introduce WeatherPEFT, a novel PEFT framework for WFMs incorporating two synergistic innovations. |
Shilei Cao; Hehai Lin; Jiashun Cheng; Yang Liu; Guowen Li; Xuehe Wang; Juepeng Zheng; Haoyuan Liang; Meng Jin; Chengwei Qin; Hong Cheng; Haohuan Fu; | code |
| 424 | WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we introduce **WFR Flow Matching (WFR-FM)**, a simulation-free training algorithm that unifies flow matching with dynamic unbalanced OT. |
Qiangwei Peng; Zihan Wang; Junda Ying; Yuhao Sun; Qing Nie; Lei Zhang; Tiejun Li; Peijie Zhou; | code |
| 425 | PolyGraph Discrepancy: A Classifier-based Metric for Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraphScore (PGS), a new evaluation framework that addresses these limitations. |
Markus Krimmel; Philip Hartout; Karsten Borgwardt; Dexiong Chen; | code |
| 426 | Distillation of Large Language Models Via Concrete Score Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Concrete Score Distillation (CSD), a discrete score-matching objective that overcomes both softmax-induced smoothing and restrictions on the optimal solution set. |
Yeongmin Kim; Donghyeok Shin; Mina Kang; Byeonghu Na; Il-chul Moon; | code |
| 427 | WAFT: Warping-Alone Field Transforms for Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow. |
Yihan Wang; Jia Deng; | code |
| 428 | CodeSense: A Real-World Benchmark and Dataset for Code Semantic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge this gap, we propose CodeSense, the first benchmark that makes available a spectrum of fine-grained code reasoning tasks concerned with the software engineering of real-world code.We executed tests from these repositories, collected their execution traces, and constructed a ground truth dataset for fine-grained semantic reasoning tasks. |
Monoshi Kumar Roy; Simin Chen; Benjamin Steenhoek; Jinjun Peng; Gail Kaiser; Baishakhi Ray; Wei Le; | code |
| 429 | CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving Via Diffusion Posterior Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CL-DPS, a contrastively trained likelihood for diffusion posterior sampling that requires no knowledge of the operator parameters at inference. |
Linfeng Ye; Shayan Mohajer Hamidi; Mert Pilanci; Konstantinos N. Plataniotis; | code |
| 430 | ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This instability adds to two previously reported challenges: overfitting and over-concentrated attention distribution. To simultaneously overcome these three limitations, we introduce attention-stabilized multiple instance learning (ASMIL), a novel unified framework. |
Linfeng Ye; Shayan Mohajer Hamidi; Zhixiang Chi; Guang Li; Mert Pilanci; Takahiro Ogawa; Miki Haseyama; Konstantinos N. Plataniotis; | code |
| 431 | PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce PLANETALIGN, a comprehensive Python library for network alignment that features a rich collection of built-in datasets, methods, and evaluation pipelines with easy-to-use APIs. |
Qi Yu; Zhichen Zeng; Yuchen Yan; Zhining Liu; Baoyu Jing; Ruizhong Qiu; Ariful Azad; Hanghang Tong; | code |
| 432 | Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches often underperform due to limited capacity to model visual transitions or fragmented architectures. To overcome this limitation, we introduce Uni-CoT, a Unified Chain-of-Thought framework that captures structured visual transitions and seamlessly aligns them with textual logic, enabling coherent multimodal reasoning. |
Luozheng Qin; GONG JIA; Yuqing Sun; Tianjiao Li; Haoyu Pan; Mengping Yang; Xiaomeng Yang; Chao Qu; Zhiyu Tan; Hao Li; | code |
| 433 | From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In practice, two core challenges hinder deployment: query ambiguity from vague user questions and value mismatch between user terminology and database entries. To address this, we introduce EHR-ChatQA an interactive database question answering benchmark that evaluates the end-to-end workflow of database agents: clarifying user questions, using tools to resolve value mismatches, and generating correct SQL to deliver accurate answers. |
Gyubok Lee; Woosog Chay; Heeyoung Kwak; Yeong Hwa Kim; Haanju Yoo; Oksoon Jeong; Meong Hi Son; Edward Choi; | code |
| 434 | V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, existing video benchmarks predominantly rely on text prompts for evaluation, which often require complex referential language and diminish both the accuracy and efficiency of human–model interaction in turn. To address this limitation, we propose V2P-Bench, a robust and comprehensive benchmark for evaluating the ability of LVLMs to understand Video Visual Prompts in human–model interaction scenarios. |
Yiming Zhao; Yu Zeng; Yukun Qi; YaoYang Liu; Xikun Bao; Lin Chen; Zehui Chen; Qing Miao; Chenxi Liu; Jie Zhao; Feng Zhao; | code |
| 435 | Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose ConvRec-R1, a two-stage framework for end-to-end training of LLM-based conversational recommender systems.In Stage 1, we construct a behavioral-cloning dataset with a Remap-Reflect-Adjust pipeline, which produces high-quality, catalog-grounded demonstrations from powerful blackbox LLMs to warm-start the RL training. |
Yaochen Zhu; Harald Steck; Dawen Liang; Yinhan He; Vito Claudio Ostuni; Jundong Li; Nathan Kallus; | code |
| 436 | Steerable Adversarial Scenario Generation Through Test-Time Preference Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named Steerable Adversarial scenario GEnerator (SAGE). |
Tong Nie; Yuewen Mei; Yihong Tang; Junlin He; Jie Sun; Haotian Shi; Wei Ma; Jian Sun; | code |
| 437 | Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present Color3D, a highly adaptable framework for colorizing both static and dynamic 3D scenes from monochromatic inputs, delivering visually diverse and chromatically vibrant reconstructions with flexible user-guided control. |
Yecong Wan; Mingwen Shao; Renlong Wu; Wangmeng Zuo; | code |
| 438 | NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we push the autoregressive paradigm forward with NextStep-1, a 14B autoregressive model paired with a 157M flow matching head, training on discrete text tokens and continuous image tokens with next-token prediction objectives. |
Chunrui Han; Guopeng Li; Jingwei Wu; Quan Sun; Yan Cai; Yuang Peng; Zheng Ge; Deyu Zhou; Haomiao Tang; Hongyu Zhou; Kenkun Liu; Shu-Tao Xia; Binxing Jiao; Daxin Jiang; Xiangyu Zhang; Yibo Zhu; | code |
| 439 | Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the issue, in this paper, we propose Hierarchical-of-Groups Policy Optimization (HGPO) for long-horizon agentic tasks. |
Shuo He; Lang Feng; Qi Wei; Xin Cheng; Lei Feng; Bo An; | code |
| 440 | Aligning Deep Implicit Preferences By Learning to Reason Defensively Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we propose Critique-Driven Reasoning Alignment (CDRA), which reframes alignment from a scalar reward-matching task into a structured reasoning process.First, to bridge the preference inference gap, we introduce the DeepPref benchmark. |
Peiming Li; Zhiyuan Hu; Shiyu Li; Xi Chen; Yang Tang; | code |
| 441 | SPELL: Self-Play Reinforcement Learning for Evolving Long-Context Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose SPELL, a multi-role self-play reinforcement learning framework that enables scalable, label-free optimization for long-context reasoning. |
Ziyi Yang; Weizhou Shen; Chenliang Li; Ruijun Chen; Fanqi Wan; Ming Yan; Xiaojun Quan; Fei Huang; | code |
| 442 | NetArena: Dynamic Benchmarks for AI Agents in Network Automation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce NetArena, a dynamic benchmark generation framework for network applications. |
Yajie Zhou; Jiajun Ruan; Eric S. Wang; Sadjad Fouladi; Francis Y. Yan; Kevin Hsieh; Zaoxing Liu; | code |
| 443 | Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack Against Image Generation Model Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Machine unlearning (MU) has emerged as a promising mitigation by selectively removing undesirable concepts from pretrained models, yet the robustness of existing methods, particularly under multi-modal adversarial inputs, remains insufficiently explored. To address this gap, we propose RECALL, a multi-modal adversarial framework for systematically evaluating and compromising the robustness of unlearned IGMs. |
Renyang Liu; Guanlin Li; Tianwei Zhang; See-Kiong Ng; | code |
| 444 | Accelerated Co-design of Robots Through Morphological Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we show that a universal, morphology-agnostic controller can be rapidly and directly obtained by gradient-based optimization through differentiable simulation. |
Luke Strgar; Sam Kriegman; | code |
| 445 | AVEX: What Matters for Animal Vocalization Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a large-scale empirical study that covers aspects of bioacoustics that are relevant to research but have previously been scarcely considered: training data diversity and scale, model architectures and training recipes, and the breadth of evaluation tasks and datasets. |
Marius Miron; David Robinson; Milad Alizadeh; Ellen Gilsenan-McMahon; Gagan Narula; Emmanuel Chemla; Maddie Cusimano; Felix Effenberger; Masato Hagiwara; Benjamin Hoffman; Sara Keen; Diane Kim; Jane K. Lawton; Jen-Yu Liu; Aza Raskin; Olivier Pietquin; Matthieu Geist; | code |
| 446 | FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present FSOD-VFM: Few-Shot Object Detectors with Vision Foundation Models, a framework that leverages vision foundation models to tackle the challenge of few-shot object detection. |
Chen-Bin Feng; Youyang Sha; Longfei Liu; Yongjun YU; Chi Man VONG; Xuanlong Yu; Xi SHEN; | code |
| 447 | VGR: Visual Grounded Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This narrow focus limits their ability to handle complex visual reasoning tasks that demand comprehensive understanding of image details. To address these limitations, this paper introduces VGR, a novel reasoning multimodal large language model (MLLM) that can replay the visual memory during thinking just like humans. |
Jiacong Wang; Zijian Kang; Haochen Wang; LiangXiao; Ya Wang; Jiawen Li; Bohong Wu; Ran Jiao; Haiyong Jiang; ChaoFeng; Jun Xiao; | code |
| 448 | Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its policies, represented as opaque neural networks, are often difficult for humans to understand, verify, and debug, which undermines trust and hinders real-world deployment. This work addresses this challenge by introducing a novel approach for programmatic control policy discovery, called **M**ultimodal Large **L**anguage Model-assisted **E**volutionary **S**earch (MLES). |
Qinglong Hu; Tong Xialiang; Mingxuan Yuan; Fei Liu; Zhichao Lu; Qingfu Zhang; | code |
| 449 | OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce spatio-temporal omni-object video grounding, dubbed $\textbf{OmniSTVG}$, a new STVG task aiming to localize spatially and temporally all targets mentioned in the textual query within videos. |
Jiali Yao; Xin Gu; Xinran Deng; Mengrui Dai; Bing Fan; Zhipeng Zhang; Yan Huang; Heng Fan; Libo Zhang; | code |
| 450 | Towards Sequence Modeling Alignment Between Tokenizer and Autoregressive Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this process is challenged by the bidirectional dependencies inherent in conventional image tokenizations, which creates a fundamental misalignment with the unidirectional nature of autoregressive models. To resolve this, we introduce AliTok, a novel Aligned Tokenizer that alters the dependency structure of the token sequence. |
Pingyu Wu; Kai Zhu; Yu Liu; Longxiang Tang; Jian Yang; Yansong Peng; Wei Zhai; Yang Cao; Zheng-Jun Zha; | code |
| 451 | CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a reference, we provide a collaborative baseline based on a finetuned Qwen2.5-VL-3B. |
Tianhang Wang; Xinhai Li; Fan Lu; Tianshi Gong; Jiankun Dong; Weiyi Xue; Sanqing Qu; Chenjia Bai; Guang Chen; | code |
| 452 | Beyond Text-Only: Towards Multimodal Table Retrieval in Open-World Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce TaR-ViR (Table Retrieval via Visual Representations), a new benchmark that reformulates table retrieval as a multimodal task by treating tables as images. |
Da Li; Keping Bi; Jiafeng Guo; Wei Yuan; Fan Yang; Tingting Gao; Xueqi Cheng; | code |
| 453 | Exploring The Potential of Encoder-free Architectures in 3D LMMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present the first comprehensive investigation into the potential of encoder-free architectures to alleviate the challenges of encoder-based 3D LMMs. |
Yiwen Tang; Ziyu Guo; Zhuhao Wang; Renrui Zhang; Qizhi Chen; Junli Liu; Delin Qu; Dong Wang; Bin Zhao; Xuelong Li; | code |
| 454 | NewtonGen: Physics-consistent and Controllable Text-to-Video Generation Via Neural Newtonian Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose NewtonGen, a framework that integrates data-driven synthesis with learnable physical principles. |
Yu Yuan; Xijun Wang; Tharindu Wickremasinghe; Zeeshan Nadir; Bole Ma; Stanley H. Chan; | code |
| 455 | PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: (2) Short-context calibration: Due to Rotary Positional Embedding (RoPE), the use of short-context data during calibration fails to account for the distribution of less frequent channels in the Key Cache, resulting in performance loss. We propose Progressive Mixed-Precision KV Cache Quantization (PM-KVQ) for long-CoT LLMs to address the above issues in two folds: (1) To reduce cumulative error, we design a progressive quantization strategy to gradually lower the bit-width of KV Cache in each block. |
Tengxuan Liu; Shiyao Li; Jiayi Yang; Tianchen Zhao; Feng Zhou; Xiaohui Song; Guohao Dai; Shengen Yan; Huazhong Yang; Yu Wang; | code |
| 456 | Generalizable Heuristic Generation Through LLMs with Meta-Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches often rely on manually predefined evolutionary computation (EC) heuristic-optimizers and single-task training schemes, which may constrain the exploration of diverse heuristic algorithms and hinder the generalization of the resulting heuristics. To address these issues, we propose Meta-Optimization of Heuristics (MoH), a novel framework that operates at the optimizer level, discovering effective heuristic-optimizers through the principle of meta-learning. |
Yiding Shi; Jianan Zhou; Wen Song; Jieyi Bi; Yaoxin Wu; Zhiguang Cao; Jie Zhang; | code |
| 457 | FastGRPO: Accelerating Policy Optimization Via Concurrency-aware Speculative Decoding and Online Draft Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although speculative decoding presents a promising direction for acceleration, its direct application in GRPO achieves limited speedup under high-concurrency training conditions. To overcome this limitation, we propose a concurrency-aware speculative decoding framework that dynamically adjusts the drafting and verification strategy according to real-time concurrency levels, thereby maximizing the acceleration of the generation process. |
Yizhou Zhang; Ning Lv; Teng Wang; Jisheng Dang; | code |
| 458 | GIR-Bench: Versatile Benchmark for Generating Images with Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce \textbf{GIR-Bench}, a comprehensive benchmark that evaluates unified models across three complementary perspectives. |
Hongxiang Li; Yaowei Li; Bin Lin; Yuwei Niu; Yuhang Yang; Xiaoshuang Huang; Jiayin Cai; Xiaolong Jiang; Yao Hu; Long Chen; | code |
| 459 | Relatron: Automating Relational Machine Learning Over Relational Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a comprehensive study that unifies RDL and DFS in a shared design space and conducts large-scale architecture-centric searches across diverse RDB tasks. |
Zhikai Chen; Han Xie; Jian Zhang; Jiliang Tang; Xiang song; Huzefa Rangwala; | code |
| 460 | Beyond The Known: An Unknown-Aware Large Language Model for Open-Set Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present UnLLM, an Unknown-aware Large Language Model for OSTC. |
Xi Chen; Chuan Qin; Ziqi Wang; Shasha Hu; Chao Wang; Hengshu Zhu; Hui Xiong; | code |
| 461 | One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce OFTSR, a novel flow-based framework for one-step image super-resolution that can produce outputs with tunable levels of fidelity and realism. |
Yuanzhi Zhu; Ruiqing Wang; Shilin Lu; Hanshu Yan; Junnan Li; Kai Zhang; | code |
| 462 | Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While prior work has proposed mitigation strategies such as decontamination of pretraining data and benchmark redesign for LLMs, the complementary direction of developing detection methods for \emph{contaminated VLMs} remains underexplored. To address this gap, we deliberately contaminate open-source VLMs on popular benchmarks and show that existing detection approaches either fail outright or exhibit inconsistent behavior. |
Jaden Park; Mu Cai; Feng Yao; Jingbo Shang; Soochahn Lee; Yong Jae Lee; | code |
| 463 | Calibrating Verbalized Confidence with Self-Generated Distractors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We hypothesize that this overconfidence often stems from a given LLM’s heightened suggestibility when faced with claims that it encodes little information about; we empirically validate this hypothesis, finding more suggestibility on lower-accuracy claims. Building on this finding, we introduce Distractor-Normalized Coherence (DINCO), which estimates and accounts for an LLM’s suggestibility bias by having the model verbalize its confidence independently across several self-generated distractors (i.e. alternative claims), and normalizes by the total verbalized confidence. |
Victor Wang; Elias Stengel-Eskin; | code |
| 464 | SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Together with a scaled VFM-based discriminator, our final model, dubbed **SenseFlow**, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX.1 dev. |
Xingtong Ge; Xin Zhang; Tongda Xu; Yi Zhang; Xinjie Zhang; Yan Wang; Jun Zhang; | code |
| 465 | Forest-Based Graph Learning for Semi-Supervised Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we break the dilemma by proposing a novel forest-based graph learning (FGL) paradigm that enables efficient long-range information propagation. |
Jin Li; Shenghao Gao; Kaichen Zhang; Xinlong Chen; Ying Sun; Hui Xiong; | code |
| 466 | RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce RouterArena, the first open platform enabling comprehensive comparison of LLM routers. |
Yifan Lu; Rixin Liu; Jiayi Yuan; Xingqi Cui; Shenrun Zhang; Hongyi Liu; Jiarong Xing; | code |
| 467 | Part-level Semantic-guided Contrastive Learning for Fine-grained Visual Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods often struggle to effectively capture both part-level detail and spatial relational features, particularly across rigid and non-rigid object categories. To address these issues, we propose Part-level Semantic-guided Contrastive Learning (PSCL), a novel framework that integrates three key components. |
Zhijian Lin; Hong Han; | code |
| 468 | LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose LumiTex, an end-to-end framework that comprises three key components: (1) a multi-branch generation scheme that disentangles albedo and metallic–roughness under shared illumination priors for robust material understanding, (2) a lighting-aware material attention mechanism that injects illumination context into the decoding process for physically grounded generation of albedo, metallic, and roughness maps, and (3) a geometry-guided inpainting module based on a large view synthesis model that enriches texture coverage and ensures seamless, view-consistent UV completion. |
Jingzhi Bao; HONGZE CHEN; Lingting Zhu; Chenyu Liu; Runze Zhang; keyang luo; Zeyu HU; Weikai Chen; Yingda Yin; Xin Wang; Zehong Lin; Jun Zhang; Xiaoguang Han; | code |
| 469 | Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: On-device inference offers privacy, offline use, and instant response, but consumer hardware restricts large language models (LLMs) to low throughput and capability. To overcome this challenge, we present prima.cpp, a distributed on-device inference system that runs 30-70B LLMs on consumer home clusters with mixed CPUs/GPUs, insufficient RAM/VRAM, slow disks, Wi-Fi links, and heterogeneous OSs. |
Zonghang Li; Tao Li; Wenjiao Feng; Rongxing Xiao; Jianshu She; Hong Huang; Mohsen Guizani; Hongfang Yu; Qirong Ho; Wei Xiang; Xue Liu; | code |
| 470 | SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing orthogonal projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space. In this work, we consider a continual learning paradigm based on Low-Rank Adaptation (LoRA), which has gained considerable attention due to its efficiency and wide applicability, and propose a novel approach for continual learning, called SplitLoRA. |
Haomiao Qiu; Miao Zhang; Ziyue Qiao; Weili Guan; Min Zhang; Liqiang Nie; | code |
| 471 | Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: On the system side, the training system needs to support MAS-workflow-based rollouts and on-policy updates for both single and multiple policy models. To address these issues, we introduce AT-GRPO, consisting of (i) an Agent- and Turn-wise grouped RL algorithm tailored for MAS and (ii) a system to support both single-policy and multi-policy training. |
Yujie Zhao; Lanxiang Hu; Yang Wang; Minmin Hou; Hao Zhang; Ke Ding; Jishen Zhao; | code |
| 472 | The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a novel family of Predictor–Corrector (PC) samplers for discrete diffusion models that generalize prior methods and apply to arbitrary noise processes. |
Justin Deschenaux; Caglar Gulcehre; Subham Sekhar Sahoo; | code |
| 473 | Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods usually involve a large number of parameters and require high computational cost, which is unacceptable in many applications where speech separation serves as only a preprocessing step for further speech processing. To address this issue, we propose an efficient AVSS method, named **Dolphin**. |
Kai Li; Gao Kejun; Xiaolin Hu; | code |
| 474 | Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Contrastive Object-centric Diffusion Alignment (CODA), a simple extension that (i) employs register slots to absorb residual attention and reduce interference between object slots, and (ii) applies a contrastive alignment loss to explicitly encourage slot–image correspondence. |
Bac Nguyen; Yuhta Takida; Naoki Murata; Chieh-Hsin Lai; Toshimitsu Uesaka; Stefano Ermon; Yuki Mitsufuji; | code |
| 475 | HOG-Diff: Higher-Order Guided Diffusion for Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Higher-order Guided Diffusion (HOG-Diff), a principled framework that progressively generates plausible graphs with inherent topological structures. |
Yiming Huang; Tolga Birdal; | code |
| 476 | Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we propose the first method tailored to *label unlearning* in VFL, where labels play a dual role as both essential inputs and sensitive information. |
Hanlin Gu; Hong Xi Tae; Lixin Fan; Chee Seng Chan; | code |
| 477 | ASCIIEval: Benchmarking Models’ Visual Perception in Text Strings Via ASCII Art Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we select ASCII art as a representative artifact.We frame the problem as a recognition task, and construct a novel benchmark, ASCIIEval. |
Qi Jia; Xiang Yue; Shanshan Huang; Ziheng Qin; Yizhu Liu; Bill Yuchen Lin; Yang You; Guangtao Zhai; | code |
| 478 | Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While high-quality vision-language data can enhance these capabilities, its scarcity and limited scalability impose significant constraints. To address this, we propose AGILE, an Agentic jiGsaw Interaction Learning for Enhancing visual perception and reasoning in VLMs. |
Yu Zeng; Wenxuan Huang; Shiting Huang; Xikun Bao; Yukun Qi; Yiming Zhao; Qiuchen Wang; Lin Chen; Zehui Chen; Huaian Chen; Wanli Ouyang; Feng Zhao; | code |
| 479 | Token-Guard: Towards Token-Level Hallucination Control Via Self-Checking Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Decoding-based methods are lighter yet lack explicit hallucination control. To address this, we present \textbf{Token-Guard}, a token-level hallucination control method based on self-checking decoding. |
Yifan Zhu; Huiqiang Rong; Haoran Luo; | code |
| 480 | CerebraGloss: Instruction-Tuning A Large Vision-Language Model for Fine-Grained Clinical EEG Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We first introduce a novel, automated data generation pipeline, featuring a bespoke YOLO-based waveform detector, to programmatically create a large-scale corpus of EEG-text instruction data. Using this data, we develop CerebraGloss, the first model of its kind capable of unified, generative analysis—performing tasks from detailed waveform description to multi-turn, context-aware dialogue. |
Wei Gu; Luo Tianming; Qiran Zhang; Mohan Ye; Xiao Shen; Wenxin Chen; Yunhuan Li; Yichen Zhang; Jing Hong; Bao-liang Lu; Wei-Long Zheng; | code |
| 481 | REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce \modelname{}, a novel framework for identifying behavior-relevant modules (heads or layers) in Transformers. |
Li-Ming Zhan; Bo LIU; Yujie Feng; Chengqiang Xie; Jiannong Cao; Xiao-Ming Wu; | code |
| 482 | Dual-Path Condition Alignment for Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the observation that REPA primarily aids early layers in capturing robust semantics, we propose an unsupervised alternative that avoids external visual encoder and the assumption of consistent data distribution. |
Changhao Peng; Yuqi Ye; Shuangjun Du; Wenxu Gao; Wei Gao; | code |
| 483 | MTVCraft: Tokenizing 4D Motion for Arbitrary Character Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods rely largely on 2D-rendered pose images for motion guidance, which limits generalization and discards essential 4D information for open-world animation. To address this, we propose MTVCraft (Motion Tokenization Video Crafter), the first framework that directly models raw 3D motion sequences (i.e., 4D motion) for character image animation. |
Yanbo Ding; Xirui Hu; Guo Zhi Zhi; Yan Zhang; Xinrui Wang; Zhixiang He; Chi Zhang; Yali Wang; Xuelong Li; | code |
| 484 | Reforming The Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through systematic investigation, we uncover the Circuit-Interference Law: edit interference between reasoning patterns is proportional to the overlap of their neural circuits. Guided by this principle, we propose REdit, the first framework to actively reshape neural circuits before editing, thereby modulating interference between reasoning patterns and mitigating the trade-off. |
Zhenyu Lei; Qiong Wu; JIANXIONG DONG; Yinhan He; Emily Dodwell; Yushun Dong; Jundong Li; | code |
| 485 | FlashWorld: High-quality 3D Scene Generation Within Seconds Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose FlashWorld, a generative model that produces 3D scenes from a single image or text prompt in seconds, $10 \sim 100\times$ faster than previous works while possessing superior rendering quality. |
Xinyang Li; Tengfei Wang; Zixiao Gu; Shengchuan Zhang; Chunchao Guo; Liujuan Cao; | code |
| 486 | MASAM: Multimodal Adaptive Sharpness-Aware Minimization for Heterogeneous Data Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its application in multimodal scenarios is challenging: 1) SAM pays excessive attention to the dominant modality, exacerbating modality imbalance, and 2) the perturbation gradient calculation is affected by interference from other modalities. To address these issues, we propose Multimodal Adaptive Sharpness-Aware Minimization (MASAM), which optimizes different modalities based on their dominance. |
Zijie Chen; Kejing Yin; Wenfang Yao; William K. Cheung; Jing Qin; | code |
| 487 | Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we introduce **MaskGRPO**, the first viable approach to enable scalable multimodal reinforcement learning in discrete diffusion with effective importance sampling and modality-specific adaptations. |
Tianren Ma; Mu Zhang; Yibing Wang; Qixiang Ye; | code |
| 488 | On The Thinking-Language Modeling Gap in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To understand the gap, we construct structural causal models of next-token predictors in human languages. |
Chenxi Liu; Yongqiang Chen; Tongliang Liu; James Cheng; Bo Han; Kun Zhang; | code |
| 489 | Scalable and Adaptive Trust-Region Learning Via Projection Convex Hull Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Projection Convex Hull (PCH), a scalable framework for learning polyhedral trust regions in high-dimensional spaces. |
Hongyang Jia; Qingchun Hou; Bojun Du; Xiao Cai; Ning Zhang; Chongqing Kang; | code |
| 490 | Rethinking Code Similarity for Automated Algorithm Design with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, directly applying existing code similarity metrics to algorithms raises a critical limitation: they do not necessarily reflect the similarity between algorithms. To address this, we introduce a novel perspective that defines algorithm similarity through the lens of its problem-solving behavior. |
Rui Zhang; Zhichao Lu; | code |
| 491 | EigenBench: A Comparative Behavioral Measure of Value Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the lack of quantitative metrics for value alignment, we propose EigenBench: a black-box method for comparatively benchmarking language models’ values. |
Jonathn Chang; Leonhard Piff; Suvadip Sana; Jasmine Xinze Li; Lionel Levine; | code |
| 492 | VER: Vision Expert Transformer for Robot Learning Via Foundation Distillation and Dynamic Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose VER, a Vision Expert transformer for Robot learning. |
Yixiao Wang; Mingxiao Huo; Zhixuan Liang; Yushi Du; Lingfeng Sun; Haotian Lin; Jinghuan Shang; Chensheng Peng; Mohit Bansal; Mingyu Ding; Masayoshi Tomizuka; | code |
| 493 | From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This approach falls short in capturing deep memory connections, leading to partial retrieval of useful information or substantial noise, resulting in suboptimal performance. To tackle these limits, we propose MemGAS, a framework that enhances memory consolidation by constructing multi-granularity association, adaptive selection, and retrieval. |
Derong Xu; Yi Wen; Pengyue Jia; Yingyi Zhang; Wenlin Zhang; Yichao Wang; Huifeng Guo; Ruiming Tang; Xiangyu Zhao; Enhong Chen; Tong Xu; | code |
| 494 | Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the fly olfactory circuit, we propose Fly-CL, a bio-inspired framework compatible with a wide range of pretrained backbones. |
Heming Zou; Yunliang Zang; Wutong Xu; Xiangyang Ji; | code |
| 495 | FrameThinker: Learning to Think with Long Videos Via Multi-Turn Frame Spotlighting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Large Vision-Language Models (LVLMs) have achieved substantial progress in video understanding, their application to long video reasoning is hindered by uniform frame sampling and static textual reasoning, which are inefficient and struggle to handle visually intensive video tasks. To overcome these challenges, in this paper, we introduce the concept of thinking with long videos and propose a novel framework FrameThinker. |
Zefeng He; Xiaoye Qu; Yafu Li; Siyuan Huang; Daizong Liu; Yu Cheng; | code |
| 496 | TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation stems from the fact that most existing architectures employ static or modality-agnostic fusion, which fails to account for the dynamic and frame-dependent variation in modality saliency that naturally occurs within a video. To overcome these limitations, we propose a novel architecture, TripleSumm, which adaptively weights and fuses the contributions of the three modalities at the frame level. |
Sumin Kim; Hyemin Jeong; Mingu Kang; Yejin Kim; Yoori Oh; Joonseok Lee; | code |
| 497 | FullPart: Generating Each 3D Part at Full Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose FullPart, a novel framework that combines both implicit and explicit paradigms.We will release all code, data, and model to benefit future research in 3D part generation. |
Lihe Ding; Shaocong Dong; Yaokun Li; Chenjian Gao; Xiao Chen; Rui Han; Yihao Kuang; Hong Zhang; Bo Huang; Zhanpeng Huang; Zibin Wang; Dan Xu; Tianfan Xue; | code |
| 498 | Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While transformer-based architectures have demonstrated considerable potential for such tasks, several key limitations remain unaddressed: 1) existing query initialization mechanisms rely primarily on semantic cues or learnable parameters, demonstrating limited adaptability to changing active objects across varying input scenes; 2) previous transformer-based methods utilize pixel-level semantic features to iteratively refine queries during mask generation, which may introduce interaction-irrelevant content into the final embeddings; and 3) prevailing models are susceptible to “interaction illusion”, producing physically inconsistent predictions. To address these issues, we propose an end-to-end Interaction-aware Transformer (InterFormer), which integrates three key components, i.e., a Dynamic Query Generator (DQG), a Dual-context Feature Selector (DFS), and the Conditional Co-occurrence (CoCo) loss. |
YUEJIAO SU; Yi Wang; Lei Yao; Yawen Cui; Lap-Pui Chau; | code |
| 499 | HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A well-disguised wrong solution program may only be detected by carefully human-written edge cases that are difficult to synthesize automatically. To address this issue, we propose HardTestsGen, an approach to synthesize high-quality test cases for algorithmic coding problems. |
Zhongmou He; Yee Man Choi; Kexun Zhang; Ivan Bercovich; Jiabao Ji; Junting Zhou; Dejia Xu; Aidan Zhang; Yixiao Zeng; Lei Li; | code |
| 500 | Rethinking LLM Reasoning: From Explicit Trajectories to Latent Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we challenge the necessity of generating full reasoning trajectories and empirically demonstrate that LLMs can generate accurate answers using only fragmental reasoning paths, without relying on complete token-by-token sequences. |
Cong Jiang; Xiaofeng Zhang; Fangzhi Zhu; XiaoWei Chen; Junxiong Zhu; Zheng Zhang; | code |
| 501 | Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A promising, yet unverified, hypothesis is that the architectural diversity of multi-agent systems (MAS)—where LLM-based agents with different roles and perspectives interact—could naturally mitigate this amplification. In this work, we rigorously test this hypothesis and investigate the phenomenon of bias amplification in MAS across sensitive attributes, including gender, age, and race. |
Keyu Li; Jin Gao; Dequan Wang; | code |
| 502 | Neural Compression of 3D Meshes Using Sparse Implicit Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods often exhibit suboptimal compression performance due to the inefficient representation of mesh data. To address this issue, we propose a novel neural mesh compression method based on Sparse Implicit Representation (SIR). |
Jianqiang Wang; Siyu Ren; Junhui Hou; | code |
| 503 | Huxley-G\odel Machine: Human-Level Coding Agent Development By An Approximation of The Optimal Self-Improving Machine Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, we identify a mismatch between the agent’s self-improvement potential (metaproductivity) and its coding benchmark performance, namely the \emph{Metaproductivity-Performance~Mismatch}. Inspired by Huxley’s concept of clade, we propose a metric ($\mathrm{CMP}$) that aggregates the benchmark performances of the \emph{descendants} of an agent as an indicator of its potential for self-improvement. |
Wenyi Wang; Piotr Piękos; Li Nanbo; Firas Laakom; Yimeng Chen; Mateusz Ostaszewski; Mingchen Zhuge; Jürgen Schmidhuber; | code |
| 504 | THOR: Tool-Integrated Hierarchical Optimization Via RL for Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite recent advances, existing methods struggle with three key challenges: constructing tool-integrated reasoning data, performing fine-grained optimization, and enhancing inference. To overcome these limitations, we propose THOR (Tool-Integrated Hierarchical Optimization via RL). |
Qikai Chang; Zhenrong Zhang; Pengfei Hu; Jun Du; Jiefeng Ma; Yicheng Pan; Jianshu Zhang; Quan Liu; Jianqing Gao; | code |
| 505 | Detecting and Mitigating Memorization in Diffusion Models Through Anisotropy of The Log-Probability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent memorization detection methods are primarily based on the norm of score difference as indicators of memorization. We prove that such norm-based metrics are mainly effective under the assumption of isotropic log-probability distributions, which generally holds at high or medium noise levels. |
Rohan Asthana; Vasileios Belagiannis; | code |
| 506 | JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Generative models often treat continuous data and discrete events as separate processes, creating a gap in modeling complex systems where they interact synchronously. To bridge this gap, we introduce $\textbf{JointDiff}$, a novel diffusion framework designed to unify these two processes by simultaneously generating continuous spatio-temporal data and synchronous discrete events. |
Guillem Capellera; Luis Ferraz; Antonio Rubio Romano; Alexandre Alahi; Antonio Agudo; | code |
| 507 | Cache-to-Cache: Direct Semantic Communication Between Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we propose Cache-to-Cache (C2C), a new paradigm for direct semantic communication between LLMs. |
Tianyu Fu; Zihan Min; Hanling Zhang; Jichao Yan; Guohao Dai; Wanli Ouyang; Yu Wang; | code |
| 508 | LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we revisit the pipeline of existing GraphRAG systems and propose Linear Graph-based Retrieval-Augmented Generation (LinearRAG), an efficient framework that enables reliable graph construction and precise passage retrieval. |
Luyao Zhuang; Shengyuan Chen; Yilin Xiao; Huachi Zhou; Yujing Zhang; Hao Chen; Qinggang Zhang; Xiao Huang; | code |
| 509 | Rethinking Driving World Model As Synthetic Data Generator for Perception Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To thoroughly demonstrate the benefit of synthetic data, we introduce Dream4Drive, a novel synthetic data generation framework designed for enhancing the downstream perception tasks. |
Kai Zeng; Zhanqian Wu; Kaixin Xiong; Xiaobao Wei; Xiangyu Guo; Zhenxin Zhu; Kalok Ho; Lijun Zhou; Bohan Zeng; Ming Lu; Haiyang Sun; BING WANG; Guang Chen; Hangjun Ye; Wentao Zhang; | code |
| 510 | WALT: Web Agents That Learn Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent website functionality into deterministic, callable tools. |
Viraj Prabhu; Yutong Dai; Matthew Fernandez; Krithika Ramakrishnan; Jing Gu; Yanqi Luo; silvio savarese; Caiming Xiong; Junnan Li; Zeyuan Chen; Ran Xu; | code |
| 511 | Structured Reasoning for LLMs: A Unified Framework for Efficiency and Explainability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, inspired by cognitive science and neurosymbolic AI, we introduce Structured Reasoning, which aimes at enhancing the reasoning capabilities of LLMs from the step level. |
Yubo Dong; Hehe Fan; Linchao Zhu; Yi Yang; | code |
| 512 | Improving 2D Diffusion Models for 3D Medical Imaging with Inter‑Slice Consistent Stochasticity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit the origin of stochasticity in diffusion sampling and introduce Inter‑Slice Consistent Stochasticity (ISCS), a simple yet effective strategy that encourages inter‑slice consistency during diffusion sampling. |
Chenhe Du; Qing Wu; Xuanyu Tian; Jingyi Yu; Hongjiang Wei; Yuyao Zhang; | code |
| 513 | Pay Less Attention to Function Words for Free Robustness of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We demonstrate the scalability, generalization, and zero-shot performance of FDA experimentally, as well as in-depth ablation studies and analysis. |
Qiwei Tian; Chenhao Lin; Zhengyu Zhao; Chao Shen; | code |
| 514 | Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a \textbf{Speech-guided Multimodal Machine Translation (SMMT)} framework that integrates speech and text as fused inputs into an MLLM to improve translation quality. |
Yexing Du; Youcheng Pan; Zekun Wang; Zheng Chu; Yichong Huang; Kaiyuan Liu; Bo Yang; Yang Xiang; Ming Liu; Bing Qin; | code |
| 515 | Revisiting Weight Regularization for Low-Rank Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we revisit weight regularization in low-rank CL as a new perspective for mitigating task interference in PECL. |
Yaoyue Zheng; Yin Zhang; Joost van de Weijer; Gido M van de Ven; Shaoyi Du; Xuetao Zhang; Zhiqiang Tian; | code |
| 516 | D$^2$Cache: Accelerating Diffusion-Based LLMs Via Dual Adaptive Caching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This is because dLLMs rely on bidirectional attention and cannot directly benefit from the standard key-value (KV) cache as autoregressive models (ARMs) do. To tackle this issue, we introduce \textit{Dual aDaptive Cache} (d$^2$Cache), which is a training-free approximate KV cache framework for accelerating dLLM inference. |
Yuchu Jiang; Yue Cai; Xiangzhong Luo; Jiale Fu; Jiarui Wang; Chonghan Liu; Xu Yang; | code |
| 517 | S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, we develop an efficient rendering pipeline to generate realistic HDR images. To further mitigate the domain gap between synthetic and real-world data, we introduce S2R-Adapter, a domain adaptation designed to bridge this gap and enhance the generalization ability of models. |
Yujin Wang; Jiarui Wu; Yichen Bian; Fan Zhang; Tianfan Xue; | code |
| 518 | PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PuzzleWorld, a comprehensive benchmark of 667 puzzlehunt-style problems designed to assess step-by-step, open-ended, and creative multimodal reasoning. |
Hengzhi Li; Justin Zhang; Brendon Jiang; Alexander Naehu; Regan Song; Megan Tjandrasuwita; Chanakya Ekbote; Steven-Shine Chen; Adithya Balachandran; Wei Dai; Rebecca Chang; Paul Pu Liang; | code |
| 519 | GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches often require retraining encoders or designing interpreter modules for pairwise feature alignment, but these solutions are not scalable in practice. To address this, we propose GT-Space, a flexible and scalable collaborative perception framework for heterogeneous agents. |
Wentao Wang; Haoran Xu; Guang Tan; | code |
| 520 | Mini-cluster Guided Long-tailed Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: How to re-weight the training of deep clustering models in an unsupervised setting remains an open challenge. To address this, we propose a mini-cluster guided long-tailed deep clustering method, termed MiniClustering. |
Zhixin Li; Yuheng Jia; Guanliang Chen; Hui LIU; Junhui Hou; | code |
| 521 | Diffusion Language Model Knows The Answer Before It Decodes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we highlight and leverage an overlooked property of DLMs, **early answer convergence**: in many cases, the correct answer can be internally identified by half steps before the final decoding step, both under semi-autoregressive and random re-masking schedules. |
Pengxiang Li; Yefan Zhou; Dilxat Muhtar; Lu Yin; Shilin Yan; Li Shen; Yi Liang; Soroush Vosoughi; Shiwei Liu; | code |
| 522 | ProteinAE: Protein Diffusion Autoencoders for Structure Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current approaches often grapple with the complexities of the $\operatorname{SE}(3)$ manifold, rely on discrete tokenization, or the need for multiple training objectives, all of which can hinder the model optimization and generalization. We introduce ProteinAE, a novel and streamlined protein diffusion autoencoder designed to overcome these challenges by directly mapping protein backbone coordinates from $\operatorname{E}(3)$ into a continuous, compact latent space. |
Shaoning Li; Le Zhuo; Yusong Wang; Mingyu Li; Xinheng He; Fandi Wu; Hongsheng Li; Pheng-Ann Heng; | code |
| 523 | ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We structurally decouple each recurrent forward of the RNN cell from the backtracked state and propose the second version of ADM (ADM-v2), making the direct prediction more flexible. |
Haoxin Lin; Siyuan Xiao; Yi-Chen Li; Zhilong Zhang; Yihao Sun; Chengxing Jia; Yang Yu; | code |
| 524 | HFSTI-Net: Hierarchical Frequency-spatial-temporal Interactions for Video Polyp Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its clinical application is hampered by two key challenges: shape collapse, which compromises structural integrity, and episodic amnesia, which causes instability in challenging video sequences. To address these challenges, we present a novel video segmentation network, \emph{HFSTI-Net}, which integrates global perception with spatiotemporal consistency in spatial, temporal, and frequency domains. |
Yuanqin He; Guilian Chen; Yuhua Zhang; Huisi Wu; Jing Qin; | code |
| 525 | Teach2Eval: An Interaction-Driven LLMs Evaluation Method Via Teaching Effectiveness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Teach2Eval, which reframes evaluation as teaching: a candidate model guides weaker students, and the students’ gains constitute the score. |
Yuhang Zhou; Xutian Chen; Yixin Cao; Yuchen Ni; Yu He; Siyu Tian; Xiang Liu; Yunwen Chen; Guangnan Ye; Xipeng Qiu; Hongfeng Chai; | code |
| 526 | Boolean Satisfiability Via Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ImitSAT, a branching policy for conflict-driven clause learning (CDCL) solvers based on imitation learning for the Boolean satisfiability problem (SAT). |
Zewei Zhang; Huan Liu; YUANHAO YU; Jun Chen; Xiangyu Xu; | code |
| 527 | VIRTUE: Visual-Interactive Text-Image Universal Embedder Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel **V**isual-**I**nte**R**active **T**ext-Image **U**niversal **E**mbedder (**VIRTUE**) that extends the capabilities of the segmentation model and the vision-language model to the realm of representation learning. |
Wei-Yao Wang; Kazuya Tateishi; Qiyu Wu; Shusuke Takahashi; Yuki Mitsufuji; | code |
| 528 | RedacBench: Can AI Erase Your Secrets? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing benchmarks and evaluation methods for redaction are often limited to predefined categories of data like personally identifiable information (PII), or particular techniques like masking. To bridge this gap, we introduce RedacBench, a novel benchmark for a comprehensive evaluation of redaction capabilities, independent of specific domains or redaction strategies. |
Hyunjun Jeon; Kyuyoung Kim; Jinwoo Shin; | code |
| 529 | CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present CGSA, the first framework that brings Object-Centric Learning (OCL) into SF-DAOD by integrating slot-aware adaptation into the DETR-based detector. |
Boyang Dai; Zeng Fan; Zihao Qi; Meng Lou; Yizhou Yu; | code |
| 530 | ZeroTuning: Unlocking The Initial Token’s Power to Enhance Large Language Models Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Empirically, we find that this tuning can improve LLM performance and better elicit pretrained knowledge, with stronger effects in early layers and distinct scaling preferences across attention heads. Building on these findings, we introduce ZeroTuning, a training-free method that improves LLM performance by applying head-specific attention adjustments to the initial token, requiring no parameter updates. |
Feijiang Han; Xiaodong Yu; Jianheng Tang; Delip Rao; Weihua Du; Lyle Ungar; | code |
| 531 | GarmentGPT: Compositional Garment Pattern Generation Via Discrete Latent Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present GarmentGPT, the first framework to operationalize latent space generation for sewing patterns. |
Fangsheng Weng; Junhao Chen; Xiang Li; Jie Qin; Hanzhong Guo; ShaochunHao; Xiaoguang Han; | code |
| 532 | OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present OSWorld-MCP, the first comprehensive and fair benchmark for assessing computer-use agents’ tool invocation, GUI operation, and decision-making abilities in a real-world environment.We will release all code and data to the community. |
Hongrui Jia; Jitong Liao; Xi Zhang; Haiyang Xu; Tianbao Xie; Chaoya Jiang; Ming Yan; Si Liu; Wei Ye; Fei Huang; | code |
| 533 | ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates, thereby exploiting the model’s inherent instruction-following tendencies. |
Hwan Chang; Yonghyun Jun; Hwanhee Lee; | code |
| 534 | Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Ref-Adv, a modern REC benchmark that suppresses shortcuts by pairing linguistically nontrivial expres- sions with only the information necessary to uniquely identify the target. |
Qihua Dong; Kuo Yang; Lin Ju; Handong Zhao; Yitian Zhang; Yizhou Wang; Huimin Zeng; Jianglin Lu; Yun Fu; | code |
| 535 | Seeing Through Words: Controlling Visual Retrieval Quality with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a training framework that conditions query completion on discretized quality levels, derived from relevance and aesthetic scoring models, so that query enrichment is not only semantically meaningful but also quality-aware. |
Jianglin Lu; Simon Jenni; Kushal Kafle; Jing Shi; Handong Zhao; Yun Fu; | code |
| 536 | VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such fine-grained comparative reasoning is central to real-world tasks, especially in mathematics and education, where learners must often distinguish between nearly identical diagrams to identify correct solutions. To address this gap, we present VisioMath, a curated benchmark of 1,800 high-quality K–12 mathematics problems in which all candidate answers are diagrams with subtle visual similarities. |
Can Li; Ying Liu; Ting Zhang; Mei Wang; Hua Huang; | code |
| 537 | Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our observation reveals that increasing problem complexity induces more excessive and unnecessary reflection, which in turn reduces accuracy and increases token overhead. To address this challenge, we propose Adaptive Reflection and Length Coordinated Penalty (ARLCP), a novel reinforcement learning framework designed to dynamically balance reasoning efficiency and solution accuracy. |
Zewei Yu; Lirong Gao; Yuke Zhu; Bo Zheng; Junbo Zhao; Sheng Guo; Haobo Wang; | code |
| 538 | REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, online reinforcement learning mainly adopts a length reward to encourage short reasoning responses, but it tends to lose reflection ability and harm performance. To address these issues, we propose REA-RL, which introduces a small reflection model for efficient scaling in online training, offering both parallel sampling and sequential revision. |
Hexuan Deng; Wenxiang Jiao; Xuebo Liu; Jun Rao; Min Zhang; | code |
| 539 | AsyncBEV: Cross-modal Flow Alignment in Asynchronous 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such asynchrony degrades perception performance, especially for dynamic objects. To address this challenge, we propose AsyncBEV, a trainable lightweight and generic module to improve the robustness of 3D Birds’ Eye View (BEV) object detection models against sensor asynchrony. |
Shiming Wang; Holger Caesar; Liangliang Nan; Julian F. P. Kooij; | code |
| 540 | Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we repurpose AlphaFold 3 for representation learning to predict binding affinity, a non-trivial task that requires shifting from generative structure prediction to encoding observed geometry, simplifying the heavily conditioned trunk module, and designing a framework to jointly capture sequence and structural information. |
Liang Shi; Zuobai Zhang; Huiyu Cai; Santiago Miret; Zhi Yang; Jian Tang; | code |
| 541 | SparseEval: Efficient Evaluation of Large Language Models By Sparse Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we revisit the model-item performance matrix and show that it exhibits sparsity, that representative items can be selected as anchors, and that the task of efficient benchmarking can be formulated as a sparse optimization problem. |
Taolin Zhang; Hang Guo; Wang Lu; Tao Dai; Shu-Tao Xia; Jindong Wang; | code |
| 542 | VFScale: Intrinsic Reasoning Through Verifier-Free Test-time Scalable Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce the Verifier-free Test-time Scalable Diffusion Model (VFScale) to achieve scalable intrinsic reasoning, which equips number-of-sample test-time scaling with the intrinsic energy function of diffusion models as the verifier. |
Tao Zhang; Jia-Shu Pan; Ruiqi Feng; Tailin Wu; | code |
| 543 | Threading Keyframe with Narratives: MLLMs As Strong Long Video Comprehenders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose _Narrating KeyFrames Capturing_ (Nar-KFC), a plug-and-play module to facilitate effective and efficient long video perception. |
Bo Fang; YuXin Song; Haoyuan Sun; Qiangqiang Wu; Wenhao Wu; Antoni B. Chan; | code |
| 544 | Reassessing Layer Pruning in LLMs: New Insights and Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Although large language models (LLMs) have achieved remarkable success across various domains, their considerable scale necessitates substantial computational resources, posing … |
Yao Lu; Hao Cheng; Yujie Fang; Zeyu Wang; Jiaheng Wei; Dongwei Xu; Qi Xuan; Zhaowei Zhu; | code |
| 545 | PepBenchmark: A Standardized Benchmark for Peptide Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we present \textbf{PepBenchmark}, which standardizes datasets, preprocessing, and evaluation protocols for peptide drug discovery. |
Jiahui Zhang; Rouyi Wang; Kuangqi Zhou; Tianshu Xiao; Lingyan Zhu; Yaosen Min; Yang Wang; | code |
| 546 | Task-Aware Data Selection Via Proxy-Label Enhanced Distribution Matching for LLM Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While prevailing data selection methods rely exclusively on instruction instances X to approximate the target distribution, we argue that selection should align with the joint distribution of instructions and task-specific labels (X,Y), However, task-specific labels Y are typically unavailable in practice. To address this, we reformulate the task-specific data selection problem and present a novel pipeline that leverages the reasoning capabilities of large language models (LLMs) to infer proxy labels, thereby facilitating joint distribution alignment. |
Hao Cheng; Rui Zhang; Ling Li; Na Di; Jiaheng Wei; Zhaowei Zhu; Bo Han; | code |
| 547 | HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce HEAPr, a novel pruning algorithm that decomposes experts into smaller, indivisible atomic experts, enabling more precise and flexible atomic expert pruning. |
Ke Li; Zheng Yang; Zhongbin Zhou; Xuefeng; Zhonglin Jiang; Wenxiao Wang; | code |
| 548 | Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches for vision tasks often rely on indirect representations, such as generating coordinates as text for detection, which limits performance and prevents dense prediction tasks like segmentation. To overcome these challenges, we introduce Patch-as-Decodable Token (PaDT), a unified paradigm that enables MLLMs to directly generate both textual and diverse visual outputs. |
Yongyi Su; Haojie Zhang; Shijie Li; Nanqing Liu; Jingyi Liao; Junyi Pan; Yuan Liu; Xiaofen Xing; Chong Sun; Chen Li; Nancy F. Chen; Shuicheng YAN; Xulei Yang; Xun Xu; | code |
| 549 | Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models Via Semantic Capacity Asymmetry Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate whether smaller models can serve as efficient evaluators by leveraging internal representations instead of surface generation. |
Zhuochun Li; Yong Zhang; Ming Li; Yuelyu Ji; Yiming Zeng; Ning Cheng; Yun Zhu; Yanmeng Wang; Shaojun Wang; Jing Xiao; Daqing He; | code |
| 550 | PMDformer: Patch-Mean Decoupling Information Transformer for Long-term Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: And the latter focuses cross-variable attention on the most relevant, recent time segments to avoid overfitting on outdated correlations. Combining these components, we propose PMDformer, a model designed to effectively capture shape similarity in long-term forecasting scenarios. |
Ao Hu; Liangjian Wen; Jiang Duan; Yong Dai; YAN HE; Dongkai Wang; Jun Wang; Yukun Zhang; Ruoxi Jiang; Zenglin Xu; | code |
| 551 | WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Wavelet-Aware Temperature Scaling (WATS), a post-hoc calibration framework for node classification that assigns node-specific temperatures based on tunable heat-kernel graph wavelet features. |
Xiaoyang Li; Linwei Tao; Haohui Lu; Minjing Dong; Junbin Gao; Chang Xu; | code |
| 552 | ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Temporal Residual World Model (TR-World), which focuses on dynamic object modeling. |
Jinqing Zhang; Zehua Fu; zelinxu; wenying.dai; Qingjie Liu; Yunhong Wang; | code |
| 553 | SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models Via Multi-task Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose \textbf{SwiftTS}, a swift selection framework for time series pre-trained models. |
Tengxue Zhang; Biao Ouyang; Yang Shu; Xinyang Chen; Chenjuan Guo; Bin Yang; | code |
| 554 | Safety Subspaces Are Not Linearly Distinct: A Fine-Tuning Case Study Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we conduct a comprehensive empirical study of this perspective. |
Kaustubh Ponkshe; Shaan Shah; Raghav Singhal; Praneeth Vepakomma; | code |
| 555 | QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches often rely on a flawed, query-agnostic “change-is-important” principle, which conflates visual dynamics with semantic relevance, leading to computational waste and interaction errors. To address this, we propose QueryStream, a novel framework that instills query-awareness into the core of video processing and response scheduling. |
Kairui Zhang; Zhenyu Yang; Bing Wang; Shengsheng Qian; Changsheng Xu; | code |
| 556 | Bridging Piano Transcription and Rendering Via Disentangled Score Content and Style Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a unified framework that jointly models EPR and APT by disentangling note-level score content and global performance style representations from both paired and unpaired data. |
Wei Zeng; JUNCHUAN ZHAO; Ye Wang; | code |
| 557 | Enhanced Continual Learning of Vision-Language Models with Model Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel Continual Decoupling-Unifying (ConDU) approach that pioneers the use of model fusion for continual learning in VLMs. |
Haoyuan Gao; Zicong Zhang; Yuqi Wei; Linglan Zhao; Guilin Li; Yexin Li; Bo Wang; Linghe Kong; Weiran Huang; | code |
| 558 | Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this limitation, we introduce a new image quality assessment (IQA) task paradigm, **grounding-IQA**.To realize grounding-IQA, we construct a corresponding dataset, GIQA-160K, through our proposed automated annotation pipeline.Furthermore, we develop a well-designed benchmark, GIQA-Bench. |
Zheng Chen; Xun Zhang; Wenbo Li; Renjing Pei; Fenglong Song; Xiongkuo Min; Xiaohong Liu; Xin Yuan; Yong Guo; Yulun Zhang; | code |
| 559 | PCPO: Proportionate Credit Policy Optimization for Preference Alignment of Image Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis identifies a key cause of this instability: disproportionate credit assignment, in which the mathematical structure of the generative sampler produces volatile and non-proportional feedback across timesteps. To address this, we introduce Proportionate Credit Policy Optimization (PCPO), a framework that enforces proportional credit assignment through a stable objective reformulation and a principled reweighting of timesteps. |
Jeongjae Lee; Jong Chul Ye; | code |
| 560 | Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We cast the problem of LLM router training with combined Gold-standard and preference-based data into a causal inference framework by viewing the response evaluation mechanism as the treatment assignment. |
Yichi Zhang; Fangzheng Xie; Shu Yang; Chong Wu; | code |
| 561 | TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we target the clinical tumor analysis task and build a large-scale benchmark that operationalizes a multimodal reasoning pipeline, spanning findings, impressions, and pathology predictions.We release the task, benchmark, and evaluation protocol to advance safe, explainable, and reproducible multimodal reasoning for high-stakes tumor analysis. |
Sijing Li; Zhongwei Qiu; Jiang Liu; Wenqiao Zhang; Tianwei Lin; Yihan Xie; Jianxiang An; Boxiang Yun; Chenglin Yang; Jun Xiao; Guangyu Guo; Jiawen Yao; Wei Liu; Yuan gao; Ke Yan; Weiwei Cao; Zhilin Zheng; Tony C. W. MOK; Kai Cao; Yu Shi; Jiuyu Zhang; Jian Zhou; Beng Chin Ooi; Yingda Xia; Ling Zhang; | code |
| 562 | Enabling True Global Perception in State Space Models for Visual Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on in-depth analysis of SSMs and frequency-domain modeling principles, we construct a complete theoretical framework that overcomes the limitations imposed by SSMs’ recursive modeling mechanism from a frequency perspective, thereby adapting SSMs for global perception in image modeling. |
Jie Hui; Zhenxiang Zhang; Wenyu Mi; Jianji Wang; | code |
| 563 | Context and Diversity Matter: The Emergence of In-Context Learning in World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate in-context learning (ICL) of world models, shifting attention from zero-shot performance to the growth and asymptotic limits of the world model. |
Fan Wang; ZHIYUAN CHEN; YUXUAN ZHONG; Sunjian Zheng; Pengtao Shao; Bo Yu; Shaoshan Liu; Jianan Wang; Ning Ding; Yang Cao; Yu Kang; | code |
| 564 | Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce \textbf{Doloris}, a generative framework that defines a new paradigm for modeling unpaired, high-dimensional, and sparse single-cell perturbation data. |
Changxi Chi; Jun Xia; Yufei Huang; Zhuoli Ouyang; Cheng Tan; Yunfan Liu; Jingbo Zhou; Chang Yu; Liangyu Yuan; Siyuan Li; Zelin Zang; Stan Z. Li; | code |
| 565 | PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their ability to understand and operate on real-world engineering problems—such as Printed Circuit Board (PCB) placement and routing—remains underexplored due to the lack of standardized benchmarks and high-fidelity datasets. To address this gap, we introduce PCB-Bench, the first comprehensive benchmark designed to systematically evaluate LLMs in the context of PCB design. |
Jindong Li; Lianrong Chen; Bin Yang; Jiadong Zhu; Ying Wang; Yuzhe Ma; Menglin Yang; | code |
| 566 | Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration Via Cumulative Error Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their fixed caching strategy fails to adapt to the complex error variations during denoising, which limits the full potential of error correction. To tackle this challenge, we propose a novel fidelity-optimization plugin for existing error correction methods via cumulative error minimization, named CEM. |
Tong Shao; Yusen Fu; Guoying Sun; Jingde Kong; Zhuotao Tian; Jingyong Su; | code |
| 567 | G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose an innovative graph model merging framework called G-Merging for merging multiple task-specific fine-tuned GNN models. |
Jun Chen; Ziyue Qiao; Qin Zhang; Kaize Ding; Xiao Luo; | code |
| 568 | Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To overcome this, we propose \textbf{HVD} (Hierarchical Value-Decomposed Offline Reinforcement Learning), an offline reinforcement learning framework that learns effective policies directly from suboptimal, reward-labeled trajectories.To enable realistic evaluation and training, we further introduce \textbf{WB-50}, a 50-hour dataset of teleoperated and policy rollout trajectories annotated with rewards and preserving natural imperfections — including partial successes, corrections, and failures. |
Zhilong Zhang; Yunpeng Mei; Xinghao Du; Hongjie Cao; Haonan Wang; Pengyuan Min; Chenyu Wang; Pengfei Chen; Chenbo Xin; Yijie Wang; Wenyu Luo; Yihao Sun; Yidi Wang; Lei Yuan; Gang Wang; Yang Yu; | code |
| 569 | PA3FF:Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: When lifting these 2D features to geometry-profound 3D space, challenges arise, such as long runtimes, multi-view inconsistencies, and low spatial resolution with insufficient geometric information. To address these issues, we propose \textbf{Part-Aware 3D Feature Field (PA3FF)}, a novel dense 3D feature with part awareness for generalizable articulated object manipulation. |
Yue Chen; Muqing Jiang; Kaifeng Zheng; Jiaqi Liang; Chenrui Tie; Haoran Lu; Ruihai Wu; Hao Dong; | code |
| 570 | Reference-guided Policy Optimization for Molecular Optimization Via LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, adapting them to scientific domains like molecular optimization is challenging: its datasets provide only reference molecules, lacking the reasoning traces for SFT, while its competitive objectives hinder RLVR. To address these issues, we introduce Demonstration-guided Policy Optimization (DePO). |
Xuan Li; Zhanke Zhou; Zongze Li; Jiangchao Yao; Yu Rong; Lu Zhang; Bo Han; | code |
| 571 | PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper provides, for the first time, a clear explanation of why patch-level processing is inherently inefficient, supported by strong evidence from real-world data. To address these limitations, we introduce a phase perspective for modeling periodicity and present an efficient yet effective solution, PhaseFormer. |
Yiming Niu; Jinliang Deng; Yongxin Tong; | code |
| 572 | AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference Via Adaptive Block Size Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through a statistical analysis of confidence dynamics during the denoising process, we identify a volatility band (VB) region during dLLM decoding, which encodes local semantic structure and can be used to guide adaptive block sizing. Leveraging these insights, we introduce AdaBlock-dLLM, a training-free, plug-and-play scheduler that adaptively aligns block boundaries with semantic steps by adjusting block size during runtime. |
Guanxi Lu; Hao Mark Chen; Yuto Karashima; Zhican Wang; Daichi Fujiki; Hongxiang Fan; | code |
| 573 | A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These include traditional numerical methods and deep learning models, which struggle with limited model expressiveness and poor generalization across resolutions. To solve this issue, we propose InertialGenome, a novel transformer-based framework for robust and resolution-agnostic chromosome reconstruction. |
Yize Zhou; Haorui Li; Shengchao Liu; | code |
| 574 | Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce RigidSSL, a rigidity-based pretraining framework for proteins. |
Zhanghan Ni; Yanjing Li; Zeju Qiu; Bernhard Schölkopf; Hongyu Guo; Weiyang Liu; Shengchao Liu; | code |
| 575 | ADEPT: Continual Pretraining Via Adaptive Expansion and Dynamic Decoupled Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our pilot studies reveal that LLMs exhibit functional specialization, where layers and units differentially encode general-critical capabilities, suggesting that parameter expansion and optimization should be function-aware. |
Jinyang Zhang; Yue Fang; Hongxin Ding; Weibin Liao; Muyang Ye; Junfeng Zhao; Yasha Wang; Xu Chu; | code |
| 576 | AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The application of large language models (LLMs) in the medical field has garnered significant attention, yet their reasoning capabilities in more specialized domains like anesthesiology remain underexplored. To bridge this gap, we introduce AnesSuite, the first comprehensive dataset suite specifically designed for anesthesiology reasoning in LLMs. |
Xiang Feng; Wentao Jiang; Zengmao Wang; Yong Luo; Pingbo Xu; Baosheng Yu; Hua Jin; Jing Zhang; | code |
| 577 | The Curious Case of In-Training Compression of State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: State Space Models (SSMs), developed to tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference. |
Makram Chahine; Philipp Nazari; Daniela Rus; T. Konstantin Rusch; | code |
| 578 | From Evaluation to Defense: Advancing Safety in Video Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on this, \textit{we reveal that integrating video modality degrades safety performance by an average of 34.2\%, exposing systemic risks in multimodal attack exploitation.} To address this vulnerability, we propose \textbf{VideoSafety-R1}, a dual-stage framework achieving unprecedented safety gains through three innovations: (1) VideoSafetyThinking dataset contains 46k video-query–thinking response triplets. |
Yiwei Sun; Peiqi Jiang; Chuanbin Liu; Luohao Lin; Zhiying Lu; Hongtao Xie; | code |
| 579 | D&R: Recovery-based AI-Generated Text Detection Via A Single Black-box LLM Call Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Disrupt-and-Recover (D\&R), a recovery-based detection framework grounded in posterior concentration. |
Yuxia Sun; Ran Zhang; Aoxiang Sun; Xu Li; Zitao Liu; Jingcai Guo; | code |
| 580 | One2Scene: Geometric Consistent Explorable 3D Scene Generation from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods struggle to support free exploration, often producing severe geometric distortions and noisy artifacts when the viewpoint moves far from the original perspective. We introduce One2Scene, an effective framework that decomposes this ill-posed problem into three tractable sub-tasks to enable immersive explorable scene generation. |
Pengfei Wang; Liyi Chen; Zhiyuan Ma; Yanjun Guo; Guowen Zhang; Lei Zhang; | code |
| 581 | Generalized Spherical Neural Operators: Green’s Function Formulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing spherical operators rely on rotational equivariance but often lack the flexibility for real-world complexity. We propose a generalized operator-design framework based on \textbf{designable Green’s function} and its harmonic expansion, establishing a solid operator-theoretic foundation for spherical learning. |
Hao Tang; Hao Chen; Chao Li; | code |
| 582 | IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose IR-Agent, a novel multi-agent framework for molecular structure elucidation from IR spectra. |
Heewoong Noh; Namkyeong Lee; Gyoung S. Na; Kibum Kim; Chanyoung Park; | code |
| 583 | RESCUE: Retrieval Augmented Secure Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the conventional RAG design struggles with the noise of raw security-related documents, and existing retrieval methods overlook the significant security semantics implicitly embedded in task descriptions. To address these issues, we propose \textsc{Rescue}, a new RAG framework for secure code generation with two key innovations. |
Jiahao Shi; Tianyi Zhang; | code |
| 584 | Relational Feature Caching for Accelerating Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through a detailed analysis, we find that 1) these errors stem from the irregular magnitude of changes in the output features, and 2) an input feature of a module is strongly correlated with the corresponding output. Based on this, we propose relational feature caching (RFC), a novel framework that leverages the input-output relationship to enhance the accuracy of the feature prediction. |
Byunggwan Son; Jeimin Jeon; Jeongwoo Choi; Bumsub Ham; | code |
| 585 | Test-Time Iterative Error Correction for Efficient Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through an analysis of error propagation across diffusion timesteps, we reveal that these approximation errors can accumulate exponentially, severely impairing output quality. Motivated by this insight, we propose Iterative Error Correction (IEC), a novel test-time method that mitigates inference-time errors by iteratively refining the model’s output. |
Yunshan Zhong; Weiqi Yan; Yuxin Zhang; | code |
| 586 | Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, during generation, the prompt-related regions can only reference the unrelated regions at the same noise level, failing to obtain clear context and ultimately impairing text-to-image alignment. To address this issue, we propose asynchronous diffusion models, a novel framework that allocates distinct timesteps to different pixels and reformulates the pixel-wise denoising process. |
Zijing Hu; Yunze Tong; Fengda Zhang; Junkun Yuan; Jun Xiao; Kun Kuang; | code |
| 587 | Autoencoding-Free Context Compression for LLMs Via Contextual Semantic Anchors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Context compression presents a promising approach for accelerating large language model (LLM) inference by compressing long contexts into compact representations.Current context compression methods predominantly rely on autoencoding tasks to train context-agnostic compression tokens to compress contextual semantics.While autoencoding tasks enable compression tokens to acquire compression capabilities, compression via autoencoding tasks creates a fundamental mismatch: the models are optimized for reconstruction that diverge from actual downstream tasks, thereby weakening the features more beneficial for real-world usage.We propose Semantic-Anchor Compression (SAC), a novel method that shifts from autoencoding task based compression to an architecture that is equipped with this compression capability $\textit{a priori}$. |
Xin Liu; Runsong Zhao; Pengcheng Huang; Xinyu Liu; Junyi Xiao; Chunyang Xiao; Tong Xiao; Shengxiang Gao; Zhengtao Yu; JingBo Zhu; | code |
| 588 | Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Whereas, we observe that entities with semantic similarities often exhibit comparable interaction histories, suggesting the presence of transferable temporal patterns. Inspired by this insight, we propose TransFIR (Transferable Inductive Reasoning), a novel framework that leverages historical interaction sequences from semantically similar known entities to support inductive reasoning. |
Ze Zhao; He Yuhui; Lyuwen Wu; Gu Tang; Bin Lu; Xiaoying Gan; Luoyi Fu; Xinbing Wang; Chenghu Zhou; | code |
| 589 | ReSplat: Degradation-agnostic Feed-forward Gaussian Splatting Via Self-guided Residual Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While some approaches address NVS under specific degradation types, they are often tailored to narrow cases, lacking the generalizability needed for broader scenarios. To address this issue, we propose Restoration-based feed-forward Gaussian Splatting, named ReSplat, a novel framework capable of handling degraded multi-view inputs. |
Youngho Yoon; Kuk-Jin Yoon; | code |
| 590 | LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose LUMINA, a novel framework that detects hallucinations in RAG systems through context–knowledge signals: external context utilization is quantified via distributional distance, while internal knowledge utilization is measured by tracking how predicted tokens evolve across transformer layers. |
Samuel Yeh; Sharon Li; Tanwi Mallick; | code |
| 591 | CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, off-the-shelf Gaussian splatting methods are designed for photorealistic view synthesis and remain incompatible with cryo-EM due to mismatches in the image formation physics, reconstruction objectives, and coordinate systems. Addressing these issues, we propose cryoSplat, a GMM-based method that integrates Gaussian splatting with the physics of cryo-EM image formation. |
Suyi Chen; Haibin Ling; | code |
| 592 | CogFlow: Bridging Perception and Reasoning Through Knowledge Internalization for Visual Mathematical Problem Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Notably, they all ignore the key issue of whether the extracted visual cues are faithfully integrated and properly utilized in subsequent reasoning. Motivated by this, we present CogFlow, a novel cognitive-inspired three-stage framework that incorporates a knowledge internalization stage, explicitly simulating the hierarchical flow of human reasoning: perception$\Rightarrow$internalization$\Rightarrow$reasoning. |
Shuhang Chen; Yunqiu Xu; Junjie Xie; Aojun Lu; Tao Feng; ZEYING HUANG; ZHANG NING; Yi Sun; Yi Yang; Hangjie Yuan; | code |
| 593 | Squeeze The Soaked Sponge: Efficient Off-policy RFT for Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore the potential of \textit{off-policy} RL to leverage historical data for rollout-efficient RFT. |
Jing Liang; Jinyi Liu; Yi Ma; Hongyao Tang; YAN ZHENG; Shuyue Hu; LEI BAI; Jianye HAO; | code |
| 594 | Understanding and Improving Continuous LLM Adversarial Training Via In-context Learning Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Further, the robust bound shows that the robustness of an adversarially trained LLM is closely related to the singular values of its embedding matrix. Based on this, we propose to improve LLM CAT by introducing an additional regularization term, which depends on singular values of the LLM’s embedding matrix, into the objective function of CAT. |
Shaopeng Fu; Di Wang; | code |
| 595 | Disentangled Representation Learning for Parametric Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, as black-box solvers, they offer limited insight into the underlying physical mechanism, due to the lack of interpretable representations of the physical parameters that drive the system. To tackle this challenge, we propose a new paradigm for learning disentangled representations from NO parameters, thereby effectively solving an inverse problem. |
Ning Liu; Lu Zhang; Tian Gao; Yue Yu; | code |
| 596 | Generalization of RLVR Using Causal Reasoning As A Testbed Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper provides an empirical study of RLVR generalization in the setting of probabilistic inference over causal graphical models.We construct datasets of causal graphs and queries spanning these difficulty axes and fine-tune Qwen-2.5-Instruct models using RLVR or supervised fine-tuning (SFT). |
Brian Lu; Hongyu Zhao; Shuo Sun; Hao Peng; Rui Ding; Hongyuan Mei; | code |
| 597 | When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We instead frame abstention as a teachable skill and introduce a pipeline that couples Chain-of-Thought (CoT) supervision with Reinforcement Learning (RL) guided by abstention-aware rewards. |
Xinyu Zhou; Chang Jin; Carsten Eickhoff; Zhijiang Guo; Seyed Ali Bahrainian; | code |
| 598 | Dual Distillation for Few-Shot Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce D$^2$4FAD, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images. |
Le Dong; Qinzhong Tan; Chunlei Li; Jingliang Hu; Yilei Shi; Weisheng Dong; Xiao Xiang Zhu; Lichao Mou; | code |
| 599 | What Generative Search Engines Like and How to Optimize Web Content Cooperatively Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them. In this paper, we introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation, and rewrite web contents for more such traction. |
Yujiang Wu; Shanshan Zhong; Yubin Kim; Chenyan Xiong; | code |
| 600 | Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, rather than learning abstract prompt embeddings, we propose the first framework, named Interpretable Visual Prompt Tuning (IVPT), to explore interpretability for visual prompts by introducing cross-layer concept prototypes. |
Yubin Wang; XINYANG JIANG; De Cheng; Xiangqian Zhao; Zilong Wang; Dongsheng Li; Cairong Zhao; | code |
| 601 | Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation leads to distinct crystals being mapped to identical representations, hindering accurate property prediction. To address this, we introduce PRDNet that leverages unique reciprocal-space diffraction besides graph representations. |
Bin Cao; Yang Liu; Longhan Zhang; Yifan Wu; Zhixun Li; Yuyu Luo; Hong Cheng; Yang Ren; Tongyi ZHANG; | code |
| 602 | Rethinking Continual Learning with Progressive Neural Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus inspired, several studies have emerged very recently to leverage a fixed global ETF in CL, which however suffers from key drawbacks, such as *impracticability* and *limited performance*. To address these challenges and fully unlock the potential of ETF in CL, we propose **Progressive Neural Collapse (ProNC)**, a novel framework that completely removes the need of a fixed global ETF in CL. |
Zheng Wang; Wanhao Yu; Li Yang; Sen Lin; | code |
| 603 | Solving Football By Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For a two-player imperfect-information extensive-form game (IIEFG) with $K$ time steps and a player action space of size $U$, the game tree complexity is $U^{2K}$, causing existing IIEFG solvers to struggle with large or infinite $(U,K)$, e.g., differential games with continuous action spaces. To partially address this scalability challenge, we focus on an important class of 2p0s games where the informed player (P1) knows the payoff while the uninformed player (P2) only has a belief over the set of $I$ possible payoffs. |
Mukesh Ghimire; Lei Zhang; Zhe Xu; Yi Ren; | code |
| 604 | DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce DreamPhase, a modular framework that plans through offline imagination. |
Shayan Mohajer Hamidi; Linfeng Ye; Konstantinos N. Plataniotis; | code |
| 605 | How Far Can Unsupervised RLVR Scale LLM Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit URLVR through the lens of intrinsic rewards. |
Bingxiang He; Yuxin Zuo; Zeyuan Liu; Shangziqi Zhao; Zixuan Fu; Junlin Yang; Cheng Qian; Kaiyan Zhang; Yuchen Fan; Ganqu Cui; Xiusi Chen; Youbang Sun; Xingtai Lv; Xuekai Zhu; Li Sheng; Ran Li; Huan-ang Gao; Yuchen Zhang; Lifan Yuan; Bowen Zhou; Zhiyuan Liu; Ning Ding; | code |
| 606 | Multimodal Classification Via Total Correlation Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we theoretically analyze modality competition and propose a method for multimodal classification by maximizing the total correlation between multimodal features and labels. |
Feng Yu; Xiangyu Wu; Yang Yang; Jianfeng Lu; | code |
| 607 | Latent Denoising Makes Good Tokenizers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this insight, we propose aligning tokenizer embeddings directly with the downstream denoising objective, encouraging latent embeddings that remain reconstructable even under significant corruption. To achieve this, we introduce the Latent Denoising Tokenizer (\method), a simple yet highly effective tokenizer trained to reconstruct clean images from latent embeddings corrupted via interpolative noise or random masking. |
Jiawei Yang; Tianhong Li; Lijie Fan; Yonglong Tian; Yue Wang; | code |
| 608 | IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this gap, we introduce IndicVisionBench, the first large-scale benchmark centered on the Indian subcontinent.In addition, we release a paired parallel corpus of annotations across 10 Indic languages, creating a unique resource for analyzing cultural and linguistic biases in VLMs. |
Ali Faraz; Akash; Shaharukh Khan; Raja Kolla; Akshat Patidar; Suranjan Goswami; Abhinav Ravi; Chandra Khatri; Shubham Agarwal; | code |
| 609 | LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a first step in that direction where we develop LLM-JEPA, a JEPA based solution for LLMs applicable both to finetuning and pretraining. |
Hai Huang; Yann LeCun; Randall Balestriero; | code |
| 610 | Policy Contrastive Decoding for Robotic Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities during inference. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy’s focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. |
Shihan Wu; Xu Luo; Ji Zhang; Junlin Xie; Jingkuan Song; Heng Tao Shen; Lianli Gao; | code |
| 611 | Characterizing Human Semantic Navigation in Concept Production As Trajectories in Embedding Space Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. |
Felipe Diego Toro-Hernández; Jesuino Vieira Filho; Rodrigo M. Cabral-Carvalho; | code |
| 612 | When Would Vision-Proprioception Policies Fail in Robotic Manipulation? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, recent studies have reported inconsistent observations on the generalization of vision-proprioception policies. In this work, we investigate this by conducting temporally controlled experiments. |
Jingxian Lu; Wenke Xia; Yuxuan Wu; Zhiwu Lu; Di Hu; | code |
| 613 | Language-guided Open-world Video Anomaly Detection Under Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paradigm necessitates establishing a robust mapping from video and textual definition to anomaly scores. Therefore, we propose LaGoVAD (**La**nguage-**g**uided **O**pen-world **V**ideo **A**nomaly **D**etector), a model that dynamically adapts anomaly definitions under weak supervision with two regularization strategies: diversifying the relative durations of anomalies via dynamic video synthesis, and enhancing feature robustness through contrastive learning with negative mining. |
Zihao Liu; Xiaoyu Wu; Jianqin Wu; Xuxu Wang; Linlin Yang; | code |
| 614 | Low-Latency Neural LiDAR Compression with 2D Context Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a neural LiDAR compressor based on 2D context models that simultaneously supports high-efficiency compression, fast coding, and universal geometry-intensity compression. |
Rui Song; Yan Wang; Tongda Xu; Zhening Liu; Zehong Lin; Jun Zhang; | code |
| 615 | AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AutoEP, a novel framework that bypasses training entirely by leveraging Large Language Models (LLMs) as zero-shot reasoning engines for algorithm control. |
Zhenxing Xu; Yizhe Zhang; Weidong Bao; Hao Wang; Ming Chen; Haoran Ye; Wenzheng Jiang; Hui Yan; Ji Wang; | code |
| 616 | Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Bias Similarity Measurement (BSM), which treats fairness as a relational property between models, unifying scalar, distributional, behavioral, and representational signals into a single similarity space. |
Hyejun Jeong; Shiqing Ma; Amir Houmansadr; | code |
| 617 | AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current safety alignment methods often result in superficial refusal shortcuts or rely on intensive supervision for reasoning-based approaches, failing to fully leverage the model’s intrinsic safety self-awareness. We propose \textbf{AlphaAlign}, a simple yet effective pure reinforcement learning (RL) framework with verifiable safety reward designed to incentivize this latent safety awareness through proactive safety reasoning. |
Yi Zhang; An Zhang; XiuYu Zhang; Leheng Sheng; Yuxin Chen; Zhenkai Liang; Xiang Wang; | code |
| 618 | CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models, current approaches still yield clinically uninterpretable and weakly discriminative representations, inefficiently capture global dependencies, and neglect important local neural events. We present CodeBrain, a two-stage EFM designed to fill this gap. |
Jingying Ma; Feng Wu; Qika Lin; Yucheng Xing; Chenyu Liu; Ziyu Jia; Mengling Feng; | code |
| 619 | CTBench: Cryptocurrency Time Series Generation Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most prior work targets non-financial or traditional financial domains, focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and lacks critical financial evaluations, particularly for trading applications. To bridge these gaps, we introduce \textbf{CTBench}, the first \textbf{C}ryptocurrency \textbf{T}ime series generation \textbf{Bench}mark. |
Yihao Ang; Qiang Wang; Qiang Huang; Yifan Bao; Xinyu Xi; Anthony Kum Hoe Tung; Chen Jin; Zhiyong Huang; | code |
| 620 | FakeXplain: AI-Generated Image Detection Via Human-Aligned Grounded Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Leveraging \textbf{FakeXplained}, we develop \textbf{FakeXplainer} which fine-tunes MLLMs with a progressive training pipeline, enabling accurate detection, artifact localization, and coherent textual explanations.To address these issues, we construct \textbf{FakeXplained} dataset of AI-generated images annotated with bounding boxes and descriptive captions that highlight synthesis artifacts, forming the basis for human-aligned, visually grounded reasoning. |
Yikun Ji; Yan Hong; Qi Fan; jun lan; Huijia Zhu; Weiqiang Wang; Liqing Zhang; Jianfu Zhang; | code |
| 621 | QVGen: Pushing The Limit of Quantized Video Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present *QVGen*, a novel quantization-aware training (QAT) framework tailored for high-performance and inference-efficient video DMs under extremely low-bit quantization (*e.g.*, $4$-bit or below). |
Yushi Huang; Ruihao Gong; Jing Liu; Yifu Ding; Chengtao Lv; Haotong Qin; Jun Zhang; | code |
| 622 | Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we reveal and study a highly practical yet under-explored problem in MMEA, termed Dual-level Noisy Correspondence (DNC). |
Haobin Li; Yijie Lin; Peng Hu; Mouxing Yang; Xi Peng; | code |
| 623 | Spike-based Digital Brain: A Novel Fundamental Model for Brain Activity Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional methods often ignore the biological spike characteristics of brain activity and find it difficult to reveal the dynamic dependencies and causal interactions between brain regions, limiting their effectiveness in brain function research and clinical applications. To address this issue, we propose a Spike-based Digital Brain (Spike-DB), a novel fundamental model that introduces the spike computing paradigm into brain time series modeling. |
Shaolong Wei; Qiyu Sun; Mingliang Wang; Liang Sun; Weiping Ding; Jiashuang Huang; | code |
| 624 | Mapping Overlaps in Benchmarks Through Perplexity in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We construct benchmark signatures that capture the capacity required for strong performance to characterize large language model (LLM) benchmarks and their meaningful overlaps. |
Siyang Wu; Honglin Bao; Sida Li; Ari Holtzman; James Evans; | code |
| 625 | Inconsistency Biases in Dynamic Data Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, comparing importance scores across different model states introduces inconsistency (score context drift), and variable selection rates bias gradient dynamics over time (temporal gradient bias). We introduce RePB (Resolving Pruning Biases), a framework addressing these issues. |
Qing Zhou; Tao Yang; Bingxuan Zhao; Hongyuan Zhang; Junyu Gao; Qi Wang; | code |
| 626 | Flatter Tokens Are More Valuable for Speculative Draft Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work introduces an effective, data-centric approach that substantially improves the training efficiency for Speculative Decoding. |
Jiaming Fan; CAO DAMING; Xiangzhong Luo; Jiale Fu; Chonghan Liu; Xu Yang; | code |
| 627 | Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To solve this problem, we propose to incorporate one special token |
Jingyao Wu; Bin Lu; Zijun Di; Xiaoying Gan; Meng Jin; Luoyi Fu; Xinbing Wang; Chenghu Zhou; | code |
| 628 | PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel encoding operator dubbed as **P**ositional **P**reservation **E**mbedding (**PPE**), which has the main hallmark of preservation of spatiotemporal structure during visual token compression. |
Mouxiao Huang; Borui Jiang; Dehua Zheng; Hailin Hu; Kai Han; Xinghao Chen; | code |
| 629 | Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Expert Merging, a training-light method that learns a small set of layer-wise coefficients using only unlabeled calibration data. |
Dengming Zhang; Xiaowen Ma; Zhen-Liang Ni; Zhenkai Wu; Han Shu; Xin Jiang; Xinghao Chen; | code |
| 630 | Text2Grad: Reinforcement Learning from Natural Language Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce **Text2Grad**, a reinforcement-learning paradigm that *turns free-form textual feedback into span-level gradients*. |
Hanyang Wang; Lu Wang; Chaoyun Zhang; Tianjun Mao; Si Qin; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; | code |
| 631 | Trajectory-aware Shifted State Space Models for Online Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. |
Qiang Zhu; Xiandong MENG; Yuxuan Jiang; Fan Zhang; David Bull; Shuyuan Zhu; Bing Zeng; Ronggang Wang; | code |
| 632 | Boomerang Distillation Enables Zero-Shot Model Size Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify a novel phenomenon that we call boomerang distillation: starting from a large base model (the teacher), one first distills down to a small student and then progressively reconstructs intermediate-sized models by re-incorporating blocks of teacher layers into the student without any additional training. |
Sara Kangaslahti; Nihal V. Nayak; Jonathan Geuter; Marco Fumero; Francesco Locatello; David Alvarez-Melis; | code |
| 633 | Learning to Reason in Structured In-context Environments with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing mathematical and coding environments are difficult to scale due to heavy reliance on expert annotation, while the skills learned in game-based environments are too specialized to generalize. To bridge this gap, we introduce the \textbf{S}tructured \textbf{I}n-context \textbf{E}nvironment (SIE) framework. |
Peng Yu; Zeyuan Zhao; Shao Zhang; Luoyi Fu; Xinbing Wang; Ying Wen; | code |
| 634 | Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Mainstream Test-Time Adaptation (TTA) methods for adapting vision-language models, e.g., CLIP, typically rely on Shannon Entropy (SE) at test time to measure prediction uncertainty and inconsistency. |
Xiangyu Wu; Dongming Jiang; Feng Yu; Yueying Tian; Jiaqi Tang; Qing-Guo Chen; Yang Yang; Jianfeng Lu; | code |
| 635 | Histopathology-Genomics Multi-modal Structural Representation Learning for Data-Efficient Precision Oncology Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel Multi-modal Structural Representation Learning (MSRL) framework for data-efficient precision oncology. |
Kun Wu; Zhiguo Jiang; Xinyu Zhu; Jun Shi; Yushan Zheng; | code |
| 636 | Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA Via Data and Sampling Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose \textit{Wiki-R1}, a data-generation-based curriculum reinforcement learning framework that systematically incentivizes reasoning in MLLMs for KB-VQA. |
Shan Ning; Longtian Qiu; Xuming He; | code |
| 637 | Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Stylos, a single-forward 3D Gaussian framework for 3D style transfer that operates on unposed content, from a single image to a multi-view collection, conditioned on a separate reference style image. |
Hanzhou Liu; Jia Huang; Mi Lu; Srikanth Saripalli; Peng Jiang; | code |
| 638 | PICS: Pairwise Image Compositing with Spatial Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PICS, a self-supervised composition-by-decomposition paradigm that composes objects in parallel while explicitly modeling the compositional interactions among (fully-/partially-)visible objects and background. |
Hang Zhou; Xinxin Zuo; Sen Wang; Li cheng; | code |
| 639 | Tequila: Trapping-free Ternary Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: _ This occurs because these weights receive only noisy, less informative gradients, preventing stable escape from the deadzone and severely impeding model capacity and optimization. To address this issue, we propose **Tequila**, a trapping-free quantization optimization method that reactivates deadzone-trapped weights by repurposing them as dynamic biases. |
Hong Huang; Decheng Wu; Rui Cen; Guanghua Yu; Zonghang Li; Kai Liu; Jianchen Zhu; Peng Chen; Xue Liu; Dapeng Wu; | code |
| 640 | HiddenEcho: Mitigating Noise Amplification in Differentially Private LLMs with Hidden-State Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, these approaches suffer notable limitations: DNN-based methods often require task-specific pre-training, and conventional DP techniques, though privacy-preserving, suffer from noise amplification as perturbed inputs propagate through the deep transformer layer, leading to significant degradation in downstream task performance. To alleviate this, we propose HIDDENECHO, an end-to-end framework with client noise correction, where hidden states are sent from the server to the client and refined by a lightweight module using both embeddings and intermediate representations. |
Wenhao Li; Kunhao Li; Lei Yang; | code |
| 641 | FlashVID: Efficient Video Large Language Models Via Training-free Tree-based Spatiotemporal Token Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The highly correlated visual features are likely to change in spatial position, scale, orientation, and other attributes over time due to the dynamic nature of video. Building on this insight, we introduce FlashVID, a training-free inference acceleration framework for VLLMs. |
Ziyang Fan; Keyu Chen; Ruilong Xing; Yulin Li; Li Jiang; Zhuotao Tian; | code |
| 642 | Towards Bridging The Gap Between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we find that off-policy Soft Actor-Critic (SAC), with large-batch update and a high Update-To-Data (UTD) ratio, reliably supports large-scale pretraining of humanoid locomotion policies, achieving zero-shot deployment on real robots. |
Weidong Huang; Zhehan Li; Hangxin Liu; Biao Hou; Yao Su; Jingwen Zhang; | code |
| 643 | STEDiff: Revealing The Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: **Regarding temporal redundancy**, we observed a marginal effect associated with specific time steps, indicating that only a limited subset of time steps plays a critical role in backdoor injection. Building on these findings, we present a novel framework, *STEDiff*, comprising two key components: *STEBA* and *STEDF*. |
Yu Pan; Jiahao Chen; Lin Wang; Bingrong Dai; Wenjie Wang; | code |
| 644 | ViPO: Visual Preference Optimization at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To enhance the robustness of preference algorithms against noise, we propose Poly-DPO, which extends the DPO objective with an additional polynomial term that dynamically adjusts model confidence during training based on dataset characteristics, enabling effective learning across diverse data distributions from noisy to trivially simple patterns. |
Ming Li; Jie Wu; Justin Cui; Xiaojie Li; Rui Wang; Chen Chen; | code |
| 645 | Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To date, existing DFL methods typically rely on deterministic point predictions, which are often insufficient to capture the intrinsic stochasticity of real-world environments. To address this challenge, we propose the first diffusion-based DFL approach, which trains a diffusion model to represent the distribution of uncertain parameters and optimizes the decision by solving a stochastic optimization with samples drawn from the diffusion model. |
Zihao Zhao; Christopher Yeh; Lingkai Kong; Kai Wang; | code |
| 646 | Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Vivid-VR, a DiT-based generative video restoration method built upon an advanced T2V foundation model, where ControlNet is leveraged to control the generation process, ensuring content consistency. |
Haoran Bai; Xiaoxu Chen; Canqian Yang; Zongyao He; Sibin Deng; Ying Chen; | code |
| 647 | From Cheap Geometry to Expensive Physics: A Physics-agnostic Pretraining Framework for Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: At the same time, large numbers of geometry-only candidate designs are readily available but remain largely untapped. We propose a two-stage framework to better exploit this abundant, physics-agnostic resource and improve supervised operator learning under limited labeled data. |
Zhizhou Zhang; Youjia Wu; Kaixuan Zhang; Yanjia Wang; | code |
| 648 | Soft Equivariance Regularization for Invariant Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We conjecture that enforcing invariance and equivariance to the same layer is inherently difficult and, if handled naively, may even hinder learning. To overcome this, we propose soft equivariance regularization (SER), a simple yet scalable method that decouples the two objectives: learning invariant representations via standard SSL, while softly regularizing intermediate features with an equivariance loss. |
Joohyung Lee; Changhun Kim; Hyunsu Kim; Kwanhyung Lee; Juho Lee; | code |
| 649 | Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches often rely on pose estimators to extract intermediate representations, but such signals are prone to errors under occlusion or complex poses. Building on these observations, we present DirectAnimator, a framework that bypasses pose extraction and directly learns from raw driving videos. |
Yuan Zeng; Yujia Shi; Yuhao Yang; Dongxia Liu; Zongqing Lu; Wenming Yang; Qingmin Liao; | code |
| 650 | LVTINO: LAtent Video ConsisTency INverse SOlver for High Definition Video Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We address this challenge by leveraging recent advances in Video Consistency Models (VCMs), which distill video latent diffusion models into fast generators that explicitly capture temporal causality. Building on this foundation, we propose LVTINO, the first zero-shot or plug-and-play inverse solver for high definition video restoration with priors encoded by VCMs. |
Alessio Spagnoletti; Andres Almansa; Marcelo Pereyra; | code |
| 651 | DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Extensive experiments reveal both the strengths and limitations of existing algorithms, offering valuable insights and directions for future research. |
Fan Li; Xiaoyang Wang; Wenjie Zhang; Ying Zhang; Xuemin Lin; | code |
| 652 | Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a multi-dimensional stability evaluation framework with novel metrics for Presence, Localization, and Shape Stability, integrated into a unified mean Average Stability (mAS) score.To fill this gap, this paper presents the first comprehensive benchmark for evaluating the temporal stability of online HD mapping models.To encourage broader focus on stability, we will release a public benchmark. |
Hao Shan; Ruikai Li; Han Jiang; Yizhe Fan; Ziyang Yan; Bohan Li; Xiaoshuai Hao; Hao Zhao; Zhiyong Cui; Yilong Ren; Haiyang Yu; | code |
| 653 | Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting Plasticity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose NUFILT (NUll-space FILTering), a data-free framework that directly links these desiderata to optimization. |
Zihuan Qiu; Lei Wang; Yang Cao; Runtong ZHANG; Bing Su; Yi Xu; Fanman Meng; Linfeng Xu; Qingbo Wu; Hongliang Li; | code |
| 654 | TurboBoA: Faster and Exact Attention-aware Quantization Without Backpropagation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose TurboBoA, a new backpropagation-free PTQ algorithm that preserves the accuracy benefits of BoA while significantly accelerating the process. |
Junhan Kim; Yeo Jeong Park; Seungwoo Son; Chungman Lee; Ho-young Kim; Joonyoung Kim; Yongkweon Jeon; | code |
| 655 | $AutoDrive\text{-}P^3$: Unified Chain of Perception–Prediction–Planning Thought Via Reinforcement Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current VLM-based approaches suffer from two major limitations: 1) Some VLMs directly output planning results without chain-of-thought (CoT) reasoning, bypassing crucial perception and prediction stages which creates a significant domain gap and compromises decision-making capability; 2) Other VLMs can generate outputs for perception, prediction, and planning tasks but employ a fragmented decision-making approach where these modules operate seperately, leading to a significant lack of synergy that undermines true planning performance. To address these limitations, we propose ${AutoDrive\text{-}P^3}$, a novel framework that seamlessly integrates $\underline{\textbf{P}}$erception, $\underline{\textbf{P}}$rediction, and $\underline{\textbf{P}}$lanning through structured reasoning. |
Yuqi Ye; Zijian Zhang; Junhong Lin; Shangkun Sun; Changhao Peng; Wei Gao; | code |
| 656 | MedLesionVQA: A Multimodal Benchmark Emulating Clinical Visual Diagnosis for Body Surface Health Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet existing medical benchmarks are either built from publicly available sources with limited expert curation or focus narrowly on disease classification, failing to reflect the stepwise recognition and reasoning processes physicians follow in real practice. To address this gap, we introduce MedLesionVQA, the first benchmark explicitly designed to evaluate MLLMs on the visual diagnostic workflow for body-surface conditions in large scale. |
Deli Yu; Shengzhi Wang; Kai WU; Xiaozhong Ji; Bo Cui; Jieqiong Cao; Huichao Wang; Boyuan Jiang; Xu Wang; Qian Xu; ChaoGao; Yi Zhao; Dian Chen; Meng Li; Haifeng Wu; Yijun He; Haihua Yang; | code |
| 657 | EgoBrain: Synergizing Minds and Eyes For Human Action Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we present EgoBrain—the world’s first large-scale, temporally aligned multimodal dataset that synchronizes egocentric vision and EEG of human brain over extended periods of time, establishing a new paradigm for human-centered behavior analysis. |
Nie Lin; Yansen Wang; Dongqi Han; Weibang Jiang; Jingyuan Li; Ryosuke Furuta; Yoichi Sato; Dongsheng Li; | code |
| 658 | RedSage: A Cybersecurity Generalist LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To rigorously evaluate the models, we introduce RedSage-Bench, a benchmark with 30K multiple-choice and 240 open-ended Q\&A items covering cybersecurity knowledge, skills, and tool expertise. |
Naufal Suryanto; Muzammal Naseer; Pengfei Li; Syed Talal Wasim; Jinhui Yi; Juergen Gall; Paolo Ceravolo; Ernesto Damiani; | code |
| 659 | GenCP: Towards Generative Modeling Paradigm of Coupled Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we propose GenCP, a novel and elegant generative paradigm for coupled multiphysics simulation. |
Tianrun Gao; Haoren Zheng; Wenhao Deng; Haodong Feng; Tao Zhang; Ruiqi Feng; Qianyi Chen; Tailin Wu; | code |
| 660 | UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce UniQL, a unified post-training quantization and low-rank compression framework, with on-device configurable pruning rates for edge LLMs. |
Hung-Yueh Chiang; Chi-Chih Chang; Yu-Chen Lu; Chien-Yu Lin; Kai-Chiang Wu; Mohamed S. Abdelfattah; Diana Marculescu; | code |
| 661 | CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Machine Learning (ML) has shown promise in various research domains, the lack of large-scale, open datasets limits its application in chip design. To address this limitation, we introduce CircuitNet 3.0, a large-scale, comprehensive, and open-source dataset curated to facilitate the evaluation of ML models on challenging timing and power prediction tasks. |
Mingjun Wang; Yihan Wen; Yuntao Lu; Fengrui Liu; Yuxiang Zhao; Boyu Han; Jianan Mu; Yibo Lin; Runsheng Wang; Bei Yu; Huawei Li; | code |
| 662 | VoG: Enhancing LLM Reasoning Through Stepwise Verification on Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing KG-augmented LLM frameworks still rely on static integration mechanisms that cannot adjust reasoning in response to evolving context and retrieved evidence, resulting in error propagation and incomplete reasoning. To alleviate these issues, we propose **V**erify-**o**n-**G**raph (**VoG**), a scalable and model-agnostic framework to enhance LLM reasoning via iterative retrieval, stepwise verification, and adaptive revision. |
Wenxin Zhao; Jiachuan Wang; Yongqi Zhang; Shuangyin Li; Cheng Deng; Jun Wang; Lei Chen; | code |
| 663 | LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose LaplacianFormer, a Transformer variant that employs a Laplacian kernel as a principled alternative to softmax, motivated by empirical observations and theoretical analysis. |
Zhe Feng; Sen Lian; Changwei Wang; Muyang Zhang; Tianlong Tan; Rongtao Xu; Weiliang Meng; Xiaopeng Zhang; | code |
| 664 | Traceable Black-Box Watermarks For Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on the problem, we propose a novel server-side watermarking method, $\mathbf{TraMark}$, which creates a traceable watermarked model for each client, enabling verification of model leakage in black-box settings. |
Jiahao Xu; Rui Hu; Olivera Kotevska; Zikai Zhang; | code |
| 665 | Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, practical deployments face challenges when clients have heterogeneous resources and thus adopt different LoRA ranks, leading to substantial initialization and aggregation noise that undermines performance. To address these challenges, we propose Fed-PLoRA, a novel lightweight heterogeneous federated fine-tuning (FFT) framework. |
Zikai Zhang; Rui Hu; Jiahao Xu; | code |
| 666 | IDER: IDempotent Experience Replay for Reliable Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing uncertainty-aware continual learning methods suffer from high computational overhead and incompatibility with mainstream replay methods. To address this, we propose idempotent experience replay (IDER), a novel approach based on the idempotent property where repeated function applications yield the same output. |
Zhanwang Liu; Yuting Li; Haoyuan Gao; Yexin Li; Linghe Kong; Lichao Sun; Weiran Huang; | code |
| 667 | EgoNight: Towards Egocentric Vision Understanding at Night with A Challenging Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most existing benchmarks for egocentric vision understanding focus primarily on daytime scenarios, overlooking the low-light conditions that are inevitable in real-world applications. To investigate this gap, we present EgoNight, the first comprehensive benchmark for nighttime egocentric vision, with visual question answering (VQA) as the core task. |
Deheng Zhang; Yuqian Fu; Runyi Yang; Yang Miao; Tianwen Qian; Xu Zheng; Guolei Sun; Ajad Chhatkuli; Xuanjing Huang; Yu-Gang Jiang; Luc Van Gool; Danda Pani Paudel; | code |
| 668 | VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose VisualPrompter, a novel training-free prompt engineering framework that refines user inputs to model-preferred sentences. |
Shiyu Wu; Mingzhen Sun; Weining Wang; Yequan Wang; Jing Liu; | code |
| 669 | OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Ultrasound computed tomography (USCT) provides quantitative tissue characterization, but its clinical implementation faces significant challenges, particularly under anatomically constrained limited-angle acquisition conditions specific to prostate imaging. To address these unmet needs, we introduce OpenPros, the first large-scale benchmark dataset for limited-angle prostate USCT designed to systematically evaluate ML methods for inverse problems. |
Hanchen Wang; Yixuan Wu; Yinan Feng; Peng Jin; Luoyuan Zhang; Shihang Feng; James Wiskin; Baris Turkbey; Peter Pinto; Bradford J Wood; Songting Luo; Yinpeng Chen; Emad Boctor; Youzuo Lin; | code |
| 670 | Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Latent Particle World Model (LPWM), a self-supervised object-centric world model scaled to real-world multi-object datasets and applicable in decision-making. |
Tal Daniel; Carl Qi; Dan Haramati; Amir Zadeh; Chuan Li; Aviv Tamar; Deepak Pathak; David Held; | code |
| 671 | HiDrop: Hierarchical Vision Token Reduction in MLLMs Via Late Injection, Concave Pyramid Pruning, and Early Exit Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While progressive vision token pruning is a promising solution, we find that its full potential has been unrealized due to two key limitations: it misinterprets the role of shallow layers as being crucial for fusion and employs overly rigid, non-adaptive pruning schedules. To address these flaws, we introduce HiDivDrop, a framework that tailors token pruning to the true hierarchical function of MLLM layers. |
Hao Wu; Yingqi Fan; Dai Jinyang; Junlong Tong; Yunpu Ma; Xiaoyu Shen; | code |
| 672 | DeepPrim: A Physics-Driven 3D Short-term Weather Forecaster Via Primitive Equation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, existing deep learning-based models mostly focus on pure data-driven paradigms, overlooking the fundamental physical principles that govern atmospheric dynamics. To address these challenges, we present DeepPrim, a novel 3D \underline{deep} weather forecaster designed to learn \underline{prim}itive equations of the Earth’s atmosphere. |
Jiawei Chen; Weiqi Chen; Rong Hu; Peiyuan Liu; Haifan Zhang; Liang Sun; | code |
| 673 | Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Information Gain-based Policy Optimization (IGPO), a simple yet effective RL framework that provides dense and intrinsic supervision for multi-turn agent training. |
Guoqing Wang; Sunhao Dai; Guangze Ye; Zeyu Gan; Wei Yao; Yong Deng; Xiaofeng Wu; Zhenzhe Ying; | code |
| 674 | Local Geometry Attention for Time Series Forecasting Under Realistic Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we address this gap with two key contributions.Second, we introduce TSRBench, the first comprehensive benchmark for evaluating forecasting robustness under realistic, statistically-grounded corruptions. |
Dongbin Kim; Youngjoo Park; Woojin Jeong; Jaewook Lee; | code |
| 675 | Lost in The Non-convex Loss Landscape: How to Fine-tune The Large Time Series Model? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This severely diminishes the value of the pre-trained LTSM. To address this, we propose a new fine-tuning method called Smoothed Full Fine-tuning (SFF). |
Xu Zhang; Peng Wang; Wei Wang; | code |
| 676 | WOW-Seg: A Word-free Open World Segmentation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge discrepancies, we propose WOW-Seg, a Word-free Open World Segmentation model for segmenting and recognizing objects from open-set categories.We further construct an open world region recognition test benchmark: the Region Recognition Dataset (RR-7K). |
Danyang Li; Tianhao Wu; Bin Lin; Zhenyuan Chen; Yang Zhang; Yuxuan Li; Ming-Ming Cheng; Xiang Li; | code |
| 677 | HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the *Hallucination Risk Bound*, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. |
Xinyue Zeng; Junhong Lin; Yujun Yan; Feng Guo; Liang Shi; Jun Wu; Dawei Zhou; | code |
| 678 | FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, challenges such as unbearable inference time and compromised fidelity still limit the full potential of the diffusion models. To address this, we introduce FideDiff, a novel single-step diffusion model designed for high-fidelity deblurring. |
Xiaoyang Liu; Zhengyan Zhou; Zihang Xu; Jiezhang Cao; Zheng Chen; Yulun Zhang; | code |
| 679 | DepthLM: Metric Depth from Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Vision language models (VLMs) can flexibly address various vision tasks through text interactions. |
Zhipeng Cai; Ching-Feng Yeh; Hu Xu; Zhuang Liu; Gregory P. Meyer; Xinjie Lei; Changsheng Zhao; Shang-Wen Li; Vikas Chandra; Yangyang Shi; | code |
| 680 | R2-Dreamer: Redundancy-Reduced World Models Without Decoders or Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose R2-Dreamer, a decoder-free MBRL framework with a self-supervised objective that serves as an internal regularizer, preventing representation collapse without resorting to DA. |
Naoki Morihira; Amal Nahar; Kartik Bharadwaj; Yasuhiro Kato; Akinobu Hayashi; Tatsuya Harada; | code |
| 681 | Reasoning Models Can Be Accurately Pruned Via Chain-of-Thought Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a simple, drop-in fix: during pruning we jointly reconstruct activations from the input and the model’s on-policy chain-of-thought traces. |
Ryan Lucas; Kayhan Behdin; Zhipeng Wang; Qingquan Song; Shao Tang; Rahul Mazumder; | code |
| 682 | Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its reliance on spherical harmonics for color encoding inherently limits its ability to separate diffuse and specular components, making it challenging to accurately represent complex reflections. To address this, we propose a novel enhanced Gaussian kernel that explicitly models specular effects through view-dependent opacity. |
Yixin Yang; Bojian Wu; Yang Zhou; Hui Huang; | code |
| 683 | Continuous Space-Time Video Super-Resolution with 3D Fourier Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a novel formulation for continuous space-time video super-resolution. |
Alexander Becker; Julius Erbach; Dominik Narnhofer; Konrad Schindler; | code |
| 684 | CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CLEAR, a calibration method with two distinct parameters, $\gamma_1$ and $\gamma_2$, to combine the two uncertainty components and improve the conditional coverage of predictive intervals for regression tasks. |
Ilia Azizi; Juraj Bodik; Jakob Heiss; Bin Yu; | code |
| 685 | Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To mitigate the vulnerability in the feature representation, we propose **DEFEAT** (**D**iscrete Lat**E**nt **F**eatur**E** based **A**dversarial **T**raining), a robust prompt tuning framework for VLMs. |
Yang Chen; Yanbin Wei; James Kwok; Yu Zhang; | code |
| 686 | WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Training-free sparse activation, in contrast, offers a plug-and-play pathway to efficiency; however, existing methods often rely solely on hidden state magnitudes, leading to significant approximation error and performance degradation. To address this, we introduce WINA (Weight-Informed Neuron Activation): a simple framework for training-free sparse activation that incorporates both hidden state magnitudes and weight matrix structure. |
Sihan Chen; Dan Zhao; Jongwoo Ko; Colby Banbury; Huiping Zhuang; Luming Liang; Pashmina Cameron; Tianyi Chen; | code |
| 687 | WavePolyp: Video Polyp Segmentation Via Hierarchical Wavelet-Based Feature Aggregation and Inter-Frame Divergence Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel segmentation network, WavePolyp, which consists of two innovative components: a hierarchical wavelet-based feature aggregation (HWFA) module and inter-frame divergence perception (IDP) blocks. |
Yuhua Zhang; Guilian Chen; Yuanqin He; Huisi Wu; Jing Qin; | code |
| 688 | Random Label Prediction Heads for Studying Memorization in Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a straightforward yet effective method to empirically measure and regularize memorization in deep neural networks for classification tasks. |
Marlon Becker; Jonas Konrad; Luis Garcia Rodriguez; Benjamin Risse; | code |
| 689 | Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing evaluation methods, however, are confined to natural language based scenarios. To bridge this gap, we introduce EAPrivacy, a comprehensive evaluation benchmark designed to quantify the physical-world privacy awareness of LLM-powered agents. |
Xinjie Shen; Mufei Li; Pan Li; | code |
| 690 | ResiliBench: Evaluating Agentic Workflow Adaptation in Stochastic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PILOT-Bench, a benchmark that evaluates LLM workflow execution under simulated realistic conditions of instruction quality variability and tool execution uncertainty. |
Ruicheng Ao; Zeping Min; Tingyu Zhu; Wotao Yin; Xinshang Wang; | code |
| 691 | UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present $\textbf{UltraGauss}$: an ultrasound-specific Gaussian Splatting framework that serves as an efficient approximation to acoustic image formation. |
Mark C. Eid; Ana Namburete; Joao F. Henriques; | code |
| 692 | GOT-Edit: Geometry-Aware Generic Object Tracking Via Online Model Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, most generic object tracking (GOT) methods primarily rely on 2D features of the target and its surroundings while neglecting 3D geometric cues, which makes them susceptible to partial occlusion, distractors, and variations in geometry and appearance. To address this limitation, we introduce GOT-Edit, an online cross-modality model editing approach that integrates geometry-aware cues into a generic object tracker from a 2D video stream. |
Shih-Fang Chen; Jun-Cheng Chen; I-hong Jhuo; Yen-Yu Lin; | code |
| 693 | Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a novel framework, DECS, built on our theoretical discovery of two previously unaddressed flaws in current length rewards: (1) the erroneous penalization of essential exploratory tokens and (2) the inadvertent rewarding of partial redundancy. |
Shuyang Jiang; Yusheng Liao; Ya Zhang; Yanfeng Wang; Yu Wang; | code |
| 694 | UrbanFeel:A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes Through Human Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various domains, existing benchmarks that explore their performance in urban environments remain limited, lacking systematic exploration of temporal evolution and subjective perception of urban environment that aligns with human perception. To address these limitations, we propose UrbanFeel, a comprehensive benchmark designed to evaluate the performance of MLLMs in urban development understanding and subjective environmental perception. |
Jun He; Yi Lin; Zilong Huang; Jiacong Yin; Junyan Ye; Yuchuan Zhou; Weijia Li; Xiang Zhang; | code |
| 695 | COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the Context-aware Output-Space Adapter (COSA), a minimal, plug-and-play adapter that directly corrects predictions of a frozen base model. |
Jeonghwan Im; Hyuk-Yoon Kwon; | code |
| 696 | GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by recent successes in multimodal molecular property prediction, we introduce GRAM-DTI, a pre-training framework that integrates multimodal small molecule and protein inputs into a unified representation. |
Feng Jiang; Amina Mollaysa; Hehuan Ma; Yuzhi Guo; Tommaso Mansi; Junzhou Huang; Mangal Prakash; Rui Liao; | code |
| 697 | Unleashing Perception-Time Scaling to Multimodal Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by this success, similar strategies have been applied to multimodal reasoning, yet their impact on visual perception remains unclear. To investigate this gap, we introduce DisTANCE, a perception-centric benchmark for visual estimation tasks. |
Yifan Li; Zhenghao Chen; Ziheng Wu; Kun Zhou; Ruipu Luo; Can Zhang; Zhentao he; Yufei Zhan; Xin Zhao; Minghui Qiu; | code |
| 698 | Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We formalize a capacity-1 spike-retiming threat model with a unified trio of budgets: per-spike jitter $B_{\infty}$, total delay $B_{1}$, and tamper count $B_{0}$. |
Yi Yu; Qixin Zhang; Shuhan Ye; Xun Lin; Qianshan Wei; Kun Wang; Wenhan Yang; Dacheng Tao; Xudong Jiang; | code |
| 699 | Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we conduct the first systematic study of the relationship between explainability and fairness in hate speech detection, focusing on both encoder- and decoder-only models. |
Yifan Wang; Mayank Jobanputra; Ji-Ung Lee; Soyoung Oh; Isabel Valera; Vera Demberg; | code |
| 700 | Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Speculative decoding offers a promising avenue for mitigating this inefficiency, yet its efficacy in the structured and repetition-rich context remains unexplored. To bridge this gap, we introduce the first comprehensive benchmark designed to evaluate speculative decoding methods in LLM test-time scaling. |
Shengyin Sun; Yiming Li; Xing Li; Yingzhao Lian; Weizhe Lin; Hui-Ling Zhen; Zhiyuan Yang; Xianzhi Yu; Chen Chen; Mingxuan Yuan; Chen Ma; | code |
| 701 | FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the fruit fly’s hierarchical memory system characterized by sparse expansion and modular ensembles, we propose FlyPrompt, a brain-inspired framework that decomposes GCL into two subproblems: expert routing and expert competence improvement. |
Hongwei Yan; Guanglong Sun; Kanglei Zhou; Qian Li; Liyuan Wang; Yi Zhong; | code |
| 702 | Uncover Underlying Correspondence for Robust Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify two critical forms of NC that particularly harm clustering: i) category-level mismatch, where semantically consistent samples from the same class are mistakenly treated as negatives; and ii) sample-level mismatch, where collected cross-view pairs are misaligned and some samples may even lack any valid counterpart. |
Haochen Zhou; Guofeng Ding; Mouxing Yang; Peng Hu; Yijie Lin; Xi Peng; | code |
| 703 | PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction For Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PolySkill, a new framework that enables agents to learn generalizable and compositional skills. |
Simon Yu; Gang Li; Weiyan Shi; Peng Qi; | code |
| 704 | Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the success of self-supervised learning, we propose \textit{Co-rewarding}, a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views. |
Zizhuo Zhang; Jianing Zhu; Xinmu Ge; Zihua Zhao; Zhanke Zhou; Xuan Li; Xiao Feng; Jiangchao Yao; Bo Han; | code |
| 705 | Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We argue that this inconsistency stems partly from constraints in existing evaluation methods, including the oversight of plausible responses, limited emotional taxonomies, neglect of contextual factors, and labor-intensive annotations. To facilitate customized visual emotion evaluation for MLLMs, we propose an Emotion Statement Judgment task that overcomes these constraints. |
Daiqing Wu; Dongbao Yang; Sicheng Zhao; Can Ma; Yu ZHOU; | code |
| 706 | Echo: Towards Advanced Audio Comprehension Via Audio-Interleaved Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To instantiate it, we introduce a two-stage training framework, first teaching LALMs to localize informative audio segments through supervised fine-tuning, and then incentivizing proficient revisiting via reinforcement learning. |
Daiqing Wu; Xuan Zhang; Dongbao Yang; Jiashu Yao; Longfei Chen; Qingsong Liu; Sicheng Zhao; Can Ma; Yangyang Kang; Yu ZHOU; | code |
| 707 | TD-MoE: Tensor Decomposition for MoE Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce TD-MoE (Tensor Decomposition for MoE Compression), a data-aware framework that jointly and holistically factorizes expert weights. |
Yuebin XU; YANHONG WANG; Xuemei Peng; Hui Zang; Chen Minghao; Pengfei Xia; Zeyi Wen; | code |
| 708 | Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This presents a challenge for domain-robust video understanding in safety-critical space applications. To address this, we introduce MicroG-4M, the first benchmark for spatio-temporal and semantic understanding of human activities in microgravity. |
Di Wen; Lei Qi; Kunyu Peng; Kailun Yang; Fei Teng; Ao Luo; Jia Fu; Yufan Chen; Ruiping Liu; Yitian Shi; M. Saquib Sarfraz; Rainer Stiefelhagen; | code |
| 709 | Co-LoRA: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To move towards realistic scenarios, we propose FedMosaic, a method that jointly addresses data and model heterogeneity with a task-relevance-aware model aggregation strategy to reduce parameter interference, and a dimension-invariant module that enables knowledge sharing across heterogeneous architectures without huge computational cost.To mimic the real-world task diversity, we propose a multi-modal PFL benchmark spanning 40 distinct tasks with distribution shifts over time. |
Minhyuk Seo; Taeheon Kim; Hankook Lee; Jonghyun Choi; Tinne Tuytelaars; | code |
| 710 | PEAR: Phase Entropy Aware Reward for Efficient Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through a systematic empirical analysis, we reveal a consistent positive correlation between model entropy and response length at different reasoning stages across diverse LRMs: the thinking phase exhibits higher entropy, reflecting exploratory behavior of longer responses, while the final answer phase shows lower entropy, indicating a more deterministic solution.This observation suggests that entropy at different reasoning stages can serve as a control knob for balancing conciseness and performance. Based on this insight, this paper introduces Phase Entropy Aware Reward (PEAR), a reward mechanism that incorporating phase-dependent entropy into the reward design. |
Chen Huang; Wei Lu; Wenxuan Zhang; | code |
| 711 | Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This oversight limits the applicability of LLMs in the real world. To address these limitations, we propose \textbf{CausalPitfalls}, a comprehensive benchmark designed to rigorously evaluate the capability of LLMs in overcoming common causal inference pitfalls. |
Jin Du; Li Chen; Xun Xian; An Luo; Fangqiao Tian; Ganghua Wang; Charles Doss; Xiaotong Shen; Jie Ding; | code |
| 712 | PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, we find that naively applying listwise RL fails to produce meaningful improvements, as the model is overwhelmed by a complex, coarse-grained reward signal. To address this challenge, we introduce PoLi-RL, a novel Point-to-List Reinforcement Learning framework. |
Zixin Song; Bowen Zhang; Qian-Wen Zhang; di yin; Xing Sun; Chunping Li; | code |
| 713 | DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Within this framework, we introduce **logical closeness**, a metric that quantifies how well a model’s CoT trajectory (i.e., the LLM’s output) adheres to the DAG structure, providing evaluation beyond classical PASS@$k$ metrics. |
Yuanhe Zhang; Ilja Kuzborskij; Jason D. Lee; Chenlei Leng; Fanghui Liu; | code |
| 714 | Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, independent MARL methods fail to capture global information and exhibit poor cooperation among workers, while Centralized Training Decentralized Execution (CTDE) MARL methods suffer from the curse of dimensionality. To overcome these challenges, we propose Triple-BERT, a centralized Single Agent Reinforcement Learning (MARL) method designed specifically for large-scale order dispatching on ride-sharing platforms. |
Zijian Zhao; Sen Li; | code |
| 715 | Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present **FlowGuide**, a unified framework that achieves fine-grained control over face editing in diffusion models. |
Yan Li; Zhenyi Wang; Guanghao Li; Wei Xue; Wenhan Luo; Yike Guo; | code |
| 716 | Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing visual programming methods are often constrained by fixed toolsets or offline tool induction, which leads to suboptimal solutions and poor tool reuse. We introduce Transductive Visual Programming (TVP), a novel framework that dynamically evolves a library of reusable tools by learning from its problem-solving experience. |
Shengguang Wu; Xiaohan Wang; Yuhui Zhang; Hao Zhu; Serena Yeung-Levy; | code |
| 717 | LLaVAction: Evaluating and Training Multi-modal Large Language Models for Action Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reformulate EPIC-KITCHENS-100, one of the largest and most challenging egocentric action recognition datasets, into a MLLM benchmark (EPIC-KITCHENS-100-MQA). |
Haozhe Qi; Shaokai Ye; Alexander Mathis; Mackenzie W Mathis; | code |
| 718 | Revisiting Parameter Server in LLM Post-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose **On-Demand Communication (ODC)**, which adapts PS into Fully Sharded Data Parallel (FSDP) by replacing collective all-gather and reduce-scatter with direct point-to-point communication. |
Xinyi Wan; Penghui Qi; Guangxing Huang; Chaoyi Ruan; Min Lin; Jialin Li; | code |
| 719 | Fast and Interpretable Protein Substructure Alignment Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study presents PLASMA, the first deep learning framework for efficient and interpretable residue-level protein substructure alignment. |
Zhiyu Wang; Bingxin Zhou; Weishu Zhao; Yang Tan; Jing Wang; Pietro Lio; Liang Hong; | code |
| 720 | Exposing Weaknesses of Large Reasoning Models Through Graph Algorithm Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GrAlgoBench, a benchmark designed to evaluate LRMs through graph algorithm problems. |
Qifan Zhang; Jianhao Ruan; Aochuan Chen; Kang ZENG; Nuo Chen; Jing Tang; Jia Li; | code |
| 721 | Hierarchical Semantic-Acoustic Modeling Via Semi-Discrete Residual Representations for Expressive End-to-End Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Generative models for speech synthesis face a fundamental trade-off: discrete tokens ensure stability but sacrifice expressivity, while continuous signals retain acoustic richness but suffer from error accumulation due to task entanglement. This challenge has driven the field towards multi-stage pipelines that rely on pre-trained discrete speech tokenizers, but these create a semantic-acoustic divide, limiting holistic and expressive speech generation. |
Yixuan Zhou; Guoyang Zeng; Xin Liu; Xiang Li; Renjie Yu; Ziyang Wang; Runchuan Ye; Weiyue Sun; Jiancheng Gui; Kehan Li; Zhiyong Wu; Zhiyuan Liu; | code |
| 722 | Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Naïve solutions, such as extending the historical window, lead to severe drawbacks, including overfitting, prohibitive computational costs, and redundant information processing. To address these challenges, we introduce the Global Temporal Retriever (GTR), a lightweight and plug-and-play module designed to extend any forecasting model’s temporal awareness beyond the immediate historical context. |
Fanpu Cao; Lu Dai; Jindong Han; Hui Xiong; | code |
| 723 | SkyEvents: A Large-Scale Event-enhanced UAV Dataset for Robust 3D Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, dedicated event datasets specifically tailored for large-scale UAV 3D scene reconstruction remain limited. To bridge this gap, we introduce \textbf{SkyEvents}, a pioneering large-scale event-enhanced UAV dataset for 3D scene reconstruction, incorporating RGB, event, and LiDAR data. |
Wenzong Ma; Zhuoxiao Li; Jinjing Zhu; Tongyan Hua; Kanghao Chen; Zidong Cao; Da Yang; Peilun Shi; Yibo Zhou; Wufan Zhao; Hui Xiong; | code |
| 724 | Not All Documents Are What You Need for Extracting Instruction Tuning Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we propose to extract instruction tuning data from web corpus with much rich knowledge. |
Chi Zhang; Huaping Zhong; Hongtao Li; Chengliang Chai; Hongjiawei; Yu-Ping Wang; Yuhao Deng; Jiacheng Wang; Yizhou Yan; Qiu Jiantao; Conghui He; Lei Cao; | code |
| 725 | Bilateral Information-aware Test-time Adaptation for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we analyze the weakness of previous selection criterion and find that only selecting fixed proportion of low-entropy samples fails to ensure optimal performance across various datasets and can lead the model to becoming over-confident in wrongly classified samples, showing unexpected overfitting to atypical features and compromising effective adaptation. |
Jingwei Sun; Jianing Zhu; Jiangchao Yao; Gang Niu; Masashi Sugiyama; Bo Han; | code |
| 726 | Test-Time Scaling with Reflective Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new Reflective Generative Model (RGM), which obtains OpenAI o3-mini’s performance via a novel Reflective Generative Form. |
Zixiao Wang; Yuxin Wang; Xiaorui Wang; Mengting Xing; Jie Gao; Jianjun Xu; Guangcan Liu; Chenhui Jin; zhuo wang; Shengzhuo zhang; Hongtao Xie; | code |
| 727 | NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a continuous spatio-temporal MoD representation based on implicit neural functions that directly map coordinates to the parameters of a Semi-Wrapped Gaussian Mixture Model. |
Yufei Zhu; Shih-Min Yang; Andrey Rudenko; Tomasz Piotr Kucner; Achim J. Lilienthal; Martin Magnusson; | code |
| 728 | PEERING INTO THE UNKNOWN: ACTIVE VIEW SELECTION WITH NEURAL UNCERTAINTY MAPS FOR 3D RECONSTRUCTION Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Instead of learning radiance fields, like NeRF or 3D Gaussian Splatting, from a current observation and computing uncertainty for each candidate viewpoint, we introduce a novel AVS approach guided by neural uncertainty maps predicted by a lightweight feedforward deep neural network, named UPNet.We will release all code, models, and datasets. |
Zhengquan Zhang; Feng Xu; Mengmi Zhang; | code |
| 729 | Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Hierarchical Speculative Decoding (HSD), a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient mass across accessible branches. |
Yuxuan Zhou; Fei Huang; Heng Li; Fengyi Wu; Tianyu Wang; jianwei zhang; Junyang Lin; Zhi-Qi Cheng; | code |
| 730 | Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify the primary bottleneck as the challenge of learning discriminative representations under unreliable supervision. To tackle this challenge, we propose NcPU, a non-contrastive PU learning framework that requires no auxiliary information. |
Hengwei Zhao; Zhengzhong Tu; Zhuo Zheng; Wei Wang; Junjue Wang; Rusty Feagin; Wenzhe Jiao; | code |
| 731 | A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning Across Broad Atlases and Disorders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce the Brain Graph Foundation Model, termed BrainGFM, a unified framework that leverages graph contrastive learning and graph masked autoencoders for large-scale fMRI-based pre-training. |
Xinxu Wei; kanhao zhao; Yong Jiao; Lifang He; Yu Zhang; | code |
| 732 | SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SpectralGCD, an efficient and effective multimodal approach to GCD that uses CLIP cross-modal image-concept similarities as a unified cross-modal representation. |
Lorenzo Caselli; Marco Mistretta; Simone Magistri; Andrew D. Bagdanov; | code |
| 733 | AutoDV: An End-to-End Deep Learning Model for High-Dimensional Data Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present AutoDV, an end-to-end deep learning model, for high-dimensional data visualization. |
Wei Dai; Jicong Fan; | code |
| 734 | When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that: Compression distortion unevenly impacts different-level image features, leading to varying effects on MLLMs’ downstream tasks depending on their feature-level reliance. Motivated by this discovery, we propose an image Codec TAilored to MLLMs (CoTAM) designed to adaptively protect multi-level features and suit different demands of downstream tasks. |
Jinming Liu; Zhaoyang Jia; Jiahao Li; Bin Li; Xin Jin; Wenjun Zeng; Yan Lu; | code |
| 735 | Physics Vs Distributions: Pareto Optimal Flow Matching with Physics Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on the insight of inherently conflicting objectives, we introduce Physics-Based Flow Matching (PBFM) a method that enforces physical constraints at training time using conflict-free gradient updates and unrolling to mitigate Jensen’s gap. |
Giacomo Baldan; Qiang Liu; Alberto Guardone; Nils Thuerey; | code |
| 736 | Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, DPO suffers from the recently identified squeezing effect (also known as likelihood displacement), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in the logit space. |
Haocheng Luo; Zehang Deng; Thanh-Toan Do; Mehrtash Harandi; Dinh Phung; Trung Le; | code |
| 737 | Enhancing LLMs for Knowledge Base Question Answering By Chain-of-Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Chain-of-Decomposition (\texttt{CoD}), a novel framework that decomposes KBQA into three modular steps: (1) an LLM-free retrieval module to extract query-relevant subgraphs from the knowledge base, (2) a parameter-free reformulation step that transforms retrieved contexts into structured reasoning paths, and (3) a lightweight LLM-based reasoning module trained to evaluate the logical validity of each path. |
Yonggang Zhang; Jianqi Gao; Jie Lu; | code |
| 738 | Tell Me Habibi, Is It Real or Fake? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This linguistic mixing poses extra challenges for deepfake detection, as it can confuse models trained mostly on monolingual data. To address this, we introduce ArEnAV, the first large-scale Arabic-English audio-visual deepfake dataset featuring intra-utterance code-switching, dialectal variation, and monolingual Arabic content. |
Kartik Kuckreja; Parul Gupta; Injy Hamed; Thamar Solorio; Muhammad Haris Khan; Abhinav Dhall; | code |
| 739 | DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods face scalability challenges, as they require time-consuming optimization for each image separately, taking hours for small batches. To address these challenges, we introduce DiffVax, a scalable, lightweight, and optimization-free framework for image immunization, specifically designed to prevent diffusion-based editing. |
Tarik Can Ozden; Ozgur Kara; Oguzhan Akcin; Kerem Zaman; Shashank Srivastava; Sandeep P. Chinchali; James Matthew Rehg; | code |
| 740 | Triangle Multiplication Is All You Need for Biomolecular Structure Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Pairmixer, a streamlined alternative that eliminates triangle attention while preserving higher-order geometric reasoning capabilities that are critical for structure prediction. |
Jeffrey Ouyang-Zhang; Pranav Murugan; Daniel Jesus Diaz; Gianluca Scarpellini; Richard Strong Bowen; Nate Gruver; Adam Klivans; Philipp Kraehenbuehl; Aleksandra Faust; Maruan Al-Shedivat; | code |
| 741 | Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenge, we propose ReIMTS, a **Re**cursive multi-scale modeling approach for **I**rregular **M**ultivariate **T**ime **S**eries forecasting. |
Boyuan Li; Zhen Liu; Yicheng Luo; Qianli Ma; | code |
| 742 | Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present **_Resp-Agent_**, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A²CA). |
Pengfei ZHANG; Tianxin Xie; Minghao Yang; Li Liu; | code |
| 743 | Towards Better Branching Policies: Leveraging The Sequential Nature of Branch-and-Bound Tree Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent deep learning approaches have shown promise in learning branching policies using instance-independent features, they often struggle to capture the sequential decision-making nature of B\&B, particularly over long horizons with complex inter-step dependencies and intra-step variable interactions. To address these challenges, we propose Mamba-Branching, a novel learning-based branching policy that leverages the Mamba architecture for efficient long-sequence modeling, enabling effective capture of temporal dynamics across B\&B steps. |
Ce Zhang; Bin Zhang; Guoliang Fan; | code |
| 744 | Diffusion Fine-Tuning Via Reparameterized Policy Gradient of The Soft Q-Function Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To mitigate over-optimization, we propose Soft Q-based Diffusion Finetuning (SQDF), a novel KL-regularized RL method for diffusion alignment that applies a reparameterized policy gradient of a training-free, differentiable estimation of the soft Q-function. |
Hyeongyu Kang; Jaewoo Lee; Woocheol Shin; Kiyoung Om; Jinkyoo Park; | code |
| 745 | Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling Via Functional Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we show that the **functional scaling law (FSL)** framework introduced in [Li et al. (2025a)](https://arxiv.org/abs/2509.19189) provides a principled lens for analyzing BSS. |
Jinbo Wang; Binghui Li; Zhanpeng Zhou; Mingze Wang; Yuxuan Sun; Jiaqi Zhang; Xunliang Cai; Lei Wu; | code |
| 746 | When Priors Backfire: On The Vulnerability of Unlearnable Examples to Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we reveal a fundamental vulnerability of UEs that emerges when learning starts from a pretrained model. |
Zhihao Li; Gezheng Xu; Jiale Cai; Ruiyi Fang; Di Wu; Qicheng Lao; Charles Ling; Boyu Wang; | code |
| 747 | Towards Improved Sentence Representations Using Token Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model’s self-attention layers and making them susceptible to signal dilution. To address this, we introduce GLOT, a lightweight, structure-aware pooling module that reframes pooling as relational learning followed by aggregation. |
Krishna Sri Ipsit Mantri; Carola-Bibiane Schönlieb; Zorah Lähner; Moshe Eliasof; | code |
| 748 | Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior works have identified individual patterns—such as retrieval heads, sink heads, and diagonal traces—but these observations remain fragmented and lack a unifying explanation. To bridge this gap, we provide a unifying framework to explain the existence of diverse attention patterns by analyzing their underlying mathematical formulations with a temporal continuous perspective. |
Qingyue Yang; Jie Wang; Xing Li; Yinqi Bai; Tong Xialiang; Hui-Ling Zhen; Jianye HAO; Mingxuan Yuan; Bin Li; | code |
| 749 | Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we aim to extend AiOIR to multiple domains and propose the first multi-domain all-in-one image restoration method, DATPRL-IR, based on our proposed Domain-Aware Task Prompt Representation L}earning. |
Guanglu Dong; Chunlei Li; Chao Ren; Jingliang Hu; Yilei Shi; Xiao Xiang Zhu; Lichao Mou; | code |
| 750 | Getting Your LLMs Ready for Reinforcement Learning with Lightweight SFT Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our findings show that SFT checkpoints with peak diversity consistently lead to superior post-RL results. Building on these insights, we introduce Adaptive Early-Stop Loss (AESL), a lightweight and dynamic cold-start method that balances the acquisition of new patterns with the preservation of the base model’s distribution. |
Xinran Li; Guangda Huzhang; Siqi Shen; Qing-Guo Chen; Zhao Xu; Weihua Luo; Kaifu Zhang; Jun Zhang; | code |
| 751 | MARS-Sep: Multimodal-Aligned Reinforced Sound Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a preference alignment perspective, analogous to aligning LLMs with human intent. To address this, we introduce MARS-Sep, a reinforcement learning framework that reformulates separation as decision making. |
Zihan Zhang; Xize Cheng; Zhennan Jiang; Dongjie Fu; Jingyuan Chen; Zhou Zhao; Tao Jin; | code |
| 752 | PT$^2$-LLM: Post-Training Ternarization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its potential in the post-training quantization (PTQ) setting remains underexplored, due to the challenge of training-free parameter optimization and the quantization difficulty posed by outliers and dispersed weights. To address these issues, we propose PT$^2$-LLM, a post-training ternarization framework tailored for LLMs. |
Xianglong Yan; ChengZhu Bao; Zhiteng Li; Tianao Zhang; Kaicheng Yang; Haotong Qin; Ruobing Xie; Xingwu Sun; Yulun Zhang; | code |
| 753 | BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent work attempts to use typical expert driving behaviors (i.e., anchors) to guide diffusion models but relies on a truncated schedule, which introduces theoretical inconsistencies and can compromise performance. To address this, we introduce BridgeDrive, a novel anchor-guided diffusion bridge policy for closed-loop trajectory planning. |
Shu Liu; Wenlin Chen; Weihao Li; Zheng Wang; Lijin Yang; Jianing Huang; YipinZhang; Zhongzhan Huang; Ze Cheng; Hao Yang; | code |
| 754 | ComGS: Efficient 3D Object-Scene Composition Via Surface Octahedral Probes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Surface Octahedral Probes (SOPs), which store lighting and occlusion information and allow efficient 3D querying via interpolation, avoiding expensive ray tracing. |
Jian Gao; Mengqi Yuan; Yifei Zeng; Chang Zeng; Zhihao Li; Zhenyu Chen; Weichao Qiu; Xiao-Xiao Long; Hao Zhu; Xun Cao; Yao Yao; | code |
| 755 | Learning Concept Bottleneck Models from Mechanistic Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, these CBMs often significantly trail their black-box counterpart when controlling for information leakage. To address this, we introduce a novel CBM pipeline named Mechanistic CBM (M-CBM), which builds the bottleneck directly from a black-box model’s own learned concepts. |
Antonio De Santis; Schrasing Tong; Marco Brambilla; Lalana Kagal; | code |
| 756 | Imagine How To Change: Explicit Procedure Modeling for Change Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ProCap, a novel framework that reformulates change modeling from static image comparison to dynamic procedure modeling. |
Jiayang Sun; Zixin Guo; Min Cao; Guibo Zhu; Jorma Laaksonen; | code |
| 757 | Hyper-SET: Designing Transformers Via Hyperspherical Energy Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Transformer-based models have achieved remarkable success, but their core components, Transformer layers, are largely heuristics-driven and engineered from the bottom up, calling for a prototypical model with high interpretability and practical competence. |
Yunzhe Hu; Difan Zou; Dong Xu; | code |
| 758 | Fair Decision Utility in Human-AI Collaboration: Interpretable Confidence Adjustment for Humans with Cognitive Disparities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis suggests that achieving utility fairness in AI-assisted decision-making requires both *human-alignment* and *inter-group-alignment*. Building on these objectives, we propose a multicalibration-based AI confidence adjustment approach tailored to scenarios involving human decision-makers with heterogeneous cognitive capacities. |
Jiashi Gao; Kexin Liu; Xinwei Guo; Junlei Zhou; Jiaxin Zhang; Xiangyu Zhao; Guanhua Chen; Xin Yao; Xuetao Wei; | code |
| 759 | Embedding-Based Context-Aware Reranker Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Many state-of-the-art (SOTA) reranking methods, despite utilizing powerful large pretrained language models with potentially high inference costs, still neglect the aforementioned challenges. Therefore, we propose Embedding-Based Context-Aware Reranker (EBCAR), a lightweight reranking framework operating directly on embeddings of retrieved passages with enhanced cross-passage understandings through the structural information of the passages and a hybrid attention mechanism, which captures both high-level interactions across documents and low-level relationships within each document. |
Ye Yuan; Mohammad Amin Shabani; Siqi Liu; | code |
| 760 | PAGE-4D: Disentangled Pose and Geometry Estimation for VGGT-4D Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, since they are typically trained on static datasets, these models often struggle in real-world scenarios involving complex dynamic elements, such as moving humans or deformable objects like umbrellas. To address this limitation, we introduce PAGE-4D, a feedforward model that extends VGGT to dynamic scenes, enabling camera pose estimation, depth prediction, point cloud reconstruction, and point tracking—all without post-processing. |
Kaichen Zhou; Yuhan Wang; Grace Chen; Gaspard Beaudouin; Fangneng Zhan; Paul Pu Liang; Mengyu Wang; | code |
| 761 | Proximal Diffusion Neural Sampler Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the training of neural samplers can be challenging when the target distribution is multimodal with significant barriers separating the modes, potentially leading to mode collapse. We propose a framework named **Proximal Diffusion Neural Sampler (PDNS)** that addresses these challenges by tackling the stochastic optimal control problem via proximal point method on the space of path measures. |
Wei Guo; Jaemoo Choi; Yuchen Zhu; Molei Tao; Yongxin Chen; | code |
| 762 | Samples Are Not Equal: A Sample Selection Approach for Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such redundant learning often drives models to overemphasize simple feature patterns in high-density regions, weakening their ability to capture complex yet diverse ones in low-density regions. To address this issue, we propose a novel plug-in designed to mitigate overfitting to simple and redundant feature patterns while encouraging the learning of more complex yet diverse ones. |
Zhengxing Jiao; Yaxin Hou; Jun Ma; Yuhang Li; Ding Ding; Yuheng Jia; Hui LIU; Junhui Hou; | code |
| 763 | Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conversely, dynamic representations, which rely on runtime profiling, provide profound insights into performance bottlenecks but are often impractical for large-scale tasks due to prohibitive overhead and inherent non-determinism. This paper transcends this trade-off by proposing a novel quasi-dynamic framework for program representation. |
Haolin Pan; Dong Jinyuan; Hongbin Zhang; Hongyu Lin; Mingjie Xing; Yanjun Wu; | code |
| 764 | PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, some irrelevant instructions may also introduce negative effects to model capacity recovery. To address these challenges, we propose the **P**ost-training d**A**ta **S**election method for **E**fficient pruned large language model **R**ecovery (**PASER**). |
Bowei He; Lihao Yin; Hui-Ling Zhen; Xiaokun Zhang; Mingxuan Yuan; Chen Ma; | code |
| 765 | Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We extend manifold theory to tabular data and expand its scope to handle diverse inference-time objectives. On this foundation, we introduce Harpoon, a tabular diffusion method that guides unconstrained samples along the manifold geometry to satisfy diverse tabular conditions at inference. |
Aditya Shankar; Yuandou Wang; Rihan Hai; Lydia Y. Chen; | code |
| 766 | Robust LLM Unlearning Via Post Judgment and Multi-round Thinking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they exhibit significant robustness deficiencies against adversarial attacks: in the worst case, simple prefix attacks can induce up to a 1,150-fold surge in information leakage for fictitious entity knowledge, while composite question attacks can cause accuracy on hazardous knowledge to rebound from the 25% random-guess baseline to as high as 67.0%. To address this, we propose a new unlearning framework via post judgment and multi-round thinking (PoRT), which consists of three key modules. |
Xinrui Chen; Xu Cao; Jianhao Zhang; Pinlong Zhao; Di Gao; Ou Wu; | code |
| 767 | Constraint-guided Hardware-aware NAS Through Gradient Modification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This can either result in overly penalizing resource-intensive architectures or architectures failing to meet the hardware constraints of the target device. To address these challenges, we propose ConNAS, a novel gradient-based NAS framework that enforces hardware constraints directly through gradient modification. |
Gregory De Ruyter; Mathias Verbeke; Hans Hallez; | code |
| 768 | FeDaL: Federated Dataset Learning for General Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel Federated Dataset Learning (FeDaL) approach to tackle heterogeneous time series by learning dataset-agnostic temporal representations. |
Shengchao Chen; Guodong Long; Michael Blumenstein; Jing Jiang; | code |
| 769 | OD$^3$: Optimization-free Dataset Distillation for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce OD$^3$, a novel optimization-free data distillation framework specifically designed for object detection. |
Salwa K. Al Khatib; Ahmed Elhagry; Shitong Shao; Zhiqiang Shen; | code |
| 770 | K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This fragmentation contrasts sharply with clinical practice, where experts seamlessly integrate diverse knowledge: anatomical priors from training, exemplar-based reasoning from reference cases, and iterative refinement through real-time interaction. We present $\textbf{K-Prism}$, a unified segmentation framework that mirrors this clinical flexibility by systematically integrating three knowledge paradigms: (i) $\textit{semantic priors}$ learned from annotated datasets, (ii) $\textit{in-context knowledge}$ from few-shot reference examples, and (iii) $\textit{interactive feedback}$ from user inputs like clicks or scribbles. |
Bangwei Guo; Yunhe Gao; Meng Ye; Difei Gu; Yang Zhou; Leon Axel; Dimitris N. Metaxas; | code |
| 771 | FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Besides, previous works focus on optimizing the success rate during task execution, but pay less attention to the personalized execution trajectory, thereby neglecting potentially vast differences in user preferences. To address these challenges, we introduce the FingerTip 20K benchmark. |
Qinglong Yang; Haoming Li; Haotian Zhao; Xiaokai Yan; Jingtao Ding; Fengli Xu; Yong Li; | code |
| 772 | Any-step Generation Via N-th Order Recursive Consistent Velocity Field Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These limitations impede their scalability and stability, especially when applied to large-scale models. To address these issues, we introduce **$N$-th order Recursive Consistent velocity field estimation for Generative Modeling (RCGM)**, a novel framework that unifies many existing approaches. |
Peng Sun; Tao Lin; | code |
| 773 | Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing incomplete MDL methods either discard missing modalities, risking the loss of valuable task-relevant information, or recover them, potentially introducing irrelevant noise, leading to the discarding-imputation dilemma. To address this dilemma, in this paper, we propose DyMo, a new inference-time dynamic modality selection framework that adaptively identifies and integrates reliable recovered modalities, fully exploring task-relevant information beyond the conventional discard-or-impute paradigm. |
Siyi Du; Xinzhe Luo; Declan O’regan; Chen Qin; | code |
| 774 | Tokenizing Single-Channel EEG with Time-Frequency Motif Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from *single-channel* EEG signals and encodes them into discrete tokens. |
Jathurshan Pradeepkumar; Xihao Piao; Zheng Chen; Jimeng Sun; | code |
| 775 | Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a two-pillar framework, LiteCoST, to achieve both high accuracy and low latency with small language models (SLMs). |
Zhuowen Liang; Xiaotian Lin; Zhengxuan Zhang; Yuyu Luo; Haixun Wang; Nan Tang; | code |
| 776 | GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the gaps, we introduce Geo-Temporal Reasoning benchmark (GTR-Bench), a novel challenge for geographic temporal reasoning of moving targets in a large-scale camera network. |
Qinghongbing Xie; Zhaoyuan Xia; Feng Zhu; Lijun GONG; Ziyue Li; Rui Zhao; Long ZENG; | code |
| 777 | Libra: Effective Yet Efficient Load Balancing for Large-scale MoE Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Libra, a system that achieves near-optimal load balancing with minimal overhead. |
Jaehoon Yang; Yushin Kim; Seokwon Moon; Yeonhong Park; Jae W. Lee; | code |
| 778 | Token-level Data Selection for Safe LLM Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this limitation, we perform a systematic token-level diagnosis of safety degradation during fine-tuning. Based on this, we propose token-level data selection for safe LLM fine-tuning (TOSS), a novel framework that quantifies the safety risk of each token by measuring the loss difference between a safety-degraded model and a utility-oriented model. |
Yanping Li; Zhening Liu; Zijian Li; Zehong Lin; Jun Zhang; | code |
| 779 | Automatic Stage Lighting Control: Is It A Rule-Driven Process or Generative Task? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this gap, this paper presents Skip-BART, an end-to-end model that directly learns from experienced lighting engineers and predict vivid, human-like stage lighting.To address the lack of available datasets, we create the first stage lighting dataset, along with several pre-training and transfer learning techniques to improve model training with limited data. |
Zijian Zhao; Dian Jin; Zijing Zhou; Xiaoyu Zhang; | code |
| 780 | QuestA: Expanding Reasoning Capacity in LLMs Via Question Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This raises a key challenge: how can RL be adapted to solve harder reasoning problems more effectively? To address this challenge, we propose a simple yet effective strategy via Question Augmentation: introduce partial solutions during training to reduce problem difficulty and provide more informative learning signals. |
Jiazheng Li; Hongzhou Lin; Hong Lu; Kaiyue Wen; Zaiwen Yang; Jiaxuan Gao; Yi Wu; Jingzhao Zhang; | code |
| 781 | RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, to provide more reliable sub-goals, we novelly introduce a reliability-driven decision mechanism, and propose Reliability-Driven HRL (RD-HRL) as the solution. |
Yixiang Shan; Haipeng Liu; Ting Long; Yi Chang; | code |
| 782 | From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce **FALCON (From Spatial to Action)**, a novel paradigm that injects rich 3D spatial tokens into the action head. |
Zhengshen Zhang; Hao Li; Yalun Dai; Zhengbang Zhu; Lei Zhou; Chenchen Liu; Dong Wang; Francis E. H. Tay; Sijin Chen; Ziwei Liu; Yuxiao Liu; Xinghang Li; Pan Zhou; | code |
| 783 | Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To tackle such a challenging problem, we present a unified framework with two-stage optimization approach based on Gaussian Splatting.Since our task has not been studied before, we construct a new evaluation benchmark using publicly available datasets for HDR video reconstruction. |
Jinfeng Liu; Lingtong Kong; Mi Zhou; Jinwei Chen; Dan Xu; | code |
| 784 | Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design Via Graph Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Si-GT, a novel transformer-based model for fast and accurate signal integrity analysis in IC interconnects. |
Yuting Hu; Tarek Mohamed; Chenhui Xu; Hua Xiang; Hussam Amrouch; Gi-Joon Nam; Jinjun Xiong; | code |
| 785 | DRIFT-Net: A Spectral-Coupled Neural Operator for PDEs Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In recent years, foundation models for PDEs have largely adopted multi-scale windowed self-attention, with the scOT backbone in Poseidon serving as a representative example. |
Jiayi Li; Flora D. Salim; | code |
| 786 | Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce TimaAlign, a lightweight, plug-and-play framework that establishes a new representation paradigm, distinct from contrastive learning, by aligning auxiliary features via a simple reconstruction task and feeding them back into any base forecaster. |
Yifan Hu; Jie Yang; Tian Zhou; Peiyuan Liu; Yujin Tang; Rong Jin; Liang Sun; | code |
| 787 | Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning. |
Haruki Abe; Takayuki Osa; Yusuke Mukuta; Tatsuya Harada; | code |
| 788 | ETGS: Explicit Thermodynamics Gaussian Splatting for Dynamic Thermal Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ETGS, a method for reconstructing dynamic thermal scenes by embedding explicit thermodynamic modeling into 3D Gaussian Splatting. |
Zhongwen Wang; Han Ling; Weihao Zhang; Yinghui Sun; Quansen Sun; | code |
| 789 | GradPruner: Gradient-guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To simultaneously enhance the training and inference efficiency of downstream task fine-tuning, we introduce GradPruner, which can prune layers of LLMs guided by gradients in the early stages of fine-tuning. |
Wei Huang; Anda Cheng; Yinggui Wang; | code |
| 790 | Detecting Misbehaviors of Large Vision-Language Models By Evidential Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing uncertainty quantification methods, which typically capture only overall epistemic uncertainty, have shown limited effectiveness in identifying such issues. To address this gap, we propose Evidential Uncertainty Quantification (EUQ), a fine-grained method that captures both information conflict and ignorance for effective detection of LVLM misbehaviors. |
Tao Huang; Rui Wang; Xiaofei Liu; Yi Qin; Li Duan; Liping Jing; | code |
| 791 | Unlocking The Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To unlock the Value of Text, we propose VoT, a method with Event-driven Reasoning and Multi-level Alignment. |
Siyuan Wang; Peng Chen; Yihang Wang; Wanghui Qiu; Chenjuan Guo; Bin Yang; Yang Shu; | code |
| 792 | Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively hacking the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. |
Yuhui Wang; Changjiang Li; Guangke Chen; Jiacheng Liang; Ting Wang; | code |
| 793 | Self-Destructive Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing defenses attempt to reinforce LLM alignment, they fail to address models’ inherent `trainability’ on harmful data, leaving them vulnerable to stronger attacks with increased learning rates or larger harmful datasets. To overcome this limitation, we introduce SEAM, a novel alignment-enhancing defense that transforms LLMs into self-destructive models with intrinsic resilience to misalignment attempts. |
Yuhui Wang; Rongyi Zhu; Ting Wang; | code |
| 794 | Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Horizon Imagination (HI), an on-policy imagination process for discrete stochastic policies that denoises multiple future observations in parallel. |
Lior Cohen; Ofir Nabati; Kaixin Wang; Navdeep Kumar; Shie Mannor; | code |
| 795 | Dynamic Novel View Synthesis in High Dynamic Range Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, real-world scenarios frequently feature dynamic elements, such as moving objects, varying lighting conditions, and other temporal events, thereby presenting a significantly more challenging scenario. To address this gap, we propose a more realistic problem named HDR Dynamic Novel View Synthesis (HDR DNVS), where the additional dimension “Dynamic” emphasizes the necessity of jointly modeling temporal radiance variations alongside sophisticated 3D translation between LDR and HDR. |
Kaixuan Zhang; Zhipeng Xiong; Minxian Li; Mingwu Ren; Jiankang Deng; Xiatian Zhu; | code |
| 796 | PCLR: Progressively Compressed LoRA for Multimodal Continual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To jointly address forgetting and memory explosion, we propose the Compression–Integration–Learning (CIL) pipeline, which draws on the memory consolidation processes during human sleep. |
Weicheng Meng; Jingyang Qiao; Zhizhong Zhang; Shaohui Liu; Yuan Xie; | code |
| 797 | Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Given their diverse designs, we first place representative metric-based methods within a unified framework, enabling a clear assessment of their advantages and limitations. Our analysis identifies a core challenge across these methods: the token-level detection score is easily biased by the inherent randomness of the MGTs generation process. |
Chenwang Wu; Yiu-ming Cheung; Shuhai Zhang; Bo Han; Defu Lian; | code |
| 798 | Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This results in further challenges: explaining prediction changes is non-trivial, methods fail to leverage online dynamics, and evaluation remains difficult. To address these challenges, we propose Delta-XAI, which adapts 14 existing XAI methods through a wrapper function and introduces a principled evaluation suite for the online setting, assessing diverse aspects, such as faithfulness, sufficiency, and coherence. |
Changhun Kim; Yechan Mun; Hyeongwon Jang; Eunseo Lee; Sangchul Hahn; Eunho Yang; | code |
| 799 | Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance Via CoIPO Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches overlook the intrinsic robustness of LLMs, and their reliance on external components introduces additional computational overhead and uncertainty. In this work, we propose a Contrastive Learning-based Inverse Direct Preference Optimization (CoIPO) method that minimizes the discrepancy between the label-aligned logits produced by the model under a clean prompt and its noisy counterpart, and conduct a detailed analysis using mutual information theory. |
Xin Yang; Letian Li; Abudukelimu Wuerkaixi; Xuxin Cheng; Cao Liu; Ke Zeng; Xunliang Cai; Wenyuan Jiang; | code |
| 800 | An Information-Theoretic Parameter-Free Bayesian Framework for Probing Labeled Dependency Trees from Attention Score Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We proposed a method capable of estimating mutual information (MI) and directly extracting dependency trees from attention scores in a mathematical-rigorous way, requiring no additional network training effort. |
Hongxu Liu; Jing Ma; Xiaojie Wang; Caixia Yuan; Fangxiang Feng; | code |
| 801 | Learning from Label Proportions Via Proportional Value Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel LLP approach that can mitigate the over-smoothing problems with theoretical guarantees. |
Tianhao Ma; Wei Wang; Ximing Li; Gang Niu; Masashi Sugiyama; | code |
| 802 | CSRv2: Unlocking Ultra-Sparse Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce CSRv2, a principled training approach designed to make ultra-sparse embeddings viable. |
Lixuan Guo; Yifei Wang; Tiansheng Wen; Yifan Wang; Aosong Feng; Bo Chen; Stefanie Jegelka; Chenyu You; | code |
| 803 | SpecBranch: Speculative Decoding Via Hybrid Drafting and Rollback-Aware Branch Parallelism Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the existing SD methods still remain fundamentally constrained by their serialized execution, which inevitably causes mutual waiting bubbles between the draft and target models. To address this critical challenge, we draw inspiration from sophisticated branch prediction mechanisms in modern processors and propose a novel framework, \textbf{SpecBranch}, to fully unlock branch parallelism in SD. |
Yuhao Shen; Junyi Shen; Quan Kong; Tianyu Liu; Yao Lu; Cong Wang; | code |
| 804 | FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose FSD-CAP, a two-stage framework designed to improve imputation quality under extreme sparsity. |
Xin Qiao; Shijie Sun; Anqi Dong; Cong Hua; Xia Zhao; Longfei Zhang; Guangming Zhu; zhang liang; | code |
| 805 | FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we find two key problems: (1) student model distilled with equal class proportion data behaves significantly different across distinct categories; and (2) the robustness of student model is not stable across different attacks target. |
Zhengxiao Li; Liming Lu; Xu Zheng; Si Yuan Liang; Taric Chen; Yongbin Zhou; Shuchao Pang; | code |
| 806 | Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce the challenge of unsupervised categorical prior learning in pose estimation, where AI models learn a general pose prior for an object category from images in a self-supervised manner. |
Ziyu Wang; Shuangpeng Han; Mengmi Zhang; | code |
| 807 | WorldTree: Towards 4D Dynamic Worlds from Monocular Video Using Tree-Chains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose WorldTree, a unified framework comprising Temporal Partition Tree (TPT) that enables coarse-to-fine optimization based on the inheritance-based partition tree structure for hierarchical temporal decomposition, and Spatial Ancestral Chains (SAC) that recursively query ancestral hierarchical structure to provide complementary spatial dynamics while specializing motion representations across ancestral nodes. |
Qisen Wang; Yifan Zhao; Jia Li; | code |
| 808 | Multi-Head Low-Rank Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Multi-Head Low-Rank Attention (MLRA), a TP-friendly attention mechanism that slashes the per-device KV cache under TP to just $1.5 d_h$. |
Songtao Liu; Hongwu Peng; Zhiwei Zhang; Zhengyu Chen; Yue Guo; | code |
| 809 | AlphaSAGE: Structure-Aware Alpha Mining Via GFlowNets for Robust Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Third, the standard RL objective of maximizing expected returns inherently drives policies towards a single optimal mode, directly contradicting the practical need for a diverse portfolio of non-correlated alphas. To overcome these challenges, we introduce **AlphaSAGE** (**S**tructure-**A**ware Alpha Mining via **G**enerative Flow Networks for Robust **E**xploration), a novel framework is built upon three cornerstone innovations: (1) a structure-aware encoder based on Relational Graph Convolutional Network (RGCN); (2) a new framework with Generative Flow Networks (GFlowNets); and (3) a dense, multi-faceted reward structure. |
Binqi Chen; Hongjun Ding; Ning Shen; Taian Guo; Jinsheng Huang; Luchen Liu; Ming Zhang; | code |
| 810 | Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such queries may no longer represent the agent’s evolving behavior patterns, reducing the informativeness of human feedback. To address this issue, we propose a policy likelihood-based query sampling and critic-exploited reset (PoLiCER). |
Jongkook Heo; Jaehoon Kim; Young Jae Lee; Min Gu Kwak; Youngjoon Park; Seoung Bum Kim; | code |
| 811 | Behavior Learning (BL) Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Grounded in behavioral science, we propose Behavior Learning (BL), a novel general-purpose machine learning framework that unifies predictive performance, intrinsic interpretability, and identifiability for scientifically credible modeling. |
Zhenyao Ma; Yue Liang; Dongxu Li; | code |
| 812 | You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, once a model is deployed, user corrections can be used to adapt the network parameters to the new data distribution, mitigating distribution shift. Based on these insights, we aim to develop a practical, effective method for improving the adaptive capabilities of interactive segmentation models to new data distributions in medical imaging. |
Wentian Xu; Ziyun Liang; Harry Anthony; Yasin Ibrahim; Felix Cohen; Guang Yang; Konstantinos Kamnitsas; | code |
| 813 | LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning by aligning the optimizer’s internal dynamics with those of updating all model weights. |
Nurbek Tastan; Stefanos Laskaridis; Martin Takáč; Karthik Nandakumar; Samuel Horváth; | code |
| 814 | Diversity-Incentivized Exploration for Versatile Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In the paper, we propose **DIVER** (**D**iversity-**I**ncentivized Exploration for **V**ersatil**E** **R**easoning), an innovative framework that highlights the pivotal role of global sequence-level diversity to incentivize deep exploration for versatile reasoning. |
Zican Hu; Shilin Zhang; Yafu Li; Jianhao Yan; Xuyang Hu; Leyang Cui; Xiaoye Qu; Chunlin Chen; Yu Cheng; Zhi Wang; | code |
| 815 | Enhancing Communication Compression Via Discrepancy-aware Calibration for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these rules do not account for the discrepancy between the compressed and the original outputs, which can lead to the loss of important information. To address this issue, we propose a novel discrepancy-aware communication compression method that enhances performance under severely constrained communication conditions. |
Zhiyi Wan; Yijia Chi; Liang Li; Wanrou Du; Miao Pan; Xiaoqi Qin; | code |
| 816 | AC-Sampler: Accelerate and Correct Diffusion Sampling with Metropolis-Hastings Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the Accelerator-Corrector Sampler (AC-Sampler), which accelerates and corrects diffusion sampling without fine-tuning. |
Minsang Park; Gyuwon Sim; Hyungho Na; Jiseok Kwak; Sumin Lee; Richard Lee Kim; Donghyeok Shin; Byeonghu Na; Yeongmin Kim; Il-chul Moon; | code |
| 817 | ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing metrics often fail to balance semantic and structural information: string-based methods neglect semantics, whereas proof-based approaches offer no graded similarity when proofs fail. To address these issues, we introduce ASSESS (A Semantic and Structural Evaluation Framework for Statement Similarity), which captures syntactic structure by transforming formal statements into operator trees and computes a real-valued similarity score using our novel TransTED (Transformation Tree Edit Distance) Similarity metric by incorporating semantic transformations. |
Xiaoyang Liu; Tao Zhu; Zineng Dong; Yuntian Liu; Guo qingfeng; Liu ZhaoXuan; Yu Chen; Tao Luo; | code |
| 818 | BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This makes it unclear whether models are truly reasoning or just recalling answers. In this paper, we introduce $\textbf{BeyondBench}$, an evaluation framework that avoids this problem by using $\textbf{algorithmic problem generation}$. |
Gaurav Srivastava; Aafiya Shamshad Hussain; Zhenyu Bi; Swastik Roy; Priya Pitre; Meng Lu; Morteza Ziyadi; Xuan Wang; | code |
| 819 | MoSA: Mosaic Shared Adaptation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MoSA, a new parameter-efficient fine-tuning (PEFT) method that replaces low-rank factorization with randomized, fine-grained sharing of weight updates. |
Xiequn Wang; Zhan Zhuang; Shengda Luo; Yu Zhang; | code |
| 820 | Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce AttWarp, a lightweight method that allocates more resolution to query-relevant content while compressing less informative areas, all while preserving global context. |
Dwip Dalal; Gautam Vashishtha; Utkarsh Mishra; Jeonghwan Kim; Madhav Kanda; Hyeonjeong Ha; Svetlana Lazebnik; Heng Ji; Unnat Jain; | code |
| 821 | TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate a novel semi-supervised RLVR paradigm that utilizes a small labeled set to **guide** RLVR training on unlabeled samples. |
Shenzhi Yang; Guangcheng Zhu; Haobo Wang; Xing Zheng; Yingfan MA; Zhongqi Chen; Bowen Song; Weiqiang Wang; Junbo Zhao; Gang Chen; | code |
| 822 | MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although the new FP4 Tensor Cores in NVIDIA’s Blackwell architecture offer up to 4$\times$ speedup over FP16, existing INT4-based kernels fail to fully exploit this capability due to mismatched data formats. To bridge this gap, we propose MicroMix, a co-designed mixed-precision quantization algorithm and GEMM kernel based on Microscaling (MX) data formats. |
Wenyuan Liu; Haoqian Meng; Yilun Luo; Peng Zhang; Xindian Ma; | code |
| 823 | WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce WARC-Bench (Web Archive Benchmark), a novel web navigation benchmark featuring 438 tasks designed to evaluate multimodal AI agents on subtasks. |
Sanjari Srivastava; Gang Li; Cheng Chang; Rishu Garg; Manpreet Kaur; Charlene Y. Lee; Yuezhang Li; Yining Mao; Ignacio Cases; Yanan Xie; Peng Qi; | code |
| 824 | Understanding and Improving Hyperbolic Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify key factors that determine the success and failure of training hyperbolic deep RL agents. |
Timo Klein; Thomas Lang; Andrii Shkabrii; Alexander Sturm; Kevin Sidak; Lukas Miklautz; Claudia Plant; Yllka Velaj; Sebastian Tschiatschek; | code |
| 825 | CLARC: C/C++ Benchmark for Robust Code Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the gap, we introduce an automated pipeline for code search datasets and present CLARC, a C/C++ benchmark built from real-world GitHub repositories. |
Kaicheng Wang; Liyan Huang; Weike Fang; Weihang Wang; | code |
| 826 | Human-Object Interaction Via Automatically Designed VLM-Guided Motion Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the first unified physics-based HOI framework that leverages Vision-Language Models (VLMs) to enable long-horizon interactions with diverse object types — including static, dynamic, and articulated objects. |
Zekai Deng; Ye Shi; Kaiyang Ji; Lan Xu; Shaoli Huang; Jingya Wang; | code |
| 827 | Rethinking Residual Errors in Compensation-based LLM Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Methods based on weight compensation, which iteratively apply quantization and weight compensation to minimize the output error, have recently demonstrated remarkable success in quantizing Large Language Models (LLMs). The representative work, GPTQ, introduces several key techniques that make such iterative methods practical for LLMs with billions of parameters. |
Shuaiting Li; Juncan Deng; Kedong Xu; Rongtao Deng; Hong Gu; Minghan Jiang; Haibin Shen; Kejie Huang; | code |
| 828 | Read The Room: Video Social Reasoning with Mental-Physical Causal Chains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce $R^3$-Bench-an evaluation benchmark with fine-grained annotations of belief, intent, desire, emotion, and their causal chains in complex scenarios; and $R^3$-FDT, a large-scale training set generated through a novel automated pipeline with the same structure.We will release our dataset, code and models upon acceptance. |
Lixing Niu; Jiapeng Li; Xingping Yu; Xinyi Dong; Shu Wang; Ruining Feng; Bo Wu; Ping Wei; Yisen Wang; Lifeng Fan; | code |
| 829 | From Observations to Events: Event-Aware World Models for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: From a cognitive science perspective, humans segment continuous sensory streams into discrete events and rely on these key events for decision-making. Motivated by this principle, we propose the Event-Aware World Model (EAWM), a general framework that learns event-aware representations to streamline policy learning without requiring handcrafted labels. |
Zhao-Han Peng; Shaohui Li; Zhi Li; Shulan Ruan; Yu LIU; You He; | code |
| 830 | Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study how to redesign the framework based on the characteristics of high-dimensional 3D images, and explore data synergy to overcome the fragile representation of lightweight methods. |
Jinpeng Lu; Linghan Cai; Yinda Chen; Guo Tang; Songhan Jiang; Haoyuan Shi; Zhiwei Xiong; | code |
| 831 | Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing parameter-efficient methods often limit model expressivity or introduce new parameters per task, creating scalability issues. To address these limitations, we introduce **Orthogonal Subspace Fine-Tuning (OSFT)**, a novel parameter-efficient approach for continual learning. |
Nikhil Shivakumar Nayak; Krishnateja Killamsetty; Ligong Han; Abhishek Bhandwaldar; Prateek Chanda; Kai Xu; Oleg Silkin; Mustafa Eyceoz; Hao Wang; Aldo Pareja; Akash Srivastava; | code |
| 832 | Flower: A Flow-Matching Solver for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Flower, a solver for linear inverse problems. |
Mehrsa Pourya; Bassam El Rawas; Michael Unser; | code |
| 833 | DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most existing methods overlook the degradations (e.g., noise and blur) in LDR frames, focusing only on the brightness and position differences between them. To address this gap, we propose DeAltHDR, a novel framework for high-quality HDR video reconstruction from degraded sequences. |
Shuohao Zhang; Zhilu Zhang; RongJian Xu; Xiaohe Wu; Wangmeng Zuo; | code |
| 834 | From Atom to Space: A Region-based Readout Function for Spatial Properties of Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a region-based decomposition perspective, reformulating material properties as integrals over space and pooling contributions from spatial regions rather than atoms. |
Jiawen Zou; Weimin Tan; Zhongyao Wang; Hao Qi; Bo Yan; | code |
| 835 | Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our experiments demonstrate that even state-of-the-art MLLMs perform poorly on these foundational tasks. To address this limitation, we propose Scene Dynamic Field (SDF), a concise approach that leverages physics simulators within a multi-task fine-tuning framework. |
Nanxi Li; Xiang Wang; Yuanjie Chen; Haode Zhang; Hong Li; Yong-Lu Li; | code |
| 836 | SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce SpaceControl, a training-free test-time method for explicit spatial control of 3D generation. |
Elisabetta Fedele; Francis Engelmann; Ian Huang; Or Litany; Marc Pollefeys; Leonidas Guibas; | code |
| 837 | Let’s Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a zero-shot editing method that leverages the latent compositional structure of video models to expose fine-grained distinctions without extra data. |
Kaiting Liu; Hazel Doughty; | code |
| 838 | AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces AEGIS (Adversarial Erasure with Gradient-Informed Synergy), a retention-data-free framework that advances both robustness and retention. |
Fengpeng Li; Kemou Li; Qizhou Wang; Bo Han; Jiantao Zhou; | code |
| 839 | Self-Consistency Improves The Trustworthiness of Self-Interpretable GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Empirical analysis further reveals that self-inconsistency predominantly occurs on unimportant features, linking it to redundancy-driven explanation inconsistency observed in recent work and suggesting untapped potential for improving explanation quality. Building on these insights, we introduce a simple, model-agnostic self-consistency (SC) training strategy. |
Wenxin Tai; Ting Zhong; Goce Trajcevski; Fan Zhou; | code |
| 840 | On Robustness of Vision-Language-Action Model Against Multi-Modal Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To build multi-modal robust VLAs, we propose RobustVLA against perturbations in VLA inputs and outputs. |
Jianing Guo; Zhenhong Wu; Chang Tu; Yiyao Ma; Xiangqi Kong; Zhiqian Liu; Jiaming Ji; Shuning Zhang; Yuanpei Chen; Kai Chen; Qi Dou; Yaodong Yang; Xianglong Liu; Huijie Zhao; Weifeng Lv; Simin Li; | code |
| 841 | Zeros Can Be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: (2) Quantization sensitivity is uniform across layers. Motivated by these findings, we introduce Masked Binary U‑Net (MBU‑Net), obtained through a cost‑aware masking strategy that prioritizes masking where it yields the highest accuracy‑per‑cost, reconciling accuracy with near‑binary efficiency. |
Chunshu Wu; Ruibing Song; Sushant Kondguli; Tong Geng; Ang Li; | code |
| 842 | StreamingThinker: Large Language Models Can Think While Reading Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models (LLMs) have demonstrated remarkable capabilities in chain of thought (CoT) reasoning. |
Junlong Tong; Yingqi Fan; Anhao Zhao; Yunpu Ma; Xiaoyu Shen; | code |
| 843 | Faithfulness Under The Distribution: A New Look at Attribution Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose FUD, a novel evaluation framework that reconstructs masked regions using score-based diffusion models to produce in-distribution, semantically coherent inputs. |
Zhiyu Zhu; Zhibo Jin; Jiayu Zhang; Bartlomiej Sobieski; Przemyslaw Biecek; Fang Chen; Jianlong Zhou; | code |
| 844 | Inferring Brain Plasticity Rule Under Long-term Stimulation with Structured Recurrent Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we formalize these principles as a latent dynamical law that governs how recurrent connectivity evolves under repeated interventions. To capture this law, we introduce the Stimulus-Evoked Evolution Recurrent dynamics (STEER) framework, a dual-timescale model that disentangles fast neural activity from slow plastic changes. |
Zhichao Liang; Jingzhe Lin; Xinyi Li; Guanyi Zhao; Quanying Liu; | code |
| 845 | Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenges, we propose SIREN, a framework that enables effective LLM-based rating prediction via long-term interest sketching and internalized reasoning. |
Zhihao Ding; Jinming Li; Shuai Mu; Jieming Shi; | code |
| 846 | CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Channel-independent models treat each channel in isolation to increase flexibility, yet this neglects inter-channel dependencies and limits performance. To address these limitations, we propose CPiRi, a channel permutation invariant (CPI) framework that infers cross-channel structure from data rather than memorizing a fixed ordering, enabling deployment in settings with structural and distributional co-drift without retraining. |
Jiyuan Xu; Wenyu Zhang; Xin Jing; Jiahao Nie; Shuai Chen; Shuai Zhang; | code |
| 847 | Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models Via Extra Forward Passes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the issues above, we propose a new alternating optimization framework called LMAO (Low-rank and Memory-efficient Zeroth-Order Alternating Optimization), which combines the advantages of LoRA and MeZO. |
Jia Zhang; Yu Bai; Hualin Zhang; Tianshuo Chen; Zhaogeng Liu; zhiqiang xu; Yi Chang; Bin Gu; | code |
| 848 | Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they often rely on predefined, high-action trajectories, which limits both sampling efficiency and final restoration quality. To address this, we propose a Consistency Geodesic Bridge (CGB) framework to construct a lower-action, geodesic trajectory. |
Jinhui HOU; Zhiyu Zhu; Junhui Hou; | code |
| 849 | Towards A Foundation Model for Crowdsourced Label Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent efforts toward universal aggregation models do not account for the structural and behavioral complexities of human-annotated crowdsourcing, resulting in poor real-world performance. To address this gap, we introduce CrowdFM, a foundation model for crowdsourced label aggregation. |
Hao Liu; Jiacheng Liu; Feilong Tang; Long Chen; Jiadi Yu; Yanmin Zhu; Qiwen Dong; Yichuan Yu; Xiaofeng Hou; | code |
| 850 | MARS – A Foundational Map Auto-Regressor Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by the recent huge success of auto-regressive visual-language modeling, we propose the first map foundational model: Map Auto-Regressor (MARS), that is capable of generating both multi-polyline road networks and polygon buildings in a unified manner. |
Qi Zhang; Suvam Bag; Rupanjali Kukal; Mikael Figueroa; Rishi Madhok; Nikolaos Karianakis; Fuxun Yu; | code |
| 851 | Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a one-step diffusion framework for generating temporally coherent, depth-aware video bokeh rendering. |
Yang Yang; Siming Zheng; Qirui Yang; Jinwei Chen; Boxi Wu; Xiaofei He; Deng Cai; Bo Li; Peng-Tao Jiang; | code |
| 852 | Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. |
Hongyi Zhou; Jin Zhu; Kai Ye; Ying Yang; Erhan Xu; Chengchun Shi; | code |
| 853 | Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Contemporary prospective learning-based space construction methods struggle to balance old and new knowledge, as prototype bias and rigid structures limit the expressive capacity of the embedding space. Different from these strategies, we rethink the optimization dilemma from the perspective of feature-structure dual consistency, and propose a Consistency-driven Calibration and Matching (ConCM) framework that systematically mitigates the knowledge conflict inherent in FSCIL. |
Qinzhe Wang; Zixuan Chen; Keke Huang; Xiu Su; Chunhua Yang; Chang Xu; | code |
| 854 | GeoFAR: Geography-Informed Frequency-Aware Super-Resolution for Climate Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To improve the fidelity of climate super-resolution (SR), we introduce GeoFAR: by explicitly encoding climatic patterns at different frequencies, while learning implicit geographical neural representations (i.e., related to location and elevation), our approach provides frequency-aware and geography-informed representations for climate SR, thereby reconstructing fine-grained climate information at high resolution. |
Chang Xu; Gencer Sumbul; Li Mi; Robin Zbinden; Devis Tuia; | code |
| 855 | Accelerating Benchmarking of Functional Connectivity Modeling Via Structure-aware Core-set Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To break this bottleneck, we reframe the challenge of FC benchmarking by selecting a small, representative *core-set* whose sole purpose is to preserve the relative performance ranking of FC operators. We formalize this as a ranking-preserving subset selection problem and propose **S**tructure-aware **C**ontrastive **L**earning for **C**ore-set **S**election (**SCLCS**), a self-supervised framework to select these core-sets. |
Ling Zhan; Zhen Li; Junjie Huang; Tao Jia; | code |
| 856 | MambaVoiceCloning: Efficient and Expressive Text-to-Speech Via State-Space Modeling and Diffusion Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: MambaVoiceCloning (MVC) asks whether the conditioning path of diffusion-based TTS can be made fully SSM-only at inference—removing all attention and explicit RNN-style recurrence … |
Sahil Kumar; Namrataben Patel; Honggang Wang; Youshan Zhang; | code |
| 857 | Lookup Multivariate Kolmogorov-Arnold Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce lookup multivariate Kolmogorov-Arnold Networks (lmKANs), which deliver a substantially better trade-off between capacity and inference cost. |
Sergey Pozdnyakov; Philippe Schwaller; | code |
| 858 | Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Their robustness is reduced when objects cannot be clearly visible in both states. To address these issues, in this paper, we present a novel framework, *Articulation in Motion (AiM)*. |
Hao Ai; Wenjie Chang; Jianbo Jiao; Ales Leonardis; Eyal Ofek; | code |
| 859 | Attack-Resistant Watermarking for AIGC Image Forensics Via Diffusion-based Semantic Deflection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PAI, a training-free inherent watermarking framework for AIGC copyright protection, plug-and-play with diffusion-based AIGC services. |
Qingyu Liu; Yitao Zhang; Zhongjie Ba; Chao Shuai; Peng Cheng; Tianhang Zheng; Zhibo Wang; | code |
| 860 | Gen2seg: Generative Models Enable Generalizable Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By pretraining to synthesize coherent images from perturbed inputs, generative models inherently learn to understand object boundaries and scene compositions. |
Om Khangaonkar; Hamed Pirsiavash; | code |
| 861 | From Data Statistics to Feature Geometry: How Correlations Shape Superposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Bag-of-Words Superposition (BOWS), a framework in which autoencoders (AEs) with a non-linearity are trained to compress sparse, binary bag-of-words vectors drawn from Internet-scale text. |
Lucas Prieto; Edward Stevinson; Melih Barsbey; Tolga Birdal; Pedro A. M. Mediano; | code |
| 862 | UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we address these challenges by introducing UniSS, a novel single-stage framework for expressive S2ST.Furthermore, we construct and release a large-scale, high-quality expressive S2ST dataset, UniST, comprising 44.8k hours of data. |
Sitong Cheng; Bianweizhen; Xinsheng Wang; Ruibin Yuan; Jianyi Chen; Shunshun Yin; Yike Guo; Wei Xue; | code |
| 863 | MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce MoRA, a parameter-efficient fine-tuning method that explicitly models cross-modal interactions while maintaining modality-specific adaptations. |
Shu Zhao; Nilesh Ahuja; Tan Yu; Tianyi Shen; Vijaykrishnan Narayanan; | code |
| 864 | JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms Might Be Unsafe Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose JailbreakLoRA, a multi-task jailbreak LoRA training method that balances task utility and attack capability, it resolves training interference by uncertainty-weighting losses and mitigating gradient conflicts. |
Fanjunduo Wei; Zhenheng Tang; Rongfei Zeng; Tongliang Liu; Chengqi Zhang; Xiaowen Chu; Bo Han; | code |
| 865 | When Agents “Misremember” Collectively: Exploring The Mandela Effect in LLM-based Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This vulnerability limits our understanding of memory bias in multi-agent systems and raises ethical concerns about the potential spread of misinformation. In this paper, we conduct a comprehensive study on the Mandela effect in LLM-based multi-agent systems, focusing on its existence, causing factors, and mitigation strategies. |
Naen Xu; Hengyu An; Shuo Shi; Jinghuai Zhang; Chunyi Zhou; Changjiang Li; Tianyu Du; Zhihui Fu; Jun Wang; Shouling Ji; | code |
| 866 | DefensiveKV: Taming The Fragility of KV Cache Eviction in LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, prior work has largely focused on refining importance indicators for scoring, while defaulting to mean aggregation due to a faithful trust in the stability assumption. In this work, we argue that this underlying assumption is inherently fragile, making mean aggregation highly vulnerable in extreme cases. |
Yuan Feng; Haoyu Guo; Junlin Lv; S Kevin Zhou; Xike Xie; | code |
| 867 | Automated Stateful Specialization for Adaptive Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce \textsc{ASpec}, a framework that manages this full agent lifecycle by first autonomously $\textbf{discovering}$ specialist archetypes via evolutionary search and then $\textbf{cultivating}$ their expertise through experience, mirroring how human experts learn through practice and reflection. |
Myan Vu; Harrish Ayyanar; PANG JIANG; Anwiketh Reddy; Mayank Goel; | code |
| 868 | NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this scheme incurs an optimization error that scales with the ratio of dataset size to batch size, limiting effectiveness for large datasets or small batches. To overcome this limitation, we propose NeuCLIP, a novel and elegant optimization framework based on two key ideas: (i) **reformulating** the contrastive loss for each sample **via convex analysis** into a minimization problem with an auxiliary variable representing its log-normalizer; and (ii) **transforming** the resulting minimization over $n$ auxiliary variables (where $n$ is the dataset size) via **variational analysis** into the minimization over a compact neural network that predicts the log-normalizers. |
Xiyuan Wei; Chih-Jen Lin; Tianbao Yang; | code |
| 869 | PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inconsistent data generation, disparate experimental settings, and divergent metrics have led to irreproducible findings and unsubstantiated performance claims. To address this foundational challenge, we introduce **PU-Bench**, the first unified open-source benchmark for PU learning. |
Qiuyi Chen; Haiyang Zhang; Leqi Zhang; Changchun Li; Jia Wang; Wei Wang; | code |
| 870 | Causally Robust Reward Learning from Reason-Augmented Preference Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ReCouPLe, a lightweight framework that uses natural language rationales to provide the missing causal signal. |
Minjune Hwang; Yigit Korkmaz; Daniel Seita; Erdem Biyik; | code |
| 871 | Reconstruct Anything Model A Lightweight General Model for Computational Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel non-iterative, lightweight architecture that incorporates knowledge about the forward operator (acquisition physics and noise parameters) without relying on unrolling. |
Matthieu Terris; Samuel Hurault; Maxime Song; Julián Tachella; | code |
| 872 | Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a new benchmark construction philosophy, Dual Scaling, designed to systematically address both limitations. |
Zhe Zhang; Runlin Liu; Aishan Liu; Xingyu Liu; Xiang Gao; Hailong Sun; | code |
| 873 | DragFlow: Unleashing DiT Priors with Region-Based Supervision for Drag Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, drag-based editing has yet to benefit from these stronger priors. This work introduces DragFlow, the first framework to effectively harness FLUX’s rich prior via region-based supervision, enabling full use of its finer-grained, spatially precise features for drag-based editing and achieving substantial improvements over existing baselines. |
Zihan Zhou; Shilin Lu; Shuli Leng; Shaocong Zhang; Zhuming Lian; Xinlei Yu; Adams Wai-Kin Kong; | code |
| 874 | Towards Understanding Valuable Preference Data for Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce a set of candidate scoring functions (SFs) that are computationally simpler than TIF and positively correlated with it. |
Zizhuo Zhang; Qizhou Wang; Shanshan Ye; Jianing Zhu; Jiangchao Yao; Bo Han; Masashi Sugiyama; | code |
| 875 | Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce USplat4D, a novel Uncertainty-aware dynamic Gaussian Splatting framework that propagates reliable motion cues to enhance 4D reconstruction. |
Fengzhi Guo; Chih-Chuan Hsu; Sihao Ding; Cheng Zhang; | code |
| 876 | Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, we also claim that the non-zero initialization of $A$ could potentially compromise self-stability. To address this issue, we propose Stable-LoRA, a weight-shrinkage optimization strategy that enhances stability of LoRA feature learning. |
Yize Wu; KE GAO; Ling Li; Yanjun Wu; | code |
| 877 | Fed-Duet: Dual Expert-Orchestrated Framework for Continual Federated Vision-Language Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Fed-Duet, a novel Dual Expert-orchestrated framework for efficient federated continual learning in vision-language models. |
Tao GUO; Junwei Chen; Laizhong Cui; | code |
| 878 | Decoupling The Class Label and The Target Concept in Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we expand the scope by considering the label domain mismatch and investigate three problems beyond the conventional all matched forgetting, e.g., target mismatch, model mismatch, and data mismatch forgetting. |
Jianing Zhu; Bo Han; Jiangchao Yao; Jianliang Xu; Gang Niu; Masashi Sugiyama; | code |
| 879 | Seeing Through The Brain: New Insights from Decoding Visual Stimuli with FMRI Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Second, text representations and the generative model should be adapted to capture the compositional nature of visual stimuli, including objects, their detailed attributes, and relationships. Building on these insights, we propose PRISM, a model that Projects fMRI sIgnals into a Structured text space as an interMediate representation for visual stimuli reconstruction. |
Zheng Huang; Enpei Zhang; Weikang Qiu; Yinghao Cai; Carl Yang; Elynn Chen; Xiang Zhang; Rex Ying; Dawei Zhou; Yujun Yan; | code |
| 880 | Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: On the other hand, latent dynamics modeling has been used in decision making to learn latent representations of observations and their transformations over time for control and planning tasks. In this work, we present Midway Network, a new self-supervised learning architecture that is the first to learn strong visual representations for both object recognition and motion understanding solely from natural videos, by extending latent dynamics modeling to this domain. |
Christopher Hoang; Mengye Ren; | code |
| 881 | SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Sensitive Region Identification, SeRI, the first decision-based method that assigns a continuous sensitivity score to each image pixel. |
Feiyang Wang; Xingquan Zuo; Hai Huang; Gang Chen; Hangwei Qian; | code |
| 882 | RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this issue, we propose RoboPARA, a novel large language model (LLM)-driven framework for dual-arm task parallelism planning.In addition, we introduce the Cross-Scenario Dual-Arm Parallel Task dataset (X-DAPT dataset), the first dataset specifically designed to evaluate dual-arm task parallelism across diverse scenarios and difficulty levels. |
Shiying Duan; Pei Ren; Nanxiang Jiang; Zhengping Che; Jian Tang; Zhaoxin Fan; Yifan Sun; wenjun wu; | code |
| 883 | APT: Towards Universal Scene Graph Generation Via Plug-in Adaptive Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These semantic priors, while beneficial in other domains, are inherently misaligned with the dynamic, context-sensitive nature of visual relationships, leading to biased and suboptimal performance. In this paper, we transcend the traditional one-stage v.s. two-stage architectural debate and identify this representational bottleneck as the core issue. |
Ruikun Luo; Changwei Gu; Jing Yang; Yuan Gao; Jieming Yang; Song Wu; Hai Jin; Xiaoyu Xia; | code |
| 884 | Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: EEG spatial super-resolution methods aim to recover high-density EEG signals from sparse measurements, yet is often hindered by distribution shift and signal distortion and thus reducing fidelity and usability for EEG analysis and visualization. To overcome these challenges, we introduce SRGDiff, a step-aware residual-guided diffusion model that formulates EEG spatial super-resolution as dynamic conditional generation. |
Hongjun Liu; Leyu Zhou; Zijianghao Yang; Chao Yao; | code |
| 885 | CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a Tree-RAG acceleration method based on the improved Cuckoo Filter, which optimizes entity localization during the retrieval process to achieve significant performance improvements. |
Zihang Li; Yangdong Ruan; Wenjun Liu; Zhengyang Wang; Tong Yang; | code |
| 886 | Trust But Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution Via Implicit Reference Correlation Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods either ignore LQ–Ref correlations or rely on brittle explicit matching, leading to over-reliance on misleading references or under-utilization of valuable cues. To address this, we propose Ada-RefSR, a single-step diffusion framework guided by a Trust but Verify principle: reference information is leveraged when reliable and suppressed otherwise. |
Yuan Wang; Yuhao Wan; Siming Zheng; Bo Li; Qibin Hou; Peng-Tao Jiang; | code |
| 887 | Web-CogReasoner: Towards Multimodal Knowledge-Induced Cognitive Reasoning for Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Therefore, we decompose a web agent’s capabilities into two essential stages: knowledge content learning and cognitive processes. To formalize this, we propose Web-CogKnowledge Framework, which categorizes knowledge into Factual, Conceptual, and Procedural domains. |
Yuhan Guo; Guocong; Aiwen Sun; Hongliang He; Xinyu Yang; Yue Lu; Yingji Zhang; Xuntao Guo; Dong Zhang; Jianzhuang Liu; Jiang Duan; Yijia Xiao; Liangjian Wen; Haiming Xu; Yong Dai; | code |
| 888 | GGBall: Graph Generative Model on Poincaré Ball Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we introduce GGBall, a novel hyperbolic framework for graph generation that integrates geometric inductive biases with modern generative paradigms. |
Tianci Bu; Chuanrui Wang; Hao Ma; Haoren Zheng; Xin Lu; Tailin Wu; | code |
| 889 | ScDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present **scDFM**, a generative framework based on conditional flow matching that models the full distribution of perturbed cells conditioned on control states. |
Chenglei Yu; Chuanrui Wang; Bangyan Liao; Tailin Wu; | code |
| 890 | Helmsman: Autonomous Synthesis of Federated Learning Systems Via Collaborative LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The need to select, combine, and tune strategies for multifaceted challenges like data heterogeneity and system constraints has become a critical bottleneck, resulting in brittle, bespoke solutions. To address this, we introduce Helmsman, a novel LLM-based multi-agent framework that automates the end-to-end synthesis of federated learning systems from high-level user specifications. |
Haoyuan Li; Mathias Funk; Aaqib Saeed; | code |
| 891 | A Recovery Guarantee for Sparse Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. |
Sara Fridovich-Keil; Mert Pilanci; | code |
| 892 | GNN Explanations That Do Not Explain and How to Find Them Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify a critical failure of SE-GNN explanations: *explanations can be unambiguously unrelated to how the SE-GNNs infer labels. |
Steve Azzolin; Stefano Teso; Bruno Lepri; Andrea Passerini; Sagar Malhotra; | code |
| 893 | Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs) Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present \textbf{RealUID}, a unified distillation framework for all matching models that seamlessly incorporates real data into the distillation procedure without GANs. |
Nikita Maksimovich Kornilov; David Li; Tikhon Mavrin; Aleksei Leonov; Nikita Gushchin; Evgeny Burnaev; Iaroslav Sergeevich Koshelev; Alexander Korotin; | code |
| 894 | Entering The Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent advances in discrete diffusion and flow models have sparked growing interest in applying SB methods to discrete domains, there is still no reliable way to evaluate how well these methods actually solve the underlying problem. We address this challenge by introducing a benchmark for SB on discrete spaces. |
Xavier Aramayo Carrasco; Grigoriy Ksenofontov; Aleksei Leonov; Iaroslav Sergeevich Koshelev; Alexander Korotin; | code |
| 895 | A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Importantly, the bounds we derive depend solely on some standard statistical and mathematical properties of the considered functional classes (neural nets). |
Roman Tarasov; Petr Mokrov; Milena Gazdieva; Evgeny Burnaev; Alexander Korotin; | code |
| 896 | FlowSearcher: Synthesizing Memory-Guided Agentic Workflows for Web Information Seeking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce $\textbf{FlowSearcher}$, a novel web search framework built on agentic workflow synthesis. |
Keyi Xiang; Zeyu Feng; Zhuoyi Lin; Yueming Lyu; Shi Boyuan; Yew-Soon Ong; Ivor Tsang; Haiyan Yin; | code |
| 897 | HDR-NSFF: High Dynamic Range Neural Scene Flow Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these limitations, we present HDR-NSFF, a novel framework for reconstructing dynamic HDR radiance fields from alternatively exposed monocular videos.To enable systematic evaluation, we construct a real-world GoPro dataset with synchronized multi-exposure captures. |
Shin Dong-Yeon; Kim Jun-Seong; Kwon Byung-Ki; Tae-Hyun Oh; | code |
| 898 | Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we analyze the knowledge distribution of ViT and reveal a knowledge disentanglement within it: neurons in the feed-forward network (FFN) modules encode class-specific knowledge, while the multi-head attention (MHA) modules capture class-agnostic patterns. Building on this insight, we introduce Vulcan, a pruning-oriented post-training method for deriving compact class-specific models from a pre-trained ViT under given resource budgets. |
Ziteng Wei; Qiang He; Feifei Chen; Ranjie Duan; Xiaodan Li; Bin Li; YueFeng Chen; Hui Xue; Hai Jin; Yun Yang; | code |
| 899 | ContextIF: Enhancing Instruction-Following Through Context Reward Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In-context learning (ICL) emerges as a promising alternative due to its strong generalization without modifying the model’s parameters, but its effectiveness is constrained by the reliance on high-quality, manually curated demonstration pools. To overcome this limitation, we propose ContextIF, a reinforcement learning (RL) framework for automatic context generation. |
Yule Zhong; Jiacheng Yao; Guoxiu He; | code |
| 900 | Incomplete Multi-View Multi-Label Classification Via Shared Codebook and Fused-Teacher Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods mainly rely on contrastive learning or information bottleneck theory to learn consistent representations under missing-view conditions, but relying solely on loss-based constraints limits the ability to capture stable and discriminative shared semantics. To address this issue, we introduce a more structured mechanism for consistent representation learning: we learn discrete consistent representations through a multi-view shared codebook and cross-view reconstruction, which naturally align different views within the limited shared codebook embeddings and reduce redundant features. |
Xu Yan; Jun Yin; Shiliang Sun; Minghua Wan; | code |
| 901 | CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods require known physical properties as supervision or inputs, and this dependence limits their applicability under unknown conditions. To explore this challenge, we introduce Cloth Dynamics Grounding (CDG), a novel scenario that involves unsupervised learning of cloth dynamics from sparse multi-view visual observations. |
Yu-Liang Zhan; Jian Li; Wenbing Huang; Yang Liu; Hao Sun; | code |
| 902 | SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies Via Velocity-Reparameterized Sequential Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Training expressive flow-based policies with off-policy reinforcement learning is notoriously unstable due to gradient pathologies in the multi-step action sampling process. We … |
Yixian Zhang; Shu’ang Yu; Tonghe Zhang; Mo Guang; Haojia Hui; Kaiwen Long; Yu Wang; Chao Yu; Wenbo Ding; | code |
| 903 | TRACE: Your Diffusion Model Is Secretly An Instance Edge Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present TRACE (TRAnsforming diffusion Cues to instance Edges), showing that text-to-image diffusion models secretly function as instance edge annotators. |
Sanghyun Jo; Ziseok Lee; Wooyeol Lee; Jonghyun Choi; Jaesik Park; Kyungsu Kim; | code |
| 904 | Plan Then Act: Bi-level CAD Command Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we propose PTA, a new bi-level CAD command sequence generation method. |
Qiangya Guo; Gang Dai; Zhuoman Liu; Shuangping Huang; Yunqing Hu; Huiyuan Zhang; Tianshui Chen; | code |
| 905 | USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs As Urban Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce USTBench, the first benchmark to evaluate LLMs’ spatiotemporal reasoning abilities as urban agents across four decomposed dimensions: spatiotemporal understanding, forecasting, planning, and reflection. |
Siqi Lai; Yansong Ning; Zirui Yuan; Zhixi Chen; Hao Liu; | code |
| 906 | Copy-Paste to Mitigate Large Language Model Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To elucidate CopyPasteLLM’s effectiveness, we propose the Context-Parameter Copying Capturing algorithm. |
Yongchao Long; Yingying Zhang; Xianbin Wen; Xian Wu; Yuxi Zhou; Shenda Hong; | code |
| 907 | PRISM: Festina Lente Proactivity—Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We formulate the problem as cost-sensitive selective intervention and present PRISM, a novel framework that couples a decision-theoretic gate with a dual-process reasoning architecture. |
Yuxuan Fu; Xiaoyu Tan; Teqi Hao; Chen Zhan; Xihe Qiu; | code |
| 908 | MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. |
Yilian Liu; Xiaojun Jia; Guoshun Nan; Jiuyang Lyu; Zhican Chen; Tao Guan; Shuyuan Luo; Zhongyi Zhai; Yang Liu; | code |
| 909 | Latent Speech-Text Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the Latent Speech-Text Transformer (LST), which makes pre-training speech-text models more data-efficient by dynamically and inexpensively aggregating speech tokens into latent speech patches.We will release our models, code, and the evaluation data to facilitate further research. |
Yen-Ju Lu; Yashesh Gaur; Wei Zhou; Benjamin Muller; Jesus Villalba; Najim Dehak; Luke Zettlemoyer; Gargi Ghosh; Mike Lewis; Srini Iyer; Duc Le; | code |
| 910 | BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This phenomenon can degrade model robustness and fairness, undermining the benefits of efficient adaptation. To address this, we introduce Bias-Alleviating Low-Rank Adaptation (BA-LoRA). |
Yupeng Chang; Yi Chang; Yuan Wu; | code |
| 911 | HiTeA: Hierarchical Temporal Alignment for Training-Free Long-Video Temporal Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose HiTeA (Hierarchical Temporal Alignment), a novel, training-free framework explicitly designed for long-video temporal grounding. |
Xinyi Xu; Hongsong Wang; Guo-Sen Xie; Caifeng Shan; Fang Zhao; | code |
| 912 | FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose \textbf{FlowBind}, an efficient framework for any-to-any generation. |
Yeonwoo Cha; Semin Kim; Jinhyeon Kwon; Seunghoon Hong; | code |
| 913 | Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a CT-MARL framework that uses physics-informed neural networks (PINNs) to approximate HJB-based value functions at scale. |
Xuefeng Wang; Lei Zhang; Henglin Pu; Ahmed H Qureshi; Husheng Li; | code |
| 914 | EMFuse: Energy-based Model Fusion for Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the fusion of models specifically adapted for decision-making tasks. |
Kejie He; Yi-Chen Li; Yang Yu; | code |
| 915 | TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this problem, we introduce TPRU, a large-scale dataset sourced from diverse embodied scenarios such as robotic manipulation and GUI navigation.We will release our dataset and models to the community. |
Zhenkun Gao; Xuhong Wang; Xin Tan; Yuan Xie; | code |
| 916 | The Spacetime of Diffusion Models: An Information Geometry Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a novel geometric perspective on the latent space of diffusion models. |
Rafal Karczewski; Markus Heinonen; Alison Pouplin; Søren Hauberg; Vikas K Garg; | code |
| 917 | DRBench: A Realistic Benchmark for Enterprise Deep Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce DRBench, a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings. |
Amirhossein Abaskohi; Tianyi Chen; Miguel Muñoz-Mármol; Curtis Fox; Amrutha Varshini Ramesh; Étienne Marcotte; Xing Han Lù; Nicolas Chapados; Spandana Gella; Christopher Pal; Alexandre Drouin; Issam H. Laradji; | code |
| 918 | LogART: Pushing The Limit of Efficient Logarithmic Post-Training Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces learnable Logarithmic Adaptive Rounding Techniques (LogART) that pioneer task-aware learnable rounding specifically for the logarithmic domain. |
Jiawei Xu; Yi Zheng; Chenghe Sun; Taiyu Zhou; Zuqi Zhang; Jie Li; Lirong Zheng; Zhuo Zou; | code |
| 919 | Low-Rank Few-Shot Node Classification By Node-Level Graph Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel node-level graph diffusion method with low-rank feature learning for few-shot node classification (FSNC), termed Low-Rank Few-Shot Graph Diffusion Model or LR-FGDM. |
Yancheng Wang; Chengshuai Zhao; Dongfang Sun; huan liu; Yingzhen Yang; | code |
| 920 | MoRA: Mobility As The Backbone for Geospatial Representation Learning at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MoRA, a human-centric geospatial framework that leverages a mobility graph as its core backbone to fuse various data modalities, aiming to learn embeddings that represent the socio-economic context and functional role of a location.To rigorously evaluate the effectiveness of MoRA, we construct a benchmark dataset composed of 9 downstream prediction tasks across social and economic domains. |
Ya Wen; Jixuan Cai; Qiyao Ma; Linyan Li; Xinhuan Chen; Chris Webster; Yulun Zhou; | code |
| 921 | APPLE: Toward General Active Perception Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) – a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. |
Tim Schneider; Cristiana de Farias; Roberto Calandra; Liming Chen; Jan Peters; | code |
| 922 | Explain in Your Own Words: Improving Reasoning Via Token-Selective Dual Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Token-Selective Dual Knowledge Distillation (TSD-KD), a framework for student-centric distillation. |
Minsang Kim; Seung Jun Baek; | code |
| 923 | Arbitrary-Shaped Image Generation Via Spherical Neural Field Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing diffusion models excel at generating diverse content, but remain confined to fixed image shapes and lack the ability to flexibly control spatial attributes such as viewpoint, field-of-view (FOV), and resolution. To fill this gap, we propose Arbitrary-Shaped Image Generation (ASIG), the first generative framework that enables precise spatial attribute control while supporting high-quality synthesis across diverse image shapes (e.g., perspective, panoramic, and fisheye). |
Jiyuan Xia; Yuanshen Guan; Ruikang Xu; Zhiwei Xiong; | code |
| 924 | TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value‑function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. |
Geonwoo Cho; Jaegyun Im; Jihwan Lee; Hojun Yi; Sejin Kim; Sundong Kim; | code |
| 925 | AMPED: Adaptive Multi-objective Projection for Balancing Exploration and Skill Diversification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a new method, Adaptive Multi-objective Projection for balancing Exploration and skill Diversification (AMPED), which explicitly addresses both: during pre-training, a gradient-surgery projection balances the exploration and diversity gradients, and during fine-tuning, a skill selector exploits the learned diversity by choosing skills suited to downstream tasks. |
Geonwoo Cho; Jaemoon Lee; Jaegyun Im; Subi Lee; Jihwan Lee; Sundong Kim; | code |
| 926 | Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent approaches like EAGLE-2 and EAGLE-3 improve speculative decoding using dynamic tree structures, they often neglect the impact of crucial system variables such as GPU devices and batch sizes. Therefore, we introduce a new dynamic tree decoding approach called CAST that takes into account inference costs, including factors such as GPU configurations and batch sizes, to dynamically refine the tree structure. |
Yinrong Hong; Zhiquan Tan; Kai Hu; | code |
| 927 | Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To overcome the challenges, we propose the Recursive Likelihood Ratio (RLR) optimizer, a Half-Order (HO) fine-tuning paradigm for DM. |
Tao Ren; Zishi Zhang; Jinyang Jiang; Zehao Li; Shentao Qin; Yi Zheng; Guanghao Li; Qianyou Sun; Yan Li; Jiafeng Liang; Xinping Li; Yijie Peng; | code |
| 928 | RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we introduce a Mixed Value-at-Risk objective that integrates weighted attention over multiple regions of the reward distribution, thereby amplifying gradient signals on challenging instances and preventing overconfident convergence. |
Tao Ren; Jinyang Jiang; Hui Yang; Wan Tian; Minhao Zou; Guanghao Li; Zishi Zhang; Qinghao Wang; Shentao Qin; Yanjun Zhao; Rui Tao; Hui Shao; Yijie Peng; | code |
| 929 | Content-Aware Mamba for Learned Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This rigidity hinders its ability to effectively eliminate redundancy between tokens that are content-correlated but spatially distant. We introduce Content-Aware Mamba (CAM), an SSM that dynamically adapts its processing to the image content. |
Yunuo Chen; Zezheng Lyu; Bing He; Hongwei Hu; Qi Wang; Yuan Tian; Li Song; Wenjun Zhang; Guo Lu; | code |
| 930 | Graph Tokenization for Bridging Graphs and Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a graph tokenization framework that generates sequential representations of graphs by combining reversible graph serialization, which preserves graph information, with Byte Pair Encoding (BPE), a widely adopted tokenizer in large language models (LLMs). |
Zeyuan Guo; Enmao Diao; Cheng Yang; Chuan Shi; | code |
| 931 | RFEval: Benchmarking Reasoning Faithfulness Under Counterfactual Reasoning Intervention in Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a formal framework for *reasoning faithfulness*, defined by two testable conditions: *stance consistency* (a coherent stance linking reasoning to answer) and *causal influence* (the stated reasoning causally drives the answer under output-level interventions), explicitly decoupled from accuracy. |
Yunseok Han; Yejoon Lee; Jaeyoung Do; | code |
| 932 | THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we make two key contributions. First, we introduce the self-re-watermarking threat model as a novel attack vector and demonstrate that existing state-of-the-art watermarking methods consistently fail under such attacks. |
Vithurabiman Senthuran; Yong Xiang; Iynkaran Natgunanathan; Uthayasanker Thayasivam; | code |
| 933 | Efficient Adversarial Attacks on High-dimensional Offline Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the adversarial robustness of offline bandit evaluation remains largely unexplored, particularly when an attacker perturbs the reward model (rather than the training data) prior to bandit training. In this work, we fill this gap by investigating, both theoretically and empirically, the vulnerability of offline bandit training to adversarial manipulations of the reward model. |
Seyed Mohammad Hadi Hosseini; Amir Najafi; Mahdieh Soleymani Baghshah; | code |
| 934 | SUSD: Structured Unsupervised Skill Discovery Through State Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce SUSD, a novel framework that harnesses the compositional structure of environments by factorizing the state space into independent components (e.g., objects or controllable entities). |
Seyed Mohammad Hadi Hosseini; Mahdieh Soleymani Baghshah; | code |
| 935 | MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Masked autoencoder based methods have garnered significant attention, yet their application to volumetric medical image faces fundamental limitations from the discrete voxel-level reconstruction objective, which neglects comprehensive anatomical structure continuity. To address this challenge, We propose MedGMAE, a novel framework that replaces traditional voxel reconstruction with 3D Gaussian primitives reconstruction as new perspectives on representation learning. |
Xueming Fu; Fenghe Tang; Rongsheng Wang; Yingtai Li; Lixia Han; Jian Lu; Zihang Jiang; S Kevin Zhou; | code |
| 936 | When LLMs Get Significantly Worse: A Statistical Approach to Detect Model Degradations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a statistically sound hypothesis testing framework based on McNemar’s test allowing to efficiently detect model degradations, while guaranteeing a controlled rate of false positives. |
Jonas M. Kübler; Kailash Budhathoki; Matthäus Kleindessner; Xiong Zhou; Junming Yin; Ashish Khetan; George Karypis; | code |
| 937 | NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: NC-Bench provides 925 curated RNA sequences with 6,708 high-quality NC annotations, fine-grained edge and orientation classification tasks, and IsoScore-based embedding evaluation, offering a rigorous foundation for systematic assessment. Building on this, we propose NCfold, a dual-branch framework that couples sequence features with structural priors derived from RNA foundation models (RFMs) via Representative Embedding Fusion (REF) and REF-weighted self-attention. |
Heqin Zhu; Ruifeng Li; Ao Chang; Mingqian Li; Hongyang Chen; Peng Xiong; S Kevin Zhou; | code |
| 938 | Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in The Internal Circuitry of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we draw inspiration from prior work on edge attribution patching (EAP) to investigate the internal differences of LLMs before and after RL fine-tuning. |
Honglin Zhang; Qianyue Hao; Fengli Xu; Yong Li; | code |
| 939 | Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by biological neural mechanisms, we propose the Unified Neural Topological Foundation Model (Uni-NTFM), an architecture rooted in three core neuroscience principles. |
Zhisheng Chen; Yingwei Zhang; Qizhen Lan; Tianyu Liu; Huacan Wang; Yi Ding; Ziyu Jia; Ronghao Chen; Kun Wang; Xinliang Zhou; | code |
| 940 | Practical Estimation of The Optimal Classification Error with Soft Labels and Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the performance of machine learning systems has experienced significant improvement in recent years, relatively little attention has been paid to the fundamental question: to what extent can we improve our models? This paper provides a means of answering this question in the setting of binary classification, which is practical and theoretically supported. |
Ryota Ushio; Takashi Ishida; Masashi Sugiyama; | code |
| 941 | Advancing Spatiotemporal Representations in Spiking Neural Networks Via Parametric Invertible Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing SNNs suffer from two fundamental limitations: (1) the constrained representational space imposed by binary spike firing mechanisms, which restricts the network’s capacity to encode complex spatiotemporal patterns, and (2) the ineffective design of surrogate gradient functions that leads to gradient mismatch issues and suboptimal learning dynamics. To address these challenges, we propose the Parametric Invertible Transformation (PIT), which operates in a conjugate manner with neuronal dynamics to achieve adaptive modulation and augmented spike representations simultaneously. |
Yinsong Yan; Yujie Wu; Jibin Wu; | code |
| 942 | CircuitSense: A Hierarchical MLLM Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a hierarchical synthetic generation pipeline consisting of a grid-based schematic generator and a block diagram generator with auto-derived symbolic equation labels. |
Arman Akbari; Jian Gao; Yifei Zou; Mei Yang; Jinru Duan; Dmitrii Torbunov; Yanzhi Wang; Yihui Ren; Xuan Zhang; | code |
| 943 | Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose OrbEvo, which is based on an equivariant graph transformer architecture and learns to evolve the full electronic wavefunction coefficients across time steps.To evaluate our approach, we generate TDDFT datasets consisting of 5,000 different molecules in the QM9 dataset and 1,500 molecular configurations of the malonaldehyde molecule in the MD17 dataset. |
Xuan Zhang; Haiyang Yu; Chengdong Wang; Jacob Helwig; Shuiwang Ji; Xiaofeng Qian; | code |
| 944 | Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While numerous evaluation studies have emerged, assessing LVLMs both holistically and on specialized tasks, fine-grained image tasks—fundamental to computer vision—remain largely unexplored. To fill this gap, we introduce a comprehensive fine-grained evaluation benchmark, i.e., FG-BMK, comprising 1.01 million questions and 0.28 million images. |
Hongtao Yu; Yuxin Peng; Serge Belongie; Xiu-Shen Wei; | code |
| 945 | Beyond Uniformity: Regularizing Implicit Neural Representations Through A Lipschitz Lens Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we reframe Lipschitz regularization as a flexible *Lipschitz budget framework*. |
Julian McGinnis; Suprosanna Shit; Florian A. Hölzl; Paul Friedrich; Paul Büschl; Vasiliki Sideri-Lampretsa; Mark Mühlau; Philippe C. Cattin; Bjoern Menze; Daniel Rueckert; Benedikt Wiestler; | code |
| 946 | KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes KnowledgeSmith, a unified framework to systematically understand the updating mechanism of LLMs. |
Yinyi Luo; Zhexian Zhou; Hao Chen; Kai Qiu; Marios Savvides; Sharon Li; Jindong Wang; | code |
| 947 | Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To mitigate more complicated inductive bias of reward modeling, inspired by the information bottleneck, we introduce a novel information-theoretic debiasing method called **D**ebiasing via **I**nformation optimization for **R**M (DIR). |
Zhuo Li; Pengyu Cheng; Zhechao Yu; FeifeiTong; Anningzhe Gao; Tsung-Hui Chang; Xiang Wan; erchao.zec; xiaoxi jiang; guanjunjiang; | code |
| 948 | SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. |
Xinjie zhu; Zijing Zhao; Hui Jin; Qingxiao Guo; Yilong Ma; YUNHAO WANG; Xiaobing Guo; Weifeng Zhang; | code |
| 949 | MergePRAG: Orthogonal Merging of Passage-experts for Multi-hop Parametric RAG Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose **MergePRAG**(*Orthogonal Merging of Passage-experts for Multi-hop PRAG*), a novel framework that sequentially integrates retrieved passages into LLM parameters through a continual merging mechanism, which is advanced by two key proposals: (1) **orthogonal merging** using the Gram–Schmidt process to minimize conflicts between passage experts, and (2) **critical-layer parameterization** to efficiently encode in-context passages. |
Xuebing Liu; Shanbao Qiao; Roseline Nyange; Dongwook Min; Hyun Kim; Seung-Hoon Na; | code |
| 950 | Efficient Resource-Constrained Training of Transformers Via Subspace Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although prior work has concentrated on compact convolutional architectures, we instead apply subspace-based training to transformer models. Motivated by the idea that a model’s essential information lies in a fixed subspace, we introduce Weight-Activation Subspace Iteration (WASI), a method that mitigates the memory bottleneck of backpropagation and boosts inference efficiency in transformer models by restricting training to this subspace. |
Le-Trung Nguyen; Enzo Tartaglione; Van-Tam Nguyen; | code |
| 951 | KeepLoRA: Continual Learning with Residual Gradient Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents a simple but effective approach called KeepLoRA to effectively balance these objectives. |
Mao-Lin Luo; Zi-Hao Zhou; Yi-Lin Zhang; Yuanyu Wan; Min-Ling Zhang; Tong Wei; | code |
| 952 | Block-wise Adaptive Caching for Accelerating Diffusion Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose **B**lock-wise **A**daptive **C**aching (**BAC**), a method to accelerate Diffusion Policy by caching intermediate action features. |
Kangye Ji; Yuan Meng; Hanyun Cui; Ye Li; Jianbo Zhou; Shengjia Hua; Lei Chen; Zhi Wang; | code |
| 953 | Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language Via Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To tackle these challenges, we propose a robust adversarial training framework named \textbf{T}ranslation-based \textbf{A}ttacker \textbf{S}trengthens Mul\textbf{T}ilingual Def\textbf{E}nder (\detectorname).We will release our code, models, and dataset upon acceptance. |
Yuanfan Li; Qi Zhou; Zexuan Xie; | code |
| 954 | Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we reveal that DLMs have a critical vulnerability stemming from their iterative denoising process and propose a countermeasure. |
Shojiro Yamabe; Jun Sakuma; | code |
| 955 | EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods suffer from (1) inefficient and coarse-grained optimization with (2) high memory consumption. In this work, we first theoretically and empirically identify the \emph{key reason} of these limitations: the recursive dependence between different steps in the denoising trajectory. |
Xiaofeng Tan; Wanjiang Weng; Haodong Lei; Hongsong Wang; | code |
| 956 | LookaheadKV: Fast and Accurate KV Cache Eviction By Glimpsing Into The Future Without Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose LookaheadKV, a lightweight eviction framework that leverages the strength of surrogate future response without the need for costly draft generation. |
Jinwoo Ahn; Ingyu Seong; Akhil Kedia; Junhan Kim; Hyemi Jang; Kangwook Lee; Yongkweon Jeon; | code |
| 957 | DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these challenges, Lie groups provide a principled way to represent continuous geometric transformations, making them well-suited for enforcing spatial and temporal consistency in video modeling. Building on this insight, we propose DeLiVR, an efficient video deraining method that injects spatiotemporal Lie-group differential biases directly into attention scores of the network. |
Shuning Sun; Jialang Lu; Xiang Chen; Jichao Wang; Dianjie Lu; Guijuan Zhang; Guangwei Gao; Zhuoran Zheng; | code |
| 958 | From Fields to Random Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study introduces a novel method for performing Maximum A Posteriori (MAP) estimation on Markov Random Fields (MRFs) that are defined on locally and sparsely connected graphs, broadly existing in real-world applications. |
Yaomin Wang; Xiaodong Luo; Tianshu Yu; | code |
| 959 | Cannistraci-Hebb Training on Ultra-Sparse Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation presents a critical challenge: how to achieve high levels of structural connection sparsity while maintaining performance comparable to fully connected networks. To address this challenge, we propose the Cannistraci-Hebb Spiking Neural Network (CH-SNN), a novel and generalizable dynamic sparse training framework for SNNs consisting of four stages. |
Yuan Hua; Jilin Zhang; Yingtao Zhang; Leyi You; Baobo Xiong; Carlo Vittorio Cannistraci; Hong Chen; | code |
| 960 | Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Under sparse preference data and with overly expressive decoders, VPL may cause latent variables to be ignored, reverting to a single-reward model. To overcome this limitation, we propose Swap-guided Preference Learning (SPL). |
Gihoon Kim; Euntai Kim; | code |
| 961 | MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present \textbf{MCP-SafetyBench}, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains—browser automation, financial analysis, location navigation, repository management, and web search. |
Xuanjun Zong; Zhiqi Shen; Lei Wang; Yunshi Lan; Chao Yang; | code |
| 962 | Point-Focused Attention Meets Context-Scan State Space: Robust Biological Visual Perception for Point Cloud Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Synergistically capturing intricate local structures and global contextual dependencies has become a critical challenge in point cloud representation learning. To address this, we introduce PointLearner, a point cloud representation learning network that closely aligns with biological vision which employs an active, foveation-inspired processing strategy, thus enabling local geometric modeling and long-range dependency interactions simultaneously. |
Kanglin Qu; Pan Gao; Qun Dai; Yuanhao Sun; | code |
| 963 | DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this challenge, we formulate the backward pass of deterministic attention as a scheduling problem on a Directed Acyclic Graph (DAG) and derive schedules that minimize the critical path length. Building on this formulation, we present DASH (Deterministic Attention Scheduling for High-Throughput), which encapsulates two complementary scheduling strategies: (i) Descending Q‑Tile Iteration, a reversed query‑block traversal that shrinks pipeline stalls in causal attention, and (ii) Shift Scheduling, a theoretically optimal schedule within our DAG model that reduces pipeline stalls for both full and causal masks. |
Xinwei Qiang; Hongmin chen; Shixuan Sun; Jingwen Leng; Xin Liu; Minyi Guo; | code |
| 964 | Context Learning for Multi-Agent Discussion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a multi-LLM context learning method (M2CL) that learns a context generator for each agent, capable of dynamically generating context instructions per discussion round via automatic information organization and refinement. |
Xingyuan Hua; Sheng Yue; Xinyi Li; Yizhe Zhao; Jinrui Zhang; Ju Ren; | code |
| 965 | InclusiveVidPose: Bridging The Pose Estimation Gap for Individuals with Limb Deficiencies in Video-Based Motion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We collect 313 videos, totaling 327k frames, and covering nearly 400 individuals with amputations, congenital limb differences, and prosthetic limbs.To bridge this gap, we introduce InclusiveVidPose Dataset, the first video-based large-scale HPE dataset specific for individuals with limb deficiencies.We also provide a rigorous benchmark for evaluating inclusive and robust pose estimation algorithms, demonstrating that our dataset poses significant challenges. |
Heming Du; Jiaying Ying; Sen Wang; Xue Li; Kaihao Zhang; Xin Yu; | code |
| 966 | ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing benchmarks primarily operate on a slot-filling paradigm, restricting agents to synthetic queries with pre-defined constraint menus, which fails to capture the open-ended nature of natural language interaction, where user requirements are compositional, diverse, and often implicitly expressed. To address this gap, we introduce \emph{ChinaTravel}, with four key contributions: 1) a practical sandbox aligned with the multi-day, multi-POI travel planning, 2) a compositionally generalizable domain-specific language (DSL) for scalable evaluation, covering feasibility, constraint satisfaction, and preference comparison 3) an open-ended dataset that integrates diverse travel requirements and implicit intent from 1154 human participants, and 4) fine-grained analysis reveal the potential of neuro-symbolic agents in travel planning, achieving a 37.0\% constraint satisfaction rate on human queries, a 10$\times$ improvement over purely neural models, \blue{yet highlighting significant challenges in compositional generalization}. |
Jie-Jing Shao; Bo-Wen Zhang; Xiao-Wen Yang; Baizhi Chen; Siyu Han; Pang Jinghao; Wen-Da Wei; Guohao Cai; Zhenhua Dong; Lan-Zhe Guo; Yu-Feng Li; | code |
| 967 | Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we present the first theoretical analysis suggesting that adversarially pretrained transformers can serve as universally robust foundation models, models that can robustly adapt to diverse downstream tasks with only lightweight tuning. |
Soichiro Kumano; Hiroshi Kera; Toshihiko Yamasaki; | code |
| 968 | RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present Rotary Ray Embedding (RoRE), an approach that embeds image patches directly as rays, using a learning based rotary positional embedding (RoPE). |
Ryan Griffiths; Donald G. Dansereau; | code |
| 969 | Follow-Your-Preference: Towards Preference-Aligned Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates image inpainting with preference alignment. Instead of introducing a novel method, we go back to basics and revisit fundamental problems in achieving such alignment. |
Yutao Shen; Junkun Yuan; Toru Aonishi; Hideki Nakayama; Jack Ma; | code |
| 970 | Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce \textbf{Fisale}, a data-driven framework for handling complex two-way \textbf{FSI} problems. |
Shilong Tao; Zhe Feng; Shaohan Chen; Weichen Zhang; Zhanxing Zhu; Yunhuai Liu; | code |
| 971 | GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, prior work mainly focuses on discriminative tasks on DyTAGs, resulting in a lack of standardized task formulations and evaluation protocols tailored for DyTAG generation. To address these critical issues, we propose \underline{G}enerative \underline{D}yTA\underline{G} \underline{B}enchmark (GDGB), which comprises eight meticulously curated DyTAG datasets with high-quality textual features for both nodes and edges, overcoming limitations of prior datasets. |
Jie Peng; Jiarui Ji; Runlin Lei; Zhewei Wei; Yongchao Liu; Chuntao Hong; | code |
| 972 | MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While effective, these methods often overlook the multi-scale temporal structure inherent in trajectories, resulting in suboptimal performance. To overcome these limitations, we propose MAGE, a Multi-scale Autoregressive GEneration-based offline RL method. |
Chenxing Lin; Xinhui Gao; Haipeng Zhang; Xinran Li; Haitao Wang; Songzhu Mei; Chenglu Wen; Weiquan Liu; Siqi Shen; Cheng Wang; | code |
| 973 | FARTrack: Fast Autoregressive Visual Tracking with High Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, high-performance trackers often suffer from slow processing speeds, making them impractical for deployment on resource-constrained devices. To alleviate this issue, we propose $\textbf{FARTrack}$, a $\textbf{F}$ast $\textbf{A}$uto-$\textbf{R}$egressive $\textbf{T}$racking framework. |
Guijie Wang; Tong Lin; Yifan Bai; Anjia Cao; Shiyi Liang; Wangbo Zhao; Xing Wei; | code |
| 974 | Distribution-Aware Multi-Granularity Phase Coding: Towards Lower Conversion Error for Spike-Driven Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing conversion frameworks neglect activation distributions, as reflected in SNN neurons with rate or temporal coding to map uniformly distributed rather than distribution-aligned discrete values, thus causing latent conversion error arising from distribution misalignment. To tackle this problem, we propose a distribution-aware multi-granularity phase coding approach, which achieves reasonable discrete value allocation by minimizing conversion error relative to activation distributions. |
Hanyuan Zheng; Haozhen Zhang; Tianshuo Chen; Zhaogeng Liu; Yi Chang; Bin Gu; | code |
| 975 | Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Instead of breeding unceasingly new models for diverse domains, this paper proposes a novel framework, time-series adaptive transformation optimization (TATO), that enables a frozen pre-trained LTM to fit various downstream domains through an empirically optimal time-series transformation pipeline. |
Yunzhong Qiu; Zhiyao Cen; Zhongyi Pei; Chen Wang; Jianmin Wang; | code |
| 976 | From Pixels to Semantics: Unified Facial Action Representation Learning for Micro-Expression Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose D-FACE, a Discrete Facial ACtion Encoding framework that leverages large-scale facial video data to pretrain an identity- and domain-invariant facial action tokenizer, for MER. |
Yicheng Deng; Hideaki Hayashi; Hajime Nagahara; | code |
| 977 | DeRaDiff: Denoising Time Realignment of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce _DeRaDiff_, a _denoising-time realignment_ procedure that, after aligning a pretrained model once, modulates the regularization strength _during sampling_ to emulate models trained at other regularization strengths—_without any additional training or fine-tuning_. |
Ratnavibusena Don Shahain Manujith; Teoh Tze Tzun; Kenji Kawaguchi; Yang Zhang; | code |
| 978 | Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we embed domain-shift dynamics directly into the generative process. |
Zihao WANG; Yuzhou Chen; Shaogang Ren; | code |
| 979 | Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To validate our findings, we introduce a minimal modification to the flash attention that mitigates the bias in rounding errors. |
Haiquan Qiu; Quanming Yao; | code |
| 980 | Robust Adversarial Attacks Against Unknown Disturbance Via Inverse Gradient Sample Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel and robust attack called IGSA (**I**nverse **G**radient **S**ample-based **A**ttack), capable of generating adversarial examples that remain effective under diverse unknown disturbances. |
Zhaoyang Zhang; Shen Wang; Runze Liu; Guopu Zhu; Fanghui Sun; Ye Lu; Zeyue Wang; Yihan Yan; | code |
| 981 | Children’s Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the Wechsler Intelligence Scales, we introduce KidGym, a comprehensive 2D grid-based benchmark for assessing five essential capabilities of MLLMs: execution, perception reasoning, learning, memory, and planning.We release our benchmark at: https://kidgym.github.io/KidGym-Website/. |
Hengwei Ye; Yuanting Guan; Yuxuan Ge; Tianying Zhu; Yijia Zhong; YiJing Zhang; Han Zhang; Yingna Wu; Zheng Tian; | code |
| 982 | InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unlike human–object interaction (HOI) and human–scene interaction (HSI), HOSI generation requires reasoning over dynamic object–scene changes, yet suffers from limited annotated data. To address these issues, we propose a coarse‑to‑fine instruction‑conditioned interaction generation framework that is explicitly aligned with the iterative denoising process of a consistency model. |
Yude Zou; Junji Gong; Xing Gao; Zixuan Li; Tianxing Chen; Guanjie Zheng; | code |
| 983 | Shortcut Diffusion Training with Cumulative Consistency Loss: An Optimal Control View Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we formulate few-step generation as a controlled base generative process, and show that self-consistency loss can be understood through the lens of optimal control. |
Paribesh Regmi; Sandesh Ghimire; Rui Li; | code |
| 984 | NAB: Neural Adaptive Binning for Sparse-View CT Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by the observation that numerous industrial objects exhibit rectangular structures, we propose a novel \textbf{N}eural \textbf{A}daptive \textbf{B}inning (\textbf{NAB}) method that effectively integrates rectangular priors into the reconstruction process. |
Wangduo Xie; Matthew B. Blaschko; | code |
| 985 | Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge the gap between our goal and theoretical support, we propose Rejuvenated Cross-Entropy for Knowledge Distillation (RCE-KD). |
Zhangchi Zhu; Wei Zhang; | code |
| 986 | Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation can be attributed to a fundamental cause: neglecting global evolution inherently overlooks the temporal regularities encoded in continuous dynamics. To address this, we propose the **T**emporal **G**raph **T**humbnail (**TGT**), encapsulating a temporal graph’s global evolutionary skeleton as a thumbnail to characterize temporal regularities and enhance model robustness. |
Weining Shi; Zhisen Wen; Qinggang Zhang; Chentao Zhang; Zhihong Zhang; | code |
| 987 | SUIT: Knowledge Editing with Subspace-Aware Key-Value Mappings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods without any constraints on the key and value vectors cause significant perturbations to the edited model. To address this, we propose Subspace Knowledge Edit (SUIT), a method that identifies and modifies only the subspace of critical features relevant to the edit. |
Haewon Park; Sangwoo Kim; Yohan Jo; | code |
| 988 | Learning from Historical Activations in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This gap is particularly pronounced in cases where a node’s representation can shift significantly over the course of many graph neural layers, and worsen by graph-specific challenges such as over-smoothing in deep architectures. To bridge this gap, we introduce HistoGraph, a novel two‑stage attention‑based final aggregation layer that first applies a unified layer-wise attention over intermediate activations, followed by node-wise attention. |
Yaniv Galron; Hadar Sinai; Haggai Maron; Moshe Eliasof; | code |
| 989 | Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present **PG-Occ**, an innovative **P**rogressive **G**aussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. |
Chi Yan; Dan Xu; | code |
| 990 | Detection of Unknown Unknowns in Autonomous Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, existing multi-variate time series anomaly detection (MTAD) methods—which rely on distribution-shift cues—are ill-suited for U2 detection. Specifically: (i) we show most anomaly datasets exhibit distribution shift between normal and anomalous data and therefore are not representative of U2s; (ii) we introduce eight U2 benchmarks where training data contain OOD anomalies but no U2s, while test sets contain both OOD anomalies and U2s; (iii) we demonstrate that state-of-the-art (SOTA) MTAD results often depend on impractical enhancements: point adjustment (PA) (uses ground truth to flip false negatives to true positives, inflating precision) and threshold learning with data leakage (TL) (tuning thresholds on test data and labels); (iv) with PA+TL, even untrained deterministic methods can match or surpass MTAD baselines; (v) without PA/TL, existing MTAD methods degrade sharply on U2 benchmarks. |
Ayan Banerjee; Sandeep Gupta; | code |
| 991 | Large Depth Completion Model from Sparse Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work presents the Large Depth Completion Model (LDCM), a simple, effective, and robust framework for single-view metric depth estimation with sparse observations. |
Zhu Yu; zhengyi zhao; Runmin Zhang; Lingteng Qiu; Kejie Qiu; Yisheng He; Siyu Zhu; Zilong Dong; Si-Yuan Cao; Hui-liang Shen; | code |
| 992 | Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenge, we propose the first inference-time soup strategy, named Sample Reward Soups (SRSoup), for Pareto-optimal sampling across the entire space of preferences. |
Yinghua Yao; Yuangang Pan; Guoji Fu; Ivor Tsang; | code |
| 993 | ReVeal: Self-Evolving Code Agents Via Reliable Self-Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ReVeal, a multi-turn Reinforcement learning framework that evolves code generation through self-Verification and tool-based evaluation. |
Yiyang Jin; Kunzhao Xu; Hang Li; Xueting Han; Yanmin Zhou; Cheng Li; Jing Bai; | code |
| 994 | Antithetic Noise in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To explain it, we combine experiments and theory and propose a \textit{symmetry conjecture} that the learned score function is approximately affine antisymmetric (odd symmetry up to a constant shift), supported by empirical evidence. |
Jing Jia; Sifan Liu; Bowen Song; Wei Yuan; Liyue Shen; Guanyang Wang; | code |
| 995 | CheckMate! Watermarking Graph Diffusion Models in Polynomial Time Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose CheckWate: the first watermarking framework for graph diffusion models embedding checkerboard watermark and providing polynomial time verification. |
Roberto Gheda; Abele Mălan; Robert Birke; Maksim Kitsak; Lydia Y. Chen; | code |
| 996 | Scaling Sequence-to-Sequence Generative Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Kaleido, a family of generative models designed for photorealistic, unified object- and scene-level neural rendering. |
Shikun Liu; Kam Woh Ng; Wonbong Jang; Jiadong Guo; Junlin Han; Haozhe Liu; Yiannis Douratsos; Juan Camilo Perez; Zijian Zhou; Khanh Chi Phung; Tao Xiang; Juan-Manuel Perez-Rua; | code |
| 997 | Optimal Transport Unlocks End-to-end Learning for Single-molecule Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this presentation, we reformulate the SMLM training objective as a set‑matching problem, deriving an optimal‑transport loss that eliminates the need for NMS during inference and enables end‑to‑end training. |
Romain Seailles; Jean Baptiste Masson; Jean Ponce; Julien Mairal; | code |
| 998 | ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Sparse crime records alone are insufficient to capture latent high-order patterns shaped by heterogeneous contextual factors with spatial and criminal specificity, while high non-stationarity renders conventional offline models ineffective against concept drift. To tackle these challenges, we propose a Spatio-Temporal Hierarchical Hypergraph Online Learning framework named ST-HHOL. |
Keqing Du; Yufan Kang; Xinyu Yang; Wei Shao; | code |
| 999 | SpatiaLab: Can Vision–Language Models Perform Spatial Reasoning in The Wild? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior work largely relied on synthetic or LLM-generated environments with limited task designs and puzzle-like setups, failing to capture the real-world complexity, visual noise, and diverse spatial relationships that VLMs encounter. To address this, we introduce **_SpatiaLab_**, a comprehensive benchmark for evaluating VLMs’ spatial reasoning in realistic, unconstrained contexts. |
Azmine Toushik Wasi; Wahid Faisal; Abdur Rahman; Mahfuz Ahmed Anik; Munem Shahriar; Mohsin Mahmud Topu; Sadia Tasnim Meem; Rahatun Nesa Priti; Sabrina Afroz Mitu; Md. Iqramul Hoque; Shahriyar Zaman Ridoy; Mohammed Eunus Ali; Majd Hawasly; Mohammad Raza; Md Rizwan Parvez; | code |
| 1000 | Enhancing Image-Conditional Coverage in Segmentation: Adaptive Thresholding Via Differentiable Miscoverage Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Conformal Risk Control (CRC) offers marginal statistical guarantees, achieving image-conditional coverage, which ensures prediction sets reliably capture ground truth for individual images, remains a significant challenge. This paper introduces a novel approach to address this gap by learning image-adaptive thresholds for conformal image segmentation. |
Rui Luo; Jie Bao; Xiaoyi Su; Wen Jung Li; Suqun Cao; | code |
| 1001 | Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce bidirectional projector alignment during training: two maps, old$\to$new and new$\to$old, are trained during each new task with stop-gradient gating and a cycle-consistency objective so that transport and representation co-evolve. |
Hongye Xu; Bartosz Krawczyk; | code |
| 1002 | TreeGrad-Ranker: Feature Ranking Via $O(L)$-Time Gradients for Decision Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves; these gradients include weighted Banzhaf values. |
Weida Li; Yaoliang Yu; Bryan Kian Hsiang Low; | code |
| 1003 | Characteristic Root Analysis and Regularization for Linear Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a systematic study of linear models for time series forecasting, with a focus on the role of characteristic roots in temporal dynamics. |
Zheng Wang; Kaixuan Zhang; Wanfang Chen; Xiaonan Lu; Longyuan Li; Tobias Schlagenhauf; | code |
| 1004 | Query-Guided Spatial–Temporal–Frequency Interaction for Music Audio–Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in those methods, the audio input is primarily treated as complementary to video analysis, and the textual question information contributes minimally to audio–visual understanding, as it is typically integrated only in the final stages of reasoning. To address these limitations, we propose a novel Query-guided Spatial–Temporal–Frequency (QSTar) interaction method, which effectively incorporates question-guided clues and exploits the distinctive frequency-domain characteristics of audio signals, alongside spatial and temporal perception, to enhance audio–visual understanding. |
Kun Li; Michael Ying Yang; Sami Sebastian Brandt; | code |
| 1005 | FlowGen: Synthesizing Diverse Flowcharts to Enhance and Benchmark MLLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing flowchart datasets often lack fine-grained control over key properties such as graph complexity and rendering style, limiting their utility for training and testing of multimodal large language models (MLLMs) on visual reasoning tasks. To address these limitations, we introduce FlowGen, a controllable synthesizer that generates flowcharts that have customizable structural features and supports multiple renderer backends. |
Kaiwen Shi; Sichen Liu; Ziyue Lin; Hangrui Guo; Gong Cheng; | code |
| 1006 | PARD: Accelerating LLM Inference with Low‑Cost PARallel Draft Model Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose \textbf{PARD (PARallel Draft)}, a novel speculative decoding method featuring \textit{target-independence} and \textit{parallel token prediction}. |
Zihao An; Huajun Bai; Ziqiong Liu; Dong Li; Emad Barsoum; | code |
| 1007 | OWL : Geometry-Aware Spatial Reasoning for Audio Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the $\textbf{Spatial-Acoustic Geometry Encoder (SAGE}$), a geometry-aware audio encoder that aligns binaural acoustic features with 3D spatial structure using panoramic depth images and room-impulse responses at training time, while requiring only audio at inference. |
Subrata Biswas; Mohammad Nur Hossain Khan; Bashima Islam; | code |
| 1008 | Seeing What’s Wrong: A Trajectory-Guided Approach to Caption Error Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Correct captions typically stabilize after minor edits, while erroneous captions undergo substantial improvements. Building on these insights, we introduce TRACED, a cost-efficient and model-agnostic framework that leverages trajectory statistics for more accurate caption error detection. |
Gabriel Afriat; Ryan Lucas; Xiang Meng; Yufang Hou; Yada Zhu; Rahul Mazumder; | code |
| 1009 | Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: At the core of the challenge lies the absence of transferability of neural operators to new geometries. To tackle this issue, we propose operator learning with domain decomposition, a local-to-global framework to solve PDEs on arbitrary geometries. |
Jianing Huang; Kaixuan Zhang; Youjia Wu; Ze Cheng; | code |
| 1010 | Divide and Abstract: Autoformalization Via Decomposition and Abstraction Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Existing approaches to autoformalization—the task of translating informal mathematics into formal machine-verifiable languages—rely heavily on pre-defined libraries and expect … |
Marcus J. Min; Yeqi Gao; Wilson Sy; Zhaoyu Li; Xujie Si; Osbert Bastani; | code |
| 1011 | EGG-SR: Embedding Symbolic Equivalence Into Symbolic Regression Via Equality Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce EGG-SR, a unified framework that integrates symbolic equivalence into a class of modern symbolic regression methods, including Monte Carlo Tree Search (MCTS), Deep Reinforcement Learning (DRL), and Large Language Models (LLMs). |
Nan Jiang; Ziyi Wang; Yexiang Xue; | code |
| 1012 | Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we shift perspective by introducing a principled definition of an event as the smallest semantically self-contained action or state change in a text prompt that can be temporally aligned with a motion segment. |
Seong-Eun Hong; Jaeyoung Seon; JuYeong Hwang; JongHwan Shin; HyeongYeop Kang; | code |
| 1013 | TS$^2$: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose “Training with Sparsemax+, Testing with Softmax (TS$^2$)”. |
XuZiyang; Ananthu Rajendran Pillai; Yinghua Yao; Yuangang Pan; | code |
| 1014 | A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large vision-language models (LVLMs) achieve impressive performance, yet their internal decision-making processes remain opaque, making it difficult to determine if the success stems from true multimodal fusion or reliance on unimodal priors. To address this attribution gap, we introduce a novel framework using partial information decomposition (PID) to quantitatively measure the “information spectrum” of LVLMs—decomposing a model’s decision-relevant information into redundant, unique, and synergistic components. |
Lixin Xiu; Xufang Luo; Hideki Nakayama; | code |
| 1015 | Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: How can large language models (LLMs) serve users with varying preferences that may conflict across cultural, political, or other dimensions? |
Lily H Zhang; Smitha Milli; Karen Long Jusko; Jonathan Smith; Brandon Amos; Wassim Bouaziz; Manon Revel; Jack Kussman; Yasha Sheynin; Lisa Titus; Bhaktipriya Radharapu; Jane Yu; Vidya Sarma; Kristopher Rose; Maximilian Nickel; | code |
| 1016 | Relationship Alignment for View-aware Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods often suffer from two limitations: i) the neglect of preserving sample neighborhood structures, which weakens the consistency of inter-sample relationships across views; and ii) inability to adaptively utilize inter-view similarity, resulting in representation conflicts and semantic degradation. To address these issues, we propose a novel framework named Relationship Alignment for View-aware Multi-view Clustering (RAV). |
Shuangmei Peng; Zhe Chen; Tianyang Xu; Xiaojun Wu; | code |
| 1017 | 3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce 3DCS, the first benchmark for 3D Conformational Sensitivity in MRs. 3DCS evaluates whether representations within the same molecule (i) preserve geometric variation, (ii) capture chirality, and (iii) reflect the energy landscape. To enable this, we curate three large-scale datasets ($>$1M molecules, $\sim$10M conformers) spanning relaxed torsional scans, chiral drug candidates, and AIMD trajectories, and propose a unified Geometry–Chirality–Energy (GCE) evaluation framework. |
Xi Wang; Yang Zhang; Yingjia Zhang; Yejia Cai; Shenji Wan; | code |
| 1018 | SlotGCG: Exploiting The Positional Vulnerability in LLMs for Jailbreak Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we empirically investigate slots, i.e., candidate positions within a prompt where tokens can be inserted. |
Seungwon Jeong; Jiwoo Jeong; Hyeonjin Kim; Yunseok Lee; Woojin Lee; | code |
| 1019 | Towards Scalable Oversight Via Partitioned Human Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For example, a cardiologist could state that “this is not related to cardiology,” even if they cannot identify the true disease. Based on this weak signal, we propose a scalable oversight framework that enables us to evaluate frontier AI systems without the need to prepare the ground truth. |
Ren Yin; Takashi Ishida; Masashi Sugiyama; | code |
| 1020 | CARPRT: Class-Aware Zero-Shot Prompt Reweighting for Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For instance, a prompt like an aerial view of might be apt for airport but ill-suited for apple. To address this, we propose class-aware zero-shot prompt reweighting (CARPRT), a scoring scheme that adjusts the weighting vector for each class by capturing the class-specific relevance of different prompts in a training-free manner. |
Ruijiang Dong; Zesheng Ye; Jianzhong Qi; Lei Feng; Feng Liu; Gang Niu; Masashi Sugiyama; | code |
| 1021 | Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We trace its root cause to the dual role of \ |
BumJun Kim; Dongjae Jeon; Dueun Kim; Wonje Jeung; Albert No; | code |
| 1022 | Many Eyes, One Mind: Temporal Multi-Perspective and Progressive Distillation for Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose **MEOM** (**M**any **E**yes, **O**ne **M**ind), a unified KD framework that enriches supervision with diverse temporal perspectives through mask-weighted teacher features and progressively aligns truncated predictions with the full-length prediction, thereby enabling more reliable inference across all timesteps. |
Kai Sun; Peibo Duan; Yongsheng Huang; Nanxu Gong; Levin Kuhlmann; | code |
| 1023 | FETAL-GAUGE: A BENCHMARK FOR ASSESSING VISION-LANGUAGE MODELS IN FETAL ULTRASOUND Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This gap is primarily due to the modality’s challenging nature, operator dependency, and the limited public availability of datasets. To address this gap, we present Fetal-Gauge, the first and largest visual question answering benchmark specifically designed to evaluate VLMs across various fetal ultrasound tasks. |
Hussain Alasmawi; Numan Saeed; Mohammad Yaqub; | code |
| 1024 | Spatial Structure and Selective Text Jointly Facilitate Image Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, existing approaches often assume that textual features are universally beneficial, overlooking their varying suitability for different datasets. To address these issues, we propose to use spatial structure and selective text to jointly facilitate image clustering (SATC). |
Zizheng Jiu; Feijiang Li; Jieting Wang; Yuhua Qian; Lu Chen; | code |
| 1025 | TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding Via Self-Verification Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose **TimeSearch-R**, which reformulates temporal search as interleaved text–video thinking, seamlessly integrating searching video clips into the reasoning process through reinforcement learning (RL).Additionally, we construct datasets specifically designed for the SFT cold-start and RL training of GRPO-CSV, filtering out samples with weak temporal dependencies to enhance task difficulty and improve temporal search capabilities. |
Junwen Pan; Qizhe Zhang; Rui Zhang; Ming Lu; Xin Wan; Yuan Zhang; Chang Liu; Qi She; | code |
| 1026 | SiMO: Single-Modality-Operable Multimodal Collaborative Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Collaborative perception integrates multi-agent perspectives to enhance the sensing range and overcome occlusion issues. |
Jiageng Wen; Shengjie Zhao; Bing Li; Jiafeng Huang; Kenan Ye; Hao Deng; | code |
| 1027 | Learnable Fractional Superlets with A Spectro-Temporal Emotion Encoder for Speech Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We establish admissibility (zero mean) and continuity in order and frequency, and characterize approximate analyticity by bounding negative-frequency leakage as a function of an effective cycle parameter. Building on these results, we introduce the Learnable Fractional Superlet Transform (LFST), a fully differentiable front-end that jointly optimizes (i) a monotone, log-spaced frequency grid, (ii) frequency-dependent base cycles, and (iii) learnable fractional-order weights, all trained end-to-end. |
Alaa Nfissi; Wassim Bouachir; Nizar Bouguila; Brian L Mishara; | code |
| 1028 | DiscoX: Benchmarking Discourse-Level Translation in Expert Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While these translations demand discourse-level coherence and strict terminological precision, current evaluation methods predominantly focus on segment-level accuracy and fluency. To address this limitation, we introduce DiscoX, a new benchmark for discourse-level and expert-level Chinese-English translation. |
Xiying ZHAO; Zhoufutu Wen; Zhixuan Chen; Jingzhe Ding; Jianpeng Jiao; Shuai Li; Xi Li; Danni Liang; Shengda Long; Qianqian Liu; Xianbo Wu; Hongwan Gao; Xiang Gao; LIANG HU; Jiashuo Liu; Liumengyun; Weiran Shi; Chenghao Yang; Qianyu Yang; Xuanliang Zhang; Ge Zhang; Wenhao Huang; | code |
| 1029 | Multimodal Dataset Distillation Via Phased Teacher Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address critical challenges such as pronounced cross-stage performance gaps and unstable teacher trajectories, we propose Phased Teacher Model with Shortcut Trajectory (PTM-ST)—a novel phased distillation framework. |
Shengbin Guo; Hang Zhao; Senqiao Yang; Chenyang Jiang; Yuhang Cheng; Xiangru Peng; Rui Shao; Zhuotao Tian; | code |
| 1030 | Proper Velocity Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore the Proper Velocity (PV) space, an unconstrained representation of hyperbolic space rooted in Einstein’s special relativity, as a stable alternative. |
Ziheng Chen; Zihan Su; Bernhard Schölkopf; Nicu Sebe; | code |
| 1031 | Fast and Stable Riemannian Metrics on SPD Manifolds Via Cholesky Product Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this, we revisit the geometry of the Cholesky factors and uncover a simple product structure that enables convenient metric design. Building on this insight, we propose two fast and stable SPD metrics, Power–Cholesky Metric (PCM) and Bures–Wasserstein–Cholesky Metric (BWCM), derived via Cholesky decomposition. |
Ziheng Chen; Yue Song; Xiaojun Wu; Nicu Sebe; | code |
| 1032 | When and Where to Reset Matters for Long-Term Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose 1) an Adaptive and Selective Reset (ASR) scheme that dynamically determines when and where to reset, 2) an importance-aware regularizer to recover essential knowledge lost from reset, and 3) an on-the-fly adaptation adjustment scheme to enhance adaptability under challenging domain shifts. |
Taejun Lim; Joong-Won Hwang; Kibok Lee; | code |
| 1033 | DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces DeepFRC, an end-to-end deep learning framework that jointly learns diffeomorphic warping functions and a classifier within a unified architecture. |
Siyuan Jiang; Yihan Hu; Wenjie Li; Pengcheng Zeng; | code |
| 1034 | In Context Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce and study in-context semi-supervised learning (IC-SSL), where a small set of labeled examples is accompanied by many unlabeled points, and show that Transformers can leverage the unlabeled context to learn a robust, context-dependent representation. |
Jiashuo Fan; Paul Rosu; Aaron T Wang; Lawrence Carin; Xiang Cheng; | code |
| 1035 | T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce T1 (Time series imputation with 1-to-1 channel-head binding), a CNN-Transformer hybrid architecture that achieves robust imputation through Channel-Head Binding—a mechanism creating one-to-one correspondence between CNN channels and attention heads. |
Dongik Park; Hyunwoo Ryu; Suahn Bae; Keondo Park; Hyung-Sin Kim; | code |
| 1036 | Don’t Pass@k: A Bayesian Framework for Large Language Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a principled Bayesian evaluation framework that replaces Pass@$k$ and average accuracy over $N$ trials (avg@$N$) with posterior estimates of a model’s underlying success probability and credible intervals, yielding stable rankings and a transparent decision rule for differences. |
Mohsen Hariri; Amirhossein Samandar; Michael Hinczewski; Vipin Chaudhary; | code |
| 1037 | Selective Rotary Position Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Selectivity has generally been shown to improve language-related tasks. Inspired by this, we introduce \textit{Selective RoPE}, an \textit{input-dependent} rotary embedding mechanism, that generalizes \textit{RoPE}, and enables rotation in \textit{arbitrary angles} for both linear and softmax transformers. |
Sajad Movahedi; Timur Carstensen; Arshia Afzal; Frank Hutter; Antonio Orvieto; Volkan Cevher; | code |
| 1038 | Search Self-Play: Pushing The Frontier of Agent Capability Without Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To achieve agentic RLVR with higher scalability, we explore self-play training for deep search agents, in which the learning LLM utilizes multi-turn search engine calling and acts simultaneously as both a task proposer and a problem solver. |
Hongliang Lu; Yuhang Wen; Pengyu Cheng; Ruijin Ding; Jiaqi Guo; Haotian Xu; Chutian Wang; Haonan Chen; xiaoxi jiang; guanjunjiang; | code |
| 1039 | IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel approach, Intrinsic Mixture of Spectral Experts (IMSE), that leverages the spectral experts inherently embedded in Vision Transformers. |
Sunghyun Baek; Jaemyung Yu; Seunghee Koh; Minsu Kim; Hyeonseong Jeon; Junmo Kim; | code |
| 1040 | SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit this limitation from an information-theoretic perspective and deduce that ensuring each scale to contribute high-frequency content not explained by earlier scales mitigates the train–inference discrepancy. With this insight, we propose Scaled Spatial Guidance (SSG), a training-free, inference-time guidance that steers generation toward the intended hierarchy while maintaining global coherence. |
Youngwoo Shin; Jiwan Hur; Junmo Kim; | code |
| 1041 | MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MSB (MCP Security Benchmark), the first end-to-end evaluation suite that systematically measures how well LLM agents resist MCP-specific attacks throughout the full tool-use pipeline: task planning, tool invocation, and response handling. |
Dongsen Zhang; Zekun Li; Xu Luo; Xuannan Liu; Pei Pei Li; Wenjun Xu; | code |
| 1042 | HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates the application of Large Language Models (LLMs) in Automated Heuristic Design (AHD), where their integration into evolutionary frameworks reveals a significant gap in global control and long-term learning. We propose the Hindsight-Foresight Prompt (HiFo-Prompt), a novel framework for LLM-based AHD designed to overcome these limitations. |
ChentongChen; Mengyuan Zhong; Jialong Shi; Jianyong Sun; Ye Fan; | code |
| 1043 | DiffuDETR: Rethinking Detection Transformers with Denoising Diffusion Process Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present DiffuDETR, a novel approach that formulates object detection as a conditional object query generation task, conditioned on the image and a set of noisy reference points. |
Youssef Ahmed Nawar; Mohamed Badran; Marwan Torki; | code |
| 1044 | Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we theoretically analyze the impact of inter-policy diversity on learning efficiency in policy ensembles, and propose Coupled Policy Optimization (CPO), which regulates diversity through KL constraints between policies. |
Naoki Shitanda; Motoki Omura; Tatsuya Harada; Takayuki Osa; | code |
| 1045 | GraphUniverse: Synthetic Graph Generation for Evaluating Inductive Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While synthetic benchmarks offer controlled settings for analysis, existing approaches are confined to single-graph, transductive settings where models train and test on the same graph structure. Addressing this gap, we introduce GraphUniverse, a framework for generating entire families of graphs to enable the first systematic evaluation of inductive generalization at scale. |
Louis Van Langendonck; Guillermo Bernardez; Nina Miolane; Pere Barlet-Ros; | code |
| 1046 | Contractive Diffusion Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce **C**ontractive **D**iffusion **P**olicies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. |
Amin Abyaneh; Charlotte Morissette; Mohamad H. Danesh; Anas Houssaini; David Meger; Gregory Dudek; Hsiu-Chin Lin; | code |
| 1047 | CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities. |
Alexander Baumann; Leonardo Ayala; Silvia Seidlitz; Jan Sellner; Alexander Studier-Fischer; Berkin Özdemir; Lena Maier-hein; Slobodan Ilic; | code |
| 1048 | Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Training-Free Loosely Speculative Decoding (FLy), a novel method that loosens the rigid verification criterion by leveraging the target model’s own corrective behavior to judge whether a draft–target mismatch remains semantically valid. |
Jinze Li; Yixing Xu; Guanchen Li; Shuo Yang; Jinfeng Xu; Xuanwu Yin; Dong Li; Edith C. H. Ngai; Emad Barsoum; | code |
| 1049 | Safe Continuous-time Multi-Agent Reinforcement Learning Via Epigraph Form Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, they rarely account for safety constraints such as collision penalties, since these introduce discontinuities that make HJB-based learning difficult. To address this challenge, we propose a continuous-time constrained MDP (CT-CMDP) formulation and a novel MARL framework that transforms discrete MDPs into CT-CMDPs via an epigraph-based reformulation. |
Xuefeng Wang; Lei Zhang; Henglin Pu; Husheng Li; Ahmed H Qureshi; | code |
| 1050 | A Foundation Model with Multi-variate Parallel Attention to Generate Neuronal Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations.To support this and future efforts by the community, we release the Long-term iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of recordings from heterogeneous clinical sources. |
Francesco S. Carzaniga; Michael Hersche; Abu Sebastian; Kaspar Schindler; Abbas Rahimi; | code |
| 1051 | PAT3D: Physics-Augmented Text-to-3D Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PAT3D, the first physics-augmented text-to-3D scene generation framework that integrates vision–language models with physics-based simulation to produce physically plausible, simulation-ready, and intersection-free 3D scenes. |
Guying Lin; Kemeng Huang; Michael Liu; Ruihan Gao; Hanke Chen; Lyuhao Chen; Beijia Lu; Taku Komura; Yuan Liu; Jun-Yan Zhu; Minchen Li; | code |
| 1052 | Learning Data-Efficient and Generalizable Neural Operators Via Fundamental Physics Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by how numerical solvers are compatible with simulations of different settings of PDEs, we propose a multiphysics training framework that jointly learns from both the original PDEs and their simplified basic forms. |
Siying Ma; Mehrdad Momeni Zadeh; Mauricio Soroco; Wuyang Chen; Jiguo Cao; Vijay Ganesh; | code |
| 1053 | A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods may not always have optimal angular separation between class-wise textual features, which implies overlooking the critical role of angular diversity. To address this, we propose A-TPT, a novel TPT framework that introduces angular diversity to encourage uniformity in the distribution of normalized textual features induced by corresponding learnable prompts. |
Shihab Aaqil Ahamed; Udaya Sampath K. Perera Miriya Thanthrige; Ranga Rodrigo; Muhammad Haris Khan; | code |
| 1054 | Boosting Medical Visual Understanding From Multi-Granular Language Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple labels across different levels of granularity. To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. |
Zihan Li; Yiqing Wang; Sina Farsiu; Paul Kinahan; | code |
| 1055 | Reverse Distillation: Consistently Scaling Protein Language Model Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For example, the ESM-2 family of protein language models plateaus at 650M-3B parameters on ProteinGym benchmarks. We address this limitation by introducing Reverse Distillation, a principled framework that decomposes large protein language model representations into orthogonal subspaces guided by smaller models of the same family. |
Darius Catrina; Christian Bepler; Samuel Sledzieski; Rohit Singh; | code |
| 1056 | Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To overcome the identified trade-offs, we introduce SFT-auto, a novel framework that delivers superior and balanced robustness against both textual and structural attacks within a single model. |
Runlin Lei; Lu Yi; Mingguo He; Pengyu Qiu; Zhewei Wei; Yongchao Liu; Chuntao Hong; | code |
| 1057 | Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce Splat and Distill, a framework that instills robust 3D awareness into 2D VFMs by augmenting the teacher model with a fast, feed-forward 3D reconstruction pipeline. |
David Shavin; Sagie Benaim; | code |
| 1058 | GAVEL: Towards Rule-Based Safety Through Activation Monitoring Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose modeling activations as cognitive elements (CEs), fine-grained, interpretable factors such as making a threat and payment processing, that can be composed to capture nuanced, domain-specific behaviors with higher precision. Building on this representation, we present a practical framework that defines predicate rules over CEs and detects violations in real time. |
Shir Rozenfeld; Rahul Pankajakshan; Itay Zloczower; Eyal Lenga; Gilad Gressel; Yisroel Mirsky; | code |
| 1059 | SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present SSDi8, the first post-training quantization framework specifically designed for SSD to maintain a persistent INT8 path. |
Hyunwoo Kim; Byoungchan Ko; Minseok Kang; Minwoo Kim; Dongjin Lee; Jaehoon Lee; Sungroh Yoon; Dahuin Jung; | code |
| 1060 | ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. |
Liyang He; Yuren Zhang; Ziwei Zhu; lizhenghui; Shiwei Tong; | code |
| 1061 | Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the _Relational Transformer (RT)_, a cell-level architecture pretrained on diverse relational databases and directly applicable to unseen datasets and tasks, without any need for task- or dataset-specific fine-tuning or retrieval of in-context examples. |
Rishabh Ranjan; Valter Hudovernik; Mark Znidar; Charilaos I. Kanatsoulis; Roshan Reddy Upendra; Mahmoud Mohammadi; Joe Meyer; Tom Palczewski; Carlos Guestrin; Jure Leskovec; | code |
| 1062 | ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We develop a prediction benchmark reporting prediction error metrics, along with feature importance analysis, and introduce a novel multiple-stage ML model.We also provide a scheduling benchmark for minimizing the total completion time, max-stretch, and makespan. |
Zhiyun Jiang; Tianming Zhao; Chunqiu xia; Albert Zomaya; | code |
| 1063 | Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This phenomenon becomes pronounced when reconstructing 3D humans with dynamic or challenging poses, which we attribute to the limited scale of available 3D human datasets with diverse poses. To address this limitation, we introduce DrPose, a Direct Reward fine-tuning algorithm on Poses, which enables post-training of a multi-view diffusion model on diverse poses without requiring expensive 3D human assets. |
Seunguk Do; Minwoo Huh; Joonghyuk Shin; Jaesik Park; | code |
| 1064 | Evaluating GFlowNet from Partial Episodes for Stable and Flexible Policy-based Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work bridges the two perspectives by showing that flow balance also yields a principled policy evaluator that measures the policy divergence, and an evaluation balance objective over partial episodes is proposed for learning the evaluator. |
Puhua Niu; Shili Wu; Xiaoning Qian; | code |
| 1065 | Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Biomolecular interaction modeling has been substantially advanced by foundation models, yet they often produce all-atom structures that violate basic steric feasibility. We address this limitation by enforcing physical validity as a strict constraint during both training and inference with a unified module. |
Siyuan Chen; Minghao Guo; Caoliwen Wang; Anka He Chen; Yikun Zhang; Jingjing Chai; Yin Yang; Wojciech Matusik; Peter Yichen Chen; | code |
| 1066 | Peng’s Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a model-free offline multi-step reinforcement learning (RL) algorithm, Conservative Peng’s Q($\lambda$) (CPQL). |
Byeongchan Kim; Min-hwan Oh; | code |
| 1067 | QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose QuaMo, a novel Quaternion Motions method using quaternion differential equations (QDE) for human kinematics capture. |
Cuong Le; Pavlo Melnyk; Urs Waldmann; Mårten Wadenbäck; Bastian Wandt; | code |
| 1068 | TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a training-time data-curation framework, TrainRef, to uniformly address predictive accuracy and confidence calibration by (1) an extrinsic small set of reference samples $D_{{ref}}$ to avoid normality pollution and (2) curate labels into a class distribution instead of a categorical class to handle sample ambiguity. |
Murong Ma; Ruofan Liu; Yun Lin; Zhiyong Huang; Jin Song Dong; | code |
| 1069 | A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although Continual Spatio-Temporal Forecasting methods have been proposed to tackle these issues, they often adopt backbones with limited modeling capacity and lack effective mechanisms to balance stability and adaptability. To overcome these limitations, we propose STBP, a novel framework that integrates a general spatio-temporal backbone with a scalable contextual pattern bank. |
Aoyu Liu; Yaying Zhang; | code |
| 1070 | Hey, That’s My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a novel fingerprinting framework that provides verifiable proof of ownership while preserving fingerprint integrity. |
Mark Russinovich; Yanan Cai; Ahmed Salem; | code |
| 1071 | SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SEED ($\textbf{Se}$mantic $\textbf{E}$valuation for Visual Brain $\textbf{D}$ecoding), a novel metric for evaluating the semantic decoding performance of visual brain decoding models. |
Juhyeon Park; Peter Yongho Kim; Jiook Cha; Shinjae Yoo; Taesup Moon; | code |
| 1072 | Flock: A Knowledge Graph Foundation Model Via Learning on Random Walks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the conventional notion of deterministic equivariance inherently limits the expressive power of KGFMs, as it prevents them from distinguishing relations that are structurally similar but semantically distinct. To overcome this limitation, we propose to leverage probabilistic node-relation equivariance, which preserves equivariance in distribution while using structured randomness to break symmetries at inference time. |
Jinwoo Kim; Xingyue Huang; Krzysztof Olejniczak; Kyungbin Min; Michael M. Bronstein; Seunghoon Hong; Ismail Ilkan Ceylan; | code |
| 1073 | Adaptive Concept Discovery for Interpretable Few-Shot Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Concept Bottleneck Models (CBMs) offer an efficient and interpretable alternative, their reliance on training surrogate models makes them incompatible with few-shot scenarios. To bridge this gap, we introduce a novel CBM paradigm that relies solely on sample-concept similarity to make predictions. |
ZHENG Lifang; Hanmo Liu; Kani Chen; | code |
| 1074 | Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this gap, we introduce a new comprehensive dataset – $\textbf{StreamSR}$ – sourced from YouTube, covering a wide range of video genres and resolutions representative of real-world streaming scenarios.We made the dataset, the code, and the benchmark available at $\textit{https://github.com/EvgeneyBogatyrev/EfRLFN}$. |
Evgeney Bogatyrev; Khaled Abud; Ivan Molodetskikh; Nikita Alutis; Dmitriy S. Vatolin; | code |
| 1075 | FAST‑DIPS: Adjoint‑Free Analytic Steps and Hard‑Constrained Likelihood Correction for Diffusion‑Prior Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The correction is implemented via an adjoint-free ADMM with a closed-form projection onto the Euclidean ball and a few steepest-descent updates whose step size is analytic and computable from one VJP and one JVP—or a forward-difference surrogate—followed by decoupled re-annealing. We show this step minimizes a local quadratic model (with backtracking-based descent), any ADMM fixed point satisfies KKT for the hard-constraint, and mode substitution yields a bounded time-marginal error. |
Minwoo Kim; Seunghyeok Shin; Hongki Lim; | code |
| 1076 | Directional Textual Inversion for Personalized Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Directional Textual Inversion (DTI), which fixes the embedding magnitude to an in‑distribution scale and optimizes only direction on the unit hypersphere via Riemannian SGD. |
Kunhee Kim; NaHyeon Park; Kibeom Hong; Hyunjung Shim; | code |
| 1077 | HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: During cognition, the hippocampal EC–DG–CA3–CA1 circuit engages in multiple rounds of associative recall, and its pattern-separation and memory-completion mechanisms excel at activating historical information. Inspired by this mechanism, we propose HippoTune, a latent-space iterative retrieval strategy that embeds a query–retrieve–feedback loop within each Transformer layer. |
chenyanxi; Xiuxing Li; Han Yuyang; Zhuo Wang; Qing Li; Ziyu Li; Xiang Li; Chen Wei; Xia Wu; | code |
| 1078 | DiaBlo: Diagonal Blocks Are Sufficient For Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present *DiaBlo*, a simple yet effective PEFT approach that updates only the diagonal blocks of selected model weight matrices. |
Selcuk Gurses; Aozhong Zhang; Yanxia Deng; Xun Dong; Xin Li; Naigang Wang; Penghang Yin; Zi Yang; | code |
| 1079 | The Achilles’ Heel of LLMs: How Altering A Handful of Neurons Can Cripple Language Abilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Neuroscience research has found that a small subset of biological neurons in the human brain are crucial for core cognitive functions, which raises a fundamental question: do LLMs also contain a small subset of critical neurons? In this paper, we investigate this question by proposing a Perturbation-based Causal Identification of Critical Neurons method to systematically locate such critical neurons in LLMs. |
Zixuan Qin; Qingchen Yu; Kunlin Lyu; Zhaoxin Fan; Yifan Sun; | code |
| 1080 | VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Value Sign Flip (VSF), a simple and efficient method for incorporating negative prompt guidance in few-step (1-8 steps) diffusion and flow-matching image generation models. |
Wenqi Marshall Guo; Shan Du; | code |
| 1081 | Characterizing The Discrete Geometry of ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, relatively little is known about the geometry of these complexes beyond bounds on the total number of regions, and calculating the complex exactly is intractable for most networks. In this work, we prove new theoretical results about these complexes that hold for all fully-connected ReLU networks, specifically about their connectivity graphs in which nodes correspond to regions and edges exist between each pair of regions connected by a face. |
Blake B. Gaines; Jinbo Bi; | code |
| 1082 | Random Controlled Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a training-efficient framework for time-series learning that combines random features with controlled differential equations (CDEs). |
Francesco Piatti; Thomas Cass; William F. Turner; | code |
| 1083 | Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on this framework, we present practical RL algorithms — Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control — that combine the benefits of equivariance with robustness to symmetry-breaking. |
Junwoo Chang; Minwoo Park; Joohwan Seo; Roberto Horowitz; Jongmin Lee; Jongeun Choi; | code |
| 1084 | Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing VTOFF approaches face two major limitations: (i) they are fundamentally constrained by their exclusive reliance on ambiguous visual information from the source image, and (ii) they frequently produce images with severely degraded details, preventing their use in practical applications. To overcome these challenges, we present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF), a novel architecture featuring a dual DiT-based backbone. |
Davide Lobba; Fulvio Sanguigni; Bin Ren; Marcella Cornia; Rita Cucchiara; Nicu Sebe; | code |
| 1085 | Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Failures in the real world are typically subtle, combinatorial, and difficult to enumerate, whereas rich reasoning labels are expensive to acquire. We address this problem by introducing ARMOR: Adaptive Round-based Multi-task mOdel for Robotic failure detection and reasoning. |
Carl Qi; Xiaojie Wang; Silong Yong; Stephen Sheng; Huitan Mao; sriram srinivasan; Manikantan Nambi; Amy Zhang; Yesh Dattatreya; | code |
| 1086 | A Training-Free Framework for Long Video Understanding Via Video-Query-Options Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a training-free framework for long video understanding, integrating three key innovations: Adaptive Frame Sampling (AFS), Dynamic Resolution Allocation (DRA), and Video-Query-Options Similarity (VQOS). |
Zhirong Wu; Xiaodong Wang; Langling Huang; Teng Xu; Peixi Peng; | code |
| 1087 | Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept ($V$) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. |
Anh Tuan Bui; Thuy-Trang Vu; Trung Le; Junae Kim; Tamas Abraham; Rollin Omari; Amardeep Kaur; Dinh Phung; | code |
| 1088 | Spilled Energy in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Similar to Orgad et al. (2025), we localize the exact token associated with the answer, yet, unlike them, who need to train a classifier and ablate which activations to feed to it, we propose a method to detect hallucinations *completely training-free that naturally generalizes across tasks and LLMs* by using the output logits across subsequent generation steps. |
Adrian Robert Minut; Hazem Dewidar; Iacopo Masi; | code |
| 1089 | StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This instability stems from two flaws: a brittle single-path quantization architecture and a distant training signal indifferent to intermediate token stability. To address this, we introduce StableToken, a tokenizer that achieves stability through a consensus-driven mechanism. |
Yuhan Song; Linhao Zhang; Chuhan Wu; Aiwei Liu; Wei Jia; Houfeng Wang; Zhou Xiao; | code |
| 1090 | A Statistical Benchmark for Diffusion-Posterior-Sampling Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a statistical benchmark for diffusion posterior sampling (DPS) algorithms in linear inverse problems. |
Martin Zach; Youssef Haouchat; Michael Unser; | code |
| 1091 | Boosting Entropy with Bell Box Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose BBQ, the first ITO quantization method that is also compute-efficient. |
Ningfeng Yang; Tor M. Aamodt; | code |
| 1092 | TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, in motion generation frameworks, motion-irrelevant cues caused by noise are often entangled with features that contribute positively to generation, thereby leading to motion distortion. To address these issues, we propose Tri-Domain Causal Text-to-Motion Generation (TriC-Motion), a novel diffusion-based framework integrating spatial-temporal-frequency-domain modeling with causal intervention. |
Yiyang Cao; Yunze Deng; Ziyu Lin; Bin Feng; Xinggang Wang; Wenyu Liu; DanDan Zheng; Jingdong Chen; | code |
| 1093 | DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior learning-based approaches face three limitations: (1) reliance on bulk-synchronous frameworks that under-utilize devices, (2) learning a single placement policy without modeling the system dynamics, and (3) depending solely on reinforcement learning in pre-training while ignoring optimization during deployment. We propose Doppler, a three-stage framework with two policies—SEL for selecting operations and PLC for placing them on devices. |
Xinyu Yao; Daniel Bourgeois; Abhinav Jain; Yuxin Tang; Jiawen Yao; Zhimin Ding; Arlei Silva; Chris Jermaine; | code |
| 1094 | SafeMoE: Safe Fine-Tuning for MoE LLMs By Aligning Harmful Input Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing defenses, primarily designed for monolithic LLMs, are less effective for MoE LLMs as they fail to prevent drift in harmful input routing. To address this limitation, we propose SafeMoE, a safe fine-tuning method tailored to MoE LLMs. |
Jaehan Kim; Minkyoo Song; Seungwon Shin; Sooel Son; | code |
| 1095 | VITA: Zero-Shot Value Functions Via Test-Time Adaptation of Vision–Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce VITA, a zero-shot value function learning method that enhances both capabilities via test-time adaptation. |
Christos Ziakas; Alessandra Russo; | code |
| 1096 | Uncertainty As Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we focus on UQ for the contextual QA task and propose a theoretically grounded approach to quantify \emph{epistemic uncertainty}. |
Yavuz Faruk Bakman; Sungmin Kang; Zhiqi Huang; Duygu Nur Yaldiz; Catarina G Belém; Chenyang Zhu; Anoop Kumar; Alfy Samuel; Daben Liu; Salman Avestimehr; Sai Praneeth Karimireddy; | code |
| 1097 | Multi-LLM Adaptive Conformal Inference for Reliable LLM Response Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conformal inference provides distribution-free guarantees, but existing approaches are either overly conservative, discarding many true-claims, or rely on adaptive error rates and simple linear models that fail to capture complex group structures. To address these challenges, we reformulate conformal inference in a multiplicative filtering setting, modeling factuality as a product of claim-level scores. |
Kangjun Noh; Seongchan Lee; Ilmun Kim; Kyungwoo Song; | code |
| 1098 | Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: They lack a theoretical \red{motivation} connecting the embedding space representations with worst-group error. To address this limitation, we propose Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness (SCER), a novel approach that directly regularizes feature representations to suppress spurious cues. |
Subeen Park; JOOWANG KIM; Hakyung Lee; Sunjae yoo; Kyungwoo Song; | code |
| 1099 | Reliable Probabilistic Forecasting of Irregular Time Series Through Marginalization-Consistent Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MOSES (Mixtures of Separable Flows), a novel model that parametrizes a stochastic process via a mixture of normalizing flows, where each component combines a latent multivariate Gaussian with separable univariate transformations. |
Vijaya Krishna Yalavarthi; Randolf Scholz; Christian Klötergens; Kiran Madhusudhanan; Stefan Born; Lars Schmidt-Thieme; | code |
| 1100 | Test-Time Alignment for Large Language Models Via Textual Model Predictive Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conversely, when actions are at the response level, as in traditional iterative refinement, the curse of dimensionality emerges. To resolve this trade-off, we draw inspiration from Model Predictive Control (MPC) in control theory to propose Textual Model Predictive Control (TMPC), a novel predictive planning framework adapted for aligning LLMs at inference time. |
Kuang-Da Wang; Teng-Ruei Chen; Yu Heng Hung; Guo-Xun Ko; Shuoyang Ding; Yueh-Hua Wu; Yu-Chiang Frank Wang; Chao-Han Huck Yang; Wen-Chih Peng; Ping-Chun Hsieh; | code |
| 1101 | Cross-Domain Policy Optimization Via Bellman Consistency and Hybrid Critics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite its potential, cross-domain transfer in RL is known to have two fundamental and intertwined challenges: (i) The source and target domains can have distinct state space or action space, and this makes direct transfer infeasible and thereby requires more sophisticated inter-domain mappings; (ii) The transferability of a source-domain model in RL is not easily identifiable a priori, and hence CDRL can be prone to negative effect during transfer. In this paper, we propose to jointly tackle these two challenges through the lens of \textit{cross-domain Bellman consistency} and \textit{hybrid critic}. |
Ming-Hong Chen; Kuan-Chen Pan; You-De Huang; Xi Liu; Ping-Chun Hsieh; | code |
| 1102 | CogMoE: Signal-Quality–Guided Multimodal MoE for Cognitive Load Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In safety-critical tasks such as driving, degraded signal quality can severely compromise prediction accuracy, limiting the deployment of existing models outside controlled lab conditions. To address this challenge, we propose CogMoE, a signal quality–guided Mixture-of-Experts (MoE) framework that dynamically adapts to heterogeneous and noisy inputs. |
Aamir Bader Shah; Yu Wen; Renjie Hu; Jiefu Chen; Jose L Contreras-Vidal; Xuqing Wu; Xin Fu; | code |
| 1103 | Adaptive Mamba Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Adaptive Fourier Mamba Operators (AFMO), which integrates reproducing kernels for state-space models (SSMs) rather than the kernel integral formulation of SSMs. |
Zeyuan Song; Zheyu Jiang; | code |
| 1104 | Sheaves Reloaded: A Direction Awakening Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the GNN literature proved that incorporating directionality can substantially boost performance in many real-world applications, no SNNs approaches are known with such a capability. To address this limitation, we introduce the Directed Cellular Sheaf, a generalized cellular sheaf designed to explicitly account for edge orientations. |
Stefano Fiorini; Hakan Aktas; Iulia Duta; Pietro Morerio; Alessio Del Bue; Pietro Lio; Stefano Coniglio; | code |
| 1105 | SPREAD: Sampling-based Pareto Front Refinement Via Efficient Adaptive Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Developing efficient multi-objective optimization methods to compute the Pareto set of optimal compromises between conflicting objectives remains a key challenge, especially for large-scale and expensive problems. To bridge this gap, we introduce SPREAD, a generative framework based on Denoising Diffusion Probabilistic Models (DDPMs). |
Sedjro Salomon Hotegni; Sebastian Peitz; | code |
| 1106 | Features Emerge As Discrete States: The First Application of SAEs to 3D Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present the first application of SAEs to the 3D domain, analyzing the features used by a state-of-the-art 3D reconstruction VAE applied to 53k 3D models from the Objaverse dataset. |
Albert Miao; Chenliang Zhou; Jiawei Zhou; Cengiz Oztireli; | code |
| 1107 | Neuron-Level Analysis of Cultural Understanding in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, LLMs exhibit cultural bias and limited awareness of underrepresented cultures, while the mechanisms underlying their cultural understanding remain underexplored. To fill this gap, we conduct a neuron-level analysis to identify neurons that drive cultural behavior, introducing a gradient-based scoring method with additional filtering for precise refinement. |
Taisei Yamamoto; Ryoma Kumon; Danushka Bollegala; Hitomi Yanaka; | code |
| 1108 | GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GeoDiv, a framework leveraging large language and vision-language models to assess geographical diversity along two complementary axes: the Socio-Economic Visual Index (SEVI), capturing economic and condition-related cues, and the Visual Diversity Index (VDI), measuring variation in primary entities and backgrounds. |
Abhipsa Basu; Mohana Singh; Shashank Agnihotri; Margret Keuper; Venkatesh Babu Radhakrishnan; | code |
| 1109 | FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models Towards Adam‑Scale Speed Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Normalized-SGD, for instance, demonstrates strong empirical performance with greater memory efficiency than Adam. In light of this, we introduce FZOO, a Fast Zeroth-Order Optimizer towards Adam-Scale Speed. |
Sizhe Dang; yangyangGuo; Yanjun Zhao; Xiaodong Zheng; Guang Dai; Ivor Tsang; Haishan Ye; | code |
| 1110 | Neural Sum-of-Squares: Certifying The Nonnegativity of Polynomials with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the first learning-augmented algorithm to certify the SOS criterion. |
Nico Pelleriti; Christoph Spiegel; Shiwei Liu; David Martínez-Rubio; Max Zimmer; Sebastian Pokutta; | code |
| 1111 | CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CardioComposer: a programmable, inference-time framework for generating multi-class anatomical label maps based on interpretable ellipsoidal primitives. |
Karim Kadry; Shoaib A. Goraya; Ajay Manicka; Abdalla Abdelwahed; Naravich Chutisilp; Farhad R. Nezami; Elazer R Edelman; | code |
| 1112 | DUET: Optimizing LLM Training Data Mixtures Via Noisy Feedback from Unseen, Downstream Evaluation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our paper presents DUET, a novel global-to-local algorithm that optimizes training data mixtures by interleaving data selection with Bayesian optimization to exploit coarse and noisy feedback from a downstream evaluation task. |
Zhiliang Chen; Gregory Kang Ruey Lau; Chuan-Sheng Foo; Bryan Kian Hsiang Low; | code |
| 1113 | ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work proposes ViTSP, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs. |
Zhuoli Yin; Yi Ding; Reem Khir; Hua Cai; | code |
| 1114 | Aligning Collaborative View Recovery and Tensorial Subspace Learning Via Latent Representation for Incomplete Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, this study proposes a novel IMVC method to Align collaborative view Recovery and tensorial Subspace Learning via latent representation (ARSL-IMVC). |
Youqing Wang; Yu Cao; Jinlu Wang; Xiang Xu; Jiapu Wang; Tengfei Liu; Junbin Gao; Jipeng Guo; | code |
| 1115 | Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we provide a challenging benchmark of real-world large-scale fairness-constrained learning tasks, built on top of the US Census (Folktables, Ding et al, 2021). |
Andrii Kliachkin; Jana Lepšová; Gilles Bareilles; Jakub Marecek; | code |
| 1116 | EgoWorld: Translating Exocentric View to Egocentric View Using Rich Exocentric Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current exocentric-to-egocentric translation methods are limited by their dependence on 2D cues, synchronized multi-view settings, and unrealistic assumptions such as the necessity of an initial egocentric frame and relative camera poses during inference. To overcome these challenges, we introduce *EgoWorld*, a novel two-stage framework that reconstructs an egocentric view from rich exocentric observations, including projected point clouds, 3D hand poses, and textual descriptions. |
Junho Park; Andrew Sangwoo Ye; Taein Kwon; | code |
| 1117 | ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this, we introduce ProofFlow, a novel pipeline that treats structural fidelity as a primary objective.To facilitate evaluation, we present a new benchmark of 184 undergraduate-level problems, manually annotated with step-by-step solutions and logical dependency graphs, and introduce ProofScore, a new composite metric to evaluate syntactic correctness, semantic faithfulness, and structural fidelity. |
Rafael Medeiros Cabral; Tuan Manh Do; Yu Xuejun; Wai Ming Tai; Zijin Feng; SHEN XIN; | code |
| 1118 | Channel-Aware Mixed-Precision Quantization for Efficient Long-Context Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis reveals that quantization sensitivity varies across individual KV channels, presenting an opportunity for non-uniform bit allocation. Following this finding, we propose ChanMix, a mixed-precision quantization framework that supports channel-wise quantization on 2-bit setting with FP8 precision with a custom Triton kernel implementation. |
Chengxi Liao; Zeyi Wen; | code |
| 1119 | Detecting Temporal Misalignment Attacks in Multimodal Fusion for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its reliance on precise temporal synchronization introduces a vulnerability: adversaries can exploit network-induced delays to subtly misalign sensor streams, degrading MMF performance. To address this, we propose AION, a lightweight, plug-in defense tailored for the autonomous driving scenario. |
Md Hasan Shahriar; Md Mohaimin Al Barat; Harshavardhan Sundar; Ning Zhang; Naren Ramakrishnan; Thomas Hou; Wenjing Lou; | code |
| 1120 | Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This principle motivates atomic HIN, a canonical representation that makes all modeling choices explicit and achieves maximal expressiveness. Building on this foundation, we propose a systematic framework for task-specific schema refinement. |
Shao-En Lin; Ming-Yi Hong; Miao-Chen Chiang; Chih-Yu Wang; Che Lin; | code |
| 1121 | Why We Need New Benchmarks for Local Intrinsic Dimension Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our approach employs several techniques to create LID benchmarks for arbitrary domains, including the introduction of a method to transform any manifold into the domain while preserving the manifold structure, thereby addressing challenges posed by biases in neural network-based methods. |
Piotr Tempczyk; Dominik Filipiak; Łukasz Garncarek; Ksawery Smoczyński; Adam Kurpisz; | code |
| 1122 | Let’s Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that agreement bias is pervasive across models, resilient to test-time scaling, and can impact existing methods relying on MLLMs as evaluators. We discuss metrics to measure and strategies to mitigate this bias, and introduce Self-Grounded Verification (SGV), a lightweight method that harnesses MLLMs’ own sampling mechanisms by modulating (un)conditional generation to better leverage their knowledge, alignment, and reasoning. |
Moises Andrade; Joonhyuk Cha; Brandon Ho; Vriksha Srihari; Karmesh Yadav; Zsolt Kira; | code |
| 1123 | TusoAI: Agentic Optimization for Scientific Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we introduce TusoAI, an agentic AI system that takes a scientific task description with an evaluation function and autonomously develops and optimizes computational methods for the application. |
Alistair Turcan; Kexin Huang; Lei Li; Martin Jinye Zhang; | code |
| 1124 | Temporal Slowness in Central Vision Drives Semantic Object Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study investigates the role of central vision and slowness learning in the formation of semantic object representations in humans. |
Timothy Schaumlöffel; Arthur Aubret; Gemma Roig; Jochen Triesch; | code |
| 1125 | ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation underscores the need for under-canopy perspectives better suited for detecting missing persons in such environments. To address this gap, we introduce ForestPersons, a novel large-scale dataset specifically designed for under-canopy person detection. |
Deokyun Kim; Jeongjun Lee; Jungwon Choi; Jonggeon Park; Giyoung Lee; Yookyung Kim; Myungseok Ki; Juho Lee; Jihun Cha; | code |
| 1126 | Bridging Explainability and Embeddings: BEE Aware of Spuriousness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce BEE (Bridging Explainability and Embeddings), a framework that shifts the focus from model predictions to the weight space and embedding geometry underlying decisions. |
Cristian Daniel Paduraru; Antonio Barbalau; Radu Filipescu; Andrei Liviu Nicolicioiu; Elena Burceanu; | code |
| 1127 | AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we conduct thorough empirical analysis using effective rank (erank) as a measure of feature diversity and attention score entropy to investigate visual token processing mechanisms and analyze the strengths and weaknesses of each approach. |
Changwoo Baek; Jouwon Song; Sohyeon Kim; Kyeongbo Kong; | code |
| 1128 | Test-time Domain Generalization for Image Super-resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing TTDG methods primarily rely on style transfer strategies operating at a coarse granularity, which prove ineffective for pixel-level prediction tasks such as image super-resolution (SR). To address this limitation, we propose a multi-codebook based test-time domain generalization framework (MC-TTDG). |
Zaizuo Tang; Yu-Bin Yang; | code |
| 1129 | LDT: Layer-Decomposition Training Makes Networks More Generalizable Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We first provide a theoretical analysis of gradient perturbations caused by unstable parameters. Based on this foundation, we propose Layer-Decomposition Training (LDT), which conducts fine-grained layer-wise partitioning guided by parameter instability levels, substantially improving parameter update stability. |
Zaizuo Tang; Zongqi Yang; Yu-Bin Yang; | code |
| 1130 | BrowseNet: Graph-Based Associative Memory for Contextual Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional retrieval-augmented generation (RAG) approaches often struggle to capture intricate associative patterns and relationships embedded within textual data. To address this limitation, we propose BrowseNet, a novel associative memory framework that leverages query-specific subgraph exploration within a named-entity based graph for enhanced information retrieval. |
PAVAN KUMAR S; Kiran Kumar Nakka; C Vamshi Krishna Reddy; Divyateja Pasupuleti; Prakhar Agarwal; Harpinder Jot Singh; Anshu Avinash; Nirav Pravinbhai Bhatt; | code |
| 1131 | Dual Randomized Smoothing: Beyond Global Noise Variance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To break through the global variance limitation, we propose a dual RS framework which enables input-dependent noise variances. |
Chenhao Sun; Yuhao Mao; Martin Vechev; | code |
| 1132 | Time-to-Move: Training-Free Motion-Controlled Video Generation Via Dual-Clock Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Time-to-Move (TTM), a training-free, plug-and-play framework for motion- and appearance-controlled video generation with image-to-video (I2V) diffusion models. |
Assaf Singer; Noam Rotstein; Amir Mann; Ron Kimmel; Or Litany; | code |
| 1133 | Sharp Monocular View Synthesis in Less Than A Second Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SHARP, an approach to photorealistic view synthesis from a single image. |
Lars Mescheder; Wei Dong; Shiwei Li; Xuyang BAI; Marcel Santos; Peiyun Hu; Bruno Lecouat; Mingmin Zhen; Amaël Delaunoy; Tian Fang; Yanghai Tsin; Stephan Richter; Vladlen Koltun; | code |
| 1134 | InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These tasks are highly challenging even for frontier systems like OpenAI o3, which only obtains 42.8% accuracy on o3-bench. To tackle these tasks, we propose InSight-o3, a multi-agent framework that divides labor between a visual reasoning agent (vReasoner) and a visual search agent (vSearcher). |
Kaican Li; Lewei Yao; Jiannan Wu; Tiezheng YU; Jierun Chen; Haoli Bai; Lu Hou; Lanqing HONG; Wei Zhang; Nevin L. Zhang; | code |
| 1135 | It’s All Just Vectorization: Einx, A Universal Notation for Tensor Operations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on the universal nature of vectorization, we introduce einx, a universal notation for tensor operations. |
Florian Fervers; Sebastian Bullinger; Christoph Bodensteiner; Michael Arens; | code |
| 1136 | Probabilistic Kernel Function for Fast Angle Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the angle testing problem in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. |
Kejing Lu; Chuan Xiao; Yoshiharu Ishikawa; | code |
| 1137 | TabStruct: Measuring Structural Fidelity of Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel evaluation framework that jointly considers structural fidelity and conventional evaluation dimensions. |
Xiangjian Jiang; Nikola Simidjievski; Mateja Jamnik; | code |
| 1138 | FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose FastFlow, a plug-and-play adaptive inference framework that accelerates generation in flow matching models. |
Divya Jyoti Bajpai; Dhruv Bhardwaj; Soumya Roy; Tejas Duseja; Harsh Agarwal; Aashay Sandansing; Manjesh Kumar Hanawal; | code |
| 1139 | Best-of-Infinity: Asymptotic Performance of Test-Time LLM Ensembling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. |
Junpei Komiyama; Daisuke Oba; Masafumi Oyamada; | code |
| 1140 | COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore the underlying mechanism of INR signal representation, leveraging harmonic analysis and Chebyshev Polynomials. |
Pandula Thennakoon; Avishka Ranasinghe; Mario De Silva; Buwaneka Epakanda; Roshan Godaliyadda; Mervyn Parakrama Bandara Ekanayake; Vijitha R. Herath; | code |
| 1141 | Thicker and Quicker: The Jumbo Token for Fast Plain Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We make ViTs faster by reducing patch token width while increasing global token width by adding a new Jumbo token. |
Anthony Fuller; Yousef Yassin; Daniel Kyrollos; Evan Shelhamer; James R Green; | code |
| 1142 | Implicit Regularization of SGD Reduces Shortcut Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify batch size as an additional critical factor and show that robustness gains arise from the implicit regularization of SGD, which intensifies with larger learning rates and smaller batch sizes. |
Nahal Mirzaie; Alireza Alipanah; Ali Abbasi; Amirmahdi Farzane; Hossein Jafarinia; Erfan Sobhaei; Mahdi Ghaznavi; Amir Najafi; Mahdieh Soleymani Baghshah; Mohammad Hossein Rohban; | code |
| 1143 | OmniMouse: Scaling Properties of Multi-modal, Multi-task Brain Models on 150B Neural Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We train multi-modal, multi-task transformer models (1M–300M parameters) that support three regimes flexibly at test time: neural prediction (predicting neuronal responses from sensory input and behavior), behavioral decoding (predicting behavior from neural activity), neural forecasting (predicting future activity from current neural dynamics), or any combination of the three. |
Konstantin Friedrich Willeke; Polina Turishcheva; Alex Gilbert; Goirik Chakrabarty; Hasan Atakan Bedel; Paul G. Fahey; Yongrong Qiu; Marissa A. Weis; Michaela Vystrčilová; Taliah Muhammad; Lydia Ntanavara; Rachel E Froebe; Kayla Ponder; Zheng Huan Tan; Emin Orhan; Erick Cobos; Sophia Sanborn; Katrin Franke; Fabian H. Sinz; Alexander S. Ecker; Andreas S. Tolias; | code |
| 1144 | LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Low-discrepancy point sets and digital sequences underpin quasi-Monte Carlo (QMC) methods for high-dimensional integration. We cast two long-standing QMC design problems as program synthesis and solve them with an LLM-guided evolutionary loop that mutates and selects code under task-specific fitness: (i) constructing finite 2D/3D point sets with low star discrepancy, and (ii) choosing Sobol’ direction numbers that minimize randomized quasi-Monte Carlo (rQMC) error on downstream integrands. |
Amir Sadikov; | code |
| 1145 | HUME: Measuring The Human-Model Performance Gap in Text Embedding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, such comparisons are rarely made, as human performance on embedding tasks is difficult to measure. To fill this gap, we introduce HUME: Human Evaluation Framework for Text Embeddings. |
Adnan El Assadi; Isaac Chung; Roman Solomatin; Niklas Muennighoff; Kenneth Enevoldsen; | code |
| 1146 | Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods are highly susceptible to noisy offline demonstrations, leading to unstable skill learning and degraded performance. To address this, we propose Self-Improving Skill Learning (SISL), which performs self-guided skill refinement using decoupled high-level and skill improvement policies, while applying skill prioritization via maximum return relabeling to focus updates on task-relevant trajectories, resulting in robust and stable adaptation even under noisy and suboptimal data. |
Seungyul Han; Sanghyeon Lee; Sangjun Bae; Yisak Park; | code |
| 1147 | Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Octax, a high-performance suite of classic arcade game environments implemented in JAX, based on CHIP-8 emulation, a predecessor to Atari, which is widely adopted as a benchmark in RL research. |
Waris Radji; Thomas Michel; Hector Piteau; | code |
| 1148 | CFO: Learning Continuous-Time PDE Dynamics Via Flow-Matched Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the Continuous Flow Operator (CFO), a framework that learns continuous-time PDE dynamics without the computational burden of standard continuous approaches, e.g., neural ODE. |
Xianglong Hou; Xinquan Huang; Paris Perdikaris; | code |
| 1149 | Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Quantization-aware training (QAT) enables ultra-low-bit compression (<4 bits per weight), but existing QAT methods often degrade reasoning capability, partly because complex knowledge structures are introduced during the post-training process in LLMs. In this paper, through a systematic investigation of how quantization affects different data domains, we find that its impact on pre-training and reasoning capabilities differs. |
Yasuyuki Okoshi; Hikari Otsuka; Daichi Fujiki; Masato Motomura; | code |
| 1150 | Talking Points: Describing and Localizing Pixels Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a novel framework for pixel level grounding. |
Matan Rusanovsky; Shimon Malnick; Shai Avidan; | code |
| 1151 | Implicit Inversion Turns CLIP Into A Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our approach optimizes a frequency-aware implicit neural representation that encourages coarse-to-fine generation by stratifying frequencies across network layers. To stabilize this inverse mapping, we introduce adversarially robust initialization, a lightweight Orthogonal Procrustes projection to align local text and image embeddings, and a blending loss that anchors outputs to natural image statistics. |
Antonio D’Orazio; Maria Rosaria Briglia; Donato Crisostomi; Dario Loi; Emanuele Rodolà; Iacopo Masi; | code |
| 1152 | Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite this progress, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often converging to suboptimal policies. To address this limitation, we propose Successive Sub-value Q-learning (S2Q), a framework that successively learns multiple sub-value functions to retain information about alternative high-value actions. |
Yonghyeon Jo; Sunwoo Lee; Seungyul Han; | code |
| 1153 | Fused-Planes: Why Train A Thousand Tri-Planes When You Can Share? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This is because the current approaches independently train one Tri-Plane per object, hence overlooking structural similarities in large classes of objects. In response to this issue, we introduce Fused-Planes, a novel object representation that improves the resource efficiency of Tri-Planes when reconstructing object classes, all while retaining the same planar structure. |
Karim Kassab; Antoine Schnepf; Jean-Yves Franceschi; Laurent Caraffa; Flavian Vasile; Jeremie Mary; Andrew I. Comport; Valerie Gouet-Brunet; | code |
| 1154 | CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present CryoNet.Refine, an end-to-end, deep learning framework that automates and accelerates molecular structure refinement. |
Fuyao Huang; Xiaozhu Yu; Kui Xu; Qiangfeng Cliff Zhang; | code |
| 1155 | Omni-iEEG: A Large-Scale, Comprehensive IEEG Dataset and Benchmark for Epilepsy Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. |
Chenda Duan; Yipeng Zhang; Sotaro Kanai; Yuanyi Ding; Atsuro Daida; Pengyue Yu; Tiancheng Zheng; Naoto Kuroda; Shaun A. Hussain; Eishi Asano; Hiroki Nariai; vwani Roychowdhury; | code |
| 1156 | STEER AWAY FROM MODE COLLISIONS: IMPROVING COMPOSITION IN DIFFUSION MODELS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose to improve multi-concept prompt fidelity in text-to-image diffusion models. |
Debottam Dutta; Jianchong Chen; RAJALAXMI RAJAGOPALAN; Yu-Lin Wei; Romit Roy Choudhury; | code |
| 1157 | Toward Conservative Planning from Human-AI Preferences in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel **M**odel-based **C**onservative **P**lanning (MCP) algorithm for offline PbRL, which leverages a general function class and uses a tractable conservative learning framework to improve the policy upon an arbitrary reference policy. |
Huazhong Wang; Wenzhuo Zhou; | code |
| 1158 | Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce an end-to-end approach to learn the evolution operators of large-scale non-linear dynamical systems, such as those describing complex natural phenomena. |
Giacomo Turri; Luigi Bonati; Kai Zhu; Massimiliano Pontil; Pietro Novelli; | code |
| 1159 | LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a **L**ightweight **E**fficient **G**r**A**dient **C**ompression strategy**Y** or LEGACY, which, in theory, can work with any compression technique to produce a simple dynamic counterpart. |
Mostapha Essoullami; El houcine Bergou; Aritra Dutta; | code |
| 1160 | Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose **E**volutionary **C**aching to **A**ccelerate **D**iffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. |
Anirud Aggarwal; Abhinav Shrivastava; Matthew Gwilliam; | code |
| 1161 | TRIBE: TRImodal Brain Encoder for Whole-brain FMRI Response Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we introduce TRIBE, the first deep neural network trained to predict brain responses to stimuli across multiple modalities, cortical areas and individuals. |
Stéphane d’Ascoli; Jérémy Rapin; Yohann Benchetrit; Hubert Banville; Jean-Remi King; | code |
| 1162 | ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We formalise this approach as *context modification* and present ContextBench — a benchmark with tasks assessing core method capabilities and potential safety applications. |
Robert Graham; Edward Stevinson; Leo Richter; Alexander Chia; Joseph Miller; Joseph Isaac Bloom; | code |
| 1163 | Why Adversarially Train Diffusion Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Adversarial Training (AT) is a known, powerful, well-established technique for improving classifier robustness to input perturbations, yet its applicability beyond discriminative settings remains limited. Motivated by the widespread use of score-based generative models and their need to operate robustly under substantial noisy or corrupted input data, we propose an adaptation of AT for these models, providing a thorough empirical assessment. |
Maria Rosaria Briglia; Mujtaba Hussain Mirza; Giuseppe Lisanti; Iacopo Masi; | code |
| 1164 | Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Noise-space Hamiltonian Monte Carlo (N-HMC), a posterior sampling method that treats reverse diffusion as a deterministic mapping from initial noise to clean images. |
Yingzhi Xia; Setthakorn Tanomkiattikun; Liangli Zhen; ZAIWANG GU; | code |
| 1165 | D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present D2E (Desktop to Embodied AI), a framework that demonstrates desktop interactions can serve as an effective pretraining substrate for robotics embodied AI tasks.We will make all our work public, including the OWA toolkit, datasets of human-collected and pseudo-labeled, and VAPT-trained models. |
Suhwan Choi; Jaeyoon Jung; Haebin Seong; Minchan Kim; Minyeong Kim; Yongjun Cho; Yoonshik Kim; Park Yu Been; Youngjae Yu; Yunsung Lee; | code |
| 1166 | Context Tokens Are Anchors: Understanding The Repeat Curse in DMLLMs from An Information Flow Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our work reveals three key findings: (1) context tokens aggregate semantic information as anchors and guide the final predictions; (2) as information propagates across layers, the entropy of context tokens converges in deeper layers, reflecting the model’s growing prediction certainty; (3) Repetition is typically linked to disruptions in the information flow of context tokens and to the inability of their entropy to converge in deeper layers. Based on these insights, we present CoTA, a plug-and-play method for mitigating repetition. |
Qiyan Zhao; Xiaofeng Zhang; Shuochen Chang; Qianyu Chen; Xiaosong Yuan; Xuhang Chen; Luoqi Liu; Jiajun Zhang; Xu-Yao Zhang; Da-Han Wang; | code |
| 1167 | BigMaQ: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Especially for non-human primates, the animals phylogenetically closest to humans, mesh-based tracking efforts lag behind those for other species, leaving pose descriptions restricted to sparse keypoints that are unable to fully capture the richness of action dynamics. To address this gap, we introduce the $\textit{Big Macaque 3D Motion and Animation Dataset}$ ($\texttt{BigMac3D}$), a large-scale dataset comprising more than 750 scenes of interacting rhesus macaques with detailed 3D pose descriptions of skeletal joint rotations. |
Lucas Martini; Alexander Lappe; Anna Bognár; Rufin Vogels; Martin A. Giese; | code |
| 1168 | Self-Guided Low Light Object Detection Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Object detection in low-light environments is inherently challenging due to limited contrast and heavy noise, both of which significantly degrade feature representations. In this paper, we propose a novel self-guided low-light object detection framework that effectively addresses these issues without introducing additional parameters or increasing inference time. |
Gwangik Shin; Jaeha Song; Soonmin Hwang; | code |
| 1169 | SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SSD-GS, a physically-based relighting framework built upon 3D Gaussian Splatting (3DGS) that achieves high-quality reconstruction and photorealistic relighting under novel lighting conditions. |
Iris Zheng; Guojun Tang; Alexander Doronin; Paul D. Teal; Fang-Lue Zhang; | code |
| 1170 | Patronus: Interpretable Diffusion Models with Prototypes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With a critical question — how can the diffusion generation process be interpreted and understood? — we proposed \textit{Patronus}, an interpretable diffusion model that incorporates a prototypical network to encode semantics in visual patches, revealing \textit{what} visual patterns are learned and \textit{where} and \textit{when} they emerge throughout denoising. |
Nina Weng; Aasa Feragen; Siavash Bigdeli; | code |
| 1171 | Regret-Guided Search Control for Efficient Learning in AlphaZero Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Regret-Guided Search Control (RGSC), which extends AlphaZero with a regret network that learns to identify high-regret states, where the agent’s evaluation diverges most from the actual outcome. |
Yun-Jui Tsai; Wei-Yu Chen; Yan-Ru Ju; Yu-Hung Chang; Ti-Rong Wu; | code |
| 1172 | UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. |
Bishoy Galoaa; Xiangyu Bai; Utsav Nandi; Sai Siddhartha Vivek Dhir Rangoju; Somaieh Amraee; Sarah Ostadabbas; | code |
| 1173 | Revisiting Tree-Sliced Wasserstein Distance Through The Lens of The Fermat–Weber Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This enhanced spatial sensitivity enables TSW to reflect the geometric structure of the underlying data more accurately. Building on this insight, we propose a novel variant of TSW that explicitly leverages positional information in its design. |
Viet-Hoang Tran; Thanh Tran; Thanh Chu; Trung-Khang Tran; Duy-Tung Pham; Tam Le; Tan Minh Nguyen; | code |
| 1174 | Tree-sliced Sobolev IPM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we revisit Sobolev integral probability metrics (IPM) on trees to obtain a practical generalization of TSW. |
Viet-Hoang Tran; Thanh Tran; Thanh Chu; Duy-Tung Pham; Trung-Khang Tran; Tam Le; Tan Minh Nguyen; | code |
| 1175 | Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we introduce topological descriptors for graphs, simplices, and cells that interleave a sequence of inclusions with a sequence of contractions and related families parametrized by two functions. |
Mattie Ji; Indradyumna Roy; Vikas K Garg; | code |
| 1176 | Let OOD Feature Exploring Vast Predefined Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Vast Predefined Classifiers (VPC), which constructs a pre-specified Orthogonal Equiangular Feature Space (OEFS) to explicitly separate ID and OOD representations while capturing the rich variability of OOD features. |
Kewen Xia; Xiaodong Yue; WeiZhipeng; Yaxin Peng; Zihao Li; Jianxiang Zhu; Jie Shi; PeilinXu; | code |
| 1177 | SPICE: Submodular Penalized Information–Conflict Selection for Efficient Large Language Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Guided by this analysis, we propose SPICE, a conflict-aware selector that maximizes information while penalizing misalignment, and that supports early stopping and proxy models for efficiency. |
Powei Chang; Jinpeng Zhang; Bowen Chen; Chenyu Wang; Chenlu Guo; Yixing Zhang; Yukang Gao; JianXiang Xiang; Yue Gao; Chaoqun Sun; Yiyi Chen; Dongying kong; | code |
| 1178 | TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To explore how to improve the performance of methods on both tasks simultaneously by enhancing long-range temporal understanding capabilities, we introduce **TAMMs**, the first unified framework designed to jointly perform TCD and FSIF within a single MLLM-diffusion architecture. |
Zhongbin Guo; Yuhao Wang; Ping Jian; Chengzhi Li; Xinyue Chen; Zhen Yang; Ertai E; | code |
| 1179 | Composition of Pretrained Diffusion Models: A Logic-Based Calculus Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We expose the inadequacy and inconsistency of combining these operators in terms of limited mode coverage, biased sampling, instability under negation queries, and failure to satisfy basic compositional laws such as idempotency and distributivity. We introduce a principled calculus grounded in fuzzy logic that resolves these issues. |
Peter Blohm; Vikas K Garg; | code |
| 1180 | Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a forecast-first UQ framework that blends the empirical distribution with a frozen pretrained generator using a unique Dirichlet schedule, ensuring time-consistent forecasts. |
Fernando Ruiz-Mazo; Vikas K Garg; | code |