NeurIPS 2025 Papers with Code & Data
To facilitate rapid community engagement with the presented research, we have compiled an extensive index of accepted papers that have associated public code or data repositories. We list all of them in the following table. This index was generated using an automated extraction process. While we strive for completeness, some papers with public resources may have been missed. Please inform us if you discover any additional papers that should be included. Readers should be aware that some code repositories may not be made fully public until the conference officially begins.
In addition to this index, we encourage readers to explore our related resources: NeurIPS-2025 papers & highlights: For curated summaries and key takeaways from this year’s conference. “Best Paper” Digest (NeurIPS): A historical overview of the most influential NeurIPS papers published since 1987.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive daily paper digests on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: NeurIPS 2025 Papers with Code & Data
| Paper | Author(s) | Code | |
|---|---|---|---|
| 1 | Perception Encoder: The Best Visual Embeddings Are Not at The Output of The Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To draw them out, we introduce two alignment methods: language alignment for multimodal language modeling, and spatial alignment for dense prediction.We release our models, code, and novel dataset of synthetically and human-annotated videos: https://github.com/facebookresearch/perception_models |
Daniel Bolya; Po-Yao Huang; Peize Sun; Jang Hyun Cho; Andrea Madotto; Chen Wei; Tengyu Ma; Jiale Zhi; Jathushan Rajasegaran; Hanoona Abdul Rasheed; Junke Wang; Marco Monteiro; Hu Xu; Shiyu Dong; Nikhila Ravi; Shang-Wen Li; Piotr Dollar; Christoph Feichtenhofer; | code |
| 2 | Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct comprehensive experiments to systematically investigate gating-augmented softmax attention variants. |
Zihan Qiu; Zekun Wang; Bo Zheng; Zeyu Huang; Kaiyue Wen; Songlin Yang; Rui Men; Le Yu; Fei Huang; Suozhi Huang; Dayiheng Liu; Jingren Zhou; Junyang Lin; | code |
| 3 | Large Language Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing *LLaDA*, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. |
Shen Nie; Fengqi Zhu; Zebin You; Xiaolu Zhang; Jingyang Ou; Jun Hu; JUN ZHOU; Yankai Lin; Ji-Rong Wen; Chongxuan Li; | code |
| 4 | Efficient Part-level 3D Object Generation Via Dual Volume Packing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key challenge is that different objects may have a varying number of parts. To address this, we propose a new end-to-end framework for part-level 3D object generation. |
Jiaxiang Tang; Ruijie Lu; Max Li; Zekun Hao; Xuan Li; Fangyin Wei; Shuran Song; Gang Zeng; Ming-Yu Liu; Tsung-Yi Lin; | code |
| 5 | Composing Global Solutions to Reasoning Tasks Via Algebraic Objects in Neural Nets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove rich algebraic structures of the solution space for 2-layer neural networks with quadratic activation and $L_2$ loss, trained on reasoning tasks in Abelian group (e.g., modular addition). |
Yuandong Tian; | code |
| 6 | Show-o2: Improved Native Unified Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents improved native unified multimodal models, \emph{i.e.,} Show-o2, that leverage autoregressive modeling and flow matching. |
Jinheng Xie; Zhenheng Yang; Mike Zheng Shou; | code |
| 7 | Compute-Optimal Scaling for Value-Based Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate compute scaling for online, value-based deep RL. |
Preston Fu; Oleh Rybkin; Zhiyuan Zhou; Michal Nauman; Pieter Abbeel; Sergey Levine; Aviral Kumar; | code |
| 8 | MoBA: Mixture of Block Attention for Long-Context LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a solution that adheres to the “less structure” principle, allowing the model to determine where to attend autonomously, rather than introducing predefined biases. |
Enzhe Lu; Zhejun Jiang; Jingyuan Liu; Yulun Du; Tao Jiang; Chao Hong; Shaowei Liu; Weiran He; Enming Yuan; Yuzhi Wang; Zhiqi Huang; Huan Yuan; Suting Xu; Xinran Xu; Guokun Lai; Yanru Chen; Huabin Zheng; Junjie Yan; Jianlin Su; Yuxin Wu; Yutao Zhang; Zhilin Yang; Xinyu Zhou; Mingxing Zhang; Jiezhong Qiu; | code |
| 9 | Tiled Flash Linear Attention: More Efficient Linear RNN and XLSTM Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Tiled Flash Linear Attention (TFLA), a novel kernel algorithm for linear RNNs, that enables arbitrary large chunk sizes and high arithmetic intensity by introducing an additional level of sequence parallelization within each chunk. |
Maximilian Beck; Korbinian Pöppel; Phillip Lippe; Sepp Hochreiter; | code |
| 10 | PrefixKV: Adaptive Prefix KV Cache Is What Vision Instruction-Following Models Need for Efficient Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in the significant contextual information loss for certain layers, leading to notable performance decline. To address this, we present PrefixKV. It reframes the challenge of determining KV cache sizes for all layers into the task of searching for the optimal global prefix configuration. |
Ao Wang; Hui Chen; Jianchao Tan; Kefeng Zhang; Xunliang Cai; Zijia Lin; Jungong Han; Guiguang Ding; | code |
| 11 | Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models Via Monocular Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Fin3R, a simple, effective, and general fine-tuning method for feed-forward 3D reconstruction models. |
Weining Ren; Hongjun Wang; Xiao Tan; Kai Han; | code |
| 12 | Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these modules improve real-time and control capabilities, it remains an open question whether they preserve or degrade the semantic knowledge contained in the pretrained VLM, and what effect they have on the VLA training dynamics. In this paper, we study this question in the context of VLAs that include a continuous diffusion or flow matching action expert, showing that naively including such experts significantly harms both training speed and knowledge transfer. |
Danny Driess; Jost Tobias Springenberg; brian ichter; LILI YU; Adrian Li-Bell; Karl Pertsch; Allen Z. Ren; Homer Walke; Quan Vuong; Lucy Xiaoyang Shi; Sergey Levine; | code |
| 13 | Sampling 3D Molecular Conformers with Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, applying DiTs to molecules introduces novel challenges, such as integrating discrete molecular graph information with continuous 3D geometry, handling Euclidean symmetries, and designing conditioning mechanisms that generalize across molecules of varying sizes and structures. We propose DiTMC, a framework that adapts DiTs to address these challenges through a modular architecture that separates the processing of 3D coordinates from conditioning on atomic connectivity. |
Thorben Frank; Winfried Ripken; Gregor Lied; Klaus Robert Muller; Oliver T. Unke; Stefan Chmiela; | code |
| 14 | FlashBias: Fast Computation of Attention with Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diving into the computation of FlashAttention, we prove that its optimal efficiency is determined by the rank of the attention weight matrix. Inspired by this theoretical result, this paper presents FlashBias based on the low-rank compressed sensing theory, which can provide fast-exact computation for many widely used attention biases and a fast-accurate approximation for biases in general formalizations. |
Haixu Wu; Minghao Guo; Yuezhou Ma; Yuanxu Sun; Jianmin Wang; Wojciech Matusik; Mingsheng Long; | code |
| 15 | VividFace: A Robost and High-Fidelity Video Face Swapping Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further enhance identity and pose disentanglement, we introduce and release the Attribute-Identity Disentanglement Triplet (AIDT) dataset, comprising a large-scale collection of triplets where each set contains three face images—two sharing the same pose and two sharing the same identity. |
Hao Shao; Shulun Wang; Yang Zhou; Guanglu Song; Dailan He; Zhuofan Zong; Shuo Qin; Yu Liu; Hongsheng Li; | code |
| 16 | FastVID: Dynamic Density Pruning for Fast Video Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap, we perform a systematic analysis of video redundancy from two perspectives: temporal context and visual context. Leveraging these insights, we propose Dynamic Density Pruning for Fast Video LLMs termed FastVID. |
Leqi Shen; Guoqiang Gong; Tao He; Yifeng Zhang; pengzhang liu; Sicheng Zhao; Guiguang Ding; | code |
| 17 | Diffusion Beats Autoregressive in Data-Constrained Settings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings—where training involves repeated passes over limited data—and find that they significantly outperform AR models when compute is abundant but data is scarce. |
Mihir Prabhudesai; Mengning Wu; Amir Zadeh; Katerina Fragkiadaki; Deepak Pathak; | code |
| 18 | Mulberry: Empowering MLLM with O1-like Reasoning and Reflection Via Collective Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. |
Huanjin Yao; Jiaxing Huang; Wenhao Wu; Jingyi Zhang; Yibo Wang; Shunyu Liu; Yingjie Wang; YuXin Song; Haocheng Feng; Li Shen; Dacheng Tao; | code |
| 19 | Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce BTB3D (Better Tokens for Better 3D), a causal convolutional encoder-decoder that unifies 2D and 3D training and inference while producing compact, frequency-aware volumetric tokens. |
Ibrahim Ethem Hamamci; Sezgin Er; Suprosanna Shit; Hadrien Reynaud; Dong Yang; Pengfei Guo; Marc Edgar; Daguang Xu; Bernhard Kainz; Bjoern Menze; | code |
| 20 | FEAT: Free Energy Estimators with Adaptive Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Free energy Estimators with Adaptive Transport (FEAT), a novel framework for free energy estimation—a critical challenge across scientific domains. |
Yuanqi Du; Jiajun He; Francisco Vargas; Yuanqing Wang; Carla P Gomes; José Miguel Hernández-Lobato; Eric Vanden-Eijnden; | code |
| 21 | Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Retrieval-Augmented Generation (RAG) has demonstrated effectiveness in processing long context for Large Language Models (LLMs); however, applying RAG to long video faces challenges such as disrupted temporal dependencies and inclusion of irrelevant information that can hinder accurate reasoning. To address these limitations, we propose Vgent, a novel \textbf{graph-based retrieval-reasoning-augmented generation framework} to enhance LVLMs for long video understanding. |
Xiaoqian Shen; Wenxuan Zhang; Jun Chen; Mohamed Elhoseiny; | code |
| 22 | RAD: Training An End-to-End Driving Policy Via Large-Scale 3DGS-based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose RAD, a 3DGS-based closed-loop Reinforcement Learning (RL) framework for end-to-end Autonomous Driving.We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. |
Hao Gao; Shaoyu Chen; Bo Jiang; Bencheng Liao; Yiang Shi; Xiaoyang Guo; Yuechuan Pu; haoran yin; Xiangyu Li; xinbang zhang; ying zhang; Wenyu Liu; Qian Zhang; Xinggang Wang; | code |
| 23 | BiggerGait: Unlocking Gait Recognition with Layer-wise Representations from Large Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our analysis reveals that LVM’s intermediate layers offer complementary properties across tasks, integrating them yields an impressive improvement even without rich well-designed gait priors. Building on this insight, we propose a simple and universal baseline for LVM-based gait recognition, termed BiggerGait. |
Dingqiang Ye; Chao Fan; Zhanbo Huang; Chengwen Luo; Jianqiang Li; Shiqi Yu; Xiaoming Liu; | code |
| 24 | R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models Via Share-GRPO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to incentivize the reasoning ability of Multimodal Large Language Models (MLLMs) via reinforcement learning (RL) and develop an effective approach that mitigates the sparse reward and advantage vanishing issues during RL. |
Huanjin Yao; Qixiang Yin; Jingyi Zhang; Min Yang; Yibo Wang; Wenhao Wu; Fei Su; Li Shen; Minghui Qiu; Dacheng Tao; Jiaxing Huang; | code |
| 25 | Scalable Best-of-N Selection for Large Language Models Via Self-Certainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Reward-free alternatives, like self-consistency and universal self-consistency, are limited in their ability to handle open-ended generation tasks or scale effectively. To address these limitations, we propose self-certainty, a novel and efficient metric that leverages the inherent probability distribution of LLM outputs to estimate response quality without requiring external reward models. |
Zhewei Kang; Xuandong Zhao; Dawn Song; | code |
| 26 | QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose leveraging Singular-Value Decomposition (SVD) over the joint query (Q), key (K), and value (V) weight matrices to reduce KV cache size and computational overhead. |
Yutong Wang; Haiyu Wang; Sai Qian Zhang; | code |
| 27 | Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers. |
Liliang Ren; Congcong Chen; Haoran Xu; Young Jin Kim; Adam Atkinson; Zheng Zhan; Jiankai Sun; Baolin Peng; Liyuan Liu; Shuohang Wang; Hao Cheng; Jianfeng Gao; Weizhu Chen; yelong shen; | code |
| 28 | Reward Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Reward Reasoning Models (RRMs), which are specifically designed to execute a deliberate reasoning process before generating final rewards. |
Jiaxin Guo; Zewen Chi; Li Dong; Qingxiu Dong; Xun Wu; Shaohan Huang; Furu Wei; | code |
| 29 | Exploring Diffusion Transformer Designs Via Grafting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present *grafting*, a simple approach for editing pretrained diffusion transformers (DiTs) to materialize new architectures under small compute budgets. |
Keshigeyan Chandrasegaran; Michael Poli; Daniel Y Fu; Dongjun Kim; Lea M. Hadzic; Manling Li; Agrim Gupta; Stefano Massaroli; Azalia Mirhoseini; Juan Carlos Niebles; Stefano Ermon; Li Fei-Fei; | code |
| 30 | Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning Via Selective Rollouts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our analysis of reward dynamics reveals a strong temporal consistency in prompt value: prompts that are uninformative in one epoch of training are likely to remain uninformative in near future epochs. Based on these insights, we propose GRESO (GRPO with Efficient Selective Rollout), an online, lightweight pre-rollout filtering algorithm that predicts and skips uninformative prompts using reward training dynamics. |
Haizhong Zheng; Yang Zhou; Brian R. Bartoldson; Bhavya Kailkhura; Fan Lai; Jiawei Zhao; Beidi Chen; | code |
| 31 | Faster Video Diffusion with Trainable Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Scaling video diffusion transformers (DiTs) is limited by their quadratic 3D attention, even though most of the attention mass concentrates on a small subset of positions. We turn this observation into VSA, a trainable, hardware-efficient sparse attention that replaces full attention at both training and inference. |
Peiyuan Zhang; Yongqi Chen; Haofeng Huang; Will Lin; Zhengzhong Liu; Ion Stoica; Eric P. Xing; Hao Zhang; | code |
| 32 | Continuous Concepts Removal in Text-to-image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, these methods lead to poor alignment between the text prompts and the generated image after the continuous removal process. To address this issue, we propose a novel concept removal approach called CCRT that includes a designed knowledge distillation paradigm. |
Tingxu Han; Weisong Sun; Yanrong Hu; Chunrong Fang; Yonglong zhang; Shiqing Ma; Tao Zheng; Zhenyu Chen; Zhenting Wang; | code |
| 33 | A Smooth Sea Never Made A Skilled SAILOR: Robust Imitation Via Learning to Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we explore *learning to search* (L2S) from expert demonstrations, i.e. learning the components required to, at test time, plan to match expert outcomes, even after making a mistake. |
Arnav Kumar Jain; Vibhakar Mohta; Subin Kim; Atiksh Bhardwaj; Juntao Ren; Yunhai Feng; Sanjiban Choudhury; Gokul Swamy; | code |
| 34 | Scaling Speculative Decoding with Lookahead Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that reasoning models generate step-by-step, and each step needs only to be semantically correct, not exact token matching. |
Yichao Fu; Rui Ge; Zelei Shao; Zhijie Deng; Hao Zhang; | code |
| 35 | Efficiently Scaling LLM Reasoning Programs with Certaindex Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, we observe that these algorithms exhibit answer stabilization: their intermediate solutions often cease to change after a certain point, and further investment of compute does not change their final answer. To quantify this phenomenon, we introduce Certaindex, an algorithm-agnostic metric measuring this evolving stability, signaling when further computation is unlikely to alter the final result. |
Yichao Fu; Junda Chen; Siqi Zhu; Zheyu Fu; Zhongdongming Dai; Yonghao Zhuang; Yian Ma; Aurick Qiao; Tajana Rosing; Ion Stoica; Hao Zhang; | code |
| 36 | BLEUBERI: BLEU Is A Surprisingly Effective Reward for Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on this insight, we develop BLEUBERI, a method that first identifies challenging instructions and then applies Group Relative Policy Optimization (GRPO) using BLEU directly as the reward function.We release our code and data at https://github.com/lilakk/BLEUBERI. |
Yapei Chang; Yekyung Kim; Michael Krumdick; Amir Zadeh; Chuan Li; Chris Tanner; Mohit Iyyer; | code |
| 37 | A-Mem: Agentic Memory for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, these systems’ fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. |
Wujiang Xu; Zujie Liang; Kai Mei; Hang Gao; Juntao Tan; Yongfeng Zhang; | code |
| 38 | YOLOv12: Attention-Centric Real-Time Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. |
Yunjie Tian; Qixiang Ye; David Doermann; | code |
| 39 | Multi-Agent Collaboration Via Evolving Orchestration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator (puppeteer) dynamically directs agents (puppets) in response to evolving task states. |
Yufan Dang; Chen Qian; Xueheng Luo; Jingru Fan; Zihao Xie; Ruijie Shi; Weize Chen; Cheng Yang; Xiaoyin Che; Ye Tian; Xuantang Xiong; Lei Han; Zhiyuan Liu; Maosong Sun; | code |
| 40 | FlexWorld: Progressively Expanding 3D Scenes for Flexible-View Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce FlexWorld, a novel framework that progressively constructs a persistent 3D Gaussian splatting representation by synthesizing and integrating new 3D content. |
Luxi Chen; Zihan Zhou; Min Zhao; Yikai Wang; Ge Zhang; Wenhao Huang; Hao Sun; Ji-Rong Wen; Chongxuan Li; | code |
| 41 | Anchored Diffusion Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that this performance gap arises when important tokens (e.g., key words or low-frequency words that anchor a sentence) are masked early in the forward process, limiting contextual information for accurate reconstruction. To address this, we introduce the *Anchored Diffusion Language Model (ADLM)*, a novel two-stage framework that first predicts distributions over important tokens via an anchor network, and then predicts the likelihoods of missing tokens conditioned on the anchored predictions. |
Litu Rout; Constantine Caramanis; Sanjay Shakkottai; | code |
| 42 | Thinkless: LLM Learns When to Think Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates an open question: Can LLMs learn when to think? To answer this, we propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning, based on both task complexity and the model’s ability. |
Gongfan Fang; Xinyin Ma; Xinchao Wang; | code |
| 43 | Meta CLIP 2: A Worldwide Scaling Recipe Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present Meta CLIP 2, the first recipe training CLIP from scratch on worldwide web-scale image-text pairs. |
Yung-Sung Chuang; Yang Li; Dong Wang; Ching-Feng Yeh; Kehan Lyu; Ramya Raghavendra; James R. Glass; LIFEI HUANG; Jason E Weston; Luke Zettlemoyer; Xinlei Chen; Zhuang Liu; Saining Xie; Wen-tau Yih; Shang-Wen Li; Hu Xu; | code |
| 44 | LaViDa: A Large Diffusion Model for Vision-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LaViDa, a family of VLMs built on DMs. |
Shufan Li; Konstantinos Kallidromitis; Hritik Bansal; Akash Gokul; Yusuke Kato; Kazuki Kozuka; Jason Kuen; Zhe Lin; Kai-Wei Chang; Aditya Grover; | code |
| 45 | Latent Policy Barrier: Learning Robust Visuomotor Policies By Staying In-Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Latent Policy Barrier, a framework for robust visuomotor policy learning. |
Zhanyi Sun; Shuran Song; | code |
| 46 | Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove convergence to the near-optimal KL-regularized policy from arbitrary off-policy data via an improved change-of-trajectory-measure analysis. |
Yurun Yuan; Fan Chen; Zeyu Jia; Alexander Rakhlin; Tengyang Xie; | code |
| 47 | Magical: Medical Lay Language Generation Via Semantic Invariance and Layperson-tailored Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, through a series of exploratory experiments, we reveal that standard LoRA fail to meet the requirement for semantic fidelity and diverse lay-style generation in MLLG task. To address these limitations, we propose Magical, an asymmetric LoRA architecture tailored for MLLG under heterogeneous data scenarios. |
Weibin Liao; Tianlong Wang; Yinghao Zhu; Yasha Wang; Junyi Gao; Liantao Ma; | code |
| 48 | SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SeerAttention, a simple yet effective attention mechanism that directly learns the block-level attention sparsity from the LLM itself. |
Yizhao Gao; Zhichen Zeng; DaYou Du; Shijie Cao; Peiyuan Zhou; Jiaxing Qi; Junjie Lai; Hayden Kwok-Hay So; Ting Cao; Fan Yang; Mao Yang; | code |
| 49 | SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, most problem synthesis strategies indiscriminately expand the problem set without considering the model’s capabilities, leading to low efficiency in generating useful questions. To mitigate this issue, we introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation. |
Xiao Liang; Zhong-Zhi Li; Yeyun Gong; Yang Wang; Hengyuan Zhang; yelong shen; Ying Nian Wu; Weizhu Chen; | code |
| 50 | SimpleStrat: Diversifying Language Model Generation with Stratification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the model’s next-token probabilities may not be representative of the true answer distribution. To combat these challenges, we propose SimpleStrat, an alternative that uses the language model itself to partition the solution space into strata from which to sample. |
Justin Wong; Yury Orlovskiy; Alexander Shypula; Michael Luo; Sanjit A. Seshia; Joseph E. Gonzalez; | code |
| 51 | Radial Attention: $\mathcal O(n \log N)$ Sparse Attention for Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify a phenomenon we term Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal distance between tokens increase, akin to the physical decay of signal or waves over space and time in nature. |
Xingyang Li; Muyang Li; Tianle Cai; Haocheng Xi; Shuo Yang; Yujun Lin; Lvmin Zhang; Songlin Yang; Jinbo Hu; Kelly Peng; Maneesh Agrawala; Ion Stoica; Kurt Keutzer; Song Han; | code |
| 52 | DecompNet: Enhancing Time Series Forecasting Models with Implicit Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we pioneer the idea of implicit decomposition. |
Donghao Luo; Xue Wang; | code |
| 53 | GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **GUI-Actor**, a VLM-based method for coordinate-free GUI grounding. |
Qianhui Wu; Kanzhi Cheng; Rui Yang; Chaoyun Zhang; Jianwei Yang; Huiqiang Jiang; Jian Mu; Baolin Peng; Bo Qiao; Reuben Tan; Si Qin; Lars Liden; Qingwei Lin; Huan Zhang; Tong Zhang; Jianbing Zhang; Dongmei Zhang; Jianfeng Gao; | code |
| 54 | RLVR-World: Training World Models with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, standard training objectives such as maximum likelihood estimation (MLE) often misalign with task-specific goals of world models, i.e., transition prediction metrics like accuracy or perceptual quality. In this paper, we present RLVR-World, a unified framework that leverages reinforcement learning with verifiable rewards (RLVR) to directly optimize world models for such metrics. |
Jialong Wu; Shaofeng Yin; Ningya Feng; Mingsheng Long; | code |
| 55 | MR. Video: MapReduce As An Effective Principle for Long Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This intuition is noted in the well-established MapReduce principle in big data processing and is naturally compatible with inference scaling at the system level. Motivated by this, we propose MR. Video (pronounced as mister video), a long video understanding framework adopting the MapReduce principle. |
Ziqi Pang; Yu-Xiong Wang; | code |
| 56 | OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **Workforce**, a hierarchical multi-agent framework that decouples strategic planning from specialized execution through a modular architecture comprising: *(i)* a *domain-agnostic* **Planner** for task decomposition, *(ii)* a **Coordinator** for subtask management, and *(iii)* specialized **Workers** with *domain-specific* tool-calling capabilities. |
Mengkang Hu; Yuhang Zhou; Wendong Fan; Yuzhou Nie; Ziyu Ye; Bowei Xia; Tao Sun; Zhaoxuan Jin; Yingru Li; Zeyu Zhang; Yifeng Wang; Qianshuo Ye; Bernard Ghanem; Ping Luo; Guohao Li; | code |
| 57 | First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning Via Unsupervised Post-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent efforts have explored this direction, their methods are complex and difficult to iterate. To address this, we propose MM-UPT, a simple yet effective framework for unsupervised post-training of MLLMs, enabling continual self-improvement without any external supervision. |
Lai Wei; Yuting Li; Chen Wang; Yue Wang; Linghe Kong; Weiran Huang; Lichao Sun; | code |
| 58 | ContextAgent: Context-Aware Proactive LLM Agents with Open-world Sensory Perceptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ContextAgent, the first context-aware proactive agent that incorporates extensive sensory contexts surrounding humans to enhance the proactivity of LLM agents. |
Bufang Yang; Lilin Xu; Liekang Zeng; Kaiwei Liu; Siyang Jiang; Wenrui Lu; Hongkai Chen; Xiaofan Jiang; Guoliang Xing; Zhenyu Yan; | code |
| 59 | WebThinker: Empowering Large Reasoning Models with Deep Research Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their reliance on static internal knowledge limits their performance on complex, knowledge-intensive tasks and hinders their ability to produce comprehensive research reports requiring synthesis of diverse web information. To address this, we propose WebThinker, a deep research agent that empowers LRMs to autonomously search the web, navigate among web pages, and draft reports during the reasoning process. |
Xiaoxi Li; Jiajie Jin; Guanting Dong; Hongjin Qian; Yongkang Wu; Ji-Rong Wen; Yutao Zhu; Zhicheng Dou; | code |
| 60 | DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: System-level defenses have recently shown promise by enforcing static or predefined policies, but they still face two key challenges: the ability to dynamically update security rules and the need for memory stream isolation. To address these challenges, we propose DRIFT, a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control- and data-level constraints. |
Hao Li; Xiaogeng Liu; CHIU Hung Chun; Dianqi Li; Ning Zhang; Chaowei Xiao; | code |
| 61 | Interpretable Next-token Prediction Via The Generalized Induction Head Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While large transformer models excel in predictive performance, their lack of interpretability restricts their usefulness in high-stakes domains. To remedy this, we propose the Generalized Induction-Head Model (GIM), an interpretable model for next-token prediction inspired by the observation of “induction heads” in LLMs. |
Eunji Kim; Sriya Mantena; Weiwei Yang; Chandan Singh; Sungroh Yoon; Jianfeng Gao; | code |
| 62 | UniTraj: Learning A Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its potential, data preparation, pre-training strategy development, and architectural design present significant challenges in constructing this model. Therefore, we introduce **UniTraj**, a Universal Trajectory foundation model that aims to address these limitations through three key innovations. |
Yuanshao Zhu; James Jianqiao Yu; Xiangyu Zhao; Xun Zhou; Liang Han; Xuetao Wei; Yuxuan Liang; | code |
| 63 | Reasoning Beyond Points: A Visual Introspective Approach for Few-Shot 3D Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, scene complexity and insufficient representation of local geometric structures pose significant challenges to PC-FSS. To address these issues, we propose a novel pre-training-free Visual Introspective Prototype Segmentation network (VIP-Seg). |
Changshuo Wang; Shuting He; Xiang Fang; Zhijian Hu; JIA-HONG HUANG; Yixian Shen; Prayag Tiwari; | code |
| 64 | FedWMSAM: Fast and Flat Federated Learning Via Weighted Momentum and Sharpness-Aware Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose \textbf{FedWMSAM} to address both failure modes. |
Tianle Li; Yongzhi Huang; Linshan Jiang; Chang Liu; Qipeng Xie; Wenfeng Du; Lu Wang; Kaishun Wu; | code |
| 65 | HumanoidGen: Data Generation for Bimanual Dexterous Manipulation Via LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents HumanoidGen, an automated task creation and demonstration collection framework that leverages atomic dexterous operations and LLM reasoning to generate relational constraints.In experiments, we create a novel benchmark with augmented scenarios to evaluate the quality of the collected data. |
Zhi Jing; Siyuan Yang; Jicong Ao; Ting Xiao; Yu-Gang Jiang; Chenjia Bai; | code |
| 66 | Test3R: Learning to Reconstruct 3D at Test Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce \textbf{Test3R}, a surprisingly simple test-time learning technique that significantly boosts geometric accuracy. |
Yuheng Yuan; Qiuhong Shen; Shizun Wang; Xingyi Yang; Xinchao Wang; | code |
| 67 | SAFE: Multitask Failure Detection for Vision-Language-Action Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the multitask failure detection problem and propose SAFE, a failure detector for generalist robot policies such as VLAs. |
Qiao Gu; Yuanliang Ju; Shengxiang Sun; Igor Gilitschenski; Haruki Nishimura; Masha Itkina; Florian Shkurti; | code |
| 68 | Think or Not? Selective Reasoning Via Reinforcement Learning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the human-like thinking process—where people skip reasoning for easy questions but think carefully when needed—we explore how to enable VLMs to first decide *when reasoning is necessary*. To realize this, we propose \ours, a two-stage training strategy: **(i)** a supervised fine-tuning (SFT) stage with a simple yet effective “**thought dropout**” operation, where reasoning traces are randomly replaced with empty thoughts. |
Jiaqi WANG; Kevin Qinghong Lin; James Cheng; Mike Zheng Shou; | code |
| 69 | QuadricFormer: Scene As Superquadrics for 3D Semantic Occupancy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a probabilistic superquadric mixture model, which interprets each superquadric as an occupancy probability distribution with a corresponding geometry prior, and calculates semantics through probabilistic mixture. Building on this, we present QuadricFormer, a superquadric-based model for efficient 3D occupancy prediction, and introduce a pruning-and-splitting module to further enhance modeling efficiency by concentrating superquadrics in occupied regions. |
Sicheng Zuo; Wenzhao Zheng; Xiaoyong Han; Longchao Yang; Yong Pan; Jiwen Lu; | code |
| 70 | VCM: Vision Concept Modeling with Adaptive Vision Token Compression Via Instruction Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we introduce the concept of a vision concept model, a novel paradigm that enables LVLMs to dynamically extract the most relevant vision concepts from complex inputs, based on task-specific instructions. To optimize this vision concept modeling process, we propose VCM, a self-supervised framework that leverages vision-language correlations across diverse instances. |
Run Luo; Renke Shan; Longze Chen; Ziqiang Liu; Lu Wang; Min Yang; Xiaobo Xia; | code |
| 71 | Self-Adapting Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $\textbf{Se}$lf-$\textbf{A}$dapting $\textbf{L}$LMs (SEAL), a framework that enables LLMs to self-adapt by generating their own finetuning data and update directives. |
Adam Zweiger; Jyothish Pari; Han Guo; Yoon Kim; Pulkit Agrawal; | code |
| 72 | Curly Flow Matching for Learning Non-gradient Field Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Curly Flow Matching (Curly-FM), a novel approach that is capable of learning non-gradient field dynamics by designing and solving a Schrödinger bridge problem with a non-zero drift reference process—in stark contrast to typical zero-drift reference processes—which is constructed using inferred velocities in addition to population snapshot data. |
Katarina Petrović; Lazar Atanackovic; Viggo Moro; Kacper Kapuśniak; Ismail Ilkan Ceylan; Michael M. Bronstein; Joey Bose; Alexander Tong; | code |
| 73 | Geometry Aware Operator Transformer As An Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the existence of many operator learning algorithms to approximate such PDEs, we find that accurate models are not necessarily computationally efficient and vice versa. We address this issue by proposing a geometry aware operator transformer (GAOT) for learning PDEs on arbitrary domains. |
Shizheng Wen; Arsh Kumbhat; Levi Lingsch; Sepehr Mousavi; Yizhou Zhao; Praveen Chandrashekar; Siddhartha Mishra; | code |
| 74 | Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. |
Zinan Lin; Enshu Liu; Xuefei Ning; Junyi Zhu; Wenyu Wang; Sergey Yekhanin; | code |
| 75 | UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable.We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie. |
Han Xiao; Guozhi Wang; Yuxiang Chai; Zimu Lu; Weifeng Lin; Hao He; Lue Fan; Liuyang Bian; Rui Hu; Liang Liu; Shuai Ren; Yafei Wen; Xiaoxin Chen; Aojun Zhou; Hongsheng Li; | code |
| 76 | PLSTM: Parallelizable Linear Source Transition Mark Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend the notion of multi-dimensionality to linear RNNs. |
Korbinian Pöppel; Richard Freinschlag; Thomas Schmied; Wei Lin; Sepp Hochreiter; | code |
| 77 | ESCA: Contextualizing Embodied Agents Via Scene-Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing MLLMs do not reliably capture fine-grained links between low-level visual features and high-level textual semantics, leading to weak grounding and inaccurate perception. To overcome this challenge, we propose ESCA, a framework that contextualizes embodied agents by grounding their perception in spatial-temporal scene graphs. |
Jiani Huang; Amish Sethi; Matthew Kuo; Mayank Keoliya; Neelay Velingker; JungHo Jung; Ser-Nam Lim; Ziyang Li; Mayur Naik; | code |
| 78 | UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the second challenge, we propose a frequency-aware post-training method that enhances fine-detail generation in T2I diffusion models. |
Chen Zhao; En Ci; Yunzhe Xu; Tiehan Fan; Shanyan Guan; Yanhao Ge; Jian Yang; Ying Tai; | code |
| 79 | SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methodologies primarily utilize supervised fine-tuning (SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments (e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning (RL) algorithms. |
Peixian MA; Xialie Zhuang; Chengjin Xu; Xuhui Jiang; Ran Chen; Jian Guo; | code |
| 80 | OASIS: One-Shot Federated Graph Learning Via Wasserstein Assisted Knowledge Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, they struggle to integrate the intricate correlations necessary and transfer subtle structural insights from each client to the global model. To address these issues, we introduce **OASIS**, an innovative one-shot FGL framework. |
Guancheng Wan; Jiaru Qian; Wenke Huang; Qilin Xu; Xianda Guo; Boheng Li; Guibin Zhang; Bo Du; Mang Ye; | code |
| 81 | Multi-order Orchestrated Curriculum Distillation for Model-Heterogeneous Federated Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing model-heterogeneous approaches primarily target Euclidean data and fail to account for a crucial aspect of graph-structured data: topological relationships. To address this limitation, we propose **TRUST**, a novel knowledge distillation-based **model-heterogeneous FGL** framework. |
Guancheng Wan; Xu Cheng; Run Liu; Wenke Huang; Zitong Shi; Pinyi Jin; Guibin Zhang; Bo Du; Mang Ye; | code |
| 82 | MOTION: Multi-Sculpt Evolutionary Coarsening for Federated Continual Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods, however, neither preserve graph topology during task transitions nor mitigate parameter conflicts in server‐side aggregation. To overcome these challenges, we introduce **MOTION**, a generalizable FCGL framework that integrates two complementary modules: the Graph Topology‐preserving Multi‐Sculpt Coarsening (G‐TMSC) module, which maintains the structural integrity of past graphs through a multi‐expert, similarity‐guided fusion process, and the Graph‐Aware Evolving Parameter Adaptive Engine (G‐EPAE) module, which refines global model updates by leveraging a topology‐sensitive compatibility matrix. |
Guancheng Wan; Fengyuan Ran; Ruikang Zhang; Wenke Huang; Xuankun Rong; Guibin Zhang; Yuxin Wu; Bo Du; Mang Ye; | code |
| 83 | Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts. |
Zheng Zhan; Liliang Ren; Shuohang Wang; Liyuan Liu; Yang Liu; Yeyun Gong; Yanzhi Wang; yelong shen; | code |
| 84 | KGGen: Extracting Knowledge Graphs from Plain Text with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present KGGen, a novel text-to-knowledge-graph generator that uses language models to extract high-quality graphs from plain text with a novel entity resolution approach that clusters related entities, significantly reducing the sparsity problem that plagues existing extractors.Along with KGGen, we release Measure of Information in Nodes and Edges (MINE), the first benchmark to test an extractor’s ability to produce a useful KG from plain text. |
Belinda Mo; Kyssen Yu; Joshua Kazdan; Proud Mpala; Lisa Yu; Charilaos I. Kanatsoulis; Sanmi Koyejo; | code |
| 85 | DynaAct: Large Language Model Reasoning with Dynamic Action Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework named \textsc{DynaAct} for automatically constructing a compact action space to enhance sequential reasoning in complex problem-solving scenarios. |
Xueliang Zhao; Wei Wu; Jian Guan; Qintong Li; Lingpeng Kong; | code |
| 86 | Spot The Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the effectiveness of existing methods in evaluating image authenticity and locating forgeries, these approaches often lack human interpretability and do not fully address the growing complexity of synthetic data. To tackle these challenges, we introduce FakeVLM, a specialized large multimodal model designed for both general synthetic image and DeepFake detection tasks. |
Siwei Wen; Junyan Ye; Peilin Feng; Hengrui Kang; Zichen Wen; Yize Chen; Jiang Wu; wenjun wu; Conghui He; Weijia Li; | code |
| 87 | Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the first training-free algorithm for online routing scenarios. |
Fangzhou Wu; Sandeep Silwal; | code |
| 88 | CPPO: Accelerating The Training of Group Relative Policy Optimization-Based Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO). |
ZhiHang Lin; Mingbao Lin; Yuan Xie; Rongrong Ji; | code |
| 89 | PANDA: Towards Generalist Video Anomaly Detection Via Agentic AI Engineer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PANDA, an agentic AI engineer based on MLLMs. |
Zhiwei Yang; Chen Gao; Mike Zheng Shou; | code |
| 90 | Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the importance of speed and cost-effectiveness, prior works have utilized MLLMs as reward models, which poses significant constraints for real-world deployment. To address this, in this work, we propose the first process reward model (PRM) called Web-Shepherd which could assess web navigation trajectories in a step-level. |
Hyungjoo Chae; Sunghwan Kim; Junhee Cho; Seungone Kim; Seungjun Moon; Gyeom Hwangbo; Dongha Lim; Minjin Kim; Yeonjun Hwang; Minju Gwak; Dongwook Choi; Minseok Kang; Gwanhoon Im; ByeongUng Cho; Hyojun Kim; Jun Hee Han; Taeyoon Kwon; Minju Kim; Beong-woo Kwak; Dongjin Kang; Jinyoung Yeo; | code |
| 91 | RAST: Reasoning Activation in LLMs Via Small-model Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To verify our hypothesis, we conduct a token-level analysis of decoding trajectories and find high alignment in RL-induced output distributions across model scales, validating our hypothesis. Motivated by this, we propose RAST, a simple yet effective method that transfers reasoning behaviors by injecting RL-induced probability adjustments from a small RL-trained model into larger models. |
Siru Ouyang; Xinyu Zhu; Zilin Xiao; Minhao Jiang; Yu Meng; Jiawei Han; | code |
| 92 | OPHR: Mastering Volatility Trading with Multi-Agent Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first reinforcement learning (RL) framework specifically designed for volatility trading through options, focusing on profit from the difference between implied volatility and realized volatility. |
Zeting Chen; Xinyu Cai; Molei Qin; Bo An; | code |
| 93 | Red-Teaming Text-to-Image Systems By Rule-based Preference Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A significant challenge is how to evade unknown and diverse defense mechanisms. To overcome this difficulty, we propose a novel Rule-based Preference modeling Guided Red-Teaming (RPG-RT), which iteratively employs LLM to modify prompts to query and leverages feedback from T2I systems for fine-tuning the LLM. |
Yichuan Cao; Yibo Miao; Xiao-Shan Gao; Yinpeng Dong; | code |
| 94 | STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance on high-resolution image synthesis.Building on this foundation, we introduce a set of architectural and algorithmic innovations that significantly enhance the scalability: (1) a deep-shallow design where a deep Transformer block captures most of the model’s capacity, followed by a few shallow Transformer blocks that are computationally cheap yet contribute non-negligibly, (2) learning in the latent space of pretrained autoencoders, which proves far more effective than modeling pixels directly, and (3) a novel guidance algorithm that substantially improves sample quality. |
Jiatao Gu; Tianrong Chen; David Berthelot; Huangjie Zheng; Yuyang Wang; Ruixiang ZHANG; Laurent Dinh; Miguel Ángel Bautista; Joshua M. Susskind; Shuangfei Zhai; | code |
| 95 | Scaling RL to Long Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a full-stack framework that scales up reasoning in vision-language models (VLMs) to long videos, leveraging reinforcement learning. |
Yukang Chen; Wei Huang; Baifeng Shi; Qinghao Hu; Hanrong Ye; Ligeng Zhu; Zhijian Liu; Pavlo Molchanov; Jan Kautz; XIAOJUAN QI; Sifei Liu; Hongxu Yin; Yao Lu; Song Han; | code |
| 96 | Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to enable the collaborative co-evolution of visual comprehension and generation, advancing image generation into an iterative introspective process. |
Kaihang Pan; Yang Wu; Wendong Bu; Kai Shen; Juncheng Li; Yingting Wang; liyunfei; Siliang Tang; Jun Xiao; Fei Wu; ZhaoHang; Yueting Zhuang; | code |
| 97 | GuardReasoner-VL: Safeguarding VLMs Via Reinforced Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the safety of VLMs, this paper introduces a novel reasoning-based VLM guard model dubbed GuardReasoner-VL.We release data, code, and models (3B/7B) of GuardReasoner-VL: https://github.com/yueliu1999/GuardReasoner-VL. |
Yue Liu; Shengfang Zhai; Mingzhe Du; Yulin Chen; Tri Cao; Hongcheng Gao; Cheng Wang; Xinfeng Li; Kun Wang; Junfeng Fang; Jiaheng Zhang; Bryan Hooi; | code |
| 98 | MacOSWorld: A Multilingual Interactive Benchmark for GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gaps, we present macOSWorld, the first comprehensive benchmark for evaluating GUI agents on macOS. |
Pei Yang; Hai Ci; Mike Zheng Shou; | code |
| 99 | SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Self-play Reinforcement Learning (SeRL) to bootstrap LLM training with limited initial data. |
Wenkai Fang; Shunyu Liu; Yang Zhou; Kongcheng Zhang; Tongya Zheng; Kaixuan Chen; Mingli Song; Dacheng Tao; | code |
| 100 | TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Speech tokenizers serve as foundational components for speech language models, yet current designs exhibit several limitations, including: (1) dependence on multi-layer residual vector quantization structures or high frame rates, (2) reliance on auxiliary pre-trained models for semantic distillation, and (3) requirements for complex two-stage training processes. In this work, we introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (***TaDiCodec***), a novel approach designed to overcome these challenges. |
Yuancheng Wang; Dekun Chen; Xueyao Zhang; Junan Zhang; Jiaqi Li; Zhizheng Wu; | code |
| 101 | Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ***Metis***, a foundation model for unified speech generation. |
Yuancheng Wang; Jiachen Zheng; Junan Zhang; Xueyao Zhang; Huan Liao; Zhizheng Wu; | code |
| 102 | OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The lack of high-quality omnimodal datasets and the challenges of real-time emotional speech synthesis have notably hindered progress in open-source research. To address these limitations, we introduce OpenOmni, a two-stage training framework that integrates omnimodal alignment and speech generation to develop a state-of-the-art omnimodal large language model. |
Run Luo; Ting-En Lin; Haonan Zhang; Yuchuan Wu; Xiong Liu; Yongbin Li; Longze Chen; Jiaming Li; Lei Zhang; Xiaobo Xia; Hamid Alinejad-Rokny; Fei Huang; Min Yang; | code |
| 103 | MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose Coral, a novel fine-tuning framework that adapts pre-trained MLLMs by integrating embedding reconstruction to preserve fine-grained conditional elements and contrastive learning to extract comprehensive global semantics. |
Wei Chow; Yuan Gao; Linfeng Li; Xian Wang; Qi Xu; Hang Song; Lingdong Kong; Ran Zhou; Yi Zeng; Yidong Cai; Botian Jiang; Shilin Xu; Jiajunzhang; Minghui Qiu; Xiangtai Li; Tianshu Yang; Siliang Tang; Juncheng Li; | code |
| 104 | OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose OmniCast, a scalable and skillful probabilistic model that unifies weather forecasting across timescales. |
Tung Nguyen; Tuan Pham; Troy Arcomano; Rao Kotamarthi; Ian Foster; Sandeep Madireddy; Aditya Grover; | code |
| 105 | Parallel Scaling Law for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We theoretically propose a new scaling law and validate it through large-scale pre-training, which shows that a model with $P$ parallel streams is similar to scaling the parameters by $\mathcal O(\log P)$ while showing superior inference efficiency. |
Mouxiang Chen; Binyuan Hui; Zeyu Cui; Jiaxi Yang; Dayiheng Liu; Jianling Sun; Junyang Lin; Zhongxin Liu; | code |
| 106 | Reinforcement Learning for Reasoning in Large Language Models with One Training Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the math reasoning capabilities of large language models (LLMs). |
Yiping Wang; Qing Yang; Zhiyuan Zeng; Liliang Ren; Liyuan Liu; Baolin Peng; Hao Cheng; Xuehai He; Kuan Wang; Jianfeng Gao; Weizhu Chen; Shuohang Wang; Simon Shaolei Du; yelong shen; | code |
| 107 | MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, existing methods often rely on computationally expensive sampling procedures and classifier-free guidance (CFG), resulting in slow inference. To address these limitations, we propose **MiniMax-Remover**, a novel two-stage video object removal approach. |
Bojia Zi; Weixuan Peng; Xianbiao Qi; Jianan Wang; Shihao Zhao; Rong Xiao; Kam-Fai Wong; | code |
| 108 | Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we go beyond attention or similarity by proposing a novel visual token pruning method named **CDPruner**, which maximizes the conditional diversity of retained tokens. |
Qizhe Zhang; Mengzhen Liu; Lichen Li; Ming Lu; Yuan Zhang; Junwen Pan; Qi She; Shanghang Zhang; | code |
| 109 | On Reasoning Strength Planning in Large Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we provide explanations for this phenomenon from the perspective of model activations. |
Leheng Sheng; An Zhang; Zijian Wu; Weixiang Zhao; Changshuo Shen; Yi Zhang; Xiang Wang; Tat-Seng Chua; | code |
| 110 | Thoughts Are All Over The Place: On The Underthinking of Long Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. |
Yue Wang; Qiuzhi Liu; Jiahao Xu; Tian Liang; Xingyu Chen; Zhiwei He; Linfeng Song; Dian Yu; Juntao Li; Zhuosheng Zhang; Rui Wang; Zhaopeng Tu; Haitao Mi; Dong Yu; | code |
| 111 | Zip2zip: Inference-Time Adaptive Tokenization Via Online Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce zip2zip, a novel method for achieving context-adaptive tokenization in LLMs at inference time. |
Saibo Geng; Nathan Ranchin; Yunzhen Yao; Maxime Peyrard; Chris Wendler; Michael Gastpar; Robert West; | code |
| 112 | Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we delve into chain of step reasoning for vision-language models, enabling assessing reasoning step quality accurately and leading to effective reinforcement learning and inference-time scaling with fine-grained rewards. |
Honghao Chen; Xingzhou Lou; Xiaokun Feng; Kaiqi Huang; Xinlong Wang; | code |
| 113 | AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, draft models often struggle to fully assimilate the target model’s knowledge due to capacity constraints, leading to suboptimal performance. To address this challenge, we propose AdaSPEC, a novel method that incorporates selective token filtering into the KD process. |
Yuezhou Hu; Jiaxin Guo; Xinyu Feng; Tuo Zhao; | code |
| 114 | Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models Via Multi-Stage RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on R1-style distilled models, we observe that inserting a simple ellipsis (…) into the prompt can stochastically trigger either a thinking or no-thinking mode, revealing a latent controllability in the reasoning behavior. Leveraging this property, we propose AutoThink, a multi-stage reinforcement learning (RL) framework that progressively optimizes reasoning policies via stage-wise reward shaping. |
Songjun Tu; Jiahao Lin; Qichao Zhang; Xiangyu Tian; Linjing Li; Xiangyuan Lan; Dongbin Zhao; | code |
| 115 | Teaching Language Models to Reason with Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While integrating computational tools such as Code Interpreters (CIs) offers a promising solution, it introduces a critical challenge: a conflict between the model’s internal, probabilistic reasoning and the external, deterministic knowledge provided by the CI, which often leads models to unproductive deliberation. To overcome this, we introduce CoRT (Code-Optimized Reasoning Training), a post-training framework designed to teach LRMs to effectively utilize CIs. |
Chengpeng Li; Zhengyang Tang; Ziniu Li; Mingfeng Xue; Keqin Bao; Tian Ding; Ruoyu Sun; Benyou Wang; Xiang Wang; Junyang Lin; Dayiheng Liu; | code |
| 116 | PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. |
Xiaogang Jia; Qian Wang; Anrui Wang; Han A. Wang; Balázs Gyenes; Emiliyan Gospodinov; Xinkai Jiang; Ge Li; Hongyi Zhou; Weiran Liao; Xi Huang; Maximilian Beck; Moritz Reuss; Rudolf Lioutikov; Gerhard Neumann; | code |
| 117 | ConfTuner: Training Large Language Models to Express Their Confidence Verbally Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the notion of proper scoring rules for calibration in classical machine learning models, we introduce ConfTuner, a simple and efficient fine-tuning method that introduces minimal overhead and does not require ground-truth confidence scores or proxy confidence estimates. |
Yibo Li; Miao Xiong; Jiaying Wu; Bryan Hooi; | code |
| 118 | ChemOrch: Empowering LLMs with Chemical Intelligence Via Groundbreaking Synthetic Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Empowering large language models (LLMs) with chemical intelligence remains a challenge due to the scarcity of high-quality, domain-specific instruction-response datasets and the misalignment of existing synthetic data generation pipelines with the inherently hierarchical and rule-governed structure of chemical information. To address this, we propose ChemOrch, a framework that synthesizes chemically grounded instruction–response pairs through a two-stage process: task-controlled instruction generation and tool-aware response construction. |
Yue Huang; Zhengzhe Jiang; Xiaonan Luo; Kehan Guo; Haomin Zhuang; Yujun Zhou; Zhengqing Yuan; Xiaoqi Sun; Jules Schleinitz; Yanbo Wang; Shuhao Zhang; Mihir Surve; Nitesh V Chawla; Olaf Wiest; Xiangliang Zhang; | code |
| 119 | HYPERION: Fine-Grained Hypersphere Alignment for Robust Federated Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Combining both level strategies, we present our robust FGL framework,**HYPERION**, which operates all components within a unified hyperspherical space. |
Guancheng Wan; Xiaoran Shang; Yuxin Wu; Guibin Zhang; Jinhe Bi; Liangtao Zheng; Xin Lin; Yue Liu; Yanbiao Ma; Wenke Huang; Bo Du; | code |
| 120 | Activity Pruning for Efficient Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we specifically target the reduction of neuronal activity, which directly leads to lower computational cost and facilitates efficient SNN deployment on Neuromorphic hardware. |
Tong Bu; Xinyu Shi; Zhaofei Yu; | code |
| 121 | Token Bottleneck: One Token to Remember Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Token Bottleneck (ToBo), a simple yet intuitive self-supervised learning pipeline that squeezes a scene into a bottleneck token and predicts the subsequent scene using minimal patches as hints. |
Taekyung Kim; Dongyoon Han; Byeongho Heo; Jeongeun Park; Sangdoo Yun; | code |
| 122 | Machine Unlearning in 3D Generation: A Perspective-Coherent Acceleration Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable a more efficient unlearning process, we introduce a skip-acceleration mechanism, which leverages the similarity between multi-view generated images to bypass redundant computations. |
Shixuan Wang; Jingwen Ye; Xinchao Wang; | code |
| 123 | SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the problem hindering VLMs’ spatial understanding abilities and propose SD-VLM, a novel framework that significantly enhances fundamental spatial perception abilities of VLMs through two key contributions: (1) propose Massive Spatial Measuring and Understanding (MSMU) dataset with precise spatial annotations, and (2) introduce a simple depth positional encoding method strengthening VLMs’ spatial awareness. |
Pingyi Chen; Yujing Lou; Shen Cao; Jinhui Guo; Lubin Fan; Yue Wu; Lin Yang; Lizhuang Ma; Jieping Ye; | code |
| 124 | Adversarial Graph Fusion for Incomplete Multi-view Semi-supervised Learning with Tensorial Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate SCP, we propose a novel incomplete multi-view semi-supervised learning method, termed AGF-TI. |
Zhangqi Jiang; Tingjin Luo; Xu Yang; Xinyan Liang; | code |
| 125 | LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems Via Linked Entities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our approach, LaM-SLidE (Latent Space Modeling of Spatial Dynamical Systems via Linked Entities), bridges the gap between: (1) keeping the traceability of individual entities in a latent system representation, and (2) leveraging the efficiency and scalability of recent advances in image and video generation, where pre-trained encoder and decoder enable generative modeling directly in latent space. |
Florian Sestak; Artur P. Toshev; Andreas Fürst; Günter Klambauer; Andreas Mayr; Johannes Brandstetter; | code |
| 126 | PhySense: Sensor Placement Optimization for Accurate Physics Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placements, leaving the mutual enhancement between reconstruction and placement on the shelf. To change this suboptimal practice, we propose PhySense, a synergistic two-stage framework that learns to jointly reconstruct physical fields and to optimize sensor placements, both aiming for accurate physics sensing. |
Yuezhou Ma; Haixu Wu; Hang Zhou; Huikun Weng; Jianmin Wang; Mingsheng Long; | code |
| 127 | Group-Level Data Selection for Efficient Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce *Group-MATES*, an efficient group-level data selection approach to optimize the speed-quality frontier of language model pretraining.To train this model, we sample training trajectories of the language model and collect oracle data influences alongside. |
Zichun Yu; Fei Peng; Jie Lei; Arnold Overwijk; Wen-tau Yih; Chenyan Xiong; | code |
| 128 | RAT: Bridging RNN Efficiency and Attention Accuracy Via Chunk-based Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recurrent models offer high efficiency, but compressing the full sequence into a fixed-size and holistic representation can suffer from memory degradation in long contexts and limit fine-grained retrieval. To address this, we propose RAT, an intermediate design that bridges the efficiency of RNNs and capacity of attention. |
Xiuying Wei; Anunay Yadav; Razvan Pascanu; Caglar Gulcehre; | code |
| 129 | Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel self-rewarding reinforcement learning framework to enhance Large Language Model (LLM) reasoning by leveraging the consistency of intermediate reasoning states across different reasoning trajectories. |
Kongcheng Zhang; QI YAO; Shunyu Liu; Yingjie Wang; Baisheng Lai; Jieping Ye; Mingli Song; Dacheng Tao; | code |
| 130 | P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce the Performance Law (P-Law) for SR models, which predicts model performance across various settings, intending to provide a quantitative framework for guiding the parameter optimization of future models. |
Tingjia Shen; Hao Wang; Chuhan Wu; Jin Yao Chin; Wei Guo; Yong Liu; Huifeng Guo; Defu Lian; Ruiming Tang; Enhong Chen; | code |
| 131 | Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking language-vision learning as example, we show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. |
Marianna Nezhurina; Tomer Porian; Giovanni Puccetti; Tommie Kerssies; Romain Beaumont; Mehdi Cherti; Jenia Jitsev; | code |
| 132 | PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite advancements in learning-based methods, existing approaches often face critical limitations, including suboptimal agent coordination, poor generalization, and high computational latency. To address these issues, we propose PARCO (Parallel AutoRegressive Combinatorial Optimization), a general reinforcement learning framework designed to construct high-quality solutions for multi-agent combinatorial tasks efficiently. |
Federico Berto; Chuanbo Hua; Laurin Luttmann; Jiwoo Son; Junyoung Park; Kyuree Ahn; Changhyun Kwon; Lin Xie; Jinkyoo Park; | code |
| 133 | AnimateQR: Bridging Aesthetics and Functionality in Dynamic QR Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce AnimateQR, **the first generative framework** for creating **animated QR codes** that balance aesthetic flexibility with scannability. |
Guangyang Wu; Huayu Zheng; Siqi Luo; Guangtao Zhai; Xiaohong Liu; | code |
| 134 | Transformers for Mixed-type Event Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple yet powerful Marked Temporal Point Process (MTPP) framework for modeling event sequences with flexible structure, using a single unified model. |
Felix Draxler; Yang Meng; Kai Nelson; Lukas Laskowski; Yibo Yang; Theofanis Karaletsos; Stephan Mandt; | code |
| 135 | ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ReasonFlux-PRM, a novel trajectory-aware PRM explicitly designed to evaluate the trajectory-response type of reasoning traces. |
Jiaru Zou; Ling Yang; Jingwen Gu; Jiahao Qiu; Ke Shen; Jingrui He; Mengdi Wang; | code |
| 136 | OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-modal learning framework, termed OmniSegmentor.It has two key innovations: 1) Based on ImageNet, we assemble a large-scale dataset for multi-modal pretraining, called OmniSegmentor, which contains five popular visual modalities; 2) We provide an efficient pretraining manner to endow the model with the capacity to encode different modality information in the OmniSegmentor. |
Bo-Wen Yin; Jiao-Long Cao; Xuying Zhang; Yuming Chen; Ming-Ming Cheng; Qibin Hou; | code |
| 137 | L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their success, next-token prediction (NTP), the dominant method for LLM training and inference, is constrained in both contextual coverage and inference efficiency due to its inherently sequential process. To overcome these challenges, we propose leap multi-token prediction~(L-MTP), an innovative token prediction method that extends the capabilities of multi-token prediction (MTP) by introducing a leap-based mechanism. |
Xiaohao Liu; Xiaobo Xia; Weixiang Zhao; Manyi Zhang; Xianzhi Yu; Xiu Su; Shuo Yang; See-Kiong Ng; Tat-Seng Chua; | code |
| 138 | Continual Multimodal Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate CMCL through two specialized principles of stability and plasticity. |
Xiaohao Liu; Xiaobo Xia; See-Kiong Ng; Tat-Seng Chua; | code |
| 139 | $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce $Q\sharp$, a value-based algorithm for KL-regularized RL that guides the reference policy using the optimal regularized $Q$ function. |
Jin Peng Zhou; Kaiwen Wang; Jonathan Daniel Chang; Zhaolin Gao; Nathan Kallus; Kilian Q Weinberger; Kianté Brantley; Wen Sun; | code |
| 140 | Image Editing As Programs with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our research highlights a key challenge: these models particularly struggle with structurally-inconsistent edits that involve substantial layout changes. To address this gap, we introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture. |
Yujia Hu; Songhua Liu; Zhenxiong Tan; Xingyi Yang; Xinchao Wang; | code |
| 141 | Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contributions include: 1) we bridge the independent modulation between temporal and channel dimensions, facilitating joint feature correlation learning, and 2) we access the global self-similar patterns in large-scale remote sensing imagery to infer spatial attention weights, incorporating effective priors for realistic and faithful reconstruction. |
Yi Xiao; Qiangqiang Yuan; Kui Jiang; Wenke Huang; Qiang Zhang; Tingting Zheng; Chia-Wen Lin; Liangpei Zhang; | code |
| 142 | Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Virtual Fitting Room (VFR), a novel video generative model that produces arbitrarily long virtual try-on videos. |
Jun-Kun Chen; Aayush Bansal; Minh Phuoc Vo; Yu-Xiong Wang; | code |
| 143 | Right Question Is Already Half The Answer: Fully Unsupervised LLM Reasoning Incentivization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Entropy Minimized Policy Optimization (EMPO), which makes an early attempt at fully unsupervised LLM reasoning incentivization. |
Qingyang Zhang; Haitao Wu; Changqing Zhang; Peilin Zhao; Yatao Bian; | code |
| 144 | Walking The Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging this framework, we propose a novel autonomous counterfact-aware RFT that systematically decouples beneficial distribution adaptation from harmful concept drift through concept graph-empowered LLM experts generating counterfactual reasoning trajectories. |
Xiaoyu Yang; Jie Lu; En Yu; | code |
| 145 | TS-RAG: Retrieval-Augmented Generation Based Time Series Foundation Models Are Stronger Zero-Shot Forecaster Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present TS-RAG, a retrieval-augmented generation framework for time series forecasting that enhances the generalization and interpretability of TSFMs. |
Kanghui Ning; Zijie Pan; Yu Liu; Yushan Jiang; James Y. Zhang; Kashif Rasul; Anderson Schneider; Lintao Ma; Yuriy Nevmyvaka; Dongjin Song; | code |
| 146 | TreeSynth: Synthesizing Diverse Data from Scratch Via Tree-Guided Subspace Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the great potential of large language models (LLMs) for data synthesis, current approaches are constrained by limited seed data, model biases and low-variation prompts, resulting in limited diversity and biased distribution with the increase of data scales. To tackle this challenge, we introduce TreeSynth, a tree-guided subspace-based data synthesis approach inspired by decision trees. |
Sheng Wang; Pengan CHEN; Jingqi Zhou; Qintong Li; Jingwei Dong; Jiahui Gao; Boyang XUE; Jiyue Jiang; Lingpeng Kong; Chuan Wu; | code |
| 147 | Adjoint Schrödinger Bridge Sampler Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **Adjoint Schrödinger Bridge Sampler (ASBS)**, a new diffusion sampler that employs simple and scalable matching-based objectives yet without the need to estimate target samples during training. |
Guan-Horng Liu; Jaemoo Choi; Yongxin Chen; Benjamin Kurt Miller; Ricky T. Q. Chen; | code |
| 148 | Learning Conformational Ensembles of Proteins Based on Backbone Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a flow matching model for sampling protein conformations based solely on backbone geometry – BBFlow. |
Nicolas Wolf; Leif Seute; Vsevolod Viliuga; Simon Wagner; Jan Stühmer; Frauke Gräter; | code |
| 149 | Boosting Knowledge Utilization in Multimodal Large Language Models Via Adaptive Logits Fusion and Attention Reallocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we design Adaptive Logits Fusion and Attention Reallocation (ALFAR), a training-free and plug-and-play approach that improves MLLM responses by maximizing the utility of the retrieved knowledge. |
Wenbin An; Jiahao Nie; Feng Tian; Haonan Lin; mingxiang cai; Yaqiang Wu; QianYing Wang; Xiaoqin Zhang; Shijian Lu; | code |
| 150 | STAR: Efficient Preference-based Reinforcement Learning Via Dual Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This data scarcity introduces two key inefficiencies: (1) the reward model overfits to the limited feedback, leading to poor generalization to unseen samples, and (2) the agent exploits the learned reward model, exacerbating overestimation of action values in temporal difference (TD) learning. To address these issues, we propose STAR, an efficient PbRL method that integrates preference margin regularization and policy regularization. |
Fengshuo Bai; Rui Zhao; Hongming Zhang; Sijia Cui; Shao Zhang; bo xu; Lei Han; Ying Wen; Yaodong Yang; | code |
| 151 | Neural Emulator Superiority: When Machine Learning for PDEs Surpasses Its Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our theoretical analysis reveals how the interplay between emulator inductive biases, training objectives, and numerical error characteristics enables superior performance during multi-step rollouts. |
Felix Koehler; Nils Thuerey; | code |
| 152 | DesignX: Human-Competitive Algorithm Designer for Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present DesignX, the first automated algorithm design framework that generates an effective optimizer specific to a given black-box optimization problem within seconds. |
Hongshu Guo; Zeyuan Ma; Yining Ma; Xinglin Zhang; Wei-Neng Chen; Yue-Jiao Gong; | code |
| 153 | LeVo: High-Quality Song Generation with Multi-Preference Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches still struggle with the complex composition of songs and the scarcity of high-quality data, leading to limitations in audio quality, musicality, instruction following, and vocal-instrument harmony. To address these challenges, we introduce LeVo, a language model based framework consisting of LeLM and Music Codec. |
Shun Lei; Yaoxun Xu; ZhiweiLin; Huaicheng Zhang; Wei tan; Hangting Chen; Yixuan Zhang; Chenyu Yang; Haina Zhu; Shuai Wang; Zhiyong Wu; Dong Yu; | code |
| 154 | SensorLM: Learning The Language of Wearable Sensors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data. |
Yuwei Zhang; Kumar Ayush; Siyuan Qiao; A. Ali Heydari; Girish Narayanswamy; Maxwell A Xu; Ahmed Metwally; Jinhua Xu; Jake Garrison; Xuhai Xu; Tim Althoff; Yun Liu; Pushmeet Kohli; Jiening Zhan; Mark Malhotra; Shwetak Patel; Cecilia Mascolo; Xin Liu; Daniel McDuff; Yuzhe Yang; | code |
| 155 | Generative Graph Pattern Machine Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we explore pathways beyond message-passing and introduce Generative Graph Pattern Machine (G$^2$PM), a generative Transformer pre-training framework for graphs. |
Zehong Wang; Zheyuan Zhang; Tianyi Ma; Chuxu Zhang; Yanfang Ye; | code |
| 156 | LoRATv2: Enabling Low-Cost Temporal Modeling in One-Stream Trackers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches rely on a standard attention mechanism, which incurs quadratic token complexity, making real-time inference computationally expensive. In this paper, we introduce LoRATv2, a novel tracking framework that addresses these limitations with three main contributions. First, LoRATv2 integrates frame-wise causal attention, which ensures full self-attention within each frame while enabling causal dependencies across frames, significantly reducing computational overhead. |
Liting Lin; Heng Fan; Zhipeng Zhang; Yuqing Huang; Yaowei Wang; Yong Xu; Haibin Ling; | code |
| 157 | U-CAN: Unsupervised Point Cloud Denoising with Consistency-Aware Noise2Noise Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce U-CAN, an Unsupervised framework for point cloud denoising with Consistency-Aware Noise2Noise matching. |
Junsheng Zhou; XingYu Shi; Haichuan Song; Yi Fang; Yu-Shen Liu; Zhizhong Han; | code |
| 158 | Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing positional encoding methods demand predefined token/feature order, rendering them unsuitable for real-world data with non-sequential yet causally-related features. To address this limitation, we propose CAPE, a novel method that identifies underlying causal structure over non-sequential features as a weighted directed acyclic graph (DAG) using generalized structural equation modeling. |
Kaichen Xu; Yihang Du; Mianpeng Liu; Zimu Yu; Xiaobo Sun; | code |
| 159 | Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Reasoning Path Compression (RPC), a training-free method that accelerates inference by leveraging the semantic sparsity of reasoning paths. |
Jiwon Song; Dongwon Jo; Yulhwa Kim; Jae-Joon Kim; | code |
| 160 | UltraLED: Learning to See Everything in Ultra-High Dynamic Range Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we rely solely on a single short-exposure frame, which inherently avoids ghosting and motion blur, making it particularly robust in dynamic scenes. |
Yuang Meng; Xin Jin; Lina Lei; Chun-Le Guo; Chongyi Li; | code |
| 161 | Vision Foundation Models As Effective Visual Tokenizers for Autoregressive Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel direction to build an image tokenizer directly on top of a frozen vision foundation model, which is a largely underexplored area. |
Anlin Zheng; Xin Wen; Xuanyang Zhang; Chuofan Ma; Tiancai Wang; Gang YU; Xiangyu Zhang; XIAOJUAN QI; | code |
| 162 | Understanding and Mitigating Numerical Sources of Nondeterminism in LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents the first systematic investigation into how numerical precision affects reproducibility in LLM inference. |
Jiayi Yuan; Hao Li; Xinheng Ding; Wenya Xie; Yu-Jhe Li; Wentian Zhao; Kun Wan; Jing Shi; Xia Hu; Zirui Liu; | code |
| 163 | CORE: Collaborative Optimization with Reinforcement Learning and Evolutionary Algorithm for Floorplanning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve efficient floorplanning, we propose **CORE**, a general and effective solution optimization framework that synergizes Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for high-quality layout search and optimization. |
Pengyi Li; Shixiong Kai; Jianye HAO; Ruizhe Zhong; Hongyao Tang; Zhentao Tang; Mingxuan Yuan; Junchi Yan; | code |
| 164 | Don’t Trade Off Safety: Diffusion Regularization for Constrained Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on an offline setting where the agent learns from a fixed dataset—a common requirement in realistic tasks to prevent unsafe exploration. |
Junyu Guo; Zhi Zheng; Donghao Ying; Ming Jin; Shangding Gu; Costas Spanos; Javad Lavaei; | code |
| 165 | Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: PPIR, which retrieves patent images based on product images to identify potential infringements, presents unique challenges: (1) both product and patent images often contain numerous categories of artificial objects, but models pre-trained on standard datasets exhibit limited discriminative power to recognize some of those unseen objects; and (2) the significant domain gap between binary patent line drawings and colorful RGB product images further complicates similarity comparisons for product-patent pairs. To address these challenges, we formulate it as an open-set image retrieval task and introduce a comprehensive Patent-Product Image Retrieval Dataset (PPIRD) including a test set with 439 product-patent pairs, a retrieval pool of 727,921 patents, and an unlabeled pre-training set of 3,799,695 images. |
Haifan Gong; Xuanye Zhang; Ruifei Zhang; Yun Su; Zhuo Li; Yuhao Du; Anningzhe Gao; Xiang Wan; Haofeng Li; | code |
| 166 | MMaDA: Multimodal Large Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (iii) We propose UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models. |
Ling Yang; Ye Tian; Bowen Li; Xinchen Zhang; Ke Shen; Yunhai Tong; Mengdi Wang; | code |
| 167 | Learning to Integrate Diffusion ODEs By Averaging The Derivatives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To accelerate diffusion model inference, numerical solvers perform poorly at extremely small steps, while distillation techniques often introduce complexity and instability. |
Wenze Liu; Xiangyu Yue; | code |
| 168 | Towards Reliable Identification of Diffusion-based Image Manipulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel approach for ReliAble iDentification of inpainted AReas (RADAR). |
Alex Costanzino; Woody Bayliss; Juil Sock; Marc Gorriz Blanch; Danijela Horak; Ivan Laptev; Philip Torr; Fabio Pizzati; | code |
| 169 | NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the robustness of underwater scene understanding, we introduce physical priors derived from underwater imaging models and propose a plug-and-play vision feature enhancement (VFE) module, which explicitly restores clear underwater information. |
Wei Xu; Cheng Wang; Dingkang Liang; Zongchuang Zhao; Xingyu Jiang; Peng Zhang; Xiang Bai; | code |
| 170 | Efficient Utility-Preserving Machine Unlearning with Implicit Gradient Surgery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To resolve the additional computational cost brought by gradient surgery, we propose an implicit gradient surgery method, which approximates the solution to the aforementioned constrained optimization problem via only one backpropagation, thereby achieving efficient utility-preserving MU. |
Shiji Zhou; Tianbai Yu; Zhi Zhang; Heng Chang; Xiao Zhou; Dong Wu; Han Zhao; | code |
| 171 | Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Point3R, an online framework targeting dense streaming 3D reconstruction. |
Yuqi Wu; Wenzhao Zheng; Jie Zhou; Jiwen Lu; | code |
| 172 | Less Is More: Unlocking Specialization of Time Series Foundation Models Via Structured Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through empirical studies on various TSFMs, the pre-trained models often exhibit inherent sparsity and redundancy in computation, suggesting that TSFMs have learned to activate task-relevant network substructures to accommodate diverse forecasting tasks. To preserve this valuable prior knowledge, we propose a structured pruning method to regularize the subsequent fine-tuning process by focusing it on a more relevant and compact parameter space. |
Lifan Zhao; Yanyan Shen; Zhaoyang Liu; Xue Wang; Jiaji Deng; | code |
| 173 | DETree: DEtecting Human-AI Collaborative Texts Via Tree-Structured Hierarchical Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that representations of texts generated through different processes exhibit inherent clustering relationships. Therefore, we propose DETree, a novel approach that models the relationships among different processes as a Hierarchical Affinity Tree structure, and introduces a specialized loss function that aligns text representations with this tree. |
Yongxin He; Shan Zhang; Yixuan Cao; Lei Ma; Ping Luo; | code |
| 174 | IGD: Token Decisiveness Modeling Via Information Gain in LLMs for Personalized Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To quantify token decisiveness, we propose a novel perspective that models item generation as a decision process, measuring token decisiveness by the Information Gain (IG) each token provides in reducing uncertainty about the generated item. |
Zijie Lin; Yang Zhang; Xiaoyan Zhao; Fengbin ZHU; Fuli Feng; Tat-Seng Chua; | code |
| 175 | Synthesize Privacy-Preserving High-Resolution Images Via Private Textual Intermediaries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel method, referred to as Synthesis via Private Textual Intermediaries (SPTI), that can generate high-resolution DP images with easy adoptions. |
Haoxiang Wang; Zinan Lin; Da Yu; Huishuai Zhang; | code |
| 176 | Brain-tuning Improves Generalizability and Efficiency of Brain Alignment in Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches for both estimating and improving this brain alignment are participant-dependent and highly affected by the amount of data available per participant, hindering both generalization to new participants and population-level analyses. In this work, we address these limitations by introducing a scalable, generalizable brain-tuning method, in which we fine-tune pretrained speech language models to jointly predict fMRI responses from multiple participants. |
Omer Moussa; Mariya Toneva; | code |
| 177 | AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant Adversarial Patches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the angle robustness of T2I adversarial patches comprehensively, revealing their angle-robust issues, demonstrating that texts affect the angle robustness of generated patches significantly, and task-specific linguistic instructions fail to enhance the angle robustness. |
Wenjun Ji; Yuxiang Fu; Luyang Ying; Deng-Ping Fan; Yuyi Wang; Ming-Ming Cheng; Ivor Tsang; Qing Guo; | code |
| 178 | Towards Resilient Safety-driven Unlearning for Diffusion Models Against Downstream Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent safety-driven unlearning methods have made promising progress in suppressing model toxicity, they are found to be fragile to downstream fine-tuning, as we reveal that state-of-the-art methods largely fail to retain their effectiveness even when fine-tuned on entirely benign datasets. To mitigate this problem, in this paper, we propose ResAlign, a safety-driven unlearning framework with enhanced resilience against downstream fine-tuning. |
Boheng Li; Renjie Gu; Junjie Wang; Leyi Qi; Yiming Li; Run Wang; Zhan Qin; Tianwei Zhang; | code |
| 179 | Touch in The Wild: Learning Fine-Grained Manipulation with A Portable Visuo-Tactile Gripper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a portable, lightweight gripper with integrated tactile sensors that enables synchronized collection of visual and tactile data in diverse, real-world, and in-the-wild settings. Building on this hardware, we propose a cross-modal representation learning framework that integrates visual and tactile signals while preserving their distinct characteristics. |
Xinyue Zhu; Binghao Huang; Yunzhu Li; | code |
| 180 | FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose FSDrive, a visual spatio-temporal CoT framework that enables VLAs to think in images. |
Shuang Zeng; Xinyuan Chang; Mengwei Xie; Xinran Liu; Yifan Bai; Zheng Pan; Mu Xu; Xing Wei; | code |
| 181 | Abstain Mask Retain Core: Time Series Prediction By Adaptive Masking Loss with Representation Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon information bottleneck theory, we propose an innovative solution termed Adaptive Masking Loss with Representation Consistency (AMRC), which features two core components: 1) Dynamic masking loss, which adaptively identified highly discriminative temporal segments to guide gradient descent during model training; 2) Representation consistency constraint, which stabilized the mapping relationships among inputs, labels, and predictions. |
Renzhao Liang; Sizhe Xu; Chenggang Xie; Jingru Chen; Feiyang Ren; Shu Yang; Takahiro Yabe; | code |
| 182 | Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, on-policy algorithms used for post-training are not naturally robust to a diversified content of experience replay buffers, which asynchronous off-policy actors can efficiently populate in parallel to training. We propose efficiently learning on such off-policy data via Trajectory Balance with Asynchrony (TBA), an approach to asynchronous RL for LLMs that leverages the principled off-policy TB objective. |
Brian R. Bartoldson; Siddarth Venkatraman; James Diffenderfer; Moksh Jain; Tal Ben-Nun; Seanie Lee; Minsu Kim; Johan Obando-Ceron; Yoshua Bengio; Bhavya Kailkhura; | code |
| 183 | Detoxifying Large Language Models Via Autoregressive Reward Guided Representation Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current approaches often suffer from imprecise interventions, primarily due to their insufficient exploration of the transition space between toxic and non-toxic outputs. To address this challenge, we propose \textsc{A}utoregressive \textsc{R}eward \textsc{G}uided \textsc{R}epresentation \textsc{E}diting (ARGRE), a novel test-time detoxification framework that explicitly models toxicity transitions within the latent representation space, enabling stable and precise reward-guided editing. |
Yisong Xiao; Aishan Liu; Siyuan Liang; Zonghao Ying; Xianglong Liu; Dacheng Tao; | code |
| 184 | TADA: Improved Diffusion Sampling with Training-free Augmented DynAmics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new sampling method that is up to 186\% faster than the current state of the art solver for comparative FID on ImageNet512. |
Tianrong Chen; Huangjie Zheng; David Berthelot; Jiatao Gu; Joshua M. Susskind; Shuangfei Zhai; | code |
| 185 | Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Diffusion Adaptive Text Embedding (DATE), which dynamically updates text embeddings at each diffusion timestep based on intermediate perturbed data. |
Byeonghu Na; Minsang Park; Gyuwon Sim; Donghyeok Shin; HeeSun Bae; Mina Kang; Se Jung Kwon; Wanmo Kang; Il-chul Moon; | code |
| 186 | Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Safe Text embedding Guidance (STG), a training-free approach to improve the safety of diffusion models by guiding the text embeddings during sampling. |
Byeonghu Na; Mina Kang; Jiseok Kwak; Minsang Park; Jiwoo Shin; SeJoon Jun; Gayoung Lee; Jin-Hwa Kim; Il-chul Moon; | code |
| 187 | Order-Level Attention Similarity Across Language Models: A Latent Commonality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore an important yet previously neglected question: Do context aggregation patterns across Language Models (LMs) share commonalities? |
Jinglin Liang; Jin Zhong; Shuangping Huang; Yunqing Hu; Huiyuan Zhang; Huifang Li; Lixin Fan; Hanlin Gu; | code |
| 188 | Predictable Scale (Part II) — Farseer: A Refined Scaling Law in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. |
Houyi Li; Wenzhen Zheng; Qiufeng Wang; Zhenyu Ding; Haoying Wang; Zili Wang; Shijie Xuyang; Ning Ding; Shuigeng Zhou; Xiangyu Zhang; Daxin Jiang; | code |
| 189 | TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing long-to-short (Long2Short) methods aim to reduce inference length but often sacrifice accuracy, revealing a need for an approach that maintains performance while lowering token costs. To address this efficiency-accuracy tradeoff, we propose TokenSqueeze, a novel Long2Short method that condenses reasoning paths while preserving performance and relying exclusively on self-generated data. |
Yuxiang Zhang; Zhengxu Yu; Weihang Pan; Zhongming Jin; Qiang Fu; Deng Cai; Binbin Lin; Jieping Ye; | code |
| 190 | Can Large Language Models Master Complex Card Games? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the potential of LLMs in mastering complex card games. |
Wei Wang; Fuqing Bie; Junzhe Chen; Dan Zhang; Shiyu Huang; Evgeny Kharlamov; Jie Tang; | code |
| 191 | The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It refines the model’s existing knowledge rather than introducing entirely new behaviors. Building on this insight, we propose a simple variant of the RL objective that upweights NSR, and show that it consistently improves overall Pass@$k$ performance on MATH, AIME 2025, and AMC23. |
Xinyu Zhu; Mengzhou Xia; Zhepei Wei; Wei-Lin Chen; Danqi Chen; Yu Meng; | code |
| 192 | Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we advance the latter category by tailoring the first extension of high-order numerical inference schemes to discrete diffusion models, enabling larger step sizes while reducing error. |
Yinuo Ren; Haoxuan Chen; Yuchen Zhu; Wei Guo; Yongxin Chen; Grant M. Rotskoff; Molei Tao; Lexing Ying; | code |
| 193 | Time-Evolving Dynamical System for Learning Latent Representations of Mouse Visual Neural Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the study of the biological visual system, naturalistic visual stimuli are inherently high-dimensional and time-dependent, leading to intricate dynamics within visual neural activity. |
Liwei Huang; Zhengyu Ma; Liutao Yu; Huihui Zhou; Yonghong Tian; | code |
| 194 | EchoShot: Multi-Shot Portrait Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose EchoShot, a native and scalable multi-shot framework for portrait customization built upon a foundation video diffusion model. |
Jiahao Wang; Hualian Sheng; Sijia Cai; Weizhan Zhang; Caixia Yan; Yachuang Feng; Bing Deng; Jieping Ye; | code |
| 195 | DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it often faces challenges in striking a trade-off between aligning with the target distribution: learning a novel concept from a limited image for personalization and retaining the instruction ability needed for unifying multiple tasks, all while maintaining editability (aligning with a variety of prompts or in-context generation). In this work, we introduce DEFT, Decompositional Efficient Fine-Tuning, an efficient fine-tuning framework that adapts a pre-trained weight matrix by decomposing its update into two components with two trainable matrices: (1) a projection onto the complement of a low-rank subspace spanned by a low-rank matrix, and (2) a low-rank update. |
Komal Kumar; Rao Muhammad Anwer; Fahad Shahbaz Khan; Salman Khan; Ivan Laptev; Hisham Cholakkal; | code |
| 196 | ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. |
Yuxuan Song; Zhe Zhang; Yu Pei; Jingjing Gong; Qiying Yu; Zheng Zhang; Mingxuan Wang; Hao Zhou; Jingjing Liu; Wei-Ying Ma; | code |
| 197 | Real-World Reinforcement Learning of Active Perception Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple real-world robot learning recipe to efficiently train active perception policies. |
Edward S. Hu; Jie Wang; Xingfang Yuan; Fiona Luo; Muyao Li; Gaspard Lambrechts; Oleh Rybkin; Dinesh Jayaraman; | code |
| 198 | 4KAgent: Agentic Any Image to 4K Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). |
Yushen Zuo; Qi Zheng; Mingyang Wu; Xinrui Jiang; Renjie Li; Jian Wang; Yide Zhang; Gengchen Mai; Lihong Wang; James Zou; Xiaoyu Wang; Ming-Hsuan Yang; Zhengzhong Tu; | code |
| 199 | CSGO: Content-Style Composition in Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Empowered by IMAGStyle, we propose CSGO, a unified, end-to-end trainable framework that decouples content and style representations via independent feature injection. |
Peng Xing; Haofan Wang; Yanpeng Sun; wangqixun; Baixu; Hao Ai; Jen-Yuan Huang; Zechao Li; | code |
| 200 | On The Edge of Memorization in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a scientific and mathematical “laboratory” for investigating memorization and generalization in diffusion models trained on fully synthetic or natural image-like structured data. |
Sam Buchanan; Druv Pai; Yi Ma; Valentin De Bortoli; | code |
| 201 | Neural-Driven Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging recent advances in brain-computer interfaces (BCIs) and generative models, we propose LoongX, a hands-free image editing approach driven by multimodal neurophysiological signals. |
Pengfei Zhou; Jie Xia; Xiaopeng Peng; Wangbo Zhao; Zilong Ye; Zekai Li; Suorong Yang; Jiadong Pan; Yuanxiang Chen; Ziqiao Wang; Kai Wang; Qian Zheng; Xiaojun Chang; Gang Pan; Shurong Dong; Kaipeng Zhang; Yang You; | code |
| 202 | FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, robotic manipulation tasks of varying complexity demand different levels of modeling precision across these frequency bands. Motivated by this, we propose a novel paradigm for visuomotor policy learning that progressively models hierarchical frequency components. |
Yiming Zhong; Yumeng Liu; Chuyang Xiao; Zemin Yang; Youzhuo Wang; Yufei Zhu; Ye Shi; Yujing Sun; Xinge Zhu; Yuexin Ma; | code |
| 203 | A Pre-training Framework for Relational Data with Information-theoretic Principles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By incorporating knowledge of the underlying distribution that drives label generation, downstream tasks can benefit from relevant side-channel information. To bridge this gap, we introduce Task Vector Estimation (TVE), a novel pre-training framework that constructs predictive supervisory signals via set-based aggregation over schema traversal graphs, explicitly modeling next-window relational dynamics. |
Quang Truong; Zhikai Chen; Mingxuan Ju; Tong Zhao; Neil Shah; Jiliang Tang; | code |
| 204 | Proxy Target: Bridging The Gap Between Discrete Spiking Neural Networks and Continuous Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between discrete SNNs and continuous-control algorithms, we propose a novel proxy target framework. |
Zijie Xu; Tong Bu; Zecheng Hao; Jianhao Ding; Zhaofei Yu; | code |
| 205 | Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This divergence reveals a critical limitation of previous global normalization strategies in TTA, which inevitably distort the original data characteristics. Based on this insight, we propose Feature-based Instance Neighbor Discovery (FIND), which comprises three key components: Layer-Wise Feature Disentanglement (LFD), Feature-Aware Batch Normalization (FABN) and Selective FABN (S-FABN). |
Qinting Jiang; Chuyang Ye; Dongyan Wei; Bingli Wang; Yuan Xue; Jingyan Jiang; Zhi Wang; | code |
| 206 | Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we disentangle MAD into two key components–Majority Voting and inter-agent Debate–and assess their respective contributions. Through extensive experiments across seven NLP benchmarks, we find that Majority Voting alone accounts for most of the performance gains typically attributed to MAD. |
Hyeong Kyu Choi; Jerry Zhu; Sharon Li; | code |
| 207 | Learning to Instruct for Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose L2T, an advancement of visual instruction tuning (VIT). |
Zhihan Zhou; Feng Hong; Jiaan Luo; Yushi Ye; Jiangchao Yao; Dongsheng Li; Bo Han; Ya Zhang; Yanfeng Wang; | code |
| 208 | GauSAM: Contour‑Guided 2D Gaussian Fields for Multi‑Scale Medical Image Segmentation with Segment Anything Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Effective multiscale medical image segmentation requires simultaneously preserving smooth spatial continuity and accurately delineating high-frequency boundaries, yet pixel-wise decoders often fail to maintain this balance consistently across varying resolutions. We introduce GauSAM, which seamlessly integrates contour‑guided 2D Gaussian probability fields into the Segment Anything Model to address these challenges. |
Jinxuan Wu; Jiange Wang; Dongdong Zhang; | code |
| 209 | Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a Suggestion-aware Group Relative Policy Optimization (S-GRPO) strategy to construct our pre-operative critic model GUI-Critic-R1, incorporating a novel suggestion reward to enhance the reliability of the model’s feedback. |
Yuyang Wanyan; Xi Zhang; Haiyang Xu; Haowei Liu; Junyang Wang; Jiabo Ye; Yutong Kou; Ming Yan; Fei Huang; Xiaoshan Yang; Weiming Dong; Changsheng Xu; | code |
| 210 | GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce GeoSVR, an explicit voxel-based framework that explores and extends the under-investigated potential of sparse voxels for achieving accurate, detailed, and complete surface reconstruction. |
Jiahe Li; Jiawei Zhang; Youmin Zhang; Xiao Bai; Jin Zheng; Xiaohan Yu; Lin Gu; | code |
| 211 | LoMix: Learnable Weighted Multi-Scale Logits Mixing for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, the decoder output misses the complementary cues that arise only when coarse and fine predictions are fused. To address this issue, we introduce LoMix ($\underline{Lo}$gits $\underline{Mix}$ing), a Neural Architecture Search (NAS)‑inspired, differentiable plug-and-play module that **generates** new mixed‑scale outputs and **learns** how exactly each of them should guide the training process. |
Md Mostafijur Rahman; Radu Marculescu; | code |
| 212 | BioReason: Incentivizing Multimodal Biological Reasoning Within A DNA-LLM Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While current DNA foundation models excel at representing sequences, they struggle with multi-step reasoning and lack transparent, biologically meaningful explanations. BioReason addresses this by tightly integrating a DNA foundation model with a large language model (LLM), enabling the LLM to directly interpret and reason over genomic information. |
Adibvafa Fallahpour; Andrew Magnuson; Purav Gupta; Shihao Ma; Jack Naimer; Arnav Shah; Haonan Duan; Omar Ibrahim; Hani Goodarzi; Chris J. Maddison; BO WANG; | code |
| 213 | Polyline Path Masked Attention for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Polyline Path Masked Attention (PPMA) that integrates the self-attention mechanism of ViTs with an enhanced structured mask of Mamba2, harnessing the complementary strengths of both architectures. |
Zhongchen Zhao; Chaodong Xiao; Hui LIN; Qi Xie; Lei Zhang; Deyu Meng; | code |
| 214 | DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, they struggle to balance the commonality of degradation representations with restoration quality, often depending on complex compensation mechanisms that enhance fidelity at the expense of efficiency. To address these challenges, we introduce \textbf{DGSolver}, a diffusion generalist solver with universal posterior sampling. |
Hebaixu Wang; Jing Zhang; Haonan Guo; Di Wang; Jiayi Ma; Bo Du; | code |
| 215 | PC-Net: Weakly Supervised Compositional Moment Retrieval Via Proposal-Centric Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new task: weakly supervised compositional moment retrieval (WSCMR). |
Mingyao Zhou; Hao Sun; Wei Xie; Ming Dong; Chengji Wang; Mang Ye; | code |
| 216 | A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified reasoning framework that bridges the gap between temporal detection, spatial localization, and textual explanation. |
Dongheng Lin; Mengxue Qu; Kunyang Han; Jianbo Jiao; Xiaojie Jin; Yunchao Wei; | code |
| 217 | TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces TempSamp-R1, a new reinforcement fine-tuning framework designed to improve the effectiveness of adapting multimodal large language models (MLLMs) to video temporal grounding tasks. |
Yunheng Li; JingCheng; Shaoyong Jia; Hangyi Kuang; Shaohui Jiao; Qibin Hou; Ming-Ming Cheng; | code |
| 218 | Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents **Pixel-Perfect Depth**, a monocular depth estimation model based on pixel-space diffusion generation that produces high-quality, flying-pixel-free point clouds from estimated depth maps. |
Gangwei Xu; Haotong Lin; Hongcheng Luo; Xianqi Wang; Jingfeng Yao; Lianghui Zhu; Yuechuan Pu; Cheng Chi_; Haiyang Sun; BING WANG; Guang Chen; Hangjun Ye; Sida Peng; Xin Yang; | code |
| 219 | EA3D: Online Open-World 3D Object Extraction from Streaming Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ExtractAnything3D (EA3D), a unified online framework for open-world 3D object extraction that enables simultaneous geometric reconstruction and holistic scene understanding. |
Xiaoyu Zhou; Jingqi Wang; Yuang Jia; Yongtao Wang; Deqing Sun; Ming-Hsuan Yang; | code |
| 220 | Toward Relative Positional Encoding in Spiking Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce several strategies to approximate relative positional encoding (RPE) in spiking Transformers while preserving the binary nature of spikes. |
Changze Lv; Yansen Wang; Dongqi Han; Yifei Shen; Xiaoqing Zheng; Xuanjing Huang; Dongsheng Li; | code |
| 221 | SimWorld: An Open-ended Simulator for Agents in Physical and Social Worlds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SimWorld, a new simulator built on Unreal Engine 5, designed for developing and evaluating LLM/VLM agents in rich, real-world-like settings. |
Xiaokang Ye; Jiawei Ren; Yan Zhuang; Xuhong He; Yiming Liang; Yiqing Yang; Mrinaal Dogra; Xianrui Zhong; Eric Liu; Kevin Benavente; Rajiv Mandya Nagaraju; Dhruv Vivek Sharma; Ziqiao Ma; Tianmin Shu; Zhiting Hu; Lianhui Qin; | code |
| 222 | Online Optimization for Offline Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. |
Yassine Chemingui; Aryan Deshwal; Alan Fern; Thanh Nguyen-Tang; Jana Doppa; | code |
| 223 | Delving Into Large Language Models for Effective Time-Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an in-depth analysis that identifies two core challenges in understanding complex temporal dynamics and accurately localizing anomalies. |
Junwoo Park; Kyudan Jung; Dohyun Lee; Hyuck Lee; Daehoon Gwak; ChaeHun Park; Jaegul Choo; Jaewoong Cho; | code |
| 224 | MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the problem from a multi-view perspective and show that multi-view consistent material inference with more physically-based environment modeling is key to learning accurate reflections with Gaussian Splatting. |
Wenyuan Zhang; Jimin Tang; Weiqi Zhang; Yi Fang; Yu-Shen Liu; Zhizhong Han; | code |
| 225 | Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we pioneer the elucidation of optimization challenge caused by sparse-attention MIL and propose a novel MIL called ABMILX. |
Wenhao Tang; Rong Qin; Heng Fang; Fengtao Zhou; Hao Chen; Xiang Li; Ming-Ming Cheng; | code |
| 226 | Linear Differential Vision Transformer: Learning Visual Contrasts Via Pairwise Differentials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \emph{Visual–Contrast Attention} (VCA), a drop-in replacement for MHSA that injects an explicit notion of discrimination while reducing the theoretical complexity from $\mathcal{O}(N^{2}C)$ to $\mathcal{O}(N n C)$ with $n\! |
Yifan Pu; Jixuan Ying; Qixiu Li; Tianzhu Ye; Dongchen Han; Xiaochen Wang; Ziyi Wang; Xinyu Shao; Gao Huang; Xiu Li; | code |
| 227 | Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although large molecular language models have achieved notable success in task transfer, they often struggle to accurately analyze molecular features due to limited knowledge and reasoning capabilities. To address this issue, we present Mol-LLaMA, a large molecular language model that grasps the general knowledge centered on molecules and exhibits explainability and reasoning ability. |
Dongki Kim; Wonbin Lee; Sung Ju Hwang; | code |
| 228 | ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present **ThinkSound**, a novel framework that leverages Chain-of-Thought (CoT) reasoning to enable stepwise, interactive audio generation and editing for videos. |
Huadai Liu; Kaicheng Luo; Jialei Wang; Wen Wang; Qian Chen; Zhou Zhao; Wei Xue; | code |
| 229 | SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent efforts have attempted to directly map source and target distributions via ODE-based approaches without inversion; however, these methods still yield suboptimal editing quality. In this work, we propose a flow decomposition-and-aggregation framework built upon an inversion-free formulation to address these limitations. |
Sung-Hoon Yoon; Minghan Li; Gaspard Beaudouin; Congcong Wen; Muhammad Rafay Azhar; Mengyu Wang; | code |
| 230 | Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Treating the original transformer-based model as the Pilot, we correspondingly design a Copilot model to refine the Pilot’s inference performance via logits rectification. |
Jiaru Zou; Yikun Ban; Zihao Li; Yunzhe Qi; Ruizhong Qiu; Ling Yang; Jingrui He; | code |
| 231 | EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents EgoBridge, a unified co-training framework that explicitly aligns the policy latent spaces between human and robot data using domain adaptation. |
Ryan Punamiya; Dhruv Patel; Patcharapong Aphiwetsa; Pranav Kuppili; Lawrence Y. Zhu; Simar Kareer; Judy Hoffman; Danfei Xu; | code |
| 232 | FedGPS: Statistical Rectification Against Data Heterogeneity in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, insights from these experiments highlight that sharing statistical information can mitigate heterogeneity by enabling clients to update with a global perspective. Motivated by this, we propose **FedGPS** (**Fed**erated **G**oal-**P**ath **S**ynergy), a novel framework that seamlessly integrates statistical distribution and gradient information from others. |
Zhiqin Yang; Yonggang Zhang; Chenxin Li; Yiu-ming Cheung; Bo Han; Yixuan Yuan; | code |
| 233 | BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new paradigm for constructing 3D VLAs. |
Peiyan Li; Yixiang Chen; Hongtao Wu; Xiao Ma; Xiangnan Wu; Yan Huang; Liang Wang; Tao Kong; Tieniu Tan; | code |
| 234 | Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in visual representations.To ensure that our evaluation aligns with human perception, we propose a benchmark derived from a large-scale user study. |
Mateusz Pach; Shyamgopal Karthik; Quentin Bouniot; Serge Belongie; Zeynep Akata; | code |
| 235 | Quartet: Native FP4 Training Can Be Optimal for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate hardware-supported FP4 training and introduce a new approach for accurate, end-to-end FP4 training with all the major computations (i.e., linear layers) in low precision. |
Roberto L. Castro; Andrei Panferov; Soroush Tabesh; Oliver Sieberling; Jiale Chen; Mahdi Nikdan; Saleh Ashkboos; Dan Alistarh; | code |
| 236 | VORTA: Efficient Video Diffusion Via Routing Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent acceleration methods enhance the efficiency by exploiting the local sparsity of attention scores; yet they often struggle with accelerating the long-range computation. To address this problem, we propose VORTA, an acceleration framework with two novel components: 1) a sparse attention mechanism that efficiently captures long-range dependencies, and 2) a routing strategy that adaptively replaces full 3D attention with specialized sparse attention variants. |
Wenhao Sun; Rong-Cheng Tu; Yifu Ding; Jingyi Liao; Zhao Jin; Shunyu Liu; Dacheng Tao; | code |
| 237 | LaRes: Evolutionary Reinforcement Learning with LLM-based Adaptive Reward Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous research focuses on exploration in policy parameter space, while overlooking the reward function search. To bridge this gap, we propose **LaRes**, a novel hybrid framework that achieves efficient policy learning through reward function search. |
Pengyi Li; Hongyao Tang; Jinbin Qiao; YAN ZHENG; Jianye HAO; | code |
| 238 | Learning to Insert for Constructive Neural Vehicle Routing Solver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the idea of the insertion-based paradigm and propose Learning to Construct with Insertion-based Paradigm (L2C-Insert), a novel learning-based method for constructive NCO. |
Fu Luo; Xi Lin; Mengyuan Zhong; Fei Liu; Zhenkun Wang; Jianyong Sun; Qingfu Zhang; | code |
| 239 | Hierachical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Hierarchical Balance Packing (HBP), which designs a novel batch-construction method and training recipe to address those inefficiencies. |
Yongqiang Yao; Jingru Tan; Kaihuan Liang; Feizhao Zhang; Jiahao Hu; Shuo Wu; Yazhe Niu; Ruihao Gong; Dahua Lin; Ningyi Xu; | code |
| 240 | MOSDT: Self-Distillation-Based Decision Transformer for Multi-Agent Offline Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MOSDT, the first algorithm designed for multi-agent offline safe reinforcement learning (MOSRL), alongside MOSDB, the first dataset and benchmark for this domain. |
Yuchen Xia; Yunjian Xu; | code |
| 241 | DictPFL: Efficient and Private Federated Learning on Encrypted Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present **DictPFL**, a practical framework that achieves full gradient protection with minimal overhead. |
Jiaqi Xue; Mayank Kumar; Yuzhang Shang; Shangqian Gao; Rui Ning; Mengxin Zheng; Xiaoqian Jiang; Qian Lou; | code |
| 242 | Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate using physical forces as a control signal for video generation and propose force prompts which enable users to interact with images through both localized point forces, such as poking a plant, and global wind force fields, such as wind blowing on fabric. |
Nate Gillman; Charles Herrmann; Michael Freeman; Daksh Aggarwal; Evan Luo; Deqing Sun; Chen Sun; | code |
| 243 | Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Bifrost-1, a unified framework that bridges pretrained multimodal LLMs (MLLMs) and diffusion models using patch-level CLIP image embeddings as latent variables, which are natively aligned with the MLLM’s CLIP visual encoder. |
Han Lin; Jaemin Cho; Amir Zadeh; Chuan Li; Mohit Bansal; | code |
| 244 | $\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $\texttt{G1}$, a simple yet effective approach demonstrating that Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale LLMs’ graph reasoning abilities. |
Xiaojun Guo; Ang Li; Yifei Wang; Stefanie Jegelka; Yisen Wang; | code |
| 245 | CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce CoVoMix2, a fully non-autoregressive framework for zero-shot multi-talker dialogue generation. |
leying zhang; Yao Qian; Xiaofei Wang; Manthan Thakker; Dongmei Wang; Jianwei Yu; Haibin Wu; Yuxuan Hu; Jinyu Li; Yanmin Qian; sheng zhao; | code |
| 246 | Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate domain specialization and expert redundancy in large-scale MoE models and uncover a consistent behavior we term~\emph{few-shot expert localization}, with only a few in-domain demonstrations, the model consistently activates a sparse and stable subset of experts on tasks within the same domain. |
zican Dong; Han Peng; Peiyu Liu; Xin Zhao; Dong Wu; Feng Xiao; Zhifeng Wang; | code |
| 247 | Practical and Effective Code Watermarking for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Code watermarking offers a potential solution but faces unique challenges due to programming languages’ strict syntactic constraints and semantic requirements. To address these challenges, we introduce ACW (AST-guided Code Watermarking), a novel adaptive framework that leverages Abstract Syntax Tree (AST) analysis during training to learn watermark embedding strategies. |
Zhimeng Guo; Minhao Cheng; | code |
| 248 | Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing strategies, such as copy-paste augmentation and text-to-image generation, often fail to preserve the correct object category or produce backgrounds coherent with the target domain, making them non-trivial to apply directly to CD-FSOD. To address these challenges, we propose Domain-RAG, a training-free, retrieval-guided compositional image generation framework tailored for CD-FSOD. |
Yu Li; Xingyu Qiu; Yuqian Fu; Jie Chen; Tianwen Qian; Xu Zheng; Danda Pani Paudel; Yanwei Fu; Xuanjing Huang; Luc Van Gool; Yu-Gang Jiang; | code |
| 249 | PLANA3R: Zero-shot Metric Planar 3D Reconstruction Via Feed-forward Planar Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using planar 3D primitives — a well-suited representation for man-made environments — we introduce PLANA3R, a pose-free framework for metric $\underline{Plana}$r $\underline{3}$D $\underline{R}$econstruction from unposed two-view images. |
Changkun Liu; Bin Tan; Zeran Ke; Shangzhan Zhang; Jiachen Liu; Ming Qian; Nan Xue; Yujun Shen; Tristan Braud; | code |
| 250 | Panoptic Captioning: An Equivalence Bridge for Image and Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces panoptic captioning, a novel task striving to seek the minimum text equivalent of images, which has broad potential applications. |
Kun-Yu Lin; Hongjun Wang; Weining Ren; Kai Han; | code |
| 251 | Towards Unsupervised Open-Set Graph Domain Adaptation Via Dual Reprogramming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of unsupervised open-set graph domain adaptation, where the goal is to not only correctly classify target nodes into the known classes, but also recognize previously unseen node types into the unknown class. Towards this end, we propose a novel framework called GraphRTA, which conducts reprogramming on both the graph and model sides. |
Zhen Zhang; Bingsheng He; | code |
| 252 | Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify this as a new and stealthy threat surface that allows malicious tools to be preferentially selected by LLM agents, without requiring prompt injection or access to model internals. To demonstrate and exploit this vulnerability, we propose the Attractive Metadata Attack (AMA), a black-box in-context learning framework that generates highly attractive but syntactically and semantically valid tool metadata through iterative optimization. |
Kanghua Mo; Li Hu; Yucheng Long; Zhihao li; | code |
| 253 | Towards Robust Parameter-Efficient Fine-Tuning for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) reduce resource demands but suffer from aggregation discrepancies and heightened vulnerability to label noise, particularly in heterogeneous federated settings. In this paper, we introduce RFedLR, a robust federated PEFT framework designed to overcome these challenges. |
Xiuwen Fang; Mang Ye; | code |
| 254 | HOI-Dyn: Learning Interaction Dynamics for Human-Object Motion Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present HOI-Dyn, a novel framework that formulates HOI generation as a driver-responder system, where human actions drive object responses. |
Lin Wu; Zhixiang Chen; Jianglin Lan; | code |
| 255 | COLA: Towards Efficient Multi-Objective Reinforcement Learning with Conflict Objective Regularization in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper introduces a novel framework, Conflict Objective Regularization in Latent Space (**COLA**). |
Pengyi Li; Hongyao Tang; Yifu Yuan; Jianye HAO; Zibin Dong; YAN ZHENG; | code |
| 256 | MiniF2F-Lean Revisited: Reviewing Limitations and Charting A Path Forward Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We proceed with correcting all the errors, discrepancies and simplifications in formal and informal statements, and present the _miniF2F-v2_ with fully verified formal and informal statements and proofs. |
Azim Ospanov; Farzan Farnia; Roozbeh Yousefzadeh; | code |
| 257 | APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present APOLLO (**A**utomated **P**r**O**of repair via**L**LM and **L**ean c**O**llaboration), a modular, model‑agnostic agentic framework that combines the strengths of the Lean compiler with an LLM’s reasoning abilities to achieve better proof‐generation results at a low token and sampling budgets. |
Azim Ospanov; Farzan Farnia; Roozbeh Yousefzadeh; | code |
| 258 | WISA: World Simulator Assistant for Physics-aware Text-to-video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation stems primarily from the lack of explicit physical guidance, caused by a significant gap between high-level physical concepts and the generative capabilities of current models. To address this challenge, we propose the **W**orld **S**imulator **A**ssistant (**WISA**), a novel framework designed to systematically decompose and integrate physical principles into T2V models. |
Jing Wang; Ao Ma; Ke Cao; Jun Zheng; Jiasong Feng; Zhanjie Zhang; Wanyuan Pang; Xiaodan Liang; | code |
| 259 | AgentAuditor: Human-level Safety and Security Evaluation for LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing rule-based or LLM-based evaluators often miss dangers in agents’ step-by-step actions, overlook subtle meanings, fail to see how small issues compound, and get confused by unclear safety or security rules. To overcome this evaluation crisis, we introduce AgentAuditor, a universal, training-free, memory-augmented reasoning framework that empowers LLM evaluators to emulate human expert evaluators. |
Hanjun Luo; Shenyu Dai; Chiming Ni; Xinfeng Li; Guibin Zhang; Kun Wang; Tongliang Liu; Hanan Salam; | code |
| 260 | SEMPO: Lightweight Foundation Models for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite impressive performance across diverse downstream forecasting tasks, existing time series FMs possess massive network architectures and require substantial pre-training on large-scale datasets, which significantly hinders their deployment in resource-constrained environments. In response to this growing tension between versatility and affordability, we propose **SEMPO**, a novel lightweight foundation model that requires pretraining on relatively small-scale data, yet exhibits strong general time series forecasting. |
Hui He; Kun Yi; Yuanchi Ma; Qi Zhang; Zhendong Niu; Guansong Pang; | code |
| 261 | BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a surprisingly simple method to enhance the robustness of image protection methods against noise reversal techniques. |
Jinsu Kim; Yunhun Nam; Minseon Kim; Sangpil Kim; Jongheon Jeong; | code |
| 262 | VA-GS: Enhancing The Geometric Representation of Gaussian Splatting Via View Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment (VA). |
Qing Li; Huifang Feng; Xun Gong; Yu-Shen Liu; | code |
| 263 | Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI Models in Sound Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, AI shows a pronounced bias toward vision, often failing to suppress irrelevant or conflicting visual input, leading to chance-level performance. To bridge this gap, we present EchoPin, a neuroscience-inspired multimodal model for SSL that emulates human auditory perception. |
Yanhao Jia; Ji Xie; S Jivaganesh; Li Hao; Xu Wu; Mengmi Zhang; | code |
| 264 | TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **TC-Light**, a novel paradigm characterized by the proposed two-stage post optimization mechanism. |
Yang Liu; Chuanchen Luo; Zimo Tang; Yingyan Li; yuran Yang; Yuanyong Ning; Lue Fan; Junran Peng; Zhaoxiang Zhang; | code |
| 265 | TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing efforts towards transferability primarily involve learning embedding vectors for trajectories, which perform poorly in region transfer and require retraining of prediction modules for task transfer. To address these challenges, we propose $\textit{TransferTraj}$, a vehicle GPS trajectory learning model that excels in both region and task transferability. |
Tonglong Wei; Yan Lin; Zeyu Zhou; Haomin Wen; Jilin Hu; Shengnan Guo; Youfang Lin; Gao Cong; Huaiyu Wan; | code |
| 266 | PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Third, extracting road conditions for missing points is non-trivial. To address these challenges, we propose $\textit{PLMTrajRec}$, a novel trajectory recovery model. |
Tonglong Wei; Yan Lin; Youfang Lin; Shengnan Guo; Jilin Hu; Haitao Yuan; Gao Cong; Huaiyu Wan; | code |
| 267 | Learning and Planning Multi-Agent Tasks Via An MoE-based World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we find that many tasks exhibit **bounded similarity** in their underlying dynamics—highly similar within certain groups (e.g., door-open/close) diverge significantly between unrelated tasks (e.g., door-open \& object-catch). To leverage this property, we reconsider the role of modularity in multi-task learning, and propose **M3W**, a novel approach that applies mixture-of-experts (MoE) to world model instead of policy, enabling both learning and planning. |
Zijie Zhao; Zhongyue Zhao; Kaixuan Xu; Yuqian Fu; Jiajun Chai; Yuanheng Zhu; Dongbin Zhao; | code |
| 268 | FairNet: Dynamic Fairness Correction Without Performance Loss Via Contrastive Conditional LoRA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, their utilization of sensitive attributes is often suboptimal, either depending excessively on complete attribute labeling or disregarding these attributes entirely. To overcome these limitations, we propose FairNet, a novel framework for dynamic, instance-level fairness correction. |
Songqi Zhou; Zeyuan Liu; Benben Jiang; | code |
| 269 | Mitigating Overthinking in Large Reasoning Models Via Manifold Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to mitigate overthinking by investigating the underlying mechanisms from the perspective of mechanistic interpretability. |
Yao Huang; Huanran Chen; Shouwei Ruan; Yichi Zhang; Xingxing Wei; Yinpeng Dong; | code |
| 270 | Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our framework enhances both approaches by reducing training variance and providing a provably lower-variance gradient estimator. |
Vinh Tong; Dung Trung Hoang; Anji Liu; Guy Van den Broeck; Mathias Niepert; | code |
| 271 | RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches struggle to effectively utilize complex radiology reports for learning and offer limited interpretability through attention probability visualizations. To address these challenges, we introduce $\textbf{RadZero}$, a novel framework for VL alignment in chest X-ray with zero-shot multi-task capability. |
Jonggwon Park; Byungmu Yoon; Soobum Kim; Kyoyun Choi; | code |
| 272 | Hyperbolic Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing DM methods, constrained to Euclidean space, treat data as independent and identically distributed points, overlooking complex geometric and hierarchical relationships. To overcome this limitation, we propose a novel hyperbolic dataset distillation method, termed HDD. |
Wenyuan Li; Guang Li; Keisuke Maeda; Takahiro Ogawa; Miki Haseyama; | code |
| 273 | CellCLIP – Learning Perturbation Effects in Cell Painting Via Text-Guided Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the application of such methods to HCS data is not straightforward due to substantial differences in the semantics of Cell Painting images compared to natural images, and the difficulty of representing different classes of perturbations (e.g. small molecule vs CRISPR gene knockout) in a single latent space. In response to these challenges, here we introduce CellCLIP, a cross-modal contrastive learning framework for HCS data. |
MingYu Lu; Ethan Weinberger; Chanwoo Kim; Su-In Lee; | code |
| 274 | Neural Fractional Attention Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a generalized neural Fractional Attention Differential Equation (FADE), which combines the memory-retention capabilities of fractional calculus with contextual learnable attention mechanisms. |
Qiyu Kang; Wenjun Cui; Xuhao Li; Yuxin Ma; Xueyang Fu; Wee Peng Tay; Yidong Li; Zheng-Jun Zha; | code |
| 275 | Multi-agent KTO: Enhancing Strategic Interactions of Large Language Model in Language Game Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by Wittgenstein’s language game theory, we propose that language agents can learn through in-context interaction rather than traditional multi-stage frameworks that separate decision-making from language expression. |
Rong Ye; Yongxin Zhang; Yikai Zhang; Haoyu Kuang; peng sun; zhongyu wei; | code |
| 276 | VGGT-SLAM: Dense RGB SLAM Optimized on The SL(4) Manifold Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present VGGT-SLAM, a dense RGB SLAM system constructed by incrementally and globally aligning submaps created from the feed-forward scene reconstruction approach VGGT using only uncalibrated monocular cameras. |
Dominic Rosario Maggio; Hyungtae Lim; Luca Carlone; | code |
| 277 | LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven By Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the analyzing and reasoning capability of large language models (LLMs), we design **LLM-Explorer** to adaptively generate task-specific exploration strategies with LLMs, enhancing the policy exploration in RL. |
Qianyue Hao; Yiwen Song; Qingmin Liao; Jian Yuan; Yong Li; | code |
| 278 | Rethinking Fair Federated Learning from Parameter and Client View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a dynamic aggregation strategy that adaptively weights clients based on local update directions and performance variations. |
Kaiqi Guan; Wenke Huang; Xianda Guo; Yueyang Yuan; Bin Yang; Mang Ye; | code |
| 279 | Un$^2$CLIP: Improving CLIP’s Visual Detail Capturing Ability Via Inverting UnCLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that a specific type of generative models, unCLIP, provides a suitable framework for achieving our goal. |
Yinqi Li; Jiahe Zhao; Hong Chang; RuiBing Hou; Shiguang Shan; Xilin Chen; | code |
| 280 | Panacea: Mitigating Harmful Fine-tuning for Large Language Models Via Post-fine-tuning Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a by-product, we analyze the adaptive perturbation and show that different layers in various LLMs have distinct safety coefficients. |
Yibo Wang; Tiansheng Huang; Li Shen; Huanjin Yao; Haotian Luo; Rui Liu; Naiqiang Tan; Jiaxing Huang; Dacheng Tao; | code |
| 281 | Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens Are Information Peaks in LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the reasoning trajectories of LRMs from an information-theoretic perspective. |
Chen Qian; Dongrui Liu; Haochen Wen; Zhen Bai; Yong Liu; Jing Shao; | code |
| 282 | SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore whether low-bit attention can be effectively applied to training tasks, we design an accurate and efficient $\texttt{8-bit}$ attention for both forward and backward propagation. |
Jintao Zhang; Jia wei; Haoxu Wang; Pengle Zhang; Xiaoming Xu; Haofeng Huang; Kai Jiang; Jun Zhu; Jianfei Chen; | code |
| 283 | Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose Neptune-X, a data-centric generative-selection framework that enhances training effectiveness by leveraging synthetic data generation with task-aware sample selection.To support robust benchmarking, we construct the Maritime Generation Dataset, the first dataset tailored for generative maritime learning, encompassing a wide range of semantic conditions. |
Yu Guo; Shengfeng He; Yuxu Lu; Haonan An; Yihang Tao; Huilin Zhu; Jingxian Liu; Yuguang Fang; | code |
| 284 | A Token Is Worth Over 1,000 Tokens: Efficient Knowledge Distillation Through Low-Rank Clone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches often face three key challenges: (1) information loss from hard pruning, (2) inefficient alignment of representations, and (3) underutilization of informative activations, particularly from Feed-Forward Networks (FFNs). To address these challenges, we introduce \textbf{Low-Rank Clone (LRC)}, an efficient pre-training method that constructs SLMs aspiring to behavioral equivalence with strong teacher models. |
Jitai Hao; Qiang Huang; Hao Liu; Xinyan Xiao; Zhaochun Ren; Jun Yu; | code |
| 285 | GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a principled approach inspired by universal guidance. |
Sayan Deb Sarkar; Sinisa Stekovic; Vincent Lepetit; Iro Armeni; | code |
| 286 | Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. |
Xinji Mai; Haotian Xu; Xing W; Weinong Wang; Yingying Zhang; Wenqiang Zhang; | code |
| 287 | Preference Optimization By Estimating The Ratio of The Data Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose \textit{Bregman preference optimization} (BPO), a generalized framework for ratio matching that provides a family of objective functions achieving target policy optimality. |
Yeongmin Kim; HeeSun Bae; Byeonghu Na; Il-chul Moon; | code |
| 288 | Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel test-time iterative optimization framework to address this, employing a closed-loop system where LLMs iteratively refine code based on empirical performance feedback from an execution sandbox.We released our code and data at https://github.com/Elfsong/Afterburner. |
Mingzhe Du; Anh Tuan Luu; Yue Liu; Yuhao QING; Dong HUANG; Xinyi He; Qian Liu; Zejun MA; See-Kiong Ng; | code |
| 289 | DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of recent state space models in efficiently capturing long-term dependencies, we propose DyG-Mamba by translating dynamic graph modeling into a long-term sequence modeling problem. |
Dongyuan Li; Shiyin Tan; Ying Zhang; Ming Jin; Shirui Pan; Manabu Okumura; Renhe Jiang; | code |
| 290 | Incentivizing LLMs to Self-Verify Their Answers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that the limited improvement stems from distribution discrepancies between the specific post-trained generator and the general reward model. To address this, we propose a framework that incentivizes LLMs to self-verify their own answers. |
Fuxiang Zhang; Jiacheng Xu; Chaojie Wang; Ce Cui; Yang Liu; Bo An; | code |
| 291 | VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While VLMs perform well on such benchmarks, it is unclear whether they grasp underlying visual and contextual signals or simply exploit visual-language correlations. To fill this gap, we propose incorporating negative-control tests, i.e., videos depicting physically impossible or logically inconsistent scenarios, and evaluating whether models can recognize these violations. |
Zongxia Li; Xiyang Wu; Guangyao Shi; Yubin Qin; Hongyang Du; Tianyi Zhou; Dinesh Manocha; Jordan Lee Boyd-Graber; | code |
| 292 | VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding Via Iterative Reasoning with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As reinforcement learning (RL) has been proven to be beneficial for model reasoning, we introduce VRAG-RL, a novel RL framework tailored for complex reasoning across visually rich information. |
Qiuchen Wang; Ruixue Ding; Yu Zeng; Zehui Chen; Lin Chen; Shihang Wang; Pengjun Xie; Fei Huang; Feng Zhao; | code |
| 293 | Adaptive Defense Against Harmful Fine-Tuning for Large Language Models Via Bayesian Data Scheduler Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing defense strategies preemptively build robustness via attack simulation but suffer from fundamental limitations: (i) the infeasibility of extending attack simulations beyond bounded threat models due to the inherent difficulty of anticipating unknown attacks, and (ii) limited adaptability to varying attack settings, as simulation fails to capture their variability and complexity. To address these challenges, we propose Bayesian Data Scheduler (BDS), an adaptive tuning-stage defense strategy with no need for attack simulation. |
Zixuan Hu; Li Shen; Zhenyi Wang; Yongxian Wei; Dacheng Tao; | code |
| 294 | UFO: A Unified Approach to Fine-grained Visual Perception Via Open-ended Language Interface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is primarily because these tasks often rely heavily on task-specific designs and architectures that can complicate the modeling process. To address this challenge, we present UFO, a framework that unifies fine-grained visual perception tasks through an open-ended language interface. |
Hao Tang; Chen-Wei Xie; Haiyang Wang; Xiaoyi Bao; Tingyu Weng; Pandeng Li; Yun Zheng; Liwei Wang; | code |
| 295 | Breakthrough Sensor-Limited Single View: Towards Implicit Temporal Dynamics for Time Series Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the limitation, we propose **EDEN** (multiple **E**xplicit **D**omain **E**nhanced adaptation **N**etwork), expanding the raw dataset to multi-scale explicit domains, multi-subspace explicit domains and multi-segment explicit domains. |
Mingyang Liu; Xinyang Chen; Xiucheng Li; Weili Guan; Liqiang Nie; | code |
| 296 | Accelerating RL for LLM Reasoning with Optimal Advantage Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose $A^\star$-PO, a novel two-stage policy optimization framework that directly approximates the optimal advantage function and enables efficient training of LLMs for reasoning tasks. |
Kianté Brantley; Mingyu Chen; Zhaolin Gao; Jason D. Lee; Wen Sun; Wenhao Zhan; Xuezhou Zhang; | code |
| 297 | UniMRSeg: Unified Modality-Relax Segmentation Via Hierarchical Self-Supervised Compensation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a unified modality-relax segmentation network (UniMRSeg) through hierarchical self-supervised compensation (HSSC). |
Xiaoqi Zhao; Youwei Pang; Chenyang Yu; Lihe Zhang; Huchuan Lu; Shijian Lu; Georges El Fakhri; Xiaofeng Liu; | code |
| 298 | How to Build A Consistency Model: Learning Flow Maps Via Self-distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: al. (2024), we present a systematic algorithmic framework for directly learning the flow map associated with a flow or diffusion model. |
Nicholas Matthew Boffi; Michael Samuel Albergo; Eric Vanden-Eijnden; | code |
| 299 | Covariances for Free: Exploiting Mean Distributions for Training-free Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a training-free method based on an unbiased estimator of class covariance matrices which only uses first-order statistics in the form of class means communicated by clients to the server. |
Dipam Goswami; Simone Magistri; Kai Wang; Bartłomiej Twardowski; Andrew D. Bagdanov; Joost van de Weijer; | code |
| 300 | Fast Training of Large Kernel Models with Delayed Projections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. |
Amirhesam Abedsoltan; Siyuan Ma; Parthe Pandit; Mikhail Belkin; | code |
| 301 | ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \method{}, which fundamentally reimagines KV cache compression by treating semantic chunks – rather than isolated tokens – as basic compression units. |
Xiang Liu; Zhenheng Tang; Peijie Dong; Zeyu Li; Liuyue; Bo Li; Xuming Hu; Xiaowen Chu; | code |
| 302 | Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods using Direct Preference Optimization (DPO) constrain optimization to a solitary image reference within the input sequence, neglecting holistic context modeling. To address this, we propose Context-to-Cue Direct Preference Optimization (CcDPO), a multi-level preference optimization framework that enhances per-image perception in multi-image settings by zooming into visual clues—from sequential context to local details. |
Xudong Li; Mengdan Zhang; Peixian Chen; Xiawu Zheng; Yan Zhang; Jingyuan Zheng; Yunhang Shen; Ke Li; Chaoyou Fu; Xing Sun; Rongrong Ji; | code |
| 303 | NeuSymEA: Neuro-symbolic Entity Alignment Via Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose NeuSymEA, a unified neuro-symbolic reasoning framework that combines the strengths of both methods to fully exploit the cross-KG structural pattern for robust entity alignment. |
Shengyuan Chen; Zheng Yuan; Qinggang Zhang; Wen Hua; Jiannong Cao; Xiao Huang; | code |
| 304 | VTON-VLLM: Aligning Virtual Try-On Models with Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To benchmark VTON models more holistically, we introduce VITON-Bench, a challenging test suite of complex try-on scenarios, and human-preference–aware metrics. |
Siqi Wan; Jingwen Chen; Qi Cai; Yingwei Pan; Ting Yao; Tao Mei; | code |
| 305 | Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we propose a novel post-training framework that enhances the generalization capabilities of LVLMs via reinforcement learning (RL).(3) TVGBench: we carefully construct a small but comprehensive and balanced benchmark suitable for LVLM evaluation, which is sourced from available public benchmarks. |
Ye Wang; Ziheng Wang; Boshen Xu; Yang Du; Kejun Lin; Zihan Xiao; Zihao Yue; Jianzhong Ju; Liang Zhang; Dingyi Yang; Xiangnan Fang; Zewen He; Zhenbo Luo; Wenxuan Wang; Junqi Lin; Jian Luan; Qin Jin; | code |
| 306 | Time-o1: Time-Series Forecasting Needs Transformed Label Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimization. To address these challenges, we propose Time-o1, a transformation-augmented learning objective for training time-series forecasting models. |
Hao Wang; Licheng Pan; Zhichao Chen; Xu Chen; Qingyang Dai; Lei Wang; Haoxuan Li; Zhouchen Lin; | code |
| 307 | Inverse Methods for Missing Data Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing iterative imputation methods exhibit two critical defects: (1) model misspecification, where a uniform parametric form of model is applied across different features, conflicting with heterogeneous data generation processes; (2) underuse of oracle features, where all features are treated as potentially missing, neglecting the valuable information in fully observed features. In this work, we propose kernel point imputation (KPI), a bi-level optimization framework designed to address these issues. |
Hao Wang; zhengnan li; Zhichao Chen; Xu Chen; Shuting He; Guangyi Liu; Haoxuan Li; Zhouchen Lin; | code |
| 308 | Personalized Subgraph Federated Learning with Differentiable Auxiliary Projections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce **Fed**erated learning with **Aux**iliary projections (FedAux), a personalized subgraph FL framework that learns to align, compare, and aggregate heterogeneously distributed local models without sharing raw data or node embeddings. |
Wei Zhuo; Zhaohuan Zhan; Han Yu; | code |
| 309 | Accelerating Multimodal Large Language Models Via Dynamic Visual-Token Exit and The Empirical Findings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the visual redundancy problem of multimodal large language models (MLLMs) from the perspective of attention behaviors. |
Qiong Wu; Wenhao Lin; Yiyi Zhou; Weihao Ye; Zhanpeng Zeng; Xiaoshuai Sun; Rongrong Ji; | code |
| 310 | Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite their strong generalization capabilities, these models often lack 3D consistency, a fundamental requirement for understanding scene geometry and motion, thereby causing severe spatial misalignment and temporal flickering in complex 3D environments. In this paper, we present Motion4D, a novel framework that addresses these challenges by integrating 2D priors from foundation models into a unified 4D Gaussian Splatting representation. |
Haoran Zhou; Gim Hee Lee; | code |
| 311 | Improving Retrieval-Augmented Generation Through Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although recent efforts have explored using reinforcement learning (RL) to optimize specific RAG components, these approaches often focus on simple pipelines with only two components or do not adequately address the complex interdependencies and collaborative interactions among the modules. To overcome these limitations, we propose treating the complex RAG pipeline with multiple components as a multi-agent cooperative task, in which each component can be regarded as an RL agent. |
Yiqun Chen; Lingyong Yan; Weiwei Sun; Xinyu Ma; Yi Zhang; Shuaiqiang Wang; Dawei Yin; Yiming Yang; Jiaxin Mao; | code |
| 312 | Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify directional gradient conflicts during PINN training as a critical bottleneck. |
Sifan Wang; Ananyae Kumar bhartari; Bowen Li; Paris Perdikaris; | code |
| 313 | Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To resolve this, we propose Dual Data Alignment (DDA), which aligns both the pixel and frequency domains.Moreover, we introduce two new test sets: DDA-COCO, containing DDA-aligned synthetic images, and EvalGEN, featuring the latest generative models. |
Ruoxin Chen; Junwei Xi; Zhiyuan Yan; Ke-Yue Zhang; Shuang Wu; Jingyi Xie; Xu Chen; Lei Xu; Isabel Guan; Taiping Yao; Shouhong Ding; | code |
| 314 | StelLA: Subspace Learning in Low-rank Adaptation Using Stiefel Manifold Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a geometry-aware extension of LoRA that uses a three-factor decomposition $USV^\top$. |
Zhizhong Li; Sina Sajadmanesh; Jingtao Li; Lingjuan Lyu; | code |
| 315 | Generalizable Reasoning Through Compositional Energy Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach to reasoning generalization by learning energy landscapes over the solution spaces of smaller, more tractable subproblems. |
Alexandru Oarga; Yilun Du; | code |
| 316 | Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel, unified theoretical framework for watermarking Large Language Models (LLMs) that jointly optimizes both the watermarking scheme and detector. |
Haiyun He; Yepeng Liu; Ziqiao Wang; Yongyi Mao; Yuheng Bu; | code |
| 317 | Injecting Frame-Event Complementary Fusion Into Diffusion for Optical Flow in Challenging Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on diffusion models, we propose a Multi-Condition Iterative Denoising Decoder.In addition, we propose a dual-modal optical flow dataset for generalization experiments. |
Haonan Wang; Hanyu Zhou; Haoyue Liu; Luxin Yan; | code |
| 318 | KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing through multi-steps motion processing and adaptive motion tracking. |
Weiji Xie; Jinrui Han; Jiakun Zheng; Huanyu Li; Xinzhe Liu; Jiyuan Shi; Weinan Zhang; Chenjia Bai; Xuelong Li; | code |
| 319 | STree: Speculative Tree Decoding for Hybrid State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Along with the algorithm, we describe a hardware-aware implementation that improves naive application of AR Transformer tree-based speculative decoding methods to SSMs. |
Yangchao Wu; Zongyue Qin; Alex Wong; Stefano Soatto; | code |
| 320 | Unbiased Prototype Consistency Learning for Multi-Modal and Multi-Task Object Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the practical requirements for unified retrieval, we introduce Multi-Modal and Multi-Task object ReID ($\rm {M^3T}$-ReID). |
Zhongao Zhou; Bin Yang; Wenke Huang; Jun Chen; Mang Ye; | code |
| 321 | DKDR: Dynamic Knowledge Distillation for Reliability in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DKDR (Dynamic Knowledge Distillation for Reliability in Federated Learning), which dynamically assigns weights to forward and reverse KLD based on knowledge discrepancies. |
Yueyang Yuan; Wenke Huang; Guancheng Wan; Kaiqi Guan; He Li; Mang Ye; | code |
| 322 | DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, for the first time, we discover that introducing regular intervals among the repeated textures, creating a grid structure, significantly enhances the patch attack performance. |
Yun Xing; Yue Cao; Nhat Chung; Jie Zhang; Ivor Tsang; Ming-Ming Cheng; Yang Liu; Lei Ma; Qing Guo; | code |
| 323 | Non-Line-of-Sight 3D Reconstruction with Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HoloRadar, a practical system that reconstructs both line-of-sight (LOS) and non-line-of-sight (NLOS) 3D scenes using a single mmWave radar. |
Haowen Lai; Zitong Lan; Mingmin Zhao; | code |
| 324 | FP4 All The Way: Fully Quantized Training of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we identify a theoretical and empirical threshold for effective quantized training: when the gradient norm falls below approximately $\sqrt{3}$ times the quantization noise, quantized training becomes less effective. Leveraging these insights, we successfully train a 7-billion-parameter model on 256 Intel Gaudi2 accelerators. |
Brian Chmiel; Maxim Fishman; Ron Banner; Daniel Soudry; | code |
| 325 | RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce RepLDM, a novel reprogramming framework for pretrained LDMs that enables high-quality, high-efficiency, high-resolution image generation; see Fig. 1. |
Boyuan Cao; Jiaxin Ye; Yujie Wei; Hongming Shan; | code |
| 326 | Bridging The Gap to Real-world Language-grounded Visual Concept Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches to language-grounded visual concept learning are limited to a few predefined primitive axes, such as color and shape, and are typically explored in synthetic datasets. In this work, we propose a scalable framework that adaptively identifies image-related concept axes and grounds visual concepts along these axes in real-world scenes. |
Whie Jung; Semin Kim; Junee Kim; Seunghoon Hong; | code |
| 327 | Disentangled Representation Learning Via Modular Compositional Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such divergent approaches result in significant overhead when novel factors of variation do not align with prior assumptions, such as statistical independence or spatial exclusivity, or when multiple factors coexist, as practitioners must redesign architectures or objectives. To address this, we propose a compositional bias, a modular inductive bias decoupled from both objectives and architectures. |
Whie Jung; Dong Hoon Lee; Seunghoon Hong; | code |
| 328 | EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most previous works learn from 1D text or 2D visual cues, such as bounding boxes, which inherently lack 3D understanding. To bridge this gap, we introduce EgoDTM, an Egocentric Depth- and Text-aware \textbf{M}odel, jointly trained through large-scale 3D-aware video pretraining and video-text contrastive learning. |
Boshen Xu; Yuting Mei; liu xinbi; Sipeng Zheng; Qin Jin; | code |
| 329 | FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches face two major challenges: high computational cost during training and inference, and limited scalability due to reliance on U-Net inductive bias. To address these challenges, we propose **FlashMo**, a frequency-aware sparse motion diffusion model that prunes low-frequency tokens to enhance efficiency without custom kernel design. |
Zeyu Zhang; Yiran Wang; Danning Li; Dong Gong; Ian Reid; Richard Hartley; | code |
| 330 | The Promise of RL for Autoregressive Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to study all these components in one consistent framework, we adopt an autoregressive multimodal model that processes textual and visual tokens in a unified manner.We release our code, training data, and trained models at [https://github.com/mair-lab/EARL](https://github.com/mair-lab/EARL). |
Saba Ahmadi; Rabiul Awal; Ankur Sikarwar; Amirhossein Kazemnejad; Ge Ya Luo; Juan A. Rodriguez; Sai Rajeswar; Siva Reddy; Christopher Pal; Benno Krojer; Aishwarya Agrawal; | code |
| 331 | Towards Irreversible Attack: Fooling Scene Text Recognition Via Multi-Population Coevolution Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These attack results still carry partial information about the original prediction and could be easily corrected by an external dictionary or a language model. Therefore, we propose the Multi-Population Coevolution Search (MPCS) method to attack each character in the image. |
Jingyu Li; Pengwen Dai; Mingqing Zhu; Chengwei Wang; Haolong Liu; Xiaochun Cao; | code |
| 332 | REArtGS: Reconstructing and Generating Articulated Objects Via 3D Gaussian Splatting with Geometric and Motion Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present REArtGS, a novel framework that introduces additional geometric and motion constraints to 3D Gaussian primitives, enabling realistic surface reconstruction and generation for articulated objects. |
Di Wu; Liu Liu; Zhou Linli; Anran Huang; Liangtu Song; Qiaojun Yu; Qi Wu; Cewu Lu; | code |
| 333 | Multi-step Visual Reasoning with Visual Tokens Scaling and Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel framework for inference-time visual token scaling that enables MLLMs to perform iterative, verifier-guided reasoning over visual content.To support this, we present a new dataset, VTS, comprising supervised reasoning trajectories (VTS-SFT) and preference-labeled reasoning comparisons (VTS-DPO). |
Tianyi Bai; Zengjie Hu; Fupeng Sun; Qiu Jiantao; Yizhen Jiang; Guangxin He; Bohan Zeng; Conghui He; Binhang Yuan; Wentao Zhang; | code |
| 334 | Scaling Language-centric Omnimodal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through analysis of anisotropy and kernel similarity structure, we empirically confirm that latent alignment emerges within MLLM representations, allowing CL to serve as a lightweight refinement stage. Leveraging this insight, we propose a Language-Centric Omnimodal Embedding framework, termed LCO-Embed. |
Chenghao Xiao; Hou Pong Chan; Hao Zhang; Weiwen Xu; Mahani Aljunied; Yu Rong; | code |
| 335 | Brain Network Science Modelling of Sparse Neural Networks Enables Transformers and LLMs to Perform As Fully Connected Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study aims to enlarge our current knowledge on the application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity. |
Yingtao Zhang; Diego Cerretti; Jialin Zhao; Wenjing Wu; Ziheng Liao; Umberto Michieli; Carlo Vittorio Cannistraci; | code |
| 336 | Enhancing Training Data Attribution with Representational Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Representation-based approaches are far more scalable, but typically rely on heuristic embeddings that are not optimized for attribution, limiting their fidelity. To address these challenges, we propose AirRep, a scalable, representation-based approach that closes this gap by learning task-specific and model-aligned representations optimized explicitly for TDA. |
Weiwei Sun; Haokun Liu; Nikhil Kandpal; Colin Raffel; Yiming Yang; | code |
| 337 | TARFVAE: Efficient One-Step Generative Time Series Forecasting Via TARFLOW Based VAE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents TARFVAE, a novel generative framework that combines the Transformer-based autoregressive flow (TARFLOW) and variational autoencoder (VAE) for efficient one-step generative time series forecasting. |
Jiawen Wei; Lan Jiang; Pengbo Wei; Ziwen Ye; Teng Song; Chen Chen; Guangrui Ma; | code |
| 338 | OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our constructed data, we develop an Image-Video Transfer Mixed (IVTM) training with image editing data to enable instructive editing for the subject in the customized video. |
Yuanhao Cai; He Zhang; Xi Chen; Jinbo Xing; Yiwei Hu; Yuqian Zhou; Kai Zhang; Zhifei Zhang; Soo Ye Kim; Tianyu Wang; Yulun Zhang; Xiaokang Yang; Zhe Lin; Alan Yuille; | code |
| 339 | SANSA: Unleashing The Hidden Semantics in SAM2 for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that, despite its class-agnostic pretraining, SAM2 already encodes rich semantic structure in its features. We propose SANSA (Semantically AligNed Segment Anything 2), a framework that makes this latent structure explicit, and repurposes SAM2 for few-shot segmentation through minimal task-specific modifications. |
Claudia Cuttano; Gabriele Trivigno; Giuseppe Averta; Carlo Masone; | code |
| 340 | Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose a novel TTA method tailored to adapting VLMs for segmentation during test time. |
Mehrdad Noori; David OSOWIECHI; Gustavo Adolfo Vargas Hakim; Ali Bahri; Moslem Yazdanpanah; Sahar Dastani; Farzad Beizaee; Ismail Ben Ayed; Christian Desrosiers; | code |
| 341 | Accident Anticipation Via Temporal Occurrence Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches rely on ambiguous binary supervision—labeling all frames in accident videos as positive—despite the fact that risk varies continuously over time, leading to unreliable learning and false alarms. To address this, we propose a novel paradigm that shifts the prediction target from current-frame risk scoring to directly estimating accident scores at multiple future time steps (e.g., 0.1s–2.0s ahead), leveraging precisely annotated accident timestamps as supervision. |
Tianhao Zhao; Yiyang Zou; Zihao Mao; Peilun Xiao; Yulin Huang; Hongda Yang; Yuxuan Li; Qun Li; Guobin Wu; Yutian Lin; | code |
| 342 | A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we propose Unified Video Fusion (UniVF), a novel and unified framework for video fusion that leverages multi-frame learning and optical flow-based feature warping for informative, temporally coherent video fusion.To support its development, we also introduce Video Fusion Benchmark (VF-Bench), the first comprehensive benchmark covering four video fusion tasks: multi-exposure, multi-focus, infrared-visible, and medical fusion. |
Zixiang Zhao; Haowen Bai; Bingxin Ke; Yukun Cui; Lilun Deng; Yulun Zhang; Kai Zhang; Konrad Schindler; | code |
| 343 | S$^2$M-Former: Spiking Symmetric Mixing Branchformer for Brain Auditory Attention Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent advancements, EEG-based AAD remains hindered by the absence of synergistic frameworks that can fully leverage complementary EEG features under energy-efficiency constraints. We propose ***S$^2$M-Former***, a novel ***s***piking ***s***ymmetric ***m***ixing framework to address this limitation through two key innovations: i) Presenting a spike-driven symmetric architecture composed of parallel spatial and frequency branches with mirrored modular design, leveraging biologically plausible token-channel mixers to enhance complementary learning across branches; ii) Introducing lightweight 1D token sequences to replace conventional 3D operations, reducing parameters by 14.7$\times$. |
Jiaqi Wang; Zhengyu Ma; Xiongri Shen; Chenlin Zhou; Leilei Zhao; Han Zhang; Yi Zhong; Siqi Cai; Zhenxi Song; Zhiguo Zhang; | code |
| 344 | Projection-Manifold Regularized Latent Diffusion for Robust General Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes PDFuse, a robust, general training-free image fusion framework built on pre-trained latent diffusion models with projection–manifold regularization. |
Lei Cao; Hao Zhang; Chunyu Li; Jiayi Ma; | code |
| 345 | Towards Implicit Aggregation: Robust Image Representation for Place Recognition in The Transformer Era Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Visual place recognition (VPR) is typically regarded as a specific image retrieval task, whose core lies in representing images as global descriptors. Over the past decade, … |
Feng Lu; Tong Jin; Canming Ye; Xiangyuan Lan; Yunpeng Liu; Chun Yuan; | code |
| 346 | A Signed Graph Approach to Understanding and Mitigating Oversmoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Structural Balanced Propagation (SBP), a plug-and-play method that assigns signed edges based on either labels or feature similarity to explicitly enhance structural balance in the constructed signed graphs. |
Jiaqi WANG; Xinyi Wu; James Cheng; Yifei Wang; | code |
| 347 | Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This overemphasis on future frames can result in erroneous vision-language associations, as actions may terminate early or include irrelevant moments in the end. To address this issue, we propose Action Temporal Coherence Learning (AcTOL) to learn ordered and continuous vision-language representations without rigid goal-based constraint. |
Zhizhen Zhang; Lei Zhu; Zhen Fang; Zi Huang; Yadan Luo; | code |
| 348 | Evolutionary Multi-View Classification Via Eliminating Individual Fitness Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This makes it difficult for multi-view model (MVM) to achieve optimal performance during convergence, which in turn leads to FE failing to accurately reflect individual performance rankings and ultimately triggering FEB. To address this issue, we propose an evolutionary multi-view classification via eliminating individual fitness bias (EFB-EMVC) method, which alleviates the FEB issue by introducing evolutionary navigators for each MVM, thereby providing more accurate individual ranking. |
Xinyan Liang; ShuaiLi; Qian Guo; Yuhua Qian; Bingbing Jiang; Tingjin Luo; Liang Du; | code |
| 349 | MobileUse: A Hierarchical Reflection-Driven GUI Agent for Autonomous Mobile Operation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, applying these models to real-world mobile scenarios remains a significant challenge due to the long-horizon task execution, difficulty in error recovery, and the cold-start problem in unfamiliar environments. To address these challenges, we propose MobileUse, a GUI agent designed for robust and adaptive mobile task execution. |
Ning Li; Xiangmou Qu; Jiamu Zhou; Jun Wang; Muning Wen; Kounianhua Du; Xingyu Lou; Qiuying Peng; Jun Wang; Weinan Zhang; | code |
| 350 | RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent multi-modal large language models (MLLMs) often struggle to generate personalized image captions, even when trained on high-quality captions. In this work, we observe that such limitations persist in existing post-training-based MLLM personalization methods. |
Yeongtak Oh; Dohyun Chung; Juhyeon Shin; Sangha Park; Johan Barthelemy; Jisoo Mok; Sungroh Yoon; | code |
| 351 | LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While progressive reasoning is crucial, the functional elements significantly increase computational demands during test-time inference. We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step based on its impact on answer prediction confidence. |
Yang Xiao; Jessie Wang; Ruifeng Yuan; Chunpu Xu; Kaishuai Xu; Wenjie Li; Pengfei Liu; | code |
| 352 | BADiff: Bandwidth Adaptive Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel framework to enable diffusion models to adapt their generation quality based on real-time network bandwidth constraints. |
Xi Zhang; Hanwei Zhu; Yan Zhong; Jiamang Wang; Weisi Lin; | code |
| 353 | Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook, defined by a learnable generation matrix. |
Xi Zhang; Xiaolin Wu; Jiamang Wang; Weisi Lin; | code |
| 354 | Guided Diffusion Sampling on Function Spaces with Applications to PDEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general framework for conditional sampling in PDE-based inverse problems, targeting the recovery of whole solutions from extremely sparse or noisy measurements. |
Jiachen Yao; Abbas Mammadov; Julius Berner; Gavin Kerrigan; Jong Chul Ye; Kamyar Azizzadenesheli; Anima Anandkumar; | code |
| 355 | Multi-Scale Finetuning for Encoder-based Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on encoder-based TSFMs, we propose Multiscale finetuning (MSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process. |
Zhongzheng Qiao; Chenghao Liu; Yiming Zhang; Ming Jin; Quang Pham; Qingsong Wen; Ponnuthurai Nagaratnam Suganthan; Xudong Jiang; Savitha Ramasamy; | code |
| 356 | Semi-supervised Graph Anomaly Detection Via Robust Homophily Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this assumption often does not hold well since normal nodes in a graph can exhibit diverse homophily in real-world GAD datasets. In this paper, we propose RHO, namely Robust Homophily Learning, to adaptively learn such homophily patterns. |
GuoguoAi; Hezhe Qiao; Hui Yan; Guansong Pang; | code |
| 357 | SongBloom: Coherent Song Generation Via Interleaved Autoregressive Sketching and Diffusion Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces SongBloom, a novel framework for full-length song generation that leverages an interleaved paradigm of autoregressive sketching and diffusion-based refinement. |
Chenyu Yang; Shuai Wang; Hangting Chen; Wei Tan; Jianwei Yu; Haizhou Li; | code |
| 358 | Overcoming Challenges of Long-Horizon Prediction in Driving World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a model using simple design choices, and without additional supervision or sensors, such as maps, depth, or multiple cameras. |
Arian Mousakhan; Sudhanshu Mittal; Silvio Galesso; Karim Farid; Thomas Brox; | code |
| 359 | One for All: Universal Topological Primitive Transfer for Graph Structure Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address three critical barriers – the absence of specialized benchmarks, aligned semantic representations, and systematic transfer methodologies – we present G²SN-Transfer, a unified framework comprising: (i) TopoGraph-Mapping that transforms non-Euclidean graphs into transferable sequences via topological primitive distribution dictionaries; (ii) G²SN, a dual-stream architecture learning text-topology aligned representations through contrastive alignment; and (iii) AdaCross-Transfer, a data-adaptive knowledge transfer mechanism leveraging cross-attention for both full-parameter and parameter-frozen scenarios.We construct STA-18, the first large-scale benchmark with aligned topological primitive-text pairs across 18 diverse graph datasets. |
Yide Qiu; Tong Zhang; Xing Cai; Hui Yan; Zhen Cui; | code |
| 360 | GraphTOP: Graph Topology-Oriented Prompting for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we conduct a pioneering investigation of graph prompting in terms of graph topology. |
Xingbo Fu; Zhenyu Lei; Zihan Chen; Binchi Zhang; Chuxu Zhang; Jundong Li; | code |
| 361 | Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation Within Slow Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Fast-in-Slow (FiS), a unified dual-system vision-language-action (VLA) model that embeds the System 1 execution module within the VLM-based System 2 by partially sharing parameters. |
Hao Chen; Jiaming Liu; Chenyang Gu; Zhuoyang Liu; Renrui Zhang; Xiaoqi Li; Xiao He; Yandong Guo; Chi-Wing Fu; Shanghang Zhang; Pheng-Ann Heng; | code |
| 362 | MonoLift: Learning 3D Manipulation Policies from Monocular RGB Via Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An intuitive alternative is to incorporate a pre-trained depth estimator; however, this often incurs substantial inference-time cost. To address this, we propose MonoLift, a tri-level knowledge distillation framework that transfers spatial, temporal, and action-level knowledge from a depth-guided teacher to a monocular RGB student. |
Ziru Wang; Mengmeng Wang; Guang Dai; Yongliu Long; Jingdong Wang; | code |
| 363 | LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce LangSplatV2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images, providing a 42 × speedup and a 47 × boost over LangSplat respectively, along with improved query accuracy. |
Wanhua Li; Yujie Zhao; Minghan Qin; Yang Liu; Yuanhao Cai; Chuang Gan; Hanspeter Pfister; | code |
| 364 | Sequential Multi-Agent Dynamic Algorithm Configuration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many complex algorithms have inherent inter-dependencies among multiple parameters (e.g., determining the operator type first and then the operator’s parameter), which are, however, not considered in previous approaches, thus leading to sub-optimal results. In this paper, we propose the sequential multi-agent DAC (Seq-MADAC) framework to address this issue by considering the inherent inter-dependencies of multiple parameters. |
Chen Lu; Ke Xue; Lei Yuan; Yao Wang; Yaoyuan Wang; Fu Sheng; Chao Qian; | code |
| 365 | CPO: Condition Preference Optimization for Controllable Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to uncertainty in generative models, it is difficult to ensure that win–lose image pairs differ only in controllability while keeping other factors, such as image quality, fixed. To address this, we propose performing preference learning over control conditions rather than generated images. |
Zonglin Lyu; Ming Li; Xinxin Liu; Chen Chen; | code |
| 366 | Tensor Product Attention Is All You Need Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, substantially shrinking the KV cache size at inference time. |
Yifan Zhang; Yifeng Liu; Huizhuo Yuan; Zhen Qin; Yang Yuan; Quanquan Gu; Andrew C Yao; | code |
| 367 | Merging on The Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a training-free projection-based continual merging method that processes models sequentially through orthogonal projections of weight matrices and adaptive scaling mechanisms. |
Anke Tang; Enneng Yang; Li Shen; Yong Luo; Han Hu; Lefei Zhang; Bo Du; Dacheng Tao; | code |
| 368 | On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing PEFT methods commonly treat points as orderless tokens, neglecting important local spatial structures and global geometric contexts in 3D modeling. To bridge this gap, we introduce the Geometric Encoding Mixer (GEM), a novel geometry-aware PEFT module specifically designed for 3D point cloud transformers. |
Liyao Tang; Zhe Chen; Dacheng Tao; | code |
| 369 | SSR: Enhancing Depth Perception in Vision-Language Models Via Rationale-Guided Spatial Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Spatial Sense and Reasoning method, dubbed SSR, a novel framework that transforms raw depth data into structured, interpretable textual rationales.To enable comprehensive evaluation, we introduce a new dataset named SSR-CoT, a million-scale visual-language reasoning dataset enriched with intermediate spatial reasoning annotations, and present SSRBench, a comprehensive multi-task benchmark. |
Yang Liu; Ming Ma; Xiaomin Yu; Pengxiang Ding; Han Zhao; Mingyang Sun; Siteng Huang; Donglin Wang; | code |
| 370 | TOMCAT: Test-time Comprehensive Knowledge Accumulation for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the challenge, we propose a novel approach that accumulates comprehensive knowledge in both textual and visual modalities from unsupervised data to update multimodal prototypes at test time. |
Xudong Yan; Songhe Feng; | code |
| 371 | Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper takes the capacity-constrained vehicle routing problem (CVRP) as an example to empirically analyze the NCO performance under different tightness degrees of the capacity constraint. |
Fu Luo; Yaoxin Wu; Zhi Zheng; Zhenkun Wang; | code |
| 372 | Instance-Level Composed Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an instance-level class definition. |
Bill Psomas; George Retsinas; Nikos Efthymiadis; Panagiotis Filntisis; Yannis Avrithis; Petros Maragos; Ondrej Chum; Giorgos Tolias; | code |
| 373 | Rectified Point Flow: Generic Point Cloud Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Rectified Point Flow, a unified parameterization that formulates pairwise point cloud registration and multi-part shape assembly as a single conditional generative problem. |
TAO SUN; Liyuan Zhu; Shengyu Huang; Shuran Song; Iro Armeni; | code |
| 374 | SparseMVC: Probing Cross-view Sparsity Variations for Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cross-view sparsity variations lead to encoding discrepancies, heightening sample-level semantic heterogeneity and making view-level dynamic weighting inappropriate. To tackle these challenges, we propose Adaptive Sparse Autoencoders for Multi-View Clustering (SparseMVC), a framework with three key modules. |
Ruimeng Liu; Xin Zou; Chang Tang; Xiao Zheng; Xingchen Hu; Kun Sun; Xinwang Liu; | code |
| 375 | ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches aiming to improve faithfulness primarily focus on enhancing the utilization of external context, but often overlook the persistent influence of internal parametric knowledge during generation. In this work, we investigate the internal mechanisms behind unfaithful generation and identify a subset of mid-to-deep feed-forward networks (FFNs) that are disproportionately activated in such cases. |
Pengcheng Huang; Zhenghao Liu; Yukun Yan; Haiyan Zhao; Xiaoyuan Yi; Hao Chen; Zhiyuan Liu; Maosong Sun; Tong Xiao; Ge Yu; Chenyan Xiong; | code |
| 376 | Hallucination at A Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we propose a controlled data generation pipeline that produces minimally edited image pairs with semantically aligned captions.Using this pipeline, we construct the Micro Edit Dataset (MED), containing over 50K image-text pairs spanning 11 fine-grained edit categories, including attribute, count, position, and object presence changes. |
Tianyi Bai; Yuxuan Fan; Qiu Jiantao; Fupeng Sun; Jiayi Song; Junlin Han; Zichen Liu; Conghui He; Wentao Zhang; Binhang Yuan; | code |
| 377 | LiveStar: Live Streaming Assistant for Real-World Online Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these limitations, we introduce LiveStar, a pioneering live streaming assistant that achieves always-on proactive responses through adaptive streaming decoding.We also construct an OmniStar dataset, a comprehensive dataset for training and benchmarking that encompasses 15 diverse real-world scenarios and 5 evaluation tasks for online video understanding. |
Zhenyu Yang; Kairui Zhang; Yuhang Hu; Bing Wang; Shengsheng Qian; Bin Wen; Fan Yang; Tingting Gao; Weiming Dong; Changsheng Xu; | code |
| 378 | Failure Prediction at Runtime for Generative Robot Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. |
Ralf Römer; Adrian Kobras; Luca Worbis; Angela P. Schoellig; | code |
| 379 | RankSEG-RMA: An Efficient Segmentation Algorithm Via Reciprocal Moment Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, RankSEG is only applicable to overlapping segmentation settings, where multiple classes can occupy the same pixel, which contrasts with standard benchmarks that typically assume non-overlapping segmentation. In this paper, we overcome these two drawbacks via a \textit{reciprocal moment approximation} (RMA) of RankSEG with the following contributions: (i) we improve RankSEG using RMA, namely RankSEG-RMA, reduces the complexity of both algorithms to $\mathcal{O}(d)$ while maintaining comparable performance; (ii) inspired by RMA, we develop a pixel-wise score function that allows efficient implementation for non-overlapping segmentation settings. |
Zixun Wang; Ben Dai; | code |
| 380 | MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we introduce Multi-query Scene Text retrieval with Attention Recycling (MSTAR), a box-free approach for scene text retrieval.Furthermore, we build the Multi-Query Text Retrieval (MQTR) dataset, the first benchmark designed to evaluate the multi-query scene text retrieval capability of models, comprising four query types and $16k$ images. |
Liang Yin; Xudong Xie; Zhang Li; Xiang Bai; Yuliang Liu; | code |
| 381 | Train on Pins and Test on Obstacles for Rectilinear Steiner Minimum Tree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods either suffer from excessive exploration of the search space or rely on heuristic combinations that compromise effectiveness and efficiency, and this limitation becomes notably exacerbated when extended to the obstacle-avoiding RSMT (OARSMT). To address this, we propose OAREST, a reinforcement learning-based framework for constructing an Obstacle-Avoiding Rectilinear Edge Sequence (RES) Tree. |
Xingbo Du; Ruizhe Zhong; Junchi Yan; | code |
| 382 | $\mathcal{X}^2$-DFD: A Framework for E$\mathcal{X}$plainable and E$\mathcal{X}$tendable Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes **$\mathcal{X}^2$-DFD**, an **e$\mathcal{X}$plainable** and **e$\mathcal{X}$tendable** framework based on multimodal large-language models (MLLMs) for deepfake detection, consisting of three key stages. |
Yize Chen; Zhiyuan Yan; Guangliang Cheng; KANGRAN ZHAO; Siwei Lyu; Baoyuan Wu; | code |
| 383 | RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we propose **R**etrieval-**A**ugmented **D**iagnosis (RAD), a novel framework that explicitly injects external knowledge into multimodal models directly on downstream tasks.Moreover, recognizing the lack of quantitative evaluation of interpretability for multimodal diagnostic models, we introduce a set of criteria to assess the interpretability from both image and text perspectives. |
Haolin Li; Tianjie Dai; Zhe Chen; Siyuan Du; Jiangchao Yao; Ya Zhang; Yanfeng Wang; | code |
| 384 | A Fair Federated Learning Method for Handling Client Participation Probability Inconsistencies in Heterogeneous Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing FL research has primarily focused on optimizing learning performance based on the assumption of uniform client participation, with few studies delving into performance fairness under inconsistent client participation, particularly in model-heterogeneous FL environments. In view of this challenge, we propose PHP-FL, a novel model-heterogeneous FL method that explicitly addresses scenarios with varying client participation probabilities to enhance both model accuracy and performance fairness. |
Siyuan Wu; Yongzhe Jia; Haolong Xiang; Xiaolong Xu; Xuyun Zhang; Lianyong Qi; Wanchun Dou; | code |
| 385 | Unified Transferability Metrics for Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce TEMPLATE, a transferability estimation framework specifically tailored for versatile time series analysis, comprising three complementary metrics: (1) Dependency Learning Score quantifies a model’s capacity to capture temporal dependencies. |
Weiyang Zhang; Xinyang Chen; Xiucheng Li; Kehai Chen; Weili Guan; Liqiang Nie; | code |
| 386 | $\text{S}^2$Q-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, we observe that the joint modeling of spatial and temporal information in video diffusion models (V-DMs) leads to extremely long token sequences, which introduces high calibration variance and learning challenges. To address these issues, we propose **$S^2$Q-VDiT**, a post-training quantization framework for V-DMs that leverages **S**alient data and **S**parse token distillation. |
Weilun Feng; Haotong Qin; Chuanguang Yang; Xiangqi Li; Han Yang; Yuqi Li; Zhulin An; Libo Huang; Michele Magno; Yongjun Xu; | code |
| 387 | Model Editing for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that ViT predictions are more strongly influenced by the multi-head self-attention (MSA) modules than by the MLPs. |
Xinyi Huang; Kangfei Zhao; Long-Kai Huang; | code |
| 388 | AdaMSS: Adaptive Multi-Subspace Approach for Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AdaMSS, an adaptive multi-subspace approach for parameter-efficient fine-tuning of large models. |
Jingjing Zheng; Wanglong Lu; Yiming Dong; Chaojie Ji; Yankai Cao; Zhouchen Lin; | code |
| 389 | OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The interactive nature of egocentric videos is critical for applications like embodied intelligence, but introduces complex visual contexts that conventional models struggle to capture. To bridge this gap, we introduce OpenMMEgo with innovations across three dimensions: data, model, and training strategy. |
Hao Luo; Zihao Yue; Wanpeng Zhang; Yicheng Feng; Sipeng Zheng; Deheng Ye; Zongqing Lu; | code |
| 390 | AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models Through Contrastive Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Audio-Visual Contrastive Decoding (AVCD)—a novel, training-free decoding framework designed to model trimodal interactions and suppress modality-induced hallucinations in AV-LLMs. |
Chaeyoung Jung; Youngjoon Jang; Joon Son Chung; | code |
| 391 | PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation. Based on this insight, we propose PAID (Pairwise Angular Invariant Decomposition), a prior-driven CTTA method that decomposes weight into magnitude and direction, and introduces a learnable orthogonal matrix via Householder reflections to globally rotate direction while preserving the pairwise angular structure. |
Kunyu Wang; Xueyang Fu; Yuanfei Bao; Chengjie Ge; Chengzhi Cao; Wei Zhai; Zheng-Jun Zha; | code |
| 392 | Improving Model Representation and Reducing KV Cache Via Skip Connections with First Value Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SkipV1Former, a Transformer variant that uses skip connections from the first layer’s Value heads to strengthen model representation and reduce KV cache. |
Zhoutong Wu; Yuan Zhang; Yiming Dong; Chenheng Zhang; Cong Fang; Kun Yuan; Zhouchen Lin; | code |
| 393 | AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite their strong visual understanding, current MLMMs still face two major challenges: (1) insufficient region-level understanding and interaction, and (2) limited accuracy and interpretability due to single-step prediction. In this paper, we address these challenges by empowering MLMMs with anatomy-centric reasoning capabilities to enhance their interactivity and explainability. |
Qingqiu Li; Zihang Cui; Seongsu Bae; Jilan Xu; Runtian Yuan; Yuejie Zhang; Rui Feng; Quanli Shen; Xiaobo Zhang; Shang Gao; Junjun He; Shujun Wang; | code |
| 394 | Open-World Drone Active Tracking with Goal-Centered Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by curriculum learning, we introduce a Curriculum-Based Training strategy that progressively enhances the tracking performance in complex environments. |
Haowei Sun; Jinwu Hu; Zhirui Zhang; Haoyuan Tian; Xinze Xie; Yufeng Wang; Xiaohua Xie; Yun Lin; Zhuliang Yu; Mingkui Tan; | code |
| 395 | Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a straightforward method called \textit{\textbf{R}epresentation \textbf{E}ntanglement for \textbf{G}eneration} (\textbf{REG}), which entangles low-level image latents with a single high-level class token from pretrained foundation models for denoising. |
Ge Wu; Shen Zhang; Ruijing Shi; Shanghua Gao; Zhenyuan Chen; Lei Wang; Zhaowei Chen; Hongcheng Gao; Yao Tang; jian Yang; Ming-Ming Cheng; Xiang Li; | code |
| 396 | Towards Robust Zero-Shot Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused by out-of-distribution (OOD) actions during offline learning sometimes lead to biased representations, ultimately resulting in suboptimal performance. To address these issues, we propose Behavior-REgularizEd Zero-shot RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality. |
Kexin ZHENG; Lauriane Teyssier; Yinan Zheng; Yu Luo; Xianyuan Zhan; | code |
| 397 | VideoREPA: Learning Physics for Video Generation Through Relational Alignment with Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel framework called {VideoREPA}, which distills physics understanding capability from video understanding foundation models into T2V models by aligning token-level relations. |
Xiangdong Zhang; Jiaqi Liao; Shaofeng Zhang; Fanqing Meng; Xiangpeng Wan; Junchi Yan; Yu Cheng; | code |
| 398 | Continual Knowledge Adaptation for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although Continual Reinforcement Learning facilitates learning across multiple tasks, existing methods often suffer from catastrophic forgetting and inefficient knowledge utilization. To address these challenges, we propose Continual Knowledge Adaptation for Reinforcement Learning (CKA-RL), which enables the accumulation and effective utilization of historical knowledge. |
Jinwu Hu; ZiHao Lian; Zhiquan Wen; ChenghaoLi; Guohao Chen; Xutao Wen; Bin Xiao; Mingkui Tan; | code |
| 399 | DIPO: Dual-State Images Controlled Articulated Object Generation Powered By Diverse Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters. |
Ruiqi Wu; Xinjie wang; Liu.Liu; Chun-Le Guo; Jiaxiong Qiu; Chongyi Li; Lichao Huang; Zhizhong Su; Ming-Ming Cheng; | code |
| 400 | MDNS: Masked Diffusion Neural Sampler Via Stochastic Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $\pi\propto\mathrm{e}^{-U}$ is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardinality and the distribution is multi-modal, we propose **M**asked **D**iffusion **N**eural **S**ampler (**MDNS**), a novel framework for training discrete neural samplers by aligning two path measures through a family of learning objectives, theoretically grounded in the stochastic optimal control of the continuous-time Markov chains. |
Yuchen Zhu; Wei Guo; Jaemoo Choi; Guan-Horng Liu; Yongxin Chen; Molei Tao; | code |
| 401 | Zebra-Llama: Towards Extremely Efficient Hybrid Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a practical and scalable alternative: composing efficient hybrid language models from existing pre-trained models. |
Mingyu Yang; Mehdi Rezagholizadeh; Guihong Li; Vikram Appia; Emad Barsoum; | code |
| 402 | Less Is More: An Attention-free Sequence Prediction Modeling for Offline Embodied Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a thorough analysis on the representative Decision Transformer (DT) model using an entropy analysis and identify the inconsistencies in state-action-reward ($\langle s, a, R \rangle$) distributions causing attention “dispersal. To address this, we propose a hierarchical framework that decomposes sequence modeling into intra-step relational modeling—handled by a Token Merger that fuses each $\langle s, a, R \rangle$ triplet—and inter-step modeling—handled by a Token Mixer across timesteps. |
Wei Huang; Jianshu Zhang; Leiyu Wang; Heyue Li; Luoyi Fan; Yichen Zhu; Nanyang Ye; Qinying Gu; | code |
| 403 | A Dynamic Learning Strategy for Dempster-Shafer Theory with Applications in Classification and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current methods often neglect the inherent a priori information within data during modelling, and imbalanced data lead to insufficient attention to key information in the model. To address these limitations, this paper presents a dynamic learning strategy based on nonuniform splitting mechanism and Hilbert space mapping. |
Linlin Fan; Xingyu Liu; Mingliang Zhou; Xuekai Wei; Weizhi Xian; Jielu Yan; Weijia Jia; | code |
| 404 | Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MGAudio, a novel flow-based framework for open-domain video-to-audio generation, which introduces model-guided dual-role alignment as a central design principle. |
Kang Zhang; Trung X. Pham; Suyeon Lee; Axi Niu; Arda Senocak; Joon Son Chung; | code |
| 405 | A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This new lens shows that SPO uses biased temporal weighting, giving too little weight to later generative steps, and unlike likelihood centric views it reveals substantial noise in the gradient estimates. Leveraging these insights, our GradSPO algorithm introduces a simplified loss and a targeted, variance-informed noise reduction strategy, enhancing training stability. |
Joshua Tian Jin Tee; Hee Suk Yoon; Abu Hanif Muhammad Syarubany; Eunseop Yoon; Chang D. Yoo; | code |
| 406 | Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since many existing methods do not explicitly enforce optimality conditions, their solutions often struggle to satisfy the principle of least action and meet challenges to converge in a stable and reliable way. To address these issues, we propose Variational RUOT (Var-RUOT), a new framework to solve the RUOT problem. |
Yuhao Sun; Zhenyi Zhang; Zihan Wang; Tiejun Li; Peijie Zhou; | code |
| 407 | How Particle System Theory Enhances Hypergraph Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel hypergraph message passing framework inspired by interacting particle systems, where hyperedges act as fields inducing shared node dynamics. |
Yixuan Ma; Kai Yi; Pietro Lio; Shi Jin; Yu Guang Wang; | code |
| 408 | Fast and Fluent Diffusion Language Models Via Convolutional Decoding and Rejective Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, semi-AR eliminates the main advantages of diffusion models. To overcome this, we propose Convolutional decoding (\textit{Conv}), a normalization-based method that narrows the decoding window without hard segmentation, leading to better fluency and flexibility. |
Yeongbin Seo; Dongha Lee; Jaehyung Kim; Jinyoung Yeo; | code |
| 409 | Who You Are Matters: Bridging Interests and Social Roles Via LLM-Enhanced Logic Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap, we introduce the user role identification task and the behavioral logic modeling task that aim to explicitly model user roles and learn the logical relations between item topics and user social roles. We show that it is possible to explicitly solve these tasks through an efficient integration framework of Large Language Model (LLM) and recommendation systems, for which we propose TagCF. |
Qing Yu; Xiaobei Wang; Shuchang Liu; cheng.feng; Xiaoyu Yang; Xueliang Wang; Chang Meng; Shanshan Wu; HailanYang; Bin Wen; Huihui Xiao; Xiang Li; Fan Yang; Xiaoqiang Feng; Lantao Hu; Han Li; Kun Gai; Lixin Zou; | code |
| 410 | Towards Reliable LLM-based Robots Planning Via Combined Uncertainty Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately. |
Shiyuan Yin; Chenjia Bai; Zhang Zihao; Junwei Jin; Xinxin Zhang; Chi Zhang; Xuelong Li; | code |
| 411 | Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through empirical studies, we observe that existing methods often overlook the joint impact of pruning on both the current layer’s output (local) and the outputs of subsequent layers (global), leading to suboptimal pruning decisions. To address this challenge, we propose Balanced Token Pruning (BTP), a plug-and-play method for pruning vision tokens. |
kaiyuan Li; Xiaoyue Chen; Chen Gao; Yong Li; Xinlei Chen; | code |
| 412 | DCA: Graph-Guided Deep Embedding Clustering for Brain Atlases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Deep Cluster Atlas (DCA), a graph-guided deep embedding clustering framework for generating individualized, voxel-wise brain parcellations. |
Mo wang; Kaining Peng; Jingsheng Tang; Hongkai Wen; Quanying Liu; | code |
| 413 | Multi-dataset Joint Pre-training of Emotional EEG Enables Generalizable Affective Computing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to develop a task-specific multi-dataset joint pre-training framework for cross-dataset emotion recognition, tackling problems of large inter-dataset distribution shifts, inconsistent emotion category definitions, and substantial inter-subject variability. |
Qingzhu Zhang; Jiani Zhong; Li ZongSheng; Xinke Shen; Quanying Liu; | code |
| 414 | Depth-Supervised Fusion Network for Seamless-Free Image Stitching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The significant variations in object depth often lead to large parallax, resulting in ghosting and misalignment in the stitched results. To address this, we propose a depth-consistency-constrained seamless-free image stitching method. |
Zhiying Jiang; Ruhao Yan; Zengxi Zhang; Bowei Zhang; Jinyuan Liu; | code |
| 415 | 3DOT: Texture Transfer for 3DGS Objects from A Single Reference Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, 2D editing typically involves frame-by-frame manipulation, often resulting in inconsistencies across views, while text-driven 3D editing struggles to preserve texture characteristics from reference images. To tackle these challenges, we introduce \textbf{3DOT}, a \textbf{3D} Gaussian Splatting \textbf{O}bject \textbf{T}exture Transfer method based on a single reference image, integrating: 1) progressive generation, 2) view-consistency gradient guidance, and 3) prompt-tuned gradient guidance. |
Xiao Cao; Beibei Lin; Bo Wang; Zhiyong Huang; Robby T. Tan; | code |
| 416 | TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce TRIDENT, a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations. |
Feng Jiang; Mangal Prakash; Hehuan Ma; Jianyuan Deng; Yuzhi Guo; Amina Mollaysa; Tommaso Mansi; Rui Liao; Junzhou Huang; | code |
| 417 | Efficient Speech Language Modeling Via Energy Distance in Continuous Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \emph{SLED}, an alternative approach to speech language modeling by encoding speech waveforms into sequences of continuous latent representations and modeling them autoregressively using an energy distance objective. |
Zhengrui Ma; Yang Feng; Chenze Shao; Fandong Meng; Jie Zhou; Min zhang; | code |
| 418 | Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose learning beneficial noise for CIL guided by information theory and propose Mixture of Noise (MiN), aiming to mitigate the degradation of backbone generalization from adapting new tasks. |
Kai Jiang; Zhengyan Shi; Dell Zhang; Hongyuan Zhang; Xuelong Li; | code |
| 419 | StateSpaceDiffuser: Bringing Long Context to Diffusion World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation, common in state-of-the-art world models, which are diffusion-based, stems from the lack of a lasting environment state. To address this problem, we introduce StateSpaceDiffuser, where a diffusion model is enabled to perform long-context tasks by integrating features from a state-space model, representing the entire interaction history. |
Nedko Savov; Naser Kazemi; Deheng Zhang; Danda Pani Paudel; Xi Wang; Luc Van Gool; | code |
| 420 | ForceVLA: Enhancing VLA Models with A Force-aware MoE for Contact-rich Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they struggle with contact-rich tasks that require fine-grained control involving force, especially under visual occlusion or dynamic uncertainty. To address these limitations, we propose \textbf{ForceVLA}, a novel end-to-end manipulation framework that treats external force sensing as a first-class modality within VLA systems. |
Jiawen Yu; Hairuo Liu; Qiaojun Yu; Jieji Ren; Ce Hao; Haitong Ding; Guangyu Huang; Guofan Huang; Yan Song; Panpan Cai; Wenqiang Zhang; Cewu Lu; | code |
| 421 | Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This integration is essential in real-world scenarios since intercellular communications are fundamental life processes and can influence cell state-transition dynamics. To address this challenge, we formulate the Unbalanced Mean-Field Schrödinger Bridge (UMFSB) framework to model unbalanced stochastic interaction dynamics from snapshot data. |
Zhenyi Zhang; Zihan Wang; Yuhao Sun; Tiejun Li; Peijie Zhou; | code |
| 422 | Missing Data Imputation By Reducing Mutual Information with Rectified Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and the corresponding missingness mask. |
Jiahao Yu; Qizhen Ying; Leyang Wang; Ziyue Jiang; Song Liu; | code |
| 423 | CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD Via Reinforcement Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: SFT shows promise but often devolves into pattern memorization, resulting in poor out-of-distribution (OOD) performance on complex reasoning tasks. To tackle these limitations, we introduce CReFT-CAD, a two-stage fine-tuning paradigm: first, a curriculum-driven reinforcement learning stage with difficulty-aware rewards to steadily build reasoning abilities; second, supervised post-tuning to refine instruction following and semantic extraction. |
Ke Niu; zhuofan chen; Haiyang Yu; Yuwen Chen; Teng Fu; Mengyang Zhao; Bin Li; Xiangyang Xue; | code |
| 424 | DINGO: Constrained Inference for Diffusion LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This parallelism makes traditional constrained decoding algorithms, designed to enforce constraints with sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. |
Tarun Suresh; Debangshu Banerjee; Shubham Ugare; Sasa Misailovic; Gagandeep Singh; | code |
| 425 | Event-Driven Dynamic Scene Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose EventDC, the first event-driven depth completion framework. |
Zhiqiang Yan; Jianhao Jiao; Zhengxue Wang; Gim Hee Lee; | code |
| 426 | HPSERec: A Hierarchical Partitioning and Stepwise Enhancement Framework for Long-tailed Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, due to a substantial representational gap exists between head and tail data, head-to-tail enhancement strategies are susceptible to negative transfer, often leading to a decline in overall model performance. To address these issues, we propose a hierarchical partitioning and stepwise enhancement framework, called HPSERec, for long-tailed sequential recommendation. |
Xiaolong Xu; Xudong Zhao; Haolong Xiang; Xuyun Zhang; Wei Shen; Hongsheng Hu; Lianyong Qi; | code |
| 427 | Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, the inner optimization simulates the KD process by optimizing a surrogate student model, while the outer optimization leverages outputs from this surrogate to optimize the teacher model for implanting the conditional backdoor. Our SCAR addresses this complex optimization utilizing an implicit differentiation algorithm with a pre-optimized trigger injection function. |
Yukun Chen; Boheng Li; Yu Yuan; Leyi Qi; Yiming Li; Tianwei Zhang; Zhan Qin; Kui Ren; | code |
| 428 | GRIFFIN: Effective Token Alignment for Faster Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often struggle with token misalignment between the training and decoding phases, limiting their performance. To address this, we propose GRIFFIN, a novel framework that incorporates a token-alignable training strategy and a token-alignable draft model to mitigate misalignment. |
Shijing Hu; Jingyang Li; Xingyu Xie; Zhihui Lu; Kim-chuan Toh; Pan Zhou; | code |
| 429 | SALoM: Structure Aware Temporal Graph Networks with Long-Short Memory Updater Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While memory-based methods are commonly used and excel at capturing short-range temporal correlations, they struggle with modeling long-range dependencies, harmonizing long-range and short-range correlations, and integrating structural information effectively. To address these challenges, we present SALoM: Structure Aware Temporal Graph Networks with Long-Short Memory Updater. |
Hanwen Liu; Longjiao Zhang; Rui Wang; Tongya Zheng; Sai Wu; Chang Yao; Mingli Song; | code |
| 430 | OmniTry: Virtual Try-On Anything Without Masks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents OmniTry, a unified framework that extends VTON beyond garment to encompass any wearable objects, e.g., jewelries and accessories, with mask-free setting for more practical application. |
Yutong Feng; Linlin Zhang; Hengyuan Cao; Yiming Chen; Xiaoduan Feng; Jian Cao; Yuxiong Wu; Bin Wang; | code |
| 431 | Mixture-of-Experts Meets In-Context Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To enhance task-wise routing, we introduce a contrastive learning method that maximizes the mutual information between the task and its router representation, enabling more precise capture of task-relevant information. |
Wenhao Wu; Fuhong Liu; Haoru Li; Zican Hu; Daoyi Dong; Chunlin Chen; Zhi Wang; | code |
| 432 | NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we attempt to analyze the training dynamics in MTL by leveraging Neural Tangent Kernel (NTK) theory and propose a new MTL method, NTKMTL. |
Xiaohan Qin; Xiaoxing Wang; Ning Liao; Junchi Yan; | code |
| 433 | Non-stationary Equivariant Graph Neural Networks for Physical Dynamics Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To model the non-stationary physical dynamics while preserving the symmetric inductive bias, we introduce a Non-Stationary Equivariant Graph Neural Network (NS-EGNN) to capture the non-stationarity in physical dynamics while preserving the symmetric property of the model. |
Chaohao Yuan; Maoji Wen; Ercan Engin KURUOGLU; Yang Liu; Jia Li; Tingyang Xu; Deli Zhao; Hong Cheng; Yu Rong; | code |
| 434 | Advancing Machine-Generated Text Detection from An Easy to Hard Supervision Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. |
Chenwang Wu; Yiu-ming Cheung; Bo Han; Defu Lian; | code |
| 435 | Test-Time Adaptive Object Detection with Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first foundation model-powered test-time adaptive object detection method that eliminates the need for source data entirely and overcomes traditional closed-set limitations. |
Yingjie Gao; Yanan Zhang; Zhi Cai; Di Huang; | code |
| 436 | HetSyn: Versatile Timescale Integration in Spiking Neural Networks Via Heterogeneous Synapses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing studies overlook a fundamental property widely observed in biological neurons—synaptic heterogeneity, which plays a crucial role in temporal processing and cognitive capabilities. To bridge this gap, we introduce HetSyn, a generalized framework that models synaptic heterogeneity with synapse-specific time constants. |
Zhichao Deng; Zhikun Liu; Junxue Wang; Shengqian Chen; Xiang Wei; Qiang Yu; | code |
| 437 | Learning to Control Free-Form Soft Swimmers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional approaches often rely on morphology-dependent heuristics and simplified fluid models, which constrain exploration and preclude advanced strategies like vortex exploitation. To address this, we propose an automated framework that combines a unified, reduced-mode control space with a high-fidelity GPU-accelerated simulator. |
Changyu Hu; Yanke Qu; Qiuan Yang; Xiaoyu Xiong; Kui Wu; Wei Li; Tao Du; | code |
| 438 | Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Tortoise and Hare Guidance (THG), a training-free strategy that accelerates diffusion sampling while maintaining high-fidelity generation. |
Yunghee Lee; Byeonghyun Pak; Junwha Hong; Hoseong Kim; | code |
| 439 | RoomEditor: High-Fidelity Furniture Synthesis with Parameter-Sharing U-Net Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its significant potential in home design applications, this field remains underexplored due to two major challenges: the absence of publicly available and ready-to-use benchmarks hinders reproducible research, and existing image composition methods fail to meet the stringent fidelity requirements for realistic furniture placement. To address these issues, we introduce RoomBench, a ready-to-use benchmark dataset for virtual furniture synthesis, comprising 7,298 training pairs and 895 testing samples across 27 furniture categories. |
Zhenyi Lin; Xiaofan Ming; Qilong Wang; Dongwei Ren; Wangmeng Zuo; Qinghua Hu; | code |
| 440 | ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ChA-MAEViT, an MAE-based method that enhances feature learning across MCI channels via four key strategies: (1) dynamic channel-patch masking, which compels the model to reconstruct missing channels in addition to masked patches, thereby enhancing cross-channel dependencies and improving robustness to varying channel configurations; (2) memory tokens, which serve as long-term memory aids to promote information sharing across channels, addressing the challenges of reconstructing structurally diverse channels; (3) hybrid token fusion module, which merges fine-grained patch tokens with a global class token to capture richer representations; and (4) Channel-Aware Decoder, a lightweight decoder utilizes channel tokens to effectively reconstruct image patches. |
Chau Pham; Juan C. Caicedo; Bryan A. Plummer; | code |
| 441 | Spiking Neural Networks Need High-Frequency Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, their performance still lags behind that of artificial neural networks, often assumed to result from information loss caused by sparse and binary activations. In this work, we challenge this long-standing assumption and reveal a previously overlooked frequency bias: **spiking neurons inherently suppress high-frequency components and preferentially propagate low-frequency information. |
Yuetong Fang; Deming Zhou; Ziqing Wang; Hongwei Ren; ZeCui Zeng; Lusong Li; shibo zhou; Renjing Xu; | code |
| 442 | GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose GTR-Loc, a novel text-assisted LiDAR localization framework that effectively generates and integrates geospatial text regularization to enhance localization accuracy. |
Shangshu Yu; Wen Li; Xiaotian Sun; Zhimin Yuan; Xin Wang; Sijie Wang; Rui She; Cheng Wang; | code |
| 443 | Recognition Through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the modeling side, current approaches predominantly rely on supervised fine-tuning, which yields only marginal improvements in reasoning capabilities. To address these challenges, we propose a novel pipeline that constructs a reasoning-oriented geo-localization dataset, $\textit{MP16-Reason}$, using diverse social media images. |
Ling Li; Yao Zhou; Yuxuan Liang; Fugee Tsung; Jiaheng Wei; | code |
| 444 | Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches still fail to handle the *visual complexity of interaction*—including (1) *intra-class visual diversity*, where instances of the same verb appear in diverse poses and contexts, and (2) *inter-class visual entanglement*, where distinct verbs yield visually similar patterns. To address these challenges, we propose **VDRP**, a framework for *Visual Diversity and Region-aware Prompt learning*. |
Chanhyeong Yang; Taehoon song; Jihwan Park; Hyunwoo J. Kim; | code |
| 445 | FedQS: Optimizing Gradient and Model Aggregation for Semi-Asynchronous Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While gradient aggregation achieves faster convergence and higher accuracy, it suffers from pronounced fluctuations, whereas model aggregation offers greater stability but slower convergence and suboptimal accuracy. This paper presents FedQS, the first framework to theoretically analyze and address these disparities in SAFL. |
Yunbo Li; Jiaping Gui; Zhihang Deng; Fanchao Meng; Yue Wu; | code |
| 446 | OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Online Multimodal Conversational Response Generation (OMCRG), a novel task designed to produce synchronized verbal and non-verbal listener feedback online, based on the speaker’s multimodal inputs. |
Cheng Luo; Jianghui Wang; Bing Li; Siyang Song; Bernard Ghanem; | code |
| 447 | A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the framework, we analyze two dominant paradigms: self-consistency and perplexity, and reveal key limitations: self-consistency suffers from high estimation error while perplexity exhibits substantial modeling error and possible degradation of the estimation error convergence. To address these limitations, we introduce RPC, a hybrid method that leverages our theoretical insights through two key components: *Perplexity Consistency* and *Reasoning Pruning*. |
Zhi Zhou; Yuhao Tan; Zenan Li; Yuan Yao; Lan-Zhe Guo; Yu-Feng Li; Xiaoxing Ma; | code |
| 448 | KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that the core challenge lies in the offset variance of KV-caches across agents. To address this, we propose **KVCOMM**, a training-free framework that enables efficient prefilling in multi-agent inference by reusing KV-caches and aligning cache offsets of overlapping contexts under diverse prefix contexts. |
Hancheng Ye; Zhengqi Gao; Mingyuan Ma; Qinsi Wang; Yuzhe Fu; Ming-Yu Chung; Yueqian Lin; Zhijian Liu; Jianyi Zhang; Danyang Zhuo; Yiran Chen; | code |
| 449 | Compositional Neural Network Verification Via Assume-Guarantee Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, scaling verification to large networks is challenging, at least in part due to the significant memory requirements of verification algorithms. In this paper, we propose an assume-guarantee compositional framework, CoVeNN, that is parameterized by an underlying verifier to generate a sequence of verification sub-problems to address this challenge. |
Hai Duong; David Shriver; ThanhVu Nguyen; Matthew B. Dwyer; | code |
| 450 | Generating and Checking DNN Verification Proofs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify commonalities in algorithmic approaches taken by NNV tools to define a verifier independent proof format—activation pattern tree proofs (APTP)—and design an algorithm for checking those proofs that is proven correct and optimized to enable scalable checking. |
Hai Duong; ThanhVu Nguyen; Matthew B. Dwyer; | code |
| 451 | AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce AlphaDecay, a simple yet effective method that adaptively assigns different weight decay strengths to each module of an LLM. |
Di He; Songjun Tu; Ajay Jaiswal; Li Shen; Ganzhao Yuan; Shiwei Liu; Lu Yin; | code |
| 452 | PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel quantization approach PolarQuant, which provides a new perspective for key cache quantization and efficiently addresses the outlier dilemma. |
Songhao Wu; Ang Lv; xiao feng; Yufei zhang; Xun Zhang; Guojun Yin; Wei Lin; Rui Yan; | code |
| 453 | TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches for identifying such systematic failure modes in trained models (i) are typically designed for non-temporal settings and (ii) are challenging to evaluate in temporal settings due to the lack of quantitative evaluation frameworks. In this work, we address these challenges by introducing TRoVe, an automated approach for discovering error-inducing static feature biases learned by temporal VLMs. |
Maya Varma; Jean-Benoit Delbrouck; Sophie Ostmeier; Akshay S Chaudhari; Curtis Langlotz; | code |
| 454 | Making Classic GNNs Strong Baselines Across Varying Homophily: A Smoothness–Generalization Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This dilemma hinders learning in high-order homophilic neighborhoods and all heterophilic ones, where generalization is critical due to complex neighborhood class distributions that are sensitive to shifts induced by noise or sparsity. To address this, we introduce the Inceptive Graph Neural Network (IGNN) built on three simple yet effective design principles, which alleviate the dilemma by enabling distinct hop-wise generalization alongside improved overall generalization with adaptive smoothness. |
Ming Gu; Zhuonan Zheng; Sheng Zhou; Meihan Liu; Jiawei Chen; Qiaoyu Tan; Liangcheng Li; Jiajun Bu; | code |
| 455 | Anchor-based Maximum Discrepancy for Relative Similarity Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This challenge makes relative similarity testing ill-defined when we want to select a good kernel after the hypothesis is specified. In this paper, we cope with this challenge via learning a proper hypothesis and a kernel simultaneously, instead of learning a kernel after manually specifying the hypothesis. |
Zhijian Zhou; Liuhua Peng; Xunye Tian; Feng Liu; | code |
| 456 | Self-Evolving Pseudo-Rehearsal for Catastrophic Forgetting with Task Similarity in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present $\textbf{\textit{Self-Evolving Pseudo-Rehearsal for Catastrophic Forgetting with Task Similarity}}(\textbf{SERS})$, a lightweight framework that 1) decouples pseudo-input synthesis from label creation, using semantic masking and template guidance to produce diverse, task-relevant prompts without extra modules; 2) applies label self-evolution, blending base-model priors with fine-tuned outputs to prevent over-specialization; and 3) introduces a dynamic regularizer driven by the Wasserstein distance between task distributions, automatically relaxing or strengthening constraints in proportion to task similarity. |
Jun Wang; Liang Ding; Shuai Wang; Hongyu Li; Yong Luo; Huangxuan Zhao; Han Hu; Bo Du; | code |
| 457 | Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. |
Wentao Deng; Jiahuan Pei; Zhiwei Xu; Zhaochun Ren; Zhumin Chen; Pengjie Ren; | code |
| 458 | SpecReason: Fast and Accurate Inference-Time Compute Via Speculative Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this improved accuracy comes at the cost of high inference latency due to the length of generated reasoning sequences and the autoregressive nature of decoding. Our key insight in tackling these overheads is that LRM inference, and the reasoning that it embeds, is highly tolerant of approximations: complex tasks are typically broken down into simpler steps, each of which brings utility based on the semantic insight it provides for downstream steps rather than the exact tokens it generates. |
Rui Pan; Yinwei Dai; Zhihao Zhang; Gabriele Oliaro; Zhihao Jia; Ravi Netravali; | code |
| 459 | Open-Vocabulary Part Segmentation Via Progressive and Boundary-Aware Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Open-vocabulary part segmentation (OVPS) struggles with structurally connected boundaries due to the inherent conflict between continuous image features and discrete classification mechanism. To address this, we propose PBAPS, a novel training-free framework specifically designed for OVPS. |
Xinlong Li; Di Lin; Shaoyiyi Gao; Jiaxin Li; Ruonan Liu; Qing Guo; | code |
| 460 | AANet: Virtual Screening Under Structural Uncertainty Via Alignment and Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. |
Wenyu Zhu; Jianhui Wang; Bowen Gao; Yinjun Jia; Haichuan Tan; Ya-Qin Zhang; Wei-Ying Ma; Yanyan Lan; | code |
| 461 | A Single-Loop First-Order Algorithm for Linearly Constrained Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Especially, we establish a strong theoretical connection between the reformulated function and the original hyper-objective by characterizing the closeness of their values and derivatives. Based on this reformulation, we propose a single-loop, first-order algorithm for linearly constrained bilevel optimization (SFLCB). |
Wei Shen; Jiawei Zhang; Minhui Huang; Cong Shen; | code |
| 462 | CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent model merging strategies based on linear mode connectivity (LMC) offer improved stability by interpolating between fine-tuned checkpoints, they are computationally expensive, requiring repeated checkpoint access and multiple forward passes. In this paper, we introduce CodeMerge, a lightweight and scalable model merging framework that bypasses these limitations by operating in a compact latent space. |
Huitong Yang; Zhuoxiao Chen; Fengyi Zhang; Zi Huang; Yadan Luo; | code |
| 463 | Program Synthesis Via Test-Time Transduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While prior approaches to program synthesis–whether based on natural language descriptions or input-output examples–typically aim to generalize from training examples, they often struggle with robustness, especially in real-world settings where training examples are limited and test inputs involve various edge cases. To address this, we propose a novel framework that improves robustness by treating synthesis as an active learning over a finite hypothesis class defined by programs’ outputs. |
Kang-il Lee; Jahyun Koo; Seunghyun Yoon; Minbeom Kim; Hyukhun Koh; Dongryeol Lee; Kyomin Jung; | code |
| 464 | Accelerating Parallel Diffusion Model Serving with Residual Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CompactFusion, a compression framework that significantly reduces communication while preserving generation quality. |
Jiajun Luo; Yicheng Xiao; Jianru Xu; Yangxiu You; Rongwei Lu; Chen Tang; Jingyan Jiang; Zhi Wang; | code |
| 465 | Continuous Thought Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By incorporating neuron-level processing and synchronization, we reintroduce neural timing as a foundational element. We present the Continuous Thought Machine (CTM), a model designed to leverage neural dynamics as its core representation. |
Luke Nicholas Darlow; Ciaran Regan; Sebastian Risi; Jeffrey Seely; Llion Jones; | code |
| 466 | Training-Free Constrained Generation With Stable Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While there is increasing effort to incorporate physics-based constraints into generative models, existing techniques are either limited in their applicability to latent diffusion frameworks or lack the capability to strictly enforce domain-specific constraints. To address this limitation this paper proposes a novel integration of stable diffusion models with constrained optimization frameworks, enabling the generation of outputs satisfying stringent physical and functional requirements. |
Stefano Zampini; Jacob K Christopher; Luca Oneto; Davide Anguita; Ferdinando Fioretto; | code |
| 467 | Compressed and Smooth Latent Space for Text Diffusion Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Cosmos, a novel approach to text generation that operates entirely in a compressed, smooth latent space tailored specifically for diffusion. |
Viacheslav Meshchaninov; Egor Chimbulatov; Alexander Shabalin; Aleksandr Abramov; Dmitry Vetrov; | code |
| 468 | Luminance-Aware Statistical Quantization: Unsupervised Hierarchical Learning for Illumination Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by empirical analysis of natural luminance dynamics revealing power-law distributed intensity transitions, this paper introduces Luminance-Aware Statistical Quantification (LASQ), a novel framework that reformulates LLIE as a statistical sampling process over hierarchical luminance distributions. |
Derong Kong; Zhixiong Yang; Shengxi Li; Shuaifeng Zhi; Li Liu; Zhen Liu; Jingyuan Xia; | code |
| 469 | Partition-Then-Adapt: Combating Prediction Bias for Reliable Multi-Modal Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often prove ineffective when multiple modalities simultaneously undergo domain shifts, as they struggle to identify and utilize reliable samples within testing batches amid severe prediction bias. To address this problem, we propose Partition-Then-Adapt (PTA), a novel approach combating prediction bias for TTA with multi-modal domain shifts. |
Guowei Wang; Fan Lyu; Changxing Ding; | code |
| 470 | Audio Super-Resolution with Latent Bridge Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards high-quality audio super-resolution, we present a new system with latent bridge models (LBMs), where we compress the audio waveform into a continuous latent space and design an LBM to enable a latent-to-latent generation process that naturally matches the LR-to-HR upsampling process, thereby fully exploiting the instructive prior information contained in the LR waveform. |
Chang Li; Zehua Chen; Liyuan Wang; Jun Zhu; | code |
| 471 | Omnidirectional 3D Scene Reconstruction from Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing diffusion-based methods often struggle with reconstructing omnidirectional scenes due to geometric distortions and inconsistencies across the generated novel views, hindering accurate 3D recovery. To overcome this challenge, we propose Omni3D, an approach designed to enhance the geometric fidelity of diffusion-generated views for robust omnidirectional reconstruction. |
Ren Yang; Jiahao Li; Yan Lu; | code |
| 472 | Hierarchical Demonstration Order Optimization for Many-shot In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the second challenge, we propose a hierarchical demonstration order optimization method named \texttt{HIDO} that enables a more refined exploration of the order space, achieving high ICL performance without the need to evaluate all possible orders. |
Yinhan He; Wendy Zheng; Song Wang; Zaiyi Zheng; Yushun Dong; Yaochen Zhu; Jundong Li; | code |
| 473 | SemCoT: Accelerating Chain-of-Thought Reasoning Through Semantically-Aligned Implicit Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing implicit CoT methods face two significant challenges: (1) they fail to preserve the semantic alignment between the implicit reasoning (when transformed to natural language) and the ground-truth reasoning, resulting in a significant CoT performance degradation, and (2) they focus on reducing the length of the implicit reasoning; however, they neglect the considerable time cost for an LLM to generate one individual implicit reasoning token. To tackle these challenges, we propose a novel semantically-aligned implicit CoT framework termed **SemCoT**. |
Yinhan He; Wendy Zheng; Yaochen Zhu; Zaiyi Zheng; Lin Su; Sriram Vasudevan; Qi Guo; Liangjie Hong; Jundong Li; | code |
| 474 | Embodied Cognition Augmented End2End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel paradigm termed $E^{3}AD$, which advocates for comparative learning between visual feature extraction networks and the general EEG large model, in order to learn latent human driving cognition for enhancing end-to-end planning.In this work, we collected a cognitive dataset for the mentioned contrastive learning process. |
Ling Niu; Xiaoji Zheng; han wang; Ziyuan Yang; Chen Zheng; Bokui Chen; Jiangtao Gong; | code |
| 475 | Temporal Logic-Based Multi-Vehicle Backdoor Attacks Against Offline RL Agents in End-to-end Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works mainly focus on pixel-level triggers which are impractical to deploy in the real world. We address this gap by introducing a novel backdoor attack against the end-to-end AD systems that leverage one or more other vehicles’ trajectories as triggers. |
Xuan Chen; Shiwei Feng; Zikang Xiong; Shengwei An; Yunshu Mao; Lu Yan; Guanhong Tao; Wenbo Guo; Xiangyu Zhang; | code |
| 476 | CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the foreground-background feature coupling caused by the overlapping phenomenon specific to X-ray images makes general detectors designed for natural images perform poorly. To address this issue, we propose a Category Semantic Prior Contrastive Learning (CSPCL) mechanism, which aligns the class prototypes perceived by the classifier with the content queries to correct and supplement the missing semantic information responsible for classification, thereby enhancing the model sensitivity to foreground features. |
Mingyuan Li; Tong Jia; Hao Wang; Bowen Ma; Luhui; Shiyi Guo; Da Cai; Dongyue Chen; | code |
| 477 | RoMa: A Robust Model Watermarking Scheme for Protecting IP in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our results show that existing watermarked models tend to converge to sharp minima in the loss landscape, thus making them vulnerable to fine-tuning. To tackle this challenge, we propose **RoMa**, a **Ro**bust **M**odel w**a**termarking scheme that improves the robustness of watermarks against fine-tuning. |
Yingsha Xie; Rui Min; Zeyu Qin; Fei Ma; Li Shen; Fei Yu; Xiaochun Cao; | code |
| 478 | Entropic Time Schedulers for Generative Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a time scheduler that selects sampling points based on entropy rather than uniform time spacing, ensuring that each point contributes an equal amount of information to the final generation. |
Dejan Stancevic; Florian Handke; Luca Ambrogioni; | code |
| 479 | Joint Relational Database Generation Via Graph-Conditional Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any table order. |
Mohamed Amine Ketata; David Lüdke; Leo Schwinn; Stephan Günnemann; | code |
| 480 | Co-Reinforcement Learning for Unified Multimodal Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a pioneering exploration of reinforcement learning (RL) via group relative policy optimization for unified multimodal large language models (ULMs), aimed at simultaneously reinforcing generation and understanding capabilities. |
Jingjing Jiang; Chongjie Si; Jun Luo; Hanwang Zhang; Chao Ma; | code |
| 481 | Towards Realistic Earth-Observation Constellation Scheduling: Benchmark and Methodology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To our knowledge, AEOS-Bench is the first large-scale benchmark suite tailored for realistic constellation scheduling. Building upon this benchmark, we introduce AEOS-Former, a Transformer-based scheduling model that incorporates a constraint-aware attention mechanism. |
Luting Wang; Yinghao Xiang; Hongliang Huang; Dongjun Li; Chen Gao; Si Liu; | code |
| 482 | GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this study presents GeoLink, a multimodal framework that leverages OSM data to enhance RS FM during both the pretraining and downstream task stages. |
Lubin Bai; Xiuyuan Zhang; Siqi Zhang; Zepeng Zhang; Haoyu Wang; Wei Qin; Shihong Du; | code |
| 483 | Hybrid Re-matching for Continual Learning with Parameter-Efficient Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, reliance solely on pre-trained features for parameter matching exacerbates the inconsistency between the training and inference phases, thereby constraining the overall performance. To address this issue, we propose HRM-PET, which makes full use of the richer downstream knowledge inherently contained in the trained parameters. |
Weicheng Wang; Guoli Jia; Xialei Liu; Liang Lin; Jufeng Yang; | code |
| 484 | Inference-time Alignment in Continuous Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods struggle to explore informative candidates when the base policy is weak or the candidate set is small, resulting in limited effectiveness. In this paper, to address this problem, we propose Simple Energy Adaptation ($\textbf{SEA}$), a simple yet effective algorithm for inference-time alignment. |
Yige Yuan; Teng Xiao; Li Yunfan; Bingbing Xu; Shuchang Tao; Yunqi Qiu; Huawei Shen; Xueqi Cheng; | code |
| 485 | From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most prior methods either prioritize age transformation at the expense of identity consistency or vice versa. In this work, we address this issue by proposing a $two\text{-}pass$ face aging framework, named $Cradle2Cane$, based on few-step text-to-image (T2I) diffusion models. |
Tao Liu; Dafeng Zhang; Gengchen Li; Shizhuo Liu; yongqi song; Senmao Li; Shiqi Yang; Boqian Li; Kai Wang; Yaxing Wang; | code |
| 486 | Unveiling The Spatial-temporal Effective Receptive Fields of Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In artificial neural networks, the effective receptive field (ERF) serves as a valuable tool for analyzing feature extraction capabilities in visual long-sequence modeling. Inspired by this, we introduce the Spatio-Temporal Effective Receptive Field (ST-ERF) to analyze the ERF distributions across various Transformer-based SNNs. |
Jieyuan Zhang; Xiaolong Zhou; Shuai Wang; Wenjie Wei; Hanwen Liu; Qian Sun; Malu Zhang; Yang Yang; Haizhou Li; | code |
| 487 | Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Latent Denoising Diffusion Bridge Model (LDDBM), a general-purpose framework for modality translation based on a latent-variable extension of Denoising Diffusion Bridge Models. |
Nimrod Berman; Omkar Joglekar; Eitan Kosman; Dotan Di Castro; Omri Azencot; | code |
| 488 | GRE Suite: Geo-localization Inference Via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these limitations, we propose the Geo Reason Enhancement (GRE) Suite, a novel framework that augments VLMs with structured reasoning chains for accurate and interpretable location inference.Finally, we construct the Geo Reason Evaluation Benchmark (GREval-Bench), a comprehensive evaluation framework that assesses VLMs across diverse urban, natural, and landmark scenes to measure both coarse-grained (e.g., country, continent) and fine-grained (e.g., city, street) localization performance. |
Chun Wang; Xiaojun Ye; Xiaoran Pan; Zihao Pan; Haofan Wang; Yiren Song; | code |
| 489 | FRBNet: Revisiting Low-Light Vision Through Frequency-Domain Radial Basis Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent state-of-the-art methods have improved performance through invariant feature learning modules, they still fall short due to incomplete modeling of low-light conditions. Therefore, we revisit low-light image formation and extend the classical Lambertian model to better characterize low-light conditions. |
Fangtong Sun; Congyu Li; Ke Yang; Yuchen Pan; Hanwen Yu; Xichuan Zhang; Yiying Li; | code |
| 490 | Can Dependencies Induced By LLM-Agent Workflows Be Trusted? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this assumption frequently breaks during execution, as ground-truth responses are inaccessible, leading to inter-agent misalignment—failures caused by inconsistencies and coordination breakdowns among agents. To address this, we propose SeqCV, a dynamic framework for reliable execution under violated conditional independence. |
Yu Yao; Yiliao Song; Yian Xie; Mengdan Fan; Mingyu Guo; Tongliang Liu; | code |
| 491 | GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ultra-high-resolution (UHR) remote sensing (RS) imagery offers valuable data for Earth observation but pose challenges for existing multimodal foundation models due to two key bottlenecks: (1) limited availability of UHR training data, and (2) token explosion caused by the large image size. To address data scarcity, we introduce **SuperRS-VQA** (avg. 8,376$\times$8,376) and **HighRS-VQA** (avg. 2,000$\times$1,912), the highest-resolution vision-language datasets in RS to date, covering 22 real-world dialogue tasks. |
Fengxiang Wang; Mingshuo Chen; Yueying Li; Di Wang; Haotian Wang; Zonghao Guo; Zefan Wang; Shan Boqi; Long Lan; Yulin Wang; Hongzhen Wang; Wenjing Yang; Bo Du; Jing Zhang; | code |
| 492 | RoMA: Scaling Up Mamba-based Foundation Models for Remote Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the linear-complexity Mamba architecture offers a promising alternative, existing RS applications of Mamba remain limited to supervised tasks on small, domain-specific datasets. To address these challenges, we propose RoMA, a framework that enables scalable self-supervised pretraining of Mamba-based RS foundation models using large-scale, diverse, unlabeled data. |
Fengxiang Wang; Yulin Wang; Mingshuo Chen; Haotian Wang; Hongzhen Wang; Haiyan Zhao; Yangang Sun; Shuo Wang; Di Wang; Long Lan; Wenjing Yang; Jing Zhang; | code |
| 493 | Data-Free Model Extraction for Black-box Recommender Systems Via Graph Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle these challenges, in this paper, we first thoroughly analyze how the architecture of surrogate models influences extraction attack performance, highlighting the superior effectiveness of the graph convolution architecture. Based on this, we propose a novel Data-free Black-box Graph convolution-based Recommender Model Extraction method, dubbed DBGRME. |
Zeyu Wang; Yidan Song; Shihao Qin; Yu Shanqing; Yujin Huang; Qi Xuan; Xin Zheng; | code |
| 494 | Optimized Minimal 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Optimized Minimal Gaussians representation (OMG), which significantly reduces storage while using a minimal number of primitives. |
Joo Chan Lee; Jong Hwan Ko; Eunbyung Park; | code |
| 495 | A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against The Strong Black-box Models of GPT-4.5/4o/o1 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This critical absence of semantic information leads commercial black-box LVLMs to either ignore the perturbation entirely or misinterpret its embedded semantics, thereby causing the attack to fail. To overcome these issues, we propose to refine semantic clarity by encoding explicit semantic details within local regions, thus ensuring the capture of finer-grained features and inter-model transferability, and by concentrating modifications on semantically rich areas rather than applying them uniformly. |
Zhaoyi Li; Xiaohan Zhao; Dong-Dong Wu; Jiacheng Cui; Zhiqiang Shen; | code |
| 496 | FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce FedMGP, a new paradigm for personalized federated prompt learning in vision-language models (VLMs). |
Weihao Bo; Yanpeng Sun; Yu Wang; Xinyu Zhang; Zechao Li; | code |
| 497 | UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we first introduce UniSite-DS, the first UniProt (Unique Protein)-centric ligand binding site dataset, which contains 4.81 times more multi-site data and 2.08 times more overall data compared to the previously most widely used datasets. We then propose UniSite, the first end-to-end ligand binding site detection framework supervised by set prediction loss with bijective matching. |
Jigang Fan; QuanLin Wu; Shengjie Luo; Liwei Wang; | code |
| 498 | HyperMARL: Adaptive Hypernetworks for Multi-Agent RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: * We propose a solution built on a key insight: an agent-conditioned hypernetwork can generate agent-specific parameters and *decouple* observation- and agent-conditioned gradients, directly countering the interference from coupling agent IDs with observations. |
Kale-ab Tessera; Arrasy Rahman; Amos Storkey; Stefano V. Albrecht; | code |
| 499 | What Can RL Bring to VLA Generalization? An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Reinforcement learning (RL) offers a path to overcome these limitations by optimizing for task objectives via trial-and-error, yet a systematic understanding of its specific generalization benefits for VLAs compared to SFT is lacking. To address this, our study introduces a comprehensive benchmark for evaluating VLA generalization and systematically investigates the impact of RL fine-tuning across diverse visual, semantic, and execution dimensions. |
Jijia Liu; Feng Gao; Bingwen Wei; Xinlei Chen; Qingmin Liao; Yi Wu; Chao Yu; Yu Wang; | code |
| 500 | MaintainCoder: Maintainable Code Generation Under Dynamic Requirements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle dynamic requirements with minimal rework, we propose \textbf{MaintainCoder} as a pioneering solution. |
Zhengren Wang; Rui ling; Chufan Wang; Yongan Yu; Sizhe Wang; Zhiyu li; Feiyu Xiong; Wentao Zhang; | code |
| 501 | GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: PTQ often incurs substantial accuracy drop, while QAT achieves high accuracy but suffers from prohibitive computational costs, limited generalization to downstream tasks, training instability, and lacking of open-source codebase. To address these challenges, this paper introduces General, Practical, and Lightning Quantization (GPLQ), a novel framework designed for efficient and effective ViT quantization. |
Guang Liang; Xinyao Liu; Jianxin Wu; | code |
| 502 | Uncertainty-Informed Meta Pseudo Labeling for Surrogate Modeling with Limited Labeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Semi-supervised learning reduces label reliance by leveraging unlabeled data yet remains vulnerable to noisy pseudo-labels that mislead training and undermine robustness. To address these challenges, we propose a novel framework, Uncertainty-Informed Meta Pseudo Labeling (UMPL). |
Xingyu Ren; Pengwei Liu; Pengkai Wang; Guanyu Chen; Qinxin Wu; Dong Ni; | code |
| 503 | Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches, which typically train a single model to capture all these diverse patterns, often struggle with the pattern drifts between patches and may lead to poor generalization. To address these challenges, we propose TFPS, a novel architecture that leverages pattern-specific experts for more accurate and adaptable time series forecasting. |
Yanru Sun; Zongxia Xie; Emadeldeen Eldele; Dongyue Chen; Qinghua Hu; Min Wu; | code |
| 504 | VLA-Cache: Efficient Vision-Language-Action Manipulation Via Adaptive Token Caching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces VLA-Cache, a training-free inference acceleration method that reduces computational overhead by adaptively caching and reusing static visual tokens across frames. |
Siyu Xu; Yunke Wang; Chenghao Xia; Dihao Zhu; Tao Huang; Chang Xu; | code |
| 505 | FACE: A General Framework for Mapping Collaborative Filtering Embeddings Into LLM Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a key challenge is that LLMs struggle to interpret the latent, non-semantic embeddings produced by CF approaches, limiting recommendation effectiveness and further applications. To address this, we propose FACE, a general interpretable framework that maps CF embeddings into pre-trained LLM tokens. |
Chao Wang; Yixin Song; Jinhui Ye; Chuan Qin; Dazhong Shen; Lingfeng Liu; Xiang Wang; Yanyong Zhang; | code |
| 506 | Repo2Run: Automated Building Executable Environment for Code Repository at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the gap, we introduce Repo2Run, the first LLM-based agent aiming at automating the building of executable test environments for any repositories at scale.We created a benchmark containing 420 Python repositories with unit tests for evaluation. |
Ruida Hu; Chao Peng; XinchenWang; Junjielong Xu; Cuiyun Gao; | code |
| 507 | GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation. |
Xiang Lan; Feng Wu; Kai He; Qinghao Zhao; Shenda Hong; Mengling Feng; | code |
| 508 | Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we propose **T**ext-to-**D**ecision **A**gent (**T2DA**), a simple and scalable framework that supervises offline meta-RL with natural language. |
Shilin Zhang; Zican Hu; Wenhao Wu; Xinyi Xie; Jianxiang Tang; Chunlin Chen; Daoyi Dong; Yu Cheng; Zhenhong Sun; Zhi Wang; | code |
| 509 | ROOT: Rethinking Offline Optimization As Distributional Translation Via Probabilistic Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. |
Manh Cuong Dao; The Hung Tran; Phi Le Nguyen; Thao Nguyen Truong; Trong Nghia Hoang; | code |
| 510 | UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass, harnessing the generative capabilities of video diffusion models. |
Kai He; Ruofan Liang; Jacob Munkberg; Jon Hasselgren; Nandita Vijaykumar; Alexander Keller; Sanja Fidler; Igor Gilitschenski; Zan Gojcic; Zian Wang; | code |
| 511 | Seeing The Wind from A Falling Leaf Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how to recover the invisible forces from visual observations, e.g., estimating the wind field by observing a leaf falling to the ground. |
Zhiyuan Gao; Jiageng Mao; Hong-Xing Yu; Haozhe Lou; Emily Yue-ting Jia; Jernej Barbic; Jiajun Wu; Yue Wang; | code |
| 512 | Towards Visualization-of-Thought Jailbreak Attack Against Large Visual Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Visualization-of-Thought Attack (\textbf{VoTA}), a novel and automated attack framework that strategically constructs chains of images with risky visual thoughts to challenge victim models. |
Hongqiong Zhong; Qingyang Teng; Baolin Zheng; Guanlin Chen; Yingshui Tan; Zhendong Liu; Jiaheng Liu; Wenbo Su; Xiaoyong Zhu; Bo Zheng; Kaifu Zhang; | code |
| 513 | PseuZO: Pseudo-Zeroth-Order Algorithm for Training Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing ZO gradient estimators exhibit dimension-dependent variance scaling as $\Theta(d)$, leading to dimension-dependent convergence rates without further assumptions on the objective function, which is prohibitive for large-scale LLM parameters. To address this problem, we present a Pseudo-Zeroth-Order (PseuZO) framework for optimizing composite objective functions, especially large-scale models: $ \min_{\mathbf{x} \in \mathcal{X}} \mathcal{F}(\mathbf{x})= \bbE_{\mathbf{z}} g\circ h(\mathbf{x};\mathbf{z}) $, where $h$ represents complex, high-dimensional representations and $g$ is a task-specific loss. |
Pengyun Yue; Xuanlin Yang; Mingqing Xiao; Zhouchen Lin; | code |
| 514 | VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing multi-modal prompt learning methods typically rely on fixed, shared prompts and deterministic parameters, which limits their ability to capture instance-level variation or model uncertainty across diverse tasks and domains. To tackle this issue, we propose a novel Variational Multi-Modal Prompt Learning (VaMP) framework that enables sample-specific, uncertainty-aware prompt tuning in multi-modal representation learning. |
Silin Cheng; Kai Han; | code |
| 515 | Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap, we introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization. |
Qitao Tan; Jun Liu; Zheng Zhan; Caiwen Ding; Yanzhi Wang; Xiaolong Ma; Jaewoo Lee; Jin Lu; Geng Yuan; | code |
| 516 | Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we propose Adversarial Locomotion and Motion Imitation (ALMI), a novel framework that enables adversarial policy learning between upper and lower body.Additionally, we release a large-scale whole-body motion control dataset featuring high-quality episodic trajectories from MuJoCo simulations. |
Jiyuan Shi; Xinzhe Liu; dewei wang; ouyang lu; Sören Schwertfeger; Chi Zhang; Fuchun Sun; Chenjia Bai; Xuelong Li; | code |
| 517 | Learning to Better Search with Language Models Via Guided Reinforced Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, relying on such imperfect traces can result in inefficient use of test-time compute. To address this, we propose guided reinforced self-training (Guided-ReST), a fine-tuning algorithm designed to improve the model’s capability for effective search during inference. |
Seungyong Moon; Bumsoo Park; Hyun Oh Song; | code |
| 518 | Stochastic Forward-Forward Learning Through Representational Dimensionality Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel goodness function termed dimensionality compression that uses the effective dimensionality (ED) of fluctuating neural responses to incorporate second-order statistical structure. |
Zhichao Zhu; YANG QI; Hengyuan Ma; Wenlian Lu; Jianfeng Feng; | code |
| 519 | SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we theoretically investigate the DPO under both online and offline settings, and reveal their respective limitation: overfitting in offline DPO, and biased sampling in online DPO. |
Xiaofeng Tan; Hongsong Wang; Xin Geng; Pan Zhou; | code |
| 520 | Conditional Representation Learning for Customized Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Conditional Representation Learning (CRL), aiming to extract representations tailored to arbitrary user-specified criteria. |
Honglin Liu; Chao Sun; Peng Hu; Yunfan Li; Xi Peng; | code |
| 521 | Tensor Decomposition Networks for Accelerating Machine Learning Force Field Computations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further reduce the number of parameters, we propose path-weight sharing that ties all multiplicity-space weights across the O(L^3) CG paths into a single path without compromising equivariance, where L is the maximum angular degree. |
Yuchao Lin; Cong Fu; Zachary Krueger; Haiyang Yu; Maho Nakata; Jianwen Xie; Emine Kucukbenli; Xiaofeng Qian; Shuiwang Ji; | code |
| 522 | Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods encounter two challenges when aligning with such fine-grained value objectives: 1) they often treat multiple values as independent and equally important, ignoring their interdependence and relative priorities (value complexity); 2) they struggle to precisely control nuanced value priorities, especially those underrepresented ones (value steerability). To handle these challenges, we propose COUPLE, a COUnterfactual reasoning framework for PLuralistic valuE alignment. |
Hanze Guo; Jing Yao; Xiao Zhou; Xiaoyuan Yi; Xing Xie; | code |
| 523 | Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike conventional FM modules, which use the coarse representations from the weak generator as conditions, SFM constructs intermediate states along the FM paths from these representations. During training, we introduce an orthogonal projection method to adaptively determine the temporal position of these states, and apply a principled construction strategy based on a single-segment piecewise flow. |
Dong Yang; YIYI CAI; Yuki Saito; Lixu Wang; Hiroshi Saruwatari; | code |
| 524 | DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage garment structural correspondence to automatically generate a dataset with diverse trajectories using only a single expert demonstration, significantly reducing manual intervention. |
Yuran Wang; Ruihai Wu; Yue Chen; Jiarui Wang; Jiaqi Liang; Ziyu Zhu; Haoran Geng; Jitendra Malik; Pieter Abbeel; Hao Dong; | code |
| 525 | Target Speaker Extraction Through Comparing Noisy Positive and Negative Audio Enrollments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore a novel enrollment strategy that encodes target speaker information from the noisy enrollment by comparing segments where the target speaker is talking (Positive Enrollments) with segments where the target speaker is silent (Negative Enrollments). |
Shitong Xu; Yiyuan Yang; Niki Trigoni; Andrew Markham; | code |
| 526 | PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents PipeFusion, an innovative parallel methodology to tackle the high latency issues associated with generating high-resolution images using diffusion transformers (DiTs) models. |
Jiarui Fang; Jinzhe Pan; Aoyu Li; Xibo Sun; WANG Jiannan; | code |
| 527 | ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ZEBRA, the first zero-shot brain visual decoding framework that eliminates the need for subject-specific adaptation. |
Haonan Wang; Jingyu Lu; Hongrui Li; Xiaomeng Li; | code |
| 528 | SynBrain: Enhancing Visual-to-fMRI Synthesis Via Probabilistic Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing deterministic methods struggle to simultaneously model this biological variability while capturing the underlying functional consistency that encodes stimulus information. To address these limitations, we propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner. |
Weijian Mai; Jiamin Wu; Yu Zhu; Zhouheng Yao; Dongzhan Zhou; Andrew Luo; Qihao Zheng; Wanli Ouyang; Chunfeng Song; | code |
| 529 | Robust Explanations of Graph Neural Networks Via Graph Curvatures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are the first to prove that these geometric notions can be used to bound explanation robustness. We design a general optimization algorithm to incorporate these geometric properties into a wide spectrum of base GNN explanation methods to enhance the robustness of base explanations. |
Yazheng Liu; Xi Zhang; Sihong Xie; Hui Xiong; | code |
| 530 | Jury-and-Judge Chain-of-Thought for Uncovering Toxic Data in 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 3D Visual Grounding (3DVG) faces persistent challenges due to coarse scene-level observations and logically inconsistent annotations, which introduce ambiguities that compromise data quality and hinder effective model supervision. To address these challenges, we introduce Refer-Judge, a novel framework that harnesses the reasoning capabilities of Multimodal Large Language Models (MLLMs) to identify and mitigate toxic data. |
Kaixiang Huang; Qifeng Zhang; Jin Wang; Jingru Yang; Yang Zhou; Huan Yu; Guodong Lu; Shengfeng He; | code |
| 531 | Auditing Meta-Cognitive Hallucinations in Reasoning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate hallucination causality under constrained knowledge domains by auditing the Chain-of-Thought (CoT) trajectory and assessing the model’s cognitive confidence in potentially erroneous or biased claims. |
Haolang Lu; Yilian Liu; Jingxin Xu; Guoshun Nan; Yuanlong Yu; Zhican Chen; Kun Wang; | code |
| 532 | T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Token-Selective HIeRarchical Data Selection for Instruction Tuning (T-SHIRT), a novel data selection framework that introduces a new scoring method to include only informative tokens in quality evaluation and also promote robust and reliable samples whose neighbors also show high quality with less local inconsistencies. |
Yanjun Fu; Faisal Hamman; Sanghamitra Dutta; | code |
| 533 | SMARTraj$^2$: A Stable Multi-City Adaptive Method for Multi-View Spatio-Temporal Trajectory Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SMARTraj$^2$, a novel stable multi-city adaptive method for multi-view spatio-temporal trajectory representation learning. |
Tangwen Qian; Junhe Li; Yile Chen; Gao Cong; Zezhi Shao; Jun Zhang; Tao Sun; Fei Wang; Yongjun Xu; | code |
| 534 | InstanceAssemble: Layout-Aware Image Generation Via Instance Assembling Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, we propose InstanceAssemble, a novel architecture that incorporates layout conditions via instance-assembling attention, enabling position control with bounding boxes (bbox) and multimodal content control including texts and additional visual content.Additionally, we propose a Layout-to-Image benchmark, Denselayout, a comprehensive benchmark for layout-to-image generation, containing 5k images with 90k instances in total. |
Qiang Xiang; Shuang Sun; Binglei Li; Dejia Song; Huaxia Li; Nemo Chen; Xu Tang; Yao Hu; Junping Zhang; | code |
| 535 | Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on adversarial suffix jailbreak attacks and unveils that to defend against a jailbreak attack with an adversarial suffix of length $\Theta(M)$, it is enough to align LLMs on prompts with adversarial suffixes of length $\Theta(\sqrt{M})$. |
Shaopeng Fu; Liang Ding; Jingfeng Zhang; Di Wang; | code |
| 536 | ComRank: Ranking Loss for Multi-Label Complementary Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, existing methods underutilize label correlations inherent in MLL. To address these limitations, we propose ComRank, a ranking loss framework for MLCLL, which encourages complementary labels to be ranked lower than non-complementary ones, thereby modeling pairwise label relationships. |
Jing-Yi Zhu; Yi Gao; Miao Xu; Min-Ling Zhang; | code |
| 537 | Detecting High-Stakes Interactions with Activation Probes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines activation probes for detecting “high-stakes” interactions—where the text indicates that the interaction might lead to significant harm—as a critical, yet underexplored, target for such monitoring.We release our novel synthetic dataset and the codebase at \url{https://github.com/arrrlex/models-under-pressure}. |
Alex McKenzie; Urja Pawar; Phil Blandfort; William Bankes; David Krueger; Ekdeep Singh Lubana; Dmitrii Krasheninnikov; | code |
| 538 | Don’t Be Lazy: CompleteP Enables Compute-efficient Deep Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study compute efficiency of LLM training when using different parameterizations, i.e., rules for adjusting model and optimizer hyperparameters (HPs) as model size changes. |
Nolan Simran Dey; Bin Claire Zhang; Lorenzo Noci; Mufan Li; Blake Bordelon; Shane Bergsma; Cengiz Pehlevan; Boris Hanin; Joel Hestness; | code |
| 539 | EvolvedGRPO: Unlocking Reasoning in LVLMs Via Progressive Instruction Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the inherent entanglement between visual and textual modalities, applying GRPO to LVLMs often leads to reward convergence across different responses to the same sample as training progresses, hindering effective gradient updates and causing the enhancement of chain-of-thought reasoning to stagnate or even collapse. To address this issue, we propose a progressive instruction evolution framework, EvolvedGRPO, to gradually generate more complex questions via editing instructions in an adversarial way, progressively aligned with the model’s evolving capabilities. |
Zhebei Shen; Qifan Yu; Juncheng Li; Wei Ji; Qizhi Chen; Siliang Tang; Yueting Zhuang; | code |
| 540 | SALMONN-omni: A Standalone Speech LLM Without Codec Injection for Full-duplex Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such methods still incur significant performance degradation when operating on the speech rather than text modality. In this paper, we introduce SALMONN-omni, the first single, standalone full-duplex speech LLM that operates without audio codecs in its token space. |
Wenyi Yu; Siyin Wang; Xiaoyu Yang; Xianzhao Chen; Xiaohai Tian; Jun Zhang; Guangzhi Sun; Lu Lu; Yuxuan Wang; Chao Zhang; | code |
| 541 | Train with Perturbation, Infer After Merging: A Two-Stage Framework for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent success of model merging techniques, we propose Perturb-and-Merge (P\&M), a novel continual learning framework that integrates model merging into the CL paradigm to mitigate forgetting. |
Haomiao Qiu; Miao Zhang; Ziyue Qiao; Liqiang Nie; | code |
| 542 | Embodied Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods incorporate active camera settings, holding promise in addressing the fundamental issues in crowd counting.We then build up an interactive simulator, the Embodied Crowd Counting Dataset (ECCD), which enables large-scale scenes and large object quantities. |
Runling Long; Yunlong Wang; Jia Wan; Xiang Deng; Xinting Zhu; Weili Guan; Antoni B. Chan; Liqiang Nie; | code |
| 543 | R$^2$ec: Towards Large Recommender Models with Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose R$^2$ec, a unified large recommender model with intrinsic reasoning capability. |
Runyang You; Yongqi Li; Xinyu Lin; Xin Zhang; Wenjie Wang; Wenjie Li; Liqiang Nie; | code |
| 544 | Dense Metric Depth Estimation Via Event-based Differential Focus Volume Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel approach to enhance dense metric depth estimation by fusing events with image foundation models via a prompting approach.We further construct synthetic and real-captured datasets to facilitate the training and evaluation of both frame-based and event-based methods. |
Boyu Li; Peiqi Duan; Zhaojun Huang; Xinyu Zhou; Yifei Xia; Boxin Shi; | code |
| 545 | Diffusion Transformers As Open-World Spatiotemporal Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Effectively modeling these dynamics is essential for understanding and optimizing urban systems. In this work, we introduce UrbanDiT, a foundation model for open-world urban spatio-temporal learning that successfully scales up diffusion transformers in this field. |
Yuan Yuan; Chonghua Han; Jingtao Ding; Guozhen Zhang; Depeng Jin; Yong Li; | code |
| 546 | Rethinking Residual Distribution in Locate-then-Edit Model Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through theoretical and empirical analysis, we show that such errors increase with the distribution distance, batch size, and edit sequence length, ultimately leading to inaccurate or suboptimal edits. To address this, we propose the $\textbf{B}$oundary $\textbf{L}$ayer $\textbf{U}$pdat$\textbf{E (BLUE)}$ strategy to enhance locate-then-edit methods. |
Xiaopeng Li; Shangwen Wang; Shasha Li; Shezheng Song; Bin Ji; Ma Jun; Jie Yu; | code |
| 547 | Dimension-Reduction Attack! Video Generative Models Are Experts on Controllable Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a paradigm for video-to-image knowledge compression and task adaptation, termed \textit{Dimension-Reduction Attack} (\texttt{DRA-Ctrl}), which utilizes the strengths of video models, including long-range context modeling and flatten full-attention, to perform various generation tasks. |
Hengyuan Cao; Yutong Feng; Biao Gong; Yijing Tian; Yunhong Lu; Chuang Liu; Bin Wang; | code |
| 548 | The Indra Representation Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose The Indra Representation Hypothesis, inspired by the philosophical metaphor of Indra’s Net. |
Jianglin Lu; Hailing Wang; Kuo Yang; Yitian Zhang; Simon Jenni; Yun Fu; | code |
| 549 | Efficiently Maintaining The Multilingual Capacity of MCLIP in Downstream Cross-Modal Retrieval Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap, we systematically investigate the role of token similarity in cross-lingual transferability for image-text retrieval, establishing it as a key factor governing fine-tuning efficacy. Building on this insight, we propose two novel strategies to enhance efficiency while preserving multilinguality: 1) TaPCL dynamically optimizes training by prioritizing linguistically distant language pairs during corpus sampling, reducing redundant computation, and 2) CiPCL enriches the source corpus with multilingual key terms, enabling targeted knowledge transfer without reliance on exhaustive parallel data. |
Fengmao Lv; Jitong Lei; Guosheng Lin; Desheng ZHENG; Jianyang Zhang; Tianrui Li; | code |
| 550 | NeuroPath: Neurobiology-Inspired Path Tracking and Reflection for Semantically Coherent Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches often result in a loss of semantic coherence and introduce irrelevant noise during node matching and subgraph construction. To address these limitations, we propose NeuroPath, an LLM-driven semantic path tracking RAG framework inspired by the path navigational planning of place cells in neurobiology. |
Junchen Li; Rongzheng Wang; Yihong Huang; Qizhi Chen; Jiasheng Zhang; Shuang Liang; | code |
| 551 | GeoRemover: Removing Objects and Their Causal Visual Artifacts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that these limitations stem from ignoring the causal relationship between an object’s geometry presence and its visual effects. To address this limitation, we propose a geometry-aware two-stage framework that decouples object removal into (1) geometry removal and (2) appearance rendering. |
Zixin Zhu; Haoxiang Li; Xuelu Feng; He Wu; Chunming Qiao; Junsong Yuan; | code |
| 552 | Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **VIPGuard**, a unified multimodal framework designed to capture fine-grained and comprehensive facial representations of a given identity, compare them against potentially fake or similar-looking faces, and reason over these comparisons to make accurate and explainable predictions.To facilitate the evaluation of our method, we build a comprehensive identity-aware benchmark called **VIPBench** for personalized deepfake detection, involving the latest 7 face-swapping and 7 entire face synthesis techniques for generation. |
Kaiqing Lin; Zhiyuan Yan; Ke-Yue Zhang; Li Hao; Yue Zhou; Yuzhen Lin; Weixiang Li; Taiping Yao; Shouhong Ding; Bin Li; | code |
| 553 | Integrating Drug Substructures and Longitudinal Electronic Health Records for Personalized Drug Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose \textbf{SubRec}, a unified framework that integrates representation learning across both patient and drug spaces. |
Wenjie Du; Xuqiang Li; Jinke Feng; Shuai Zhang; Wen Zhang; Yang Wang; | code |
| 554 | BlurDM: A Blur Diffusion Model for Image Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address it, we present a Blur Diffusion Model (BlurDM), which seamlessly integrates the blur formation process into diffusion for image deblurring. |
Jin-Ting He; Fu-Jen Tsai; Yan-Tsung Peng; Min-Hung Chen; Chia-Wen Lin; Yen-Yu Lin; | code |
| 555 | DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, achieving one step in VSR remains challenging, due to the high training overhead on video data and stringent fidelity demands. To tackle the above issues, we propose DOVE, an efficient one-step diffusion model for real-world VSR. |
Zheng Chen; Zichen Zou; Kewei Zhang; Xiongfei Su; Xin Yuan; Yong Guo; Yulun Zhang; | code |
| 556 | Q-Insight: Understanding Image Quality Via Visual Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO), which demonstrates strong visual reasoning capability for image quality understanding while requiring only a limited amount of rating scores and degradation labels. |
Weiqi Li; Xuanyu Zhang; Shijie Zhao; Yabin ZHANG; Junlin Li; Li zhang; Jian Zhang; | code |
| 557 | Composition and Alignment of Diffusion Models Using Constrained Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods cannot guarantee that the resulting model faithfully generates samples with all the desired properties. To address this gap, we propose a constrained optimization framework that unifies alignment and composition of diffusion models by enforcing that the aligned model satisfies reward constraints and/or remains close to each pretrained model. |
Shervin Khalafi; Ignacio Hounie; Dongsheng Ding; Alejandro Ribeiro; | code |
| 558 | FlowCut: Rethinking Redundancy Via Information Flow for Efficient Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find (1) the CLS token acts as an information relay, which can simplify the complicated flow analysis; (2) the redundancy emerges progressively and dynamically via layer-wise attention concentration; and (3) relying solely on attention scores from single layers can lead to contradictory redundancy identification. Based on this, we propose FlowCut, an information-flow-aware pruning framework, mitigating the insufficiency of the current criterion for identifying redundant tokens and better aligning with the model’s inherent behaviors. |
Jintao Tong; Wenwei Jin; Pengda Qin; Anqi Li; Yixiong Zou; Yuhong Li; Yuhua Li; Ruixuan Li; | code |
| 559 | Optimize Any Topology: A Foundation Model for Shape- and Resolution-Free Structural Topology Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Optimize Any Topology (OAT), a foundation-model framework that directly predicts minimum-compliance layouts for arbitrary aspect ratios, resolutions, volume fractions, loads, and fixtures. |
Amin Heyrani Nobari; Lyle Regenwetter; Cyril Picard; Ligong Han; Faez Ahmed; | code |
| 560 | Solving Inverse Problems with FLAIR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present FLAIR, a novel, training-free variational framework that leverages flow-based generative models as prior for inverse problems. |
Julius Erbach; Dominik Narnhofer; Andreas Robert Dombos; Bernt Schiele; Jan Eric Lenssen; Konrad Schindler; | code |
| 561 | Learning Shared Representations from Unpaired Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that shared representations can be learned almost exclusively from unpaired data. |
Amitai Yacobi; Nir Ben-Ari; Ronen Talmon; Uri Shaham; | code |
| 562 | Synthetic Series-Symbol Data Generation for Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage series-symbol data pairs with strong correlations, we develop SymTime, a pre-trained foundation model for enhancing time series representation using symbolic information. |
Wenxuan Wang; Kai Wu; Yujian Betterest Li; Dan Wang; Xiaoyu Zhang; | code |
| 563 | Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as we identified in this work, the state-of-the-art (SOTA) LVM-based forecaster poses an inductive bias towards forecasting periods. To harness this bias, we propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs for LTSF. |
ChengAo Shen; Wenchao Yu; Ziming Zhao; Dongjin Song; Wei Cheng; Haifeng Chen; Jingchao Ni; | code |
| 564 | Dynamic and Chemical Constraints to Enhance The Molecular Masked Graph Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in the molecular domain, two main issues arise: the predetermined mask ratio and reconstruction objectives can lead to suboptimal performance or negative transfer due to overly simplified or complex tasks, and these tasks may deviate from chemical priors. To tackle these challenges, we propose Dynamic and Chemical Constraints (DyCC) for MGAEs. |
Jiahui Zhang; Wenjie Du; Yang Wang; | code |
| 565 | Enhancing The Maximum Effective Window for Long-Term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Maximum Effective Window (MEW) metric to assess a model’s ability to effectively utilize the lookback window. |
Jiahui Zhang; Zhengyang Zhou; Wenjie Du; Yang Wang; | code |
| 566 | Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically analyze the adversarial robustness of NPC and demonstrate that it only depends on the robustness of the attribute recognition model and is independent of the robustness of the probabilistic circuit. |
Weixin Chen; Han Zhao; | code |
| 567 | From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new driving paradigm named Policy World Model (PWM), which not only integrates world modeling and trajectory planning within a unified architecture, but is also able to benefit planning using the learned world knowledge through the proposed action-free future state forecasting scheme. |
Zhida Zhao; Talas Fu; Yifan Wang; Lijun Wang; Huchuan Lu; | code |
| 568 | RoboScape: Physics-informed Embodied World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present RoboScape, a unified physics-informed world model that jointly learns RGB video generation and physics knowledge within an integrated framework. |
Yu Shang; Xin Zhang; Yinzhou Tang; Lei Jin; Chen Gao; Wei Wu; Yong Li; | code |
| 569 | VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods still suffer from hallucinating semantics that contradict the visual evidence due to the lack of grounding in actual instances, resulting in noisy guidance and costly corrections. To address these issues, we propose a novel framework, bridging Vision and Text with LLMs for Few-Shot Learning (VT-FSL), which constructs precise cross-modal prompts conditioned on Large Language Models (LLMs) and support images, seamlessly integrating them through a geometry-aware alignment mechanism. |
Wenhao Li; Qiangchang Wang; Xianjing Meng; Zhibin Wu; Yilong Yin; | code |
| 570 | SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SpecEdge, an edge-assisted inference framework that splits LLM workloads between edge and server GPUs using a speculative decoding scheme, exchanging only token outputs over the network. |
Jinwoo Park; Seunggeun Cho; Dongsu Han; | code |
| 571 | On Efficiency-Effectiveness Trade-off of Diffusion-based Recommenders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the multi-step process relies on discrete approximations, introducing discretization error that creates a trade-off between computational efficiency and recommendation effectiveness. To address this trade-off, we propose TA-Rec, a two-stage framework that achieves one-step generation by smoothing the denoising function during pretraining while alleviating trajectory deviation by aligning with user preferences during fine-tuning. |
Wenyu Mao; Jiancan Wu; Guoqing Hu; Zhengyi Yang; Wei Ji; Xiang Wang; | code |
| 572 | Multivariate Time Series Anomaly Detection with Idempotent Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, balancing robustness and sensitivity is also important for final performance, as robustness ensures accurate detection in potentially noisy data, while sensitivity enables early detection of subtle anomalies. To address these problems, inspired by idempotent generative network, we take the view from the manifold and propose a novel module named **I**dempotent **G**eneration for **A**nomaly **D**etection (IGAD) which can be flexibly combined with a reconstruction-based method without introducing additional trainable parameters. |
Xin Sun; Heng Zhou; Chao Li; | code |
| 573 | Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the stronger potential of Stable Diffusion 3 (SD3), a rectified flow-based model with a multimodal transformer backbone (MM-DiT). |
Jing Zuo; Jiaqi Wang; Yonggang Qi; Yi-Zhe Song; | code |
| 574 | CCS: Controllable and Constrained Sampling with Diffusion Models Via Initial Noise Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then provide both theoretical and empirical analyses to justify this linearity property of the input–output (noise → generation data) relationship. Inspired by these insights, we propose a novel **C**ontrollable and **C**onstrained **S**ampling (CCS) method, along with a new controller algorithm for diffusion models, that enables precise control over both (1) the proximity of individual samples to a target image and (2) the alignment of the sample mean with the target, while preserving high sample quality. |
Bowen Song; Zecheng Zhang; Zhaoxu Luo; Jason Hu; Wei Yuan; Jing Jia; Zhengxu Tang; Guanyang Wang; Liyue Shen; | code |
| 575 | HybridNorm: Towards Stable and Efficient Transformer Training Via Hybrid Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **HybridNorm**, a simple yet effective hybrid normalization strategy that integrates the advantages of both Pre-Norm and Post-Norm. |
Zhijian Zhuo; Yutao Zeng; Ya Wang; Sijun Zhang; Xiaoqing Li; Jian Yang; zhou Xun; Jinwen Ma; | code |
| 576 | HAODiff: Human-Aware One-Step Diffusion Via Dual-Prompt Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a triple-branch dual-prompt guidance (DPG), which leverages high-quality images, residual noise (LQ minus HQ), and HMB segmentation masks as training targets. |
Jue Gong; Tingyu Yang; Jingkai Wang; Zheng Chen; Xing Liu; Hong Gu; Yulun Zhang; Xiaokang Yang; | code |
| 577 | Angular Constraint Embedding Via SpherePair Loss for Constrained Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To avoid their respective pitfalls, we propose a novel angular constraint embedding approach for DCC, termed SpherePair. |
Shaojie Zhang; Ke Chen; | code |
| 578 | Exploiting Task Relationships in Continual Learning Via Transferability-Aware Task Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing CL strategies primarily focus on task models — either by regularizing model updates or by separating task-specific and shared components — while often overlooking the potential of leveraging inter-task relationships to enhance transfer. To address this gap, we propose a transferability-aware task embedding, termed H-embedding, and construct a hypernet framework under its guidance to learn task-conditioned model weights for CL tasks. |
Yanru Wu; Jianning Wang; Xiangyu Chen; Enming Zhang; Yang Tan; Hanbing Liu; Yang Li; | code |
| 579 | PID-controlled Langevin Dynamics for Faster Sampling on Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PID-controlled Langevin Dynamics (PIDLD), a novel sampling acceleration algorithm that reinterprets the sampling process using control-theoretic principles. |
Hongyi Chen; Jianhai Shu; Jingtao Ding; Yong Li; Xiao-Ping Zhang; | code |
| 580 | One-Step Diffusion-Based Image Compression with Semantic Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. |
Naifu Xue; Zhaoyang Jia; Jiahao Li; Bin Li; Yuan Zhang; Yan Lu; | code |
| 581 | Chiron-o1: Igniting Multimodal Large Language Models Towards Generalizable Medical Reasoning Via Mentor-Intern Collaborative Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches exhibit a deficiency in offering a comprehensive framework for searching and evaluating effective reasoning paths towards critical diagnosis. To address this challenge, we propose Mentor-Intern Collaborative Search (MICS), a novel reasoning-path searching scheme to generate rigorous and effective medical CoT data. |
Haoran Sun; Yankai Jiang; Wenjie Lou; Yujie Zhang; Wenjie Li; Lilong Wang; Mianxin Liu; Lei Liu; Xiaosong Wang; | code |
| 582 | MetaDefense: Defending Fine-tuning Based Jailbreak Attack Before and During Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces MetaDefense, a novel framework for defending against finetuning-based jailbreak attacks in large language models (LLMs). |
Weisen Jiang; Sinno Jialin Pan; | code |
| 583 | Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Direct3D-S2, a scalable 3D generation framework based on sparse volumes that achieves superior output quality with dramatically reduced training costs. |
Shuang Wu; Youtian Lin; Feihu Zhang; Yifei Zeng; Yikang Yang; yajie bao; Jiachen Qian; Siyu Zhu; Xun Cao; Philip Torr; Yao Yao; | code |
| 584 | Dual Prototype-Enhanced Contrastive Framework for Class-Imbalanced Graph Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, they face challenges arising from biased knowledge in the source graph and substantial domain distribution shifts. To remedy the above challenges, we propose a dual-branch prototype-enhanced contrastive framework for class-imbalanced graph domain adaptation in this paper. |
Xin Ma; Yifan Wang; Siyu Yi; Wei Ju; Junyu Luo; Yusheng Zhao; Xiao Luo; Jiancheng Lv; | code |
| 585 | SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional test-time adaptation (TTA) methods designed for ANNs often fail to address the unique computational dynamics of SNNs, such as sparsity and temporal spiking behavior. To address these challenges, we propose SPike-Aware Consistency Enhancement (SPACE), the first source-free and single-instance TTA method specifically designed for SNNs. |
Xinyu Luo; Kecheng Chen; Pao-Sheng Vincent Sun; Chris XING TIAN; Arindam Basu; Haoliang Li; | code |
| 586 | Process Vs. Outcome Reward: Which Is Better for Agentic RAG Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent methods like Search-R1, which rely on outcome-based reinforcement learning, face challenges such as low exploration efficiency, gradient conflict, and sparse reward signals. To tackle these limitations, we introduce ReasonRAG, a novel method that leverages RAG-ProGUIDE—a high-quality dataset providing fine-grained, process-level rewards for query generation, evidence extraction, and answer generation. |
wenlin zhang; Xiangyang Li; Kuicai Dong; Yichao Wang; Pengyue Jia; Xiaopeng Li; Yingyi Zhang; Derong Xu; Zhaocheng Du; Huifeng Guo; Ruiming Tang; Xiangyu Zhao; | code |
| 587 | Jacobian-Based Interpretation of Nonlinear Neural Encoding Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches remain limited in characterizing the brain’s inherently nonlinear response properties. To address this, we propose the Jacobian-based Nonlinearity Evaluation (JNE), an interpretability metric for nonlinear neural encoding models. |
Xiaohui Gao; Haoran Yang; Yue Cheng; Mengfei Zuo; Yiheng Liu; Peiyang Li; Xintao Hu; | code |
| 588 | Human-assisted Robotic Policy Refinement Via Action Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While Vision-Language-Action (VLA) models are widely recognized as the foundation model for such robotic deployment, their reliance on offline expert demonstrations critically limits their capacity for post-deployment refinement. To mitigate this limitation, we introduce Action Preference Optimization (APO), a method designed to refine VLA models by human-assisted preference alignment gathered through interaction with environments. |
Wenke Xia; Yichu Yang; Hongtao Wu; Xiao Ma; Tao Kong; Di Hu; | code |
| 589 | How Different from The Past? Spatio-Temporal Time Series Forecasting with Self-Supervised Deviation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These deviations contain critical signals that can significantly affect model performance. To fill this gap, we propose $\textbf{ST-SSDL}$, a $\underline{S}$patio-$\underline{T}$emporal time series forecasting framework that incorporates a $\underline{S}$elf-$\underline{S}$upervised $\underline{D}$eviation $\underline{L}$earning scheme to capture and utilize such deviations. |
Haotian Gao; Zheng Dong; Jiawei Yong; Shintaro Fukushima; Kenjiro Taura; Renhe Jiang; | code |
| 590 | S’MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. |
Hanqing Zeng; Yinglong Xia; Zhuokai Zhao; Chuan Jiang; Qiang Zhang; Jiayi Liu; Qunshu Zhang; Lizhu Zhang; Xiangjun Fan; Benyu Zhang; | code |
| 591 | ScSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method called scSplit that is cognizant of the severity of the above-mentioned mixing ratio. |
Ashesh; Florian Jug; | code |
| 592 | KaRF: Weakly-Supervised Kolmogorov-Arnold Networks-based Radiance Fields for Local Color Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing NeRF-based editing methods continue to face significant challenges in local region editing, which usually lead to imprecise local object boundaries, difficulties in maintaining multi-view consistency, and over-reliance on annotated data. To address these limitations, in this paper, we propose a novel weakly-supervised method called KaRF for local color editing, which facilitates high-fidelity and realistic appearance edits in arbitrary regions of 3D scenes. |
Wudi Chen; Zhiyuan Zha; Shigang Wang; Bihan Wen; Xin Yuan; Jiantao Zhou; zipei fan; Gang Yan; Ce Zhu; | code |
| 593 | Handling Label Noise Via Instance-Level Difficulty Modeling and Dynamic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods focus on isolating clean subsets or correcting noisy labels, facing limitations such as high computational costs, heavy hyperparameter tuning process, and coarse-grained optimization. To address these challenges, we propose a novel two-stage noisy learning framework that enables instance-level optimization through a dynamically weighted loss function, avoiding hyperparameter tuning. |
Kuan Zhang; Chengliang Chai; Jingzhe Xu; Chi Zhang; Han Han; Ye Yuan; Guoren Wang; Lei Cao; | code |
| 594 | When One Moment Isn’t Enough: Multi-Moment Retrieval with Cross-Moment Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on existing efforts in MMR, we propose a framework called FlashMMR.By revisiting the gap between current MR tasks and real-world applications, we introduce a high-quality datasets called QVHighlights Multi-Moment Dataset (QV-M$^2$), along with new evaluation metrics tailored for multi-moment retrieval (MMR). |
Zhuo Cao; Heming Du; Bingqing Zhang; Xin Yu; Xue Li; Sen Wang; | code |
| 595 | Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limits LoRA’s ability to capture token-specific information due to the inherent semantic differences among tokens. To address this limitation, we propose **Token-wise Projected Low-Rank Adaptation (TopLoRA)**, which dynamically adjusts LoRA weights according to the input token, thereby learning token-wise input-output projections in an end-to-end manner. |
Shiwei Li; Xiandi Luo; Haozhao Wang; Xing Tang; Ziqiang Cui; Dugang Liu; Yuhua Li; xiuqiang He; Ruixuan Li; | code |
| 596 | Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods primarily rely on the single coarse condition (e.g., skeleton sequences) as the intermediary to bridge the translation model and the video generation model, which limits both the naturalness and expressiveness of the generated videos. To overcome these limitations, we propose SignViP, a novel SLVG framework that incorporate multiple fine-grained conditions for improved generation fidelity. |
Cong Wang; Zexuan Deng; Zhiwei Jiang; Yafeng Yin; Fei Shen; Zifeng Cheng; Shiping Ge; Shiwei Gan; Qing Gu; | code |
| 597 | Mint: A Simple Test-Time Adaptation of Vision-Language Models Against Common Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate how corruptions affect CLIP’s image embeddings and uncover a consistent phenomenon we term as embedding variance collapse, where both intra-class and inter-class variances shrink as corruption severity increases. |
Wenxuan Bao; Ruxi Deng; Jingrui He; | code |
| 598 | Dependency Matters: Enhancing LLM Reasoning with Explicit Knowledge Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) often produce reasoning steps that are superficially coherent yet internally inconsistent, leading to unreliable outputs. Since such failures typically arise from implicit or poorly-grounded knowledge, we introduce \emph{Grounded Reasoning in Dependency (GRiD)}, a novel dependency-aware reasoning framework that explicitly grounds reasoning steps in structured knowledge. |
Xiangyu Wen; Min Li; Junhua Huang; Jianyuan Zhong; Zhijian Xu; Zeju Li; Yongxiang Huang; Mingxuan Yuan; Qiang Xu; | code |
| 599 | OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents $\mathbf{OLinear}$, a $\mathbf{linear}$-based multivariate time series forecasting model that operates in an $\mathbf{o}$rthogonally transformed domain. |
Wenzhen Yue; Yong Liu; Hao Wang; Haoxuan Li; Xianghua Ying; Ruohao Guo; Bowei Xing; Ji Shi; | code |
| 600 | 3D Interaction Geometric Pre-training for Molecular Relational Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel 3D geometric pre-training strategy for MRL (3DMRL) that incorporates a 3D virtual interaction environment, overcoming the limitations of costly traditional quantum mechanical calculation methods. |
Namkyeong Lee; Yunhak Oh; Heewoong Noh; Gyoung S. Na; Minkai Xu; Hanchen; Tianfan Fu; Chanyoung Park; | code |
| 601 | Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Compiler-R1, the first reinforcement learning (RL)-driven framework specifically augmenting LLM capabilities for compiler auto-tuning. |
Haolin Pan; Hongyu Lin; Haoran Luo; Yang Liu; Kaichun Yao; Libo Zhang; Mingjie Xing; Yanjun Wu; | code |
| 602 | SPARKE: Scalable Prompt-Aware Diversity and Novelty Guidance in Diffusion Models Via RKE Score Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend the diversity measure-based approaches by proposing the *S*calable *P*rompt-*A*ware *R*eny *K*ernel *E*ntropy Diversity Guidance (*SPARKE*) method for prompt-aware diversity guidance. |
Mohammad Jalali; Haoyu LEI; Amin Gohari; Farzan Farnia; | code |
| 603 | Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce CLIC (Compositionally-aware Learning in CLIP), a fine-tuning method based on a novel training technique combining multiple images and their associated captions. |
Amit Peleg; Naman Deep Singh; Matthias Hein; | code |
| 604 | Predictive Preference Learning from Human Interventions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although most interactive imitation learning methods focus on correcting the agent’s action at the current state, they do not adjust its actions in future states, which may be potentially more hazardous. To address this, we introduce Predictive Preference Learning from Human Interventions (PPL), which leverages the implicit preference signals contained in human interventions to inform predictions of future rollouts. |
Haoyuan Cai; Zhenghao Peng; Bolei Zhou; | code |
| 605 | Breaking The Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods fail to maintain both high compression and performance, and often rely on data. To address these challenges, we propose UltraDelta, the first data-free delta compression pipeline that achieves both ultra-high compression and strong performance. |
Xiaohui Wang; Peng Ye; Chenyu Huang; Shenghe Zheng; Bo Zhang; LEI BAI; Wanli Ouyang; Tao Chen; | code |
| 606 | Data Selection Matters: Towards Robust Instruction Tuning of Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet we observe that both full-data training and existing state-of-the-art data selection methods tend to inherit underlying dataset biases such as position bias and spurious correlations, leading to biased model behaviors. To address this issue, we introduce ARDS, a robustness-aware targeted visual instruction-selection framework that explicitly mitigates these weaknesses, sidestepping the need for access to downstream data or time-consuming gradient computation. |
Xu Yang; Chen Liu; Ying Wei; | code |
| 607 | Buffer Layers for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel paradigm based on the concept of a \textit{Buffer} layer, which addresses the fundamental limitations of normalization layer updates. |
Hyeongyu Kim; GeonHui Han; Dosik Hwang; | code |
| 608 | Learning Multi-Source and Robust Representations for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate over-regularization issues, we propose a novel Adaptive Regularization Optimization (ARO) method to manage and optimize a switch vector that selectively governs the updating process of each representation layer, which promotes the new task learning. |
Fei Ye; Yongcheng Zhong; Qihe Liu; Adrian G. Bors; Jingling sun; Rongyao Hu; shijie zhou; | code |
| 609 | OSTAR: Optimized Statistical Text-classifier with Adversarial Resistance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In that case, we propose OSTAR, a novel MGT detection framework designed for adversarial environments which composed of a statistical enhanced classifier and a Multi-Faceted Contrastive Learning(MFCL). |
Yuhan Yao; Feifei Kou; Lei Shi; Xiao yang; Zhongbao Zhang; Suguo Zhu; Jiwei Zhang; Lirong Qiu; LI Haisheng; | code |
| 610 | MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel model MoE-ization strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. |
Shen Yuan; Yin Zheng; Taifeng Wang; BINBINLIU; Hongteng Xu; | code |
| 611 | One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a null-text-null frequency-aware diffusion models, dubbed NTN-Diff, for text-guided image inpainting, by decomposing the semantics consistency across masked and unmasked regions into the consistencies as per each frequency band, while preserving the unmasked regions, to circumvent two challenges in a row. |
Haipeng Liu; Yang Wang; Meng Wang; | code |
| 612 | Learnable Burst-Encodable Time-of-Flight Imaging for High-Fidelity Long-Distance Depth Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel ToF imaging paradigm, termed Burst-Encodable Time-of-Flight (BE-ToF), which facilitates high-fidelity, long-distance depth imaging. |
Manchao Bao; Shengjiang Fang; Tao Yue; Xuemei Hu; | code |
| 613 | Anti-Aliased 2D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that these artifacts stem from two key limitations: the lack of frequency constraints in the representation and an ineffective screen-space clamping approach. To address these issues, we present AA-2DGS, an anti-aliased formulation of 2D Gaussian Splatting that maintains its geometric benefits while significantly enhancing rendering quality across different scales. |
Mae Younes; Adnane Boukhayma; | code |
| 614 | ADPretrain: Advancing Industrial Anomaly Detection Via Anomaly Representation Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel AD representation learning framework specially designed for learning robust and discriminative pretrained representa- tions for industrial anomaly detection. |
Xincheng Yao; Yan Luo; Zefeng Qian; Chongyang Zhang; | code |
| 615 | GPAS: Accelerating Convergence of LLM Pretraining Via Gradient-Preserving Activation Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While being stable during pretraining and scalable to large model sizes, Pre-LN suffers from an exponential growth in activation variance across layers, causing the shortcut to dominate over sub-layer outputs in the residual connection and limiting the learning capacity of deeper layers. To mitigate this issue, we propose Gradient-Preserving Activation Scaling (GPAS), a simple technique that can be used in combination with existing approaches. |
Tianhao Chen; Xin Xu; Zijing Liu; Pengxiang Li; Xinyuan Song; AJAY KUMAR JAISWAL; Fan Zhang; Jishan Hu; Yang Wang; Hao Chen; Shizhe Diao; Shiwei Liu; Yu Li; Lu Yin; Can Yang; | code |
| 616 | Data Efficient Adaptation in Large Language Models Via Continuous Low-Rank Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, con- ventional FT approaches often suffer from catastrophic forgetting and suboptimal data efficiency, limiting their real-world applicability. To address these challenges, this paper proposes DEAL, a novel framework that integrates Low-Rank Adapta- tion (LoRA) with a continuous fine-tuning strategy. |
Xiao Han; ZIMO ZHAO; Wanyu Wang; Maolin Wang; Zitao Liu; Yi Chang; Xiangyu Zhao; | code |
| 617 | Retrieval Is Not Enough: Enhancing RAG Through Test-Time Critique and Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reinterpret RAG as \textit{Retrieval-Augmented Reasoning} and identify a central but underexplored problem: \textit{Reasoning Misalignment}—the divergence between an LLM’s internal reasoning trajectory and the evidential constraints provided by retrieval. |
Jiaqi Wei; Hao Zhou; Xiang Zhang; Di Zhang; Zijie Qiu; Noah Wei; Jinzhe Li; Wanli Ouyang; Siqi Sun; | code |
| 618 | FRN: Fractal-Based Recursive Spectral Reconstruction Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Fractal-Based Recursive Spectral Reconstruction Network (FRN), which differs from existing paradigms that attempt to directly integrate the full-spectrum information from the R, G, and B channels in a one-shot manner. |
Ge Meng; Zhongnan Cai; Ruizhe Chen; Jingyan Tu; Yingying Wang; Yue Huang; Xinghao Ding; | code |
| 619 | The Emergence of Abstract Thought in Large Language Models Beyond Any Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find that LLMs progressively develop a core language-agnostic parameter space—a remarkably small subset of parameters whose deactivation results in significant performance degradation across all languages. |
Yuxin Chen; Yiran Zhao; Yang Zhang; An Zhang; Kenji Kawaguchi; Shafiq Joty; Junnan Li; Tat-Seng Chua; Michael Qizhe Shieh; Wenxuan Zhang; | code |
| 620 | Graphs Help Graphs: Multi-Agent Graph Socialized Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is crucial to determine with whom, what, and when to share and accumulate information for effective GSL. Thus, we propose the ”Graphs Help Graphs” (GHG) method to solve these issues. |
Jialu Li; Yu Wang; Pengfei Zhu; Wanyu Lin; Xinjie Yao; Qinghua Hu; | code |
| 621 | HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HBLLM, a wavelet-enhanced high-fidelity $1$-bit post-training quantization method for Large Language Models (LLMs). |
Ningning CHEN; Weicai Ye; Ying Jiang; | code |
| 622 | SEAL: Semantic-Aware Hierarchical Learning for Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches typically depend on either single-level semantics or manually designed abstract hierarchies, which limit their generalizability and scalability. To address these limitations, we introduce a SEmantic-aware hierArchical Learning framework (SEAL), guided by naturally occurring and easily accessible hierarchical structures. |
Zhenqi He; Yuanpei Liu; Kai Han; | code |
| 623 | Concept-Guided Interpretability Via Neural Chunking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We refer to this as the \textit{Reflection Hypothesis} and provide evidence for this phenomenon in both simple recurrent neural networks (RNNs) and complex large language models (LLMs). Building on this insight, we propose to leverage cognitively-inspired methods of \textit{chunking} to segment high-dimensional neural population dynamics into interpretable units that reflect underlying concepts. |
Shuchen Wu; Stephan Alaniz; Shyamgopal Karthik; Peter Dayan; Eric Schulz; Zeynep Akata; | code |
| 624 | For Better or for Worse, Transformers Seek Patterns for Memorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate memorization in transformer-based language models by analyzing their memorization dynamics during training over multiple epochs. |
Madhur Panwar; Gail Weiss; Navin Goyal; Antoine Bosselut; | code |
| 625 | Bidirectional Representations Augmented Autoregressive Biological Sequence Generation: Application in De Novo Peptide Sequencing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Non-Autoregressive (NAR) models offer holistic, bidirectional representations but face challenges with generative coherence and scalability. To transcend this, we propose a hybrid framework enhancing AR generation by dynamically integrating rich contextual information from non-autoregressive mechanisms. |
Xiang Zhang; Jiaqi Wei; Zijie Qiu; Sheng Xu; Zhi Jin; ZhiQiang Gao; Nanqing Dong; Siqi Sun; | code |
| 626 | Angular Steering: Behavior Control Via Rotation in Activation Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Angular Steering, a novel and flexible method for behavior modulation that operates by rotating activations within a fixed two-dimensional subspace. |
Hieu M. Vu; Tan Minh Nguyen; | code |
| 627 | SRA-CL: Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, existing approaches typically require predefined selection rules that impose strong assumptions, limiting the model’s ability to autonomously learn optimal contrastive pairs. To address these limitations, we propose a novel approach named Semantic Retrieval Augmented Contrastive Learning (SRA-CL). |
Ziqiang Cui; Yunpeng Weng; Xing Tang; Xiaokun Zhang; Shiwei Li; Peiyang Liu; Bowei He; Dugang Liu; Weihong Luo; xiuqiang He; Chen Ma; | code |
| 628 | Fine-grained List-wise Alignment for Generative Medication Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling drug-by-drug generation of drug lists. |
Chenxiao Fan; Chongming Gao; Wentao Shi; Yaxin Gong; Zhao Zihao; Fuli Feng; | code |
| 629 | DOVTrack: Data-Efficient Open-Vocabulary Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the detection task, we introduce a dynamic group contrastive learning approach that generates diverse sample groups through affinity, dispersion, and adversarial grouping strategies, tripling the effective training samples for classification while maintaining sample quality. |
Zekun Qian; Ruize Han; Zhixiang Wang; Junhui Hou; Wei Feng; | code |
| 630 | X-Mahalanobis: Transformer Feature Mixing for Reliable OOD Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a straightforward feature mixing approach for pre-trained Transformers, which combines multi-layer representations via calculated importance weights, and identifies OOD samples using Mahalanobis distance in the blended feature space. |
Tong Wei; Bo-Lin Wang; Jiang-Xin Shi; Yu-Feng Li; Min-Ling Zhang; | code |
| 631 | Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, directly applying decentralized approaches to FCL suffers from ineffective group formation caused by task changes. To address these challenges, we propose a decentralized dynamic cooperation framework for FCL, where clients establish dynamic cooperative learning coalitions to balance the acquisition of new knowledge and the retention of prior learning, thereby obtaining personalized models. |
Danni Yang; Zhikang Chen; Sen Cui; Mengyue Yang; Ding Li; Abudukelimu Wuerkaixi; Haoxuan Li; Jinke Ren; Mingming Gong; | code |
| 632 | Learning Crossmodal Interaction Patterns Via Attributed Bipartite Graphs for Single-Cell Omics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present a novel framework which reformulates crossmodal matching as a graph classification task on Attributed Bipartite Graphs (ABGs). |
Xiaotang Wang; Xuanwei Lin; Yun Zhu; Hao Li; Yongqi Zhang; | code |
| 633 | Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify two main sources of anomaly in existing graph classification backdoor methods: structural deviation from rare subgraph triggers and semantic deviation caused by label flipping, both of which make poisoned graphs easily detectable by anomaly detection models. To address this, we propose DPSBA, a clean-label backdoor framework that learns in-distribution triggers via adversarial training guided by anomaly-aware discriminators. |
Xiaobao Wang; Ruoxiao Sun; Yujun Zhang; Bingdao Feng; Dongxiao He; Luzhi Wang; Di Jin; | code |
| 634 | C-NAV: Towards Self-Evolving Continual Object Navigation in Open World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle this challenge, we propose C-Nav, a continual visual navigation framework that integrates two key innovations: (1) A dual-path anti-forgetting mechanism, which comprises feature distillation that aligns multi-modal inputs into a consistent representation space to ensure representation consistency, and feature replay that retains temporal features within the action decoder to ensure policy consistency.To facilitate related studies, we introduce the continual object navigation benchmark, which requires agents to acquire navigation skills for new object categories while avoiding catastrophic forgetting of previously learned knowledge. |
MingMing Yu; Fei Zhu; wenzhuo liu; Yirong Yang; Qunbo Wang; wenjun wu; Jing Liu; | code |
| 635 | Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel conformal prediction framework for constructing conformal prediction bands with high probability around biomarker trajectories observed at subject-specific, randomly-timed follow-up visits. |
Vasiliki Tassopoulou; Charis Stamouli; Haochang Shou; George J. Pappas; Christos Davatzikos; | code |
| 636 | COME: Adding Scene-Centric Forecasting Control to Occupancy World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. |
Yining Shi; Kun Jiang; Qiang Meng; Ke Wang; Jiabao Wang; Wenchao Sun; Tuopu Wen; mengmeng yang; Diange Yang; | code |
| 637 | TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, existing methods often conflate the time-varying and time-invariant components, and jointly learn the combined long-term patterns and short-term fluctuations, leading to suboptimal performance facing distribution shifts. To address this issue, we initiatively propose a lightweight static-dynamic decomposition framework, TimeEmb, for time series forecasting. |
Mingyuan Xia; Chunxu Zhang; Zijian Zhang; Hao Miao; Qidong Liu; Yuanshao Zhu; Bo Yang; | code |
| 638 | AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AReaL, a fully asynchronous RL system that completely decouples generation from training. |
Wei Fu; Jiaxuan Gao; Xujie Shen; Chen Zhu; Zhiyu Mei; Chuyi He; Shusheng Xu; Guo Wei; Jun Mei; WANG JIASHU; Tongkai Yang; Binhang Yuan; Yi Wu; | code |
| 639 | SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, to convert sparse delayed rewards into denser intermediate signals that improve multi-step reasoning, we propose Tree-based group relative policy optimization (**Tree-GRPO**) integrates Monte Carlo Tree Search into GRPO. |
Wanxin Tian; Shijie Zhang; Kevin Zhang; Xiaowei Chi; Chun-Kai Fan; Junyu Lu; Yulin Luo; Qiang Zhou; Yiming Zhao; Ning Liu; Siyu Lin; Zhiyuan Qin; Xiaozhu Ju; Shanghang Zhang; Jian Tang; | code |
| 640 | StruDiCO: Structured Denoising Diffusion with Gradient-free Inference-stage Boosting for Memory and Time Efficient Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches fail to reveal how variables are progressively determined during inference, making the final solution opaque until the last step. To address this limitation, we propose a structured denoising diffusion model, StruDiCO, which incrementally constructs solutions through step-wise variable selection. |
Yu Wang; Yang Li; Junchi Yan; Yi Chang; | code |
| 641 | FedRTS: Federated Robust Pruning Via Combinatorial Thompson Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods use dynamic pruning to improve efficiency by periodically adjusting sparse model topologies while maintaining sparsity, these approaches suffer from issues such as **greedy adjustments**, **unstable topologies**, and **communication inefficiency**, resulting in less robust models and suboptimal performance under data heterogeneity and partial client availability. To address these challenges, we propose **Fed**erated **R**obust pruning via combinatorial **T**hompson **S**ampling (FedRTS),a novel framework designed to develop robust sparse models. |
Hong Huang; Jinhai Yang; Yuan Chen; Jiaxun Ye; Dapeng Wu; | code |
| 642 | Counterfactual Implicit Feedback Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate the implicit feedback problem as a counterfactual estimation problem with missing treatment variables. |
Chuan Zhou; Lina Yao; Haoxuan Li; Mingming Gong; | code |
| 643 | Venus-MAXWELL: Efficient Learning of Protein-Mutation Stability Landscapes Using Protein Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes the Venus-MAXWELL framework, which reformulates mutation $\Delta \Delta G$ prediction as a sequence-to-landscape task. |
Yuanxi Yu; Fan Jiang; Xinzhu Ma; Liang Zhang; Bozitao Zhong; Wanli Ouyang; Guisheng Fan; Huiqun Yu; Liang Hong; Mingchen Li; | code |
| 644 | Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Self-Supervised Selective-Guided Diffusion (SSDiff), which leverages pseudo-reference faces generated by a pre-trained diffusion model under weak guidance. |
Wenjie Li; Xiangyi Wang; Heng Guo; Guangwei Gao; Zhanyu Ma; | code |
| 645 | Breaking The Discretization Barrier of Continuous Physics Simulation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we propose CoPS, a purely data-driven methods, to effectively model continuous physics simulation from partial observations. |
Fan Xu; Hao Wu; Nan Wang; Lilan Peng; Kun Wang; Wei Gong; Xibin Zhao; | code |
| 646 | Glocal Information Bottleneck for Time Series Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This reveals a critical optimization dilemma: current objectives lack global guidance, leading models to overfit local noise and fail to capture global information of the data. To address this issue, we propose a new training paradigm, **Glocal** **I**nformation **B**ottleneck (**Glocal-IB**). |
Jie Yang; Kexin Zhang; Guibin Zhang; Philip S. Yu; Kaize Ding; | code |
| 647 | MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose Logos, a method that combines curriculum reinforcement fine-tuning to encourage models to generate logic-consistent reasoning chains by stepwise reducing learning difficulty, and collaborative hint inference to reduce reasoning complexity.To address this, we propose the MIRAGE benchmark, which isolates reasoning hallucinations by constructing questions where input images are correctly perceived by MLLMs yet reasoning errors persist. |
Bowen Dong; Minheng Ni; Zitong Huang; Guanglei Yang; Wangmeng Zuo; Lei Zhang; | code |
| 648 | DAAC: Discrepancy-Aware Adaptive Contrastive Learning for Medical Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Medical time-series data play a vital role in disease diagnosis but suffer from limited labeled samples and single-center bias, which hinder model generalization and lead to overfitting. To address these challenges, we propose DAAC (Discrepancy-Aware Adaptive Contrastive learning), a learnable multi-view contrastive framework that integrates external normal samples and enhances feature learning through adaptive contrastive strategies. |
Yifan WANG; Hongfeng Ai; liruiqi; Maowei Jiang; Quangao Liu; Jiahua Dong; Ruiyuan Kang; Alan Liang; Zihang Wang; Ruikai Liu; Cheng Jiang; Chenzhong Li; | code |
| 649 | Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our theoretical investigation of ODE- and SDE-based solvers reveals complementary weaknesses: ODE solvers accumulate irreducible gradient error along deterministic trajectories, while SDE methods suffer from amplified discretization errors when the step budget is limited. Building upon this insight, we introduce AdaSDE, a novel single-step SDE solver that aims to unify the efficiency of ODEs with the error resilience of SDEs. |
Ruoyu Wang; Beier Zhu; Junzhi Li; Liangyu Yuan; Chi Zhang; | code |
| 650 | Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and The Human Brain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key question is whether LLM behavioral capabilities stem from mechanisms akin to those in the human brain. To address these questions, we introduce the Hierarchical Frequency Tagging Probe (HFTP), a tool that utilizes frequency-domain analysis to identify neuron-wise components of LLMs (e.g., individual Multilayer Perceptron (MLP) neurons) and cortical regions (via intracranial recordings) encoding syntactic structures. |
Jingmin An; Yilong Song; Ruolin Yang; Nai Ding; Lingxi Lu; Yuxuan Wang; Wei Wang; Chu Zhuang; Qian Wang; Fang Fang; | code |
| 651 | Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the substantial empirical gains demonstrated by RL-based training methods like GRPO, a granular understanding of why and how RL enhances performance is still lacking. To bridge this gap, we introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions: (1) plan following and execution, (2) knowledge integration, and (3) chain of subproblems. |
Jiayu Wang; Yifei Ming; Zixuan Ke; Caiming Xiong; Shafiq Joty; Aws Albarghouthi; Frederic Sala; | code |
| 652 | Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a physics-driven AI-generated video detection paradigm based on probability flow conservation principles. |
Shuhai Zhang; ZiHao Lian; Jiahao Yang; Daiyuan Li; Guoxuan Pang; Feng Liu; Bo Han; Shutao Li; Mingkui Tan; | code |
| 653 | Deno-IF: Unsupervised Noisy Visible and Infrared Image Fusion Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel unsupervised noisy visible and infrared image fusion method, comprising two key modules. |
Han Xu; Yuyang Li; Yunfei Deng; Jiayi Ma; Guangcan Liu; | code |
| 654 | Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \textbf{Value-guided Inference with Margin-based Reward (ViMaR)}, a two-stage inference framework that improves both efficiency and output fidelity by combining a temporal-difference value model with a margin-aware reward adjustment. |
Ankan Deria; Adinath Madhavrao Dukre; Feilong Tang; Sara Atito; Sudipta Roy; Muhammad Awais; Muhammad Haris Khan; Imran Razzak; | code |
| 655 | Keep It on A Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the distribution of the unlabeled data is generally unknown and may follow an arbitrary distribution. To tackle this challenge, we propose a Controllable Pseudo-label Generation (CPG) framework, expanding the labeled dataset with the progressively identified reliable pseudo-labels from the unlabeled dataset and training the model on the updated labeled dataset with a known distribution, making it unaffected by the unlabeled data distribution. |
Yaxin Hou; Bo Han; Yuheng Jia; Hui LIU; Junhui Hou; | code |
| 656 | Improving Diffusion-based Inverse Algorithms Under Few-Step Constraint Via Linear Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we begin with an analysis of ODE solvers for inverse problems that reveals a linear combination structure of approximations for the inverse trajectory. |
Jiawei Zhang; Ziyuan Liu; Leon Yan; Gen Li; Yuantao Gu; | code |
| 657 | Flatten Graphs As Sequences: Transformers Are Scalable Graph Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AutoGraph, a scalable autoregressive model for attributed graph generation using decoder-only transformers. |
Dexiong Chen; Markus Krimmel; Karsten Borgwardt; | code |
| 658 | OSKAR: Omnimodal Self-supervised Knowledge Abstraction and Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present OSKAR, the first multimodal foundation model based on bootstrapped latent feature prediction. |
Mohamed O Abdelfattah; Kaouther Messaoud; Alexandre Alahi; | code |
| 659 | Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action-Guided Diffusion Policy (DP-AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. |
Jing Wang; Weiting Peng; Jing Tang; Zeyu Gong; Xihua Wang; Bo Tao; Li cheng; | code |
| 660 | ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates the need for a generative model capable of designing diverse sequences while preserving structural consistency. To address this trade-off, we introduce ProtInvTree, the first reward-guided tree-search framework for protein inverse folding. |
Mengdi Liu; Xiaoxue Cheng; Zhangyang Gao; Hong Chang; Cheng Tan; Shiguang Shan; Xilin Chen; | code |
| 661 | PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, its effectiveness is often limited by two critical challenges: the reliance on extensive human input and the inherent difficulties in resolving query ambiguity and credit assignment during reward learning. In this paper, we introduce PRIMT, a PbRL framework designed to overcome these challenges by leveraging foundation models (FMs) for multimodal synthetic feedback and trajectory synthesis. |
Ruiqi Wang; Dezhong Zhao; Ziqin Yuan; Tianyu Shao; Guohua Chen; Dominic Kao; Sungeun Hong; Byung-Cheol Min; | code |
| 662 | Spectral Compressive Imaging Via Chromaticity-Intensity Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the captured radiance inherently depends on scene illumination, making it difficult to recover the intrinsic spectral reflectance that remains invariant to lighting conditions. To address these challenges, we propose a chromaticity-intensity decomposition framework, which disentangles an HSI into a spatially smooth intensity map and a spectrally variant chromaticity cube. |
Xiaodong Wang; Zijun He; Ping Wang; Lishun Wang; Yanan Hu; Xin Yuan; | code |
| 663 | OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, they typically require training separate models for different compression bit-rates, leading to significant training and storage costs. To address these challenges, we propose a one-step diffusion codec across multiple bit-rates. |
Jinpei Guo; Yifei Ji; Zheng Chen; Kai Liu; Min Liu; Wang Rao; Wenbo Li; Yong Guo; Yulun Zhang; | code |
| 664 | MixSignGraph: A Sign Sequence Is Worth Mixed Graphs of Nodes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we emphasize that although capturing cross-region dependencies can improve sign language performance, it may degrade the representation quality of local regions. To mitigate this, we introduce MixSignGraph, which represents sign sequences as a group of mixed graphs for feature extraction. |
Shiwei Gan; Yafeng Yin; Zhiwei Jiang; Lei Xie; Sanglu Lu; Hongkai Wen; | code |
| 665 | Correlated Low-Rank Adaptation for ConvNets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, directly applying existing LoRA techniques to convolutional networks (ConvNets) yields unsatisfactory results due to the high correlation between the stacked sequential layers of ConvNets. To overcome this challenge, we introduce a novel framework called Correlated Low-Rank Adaptation (CoLoRA), which explicitly utilizes correlated low-rank matrices to model the inter-layer dependencies among convolutional layers. |
Wu Ran; Weijia Zhang; ShuYang Pang; Qi Zhu; Jinfan Liu; JingSheng Liu; Xin Cao; Qiang Li; Yichao Yan; Chao Ma; | code |
| 666 | BrainOmni: A Brain Foundation Model for Unified EEG and MEG Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes BrainOmni, the first brain foundation model that generalises across heterogeneous EEG and MEG recordings. |
Qinfan Xiao; Ziyun Cui; Chi Zhang; SiQi Chen; Wen Wu; Andrew Thwaites; Alexandra Woolgar; Bowen Zhou; Chao Zhang; | code |
| 667 | Private Training Large-scale Models with Efficient DP-SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces FlashDP, an innovative cache-friendly per-layer DP-SGD that consolidates necessary operations into a single task, calculating gradients only once in a fused manner. |
Liangyu Wang; Junxiao Wang; Jie Ren; Zihang Xiang; David E. Keyes; Di Wang; | code |
| 668 | Enhancing Temporal Understanding in Video-LLMs Through Stacked Temporal Attention in Vision Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Video-LLM architecture that introduces stacked temporal attention modules directly within the vision encoder. |
Ali Rasekh; Erfan Bagheri Soula; Omid Daliran; Simon Gottschalk; Mohsen Fayyaz; | code |
| 669 | Shape It Up! Restoring LLM Safety During Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose dynamic safety shaping (DSS), a dynamic shaping framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content. |
ShengYun Peng; Pin-Yu Chen; Jianfeng Chi; Seongmin Lee; Duen Horng Chau; | code |
| 670 | PIVNO: Particle Image Velocimetry Neural Operator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional cross-correlation methods and deep learning-based feature matching approaches often struggle with ambiguity, limited resolution, and generalization to real-world conditions. To address these challenges, we propose a PIV Neural Operator (PIVNO) framework that directly approximates the inverse mapping from paired particle images to flow fields within a function space. |
Jie Xu; Xuesong Zhang; Jing Jiang; Qinghua Cui; | code |
| 671 | Analogy-based Multi-Turn Jailbreak Against Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, even when harmful content is generated, the response often fails to align with the malicious prompt due to semantic drift, where the conversation slowly moves away from its intended goal. To address these challenges, we propose an analogy-based black-box multi-turn jailbreak framework that constructs fully benign contexts to improve attack success rate while ensuring semantic alignment with the malicious intent. |
Mengjie Wu; Yihao Huang; Zhenjun Lin; Kangjie Chen; Yuyang zhang; Yuhan Huang; Run Wang; Lina Wang; | code |
| 672 | Rethinking Nighttime Image Deraining Via Learnable Color Space Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink the task of nighttime image deraining and contribute a new high-quality benchmark, HQ-NightRain, which offers higher harmony and realism compared to existing datasets. |
Qiyuan Guan; Xiang Chen; Guiyue Jin; Jiyu Jin; Shumin Fan; Tianyu Song; Jinshan Pan; | code |
| 673 | WMCopier: Forging Invisible Watermarks on Arbitrary Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose WMCopier, an effective watermark forgery attack that operates without requiring any prior knowledge of or access to the target watermarking algorithm. |
Ziping Dong; Chao Shuai; Zhongjie Ba; Peng Cheng; Zhan Qin; Qinglong Wang; Kui Ren; | code |
| 674 | DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce **Dual-Conditional Inversion (DCI)**, a novel framework that jointly conditions on the source prompt and reference image to guide the inversion process. |
Zixiang Li; Haoyu Wang; Wei Wang; Chuangchuang Tan; Yunchao Wei; Yao Zhao; | code |
| 675 | QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose \textbf{QiMeng-MuPa}, a novel \textbf{Mu}tual-Supervised Learning framework for Sequential-to-\textbf{Pa}rallel code translation, to address the functional equivalence issue. |
Changxin Ke; Rui Zhang; Shuo Wang; Li Ding; Guangli Li; Yuanbo Wen; Shuoming Zhang; Ruiyuan Xu; Jin Qin; Jiaming Guo; Chenxi Wang; Ling Li; Qi Guo; Yunji Chen; | code |
| 676 | Evaluating Robustness of Monocular Depth Estimation with Procedural Scene Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce PDE (Procedural Depth Evaluation), a new benchmark which enables systematic evaluation of robustness to changes in 3D scene content. |
John Nugent; Siyang Wu; Zeyu Ma; Beining Han; Meenal Parakh; Abhishek Joshi; Lingjie Mei; Alexander Raistrick; Xinyuan Li; Jia Deng; | code |
| 677 | AttentionPredictor: Temporal Patterns Matter for KV Cache Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often struggle to accurately determine critical tokens as they neglect the *temporal patterns* in attention scores, resulting in a noticeable degradation in LLM performance. To address this challenge, we propose **AttentionPredictor**, which is the **first learning-based method to directly predict attention patterns for KV cache compression and critical token identification**. |
Qingyue Yang; Jie Wang; Xing Li; Zhihai Wang; Chen Chen; Lei Chen; Xianzhi Yu; Wulong Liu; Jianye HAO; Mingxuan Yuan; Bin Li; | code |
| 678 | FAST: Foreground‑aware Diffusion with Accelerated Sampling Trajectory for Segmentation‑oriented Anomaly Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FAST, a foreground-aware diffusion framework featuring two novel modules: the Anomaly-Informed Accelerated Sampling (AIAS) and the Foreground-Aware Reconstruction Module (FARM). |
Xichen Xu; Yanshu Wang; Jinbao Wang; XiaoNing Lei; Guoyang Xie; GUANNAN JIANG; Zhichao Lu; | code |
| 679 | WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present WeatherPrompt, a multi-modality learning paradigm that establishes weather-invariant representations through fusing the image embedding with the text context. |
Jiahao Wen; Hang Yu; Zhedong Zheng; | code |
| 680 | MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MINGLE, a novel framework for TTCMM. |
Zihuan Qiu; Yi Xu; Chiyuan He; Fanman Meng; Linfeng Xu; Qingbo Wu; Hongliang Li; | code |
| 681 | MATCH: Multi-faceted Adaptive Topo-Consistency for Semi-Supervised Histopathology Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is particularly challenging in histopathology image analysis, where objects are densely distributed. To address this issue, we propose a semi-supervised segmentation framework designed to robustly identify and preserve relevant topological features. |
Meilong Xu; Xiaoling Hu; Shahira Abousamra; Chen Li; Chao Chen; | code |
| 682 | Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework LENS for synthesizing preference data directly in the LLM’s latent embedding space. |
Leitian Tao; Xuefeng Du; Sharon Li; | code |
| 683 | NaDRO: Leveraging Dual-Reward Strategies for LLMs Training on Noisy Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it often relies on large-scale, high-quality labeled data, which is typically difficult to obtain. To address this challenge, we introduce the Noise-Aware Dual-Reward Optimization (NaDRO) , which effectively enhances LLMs training in environments where data is noisy or imperfect. |
Haolong Qian; Xianliang Yang; Ling Zhang; Lei Song; Jiang Bian; Chun Yuan; | code |
| 684 | Towards Accurate Time Series Forecasting Via Implicit Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given that real-world time series typically consist of various long short-term dynamics, independent predictions over individual time points may fail to express complex underlying patterns and can lead to a lack of global views. To address these issues, this work explores new perspectives from the forecasting phase and proposes a novel Implicit Forecaster (IF) as an additional decoding module. |
Xinyu Li; Yuchen Luo; Hao Wang; Haoxuan Li; Liuhua Peng; Feng Liu; Yandong Guo; Kun Zhang; Mingming Gong; | code |
| 685 | Learning Without Augmenting: Unsupervised Time Series Representation Learning Via Frame Projections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an unsupervised representation learning method that replaces augmentations by generating views using orthonormal bases and overcomplete frames. |
Berken Utku Demirel; Christian Holz; | code |
| 686 | Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce TIRE (Track, Inpaint, REsplat), a novel method for subject-driven 3D/4D generation. |
Shuhong Zheng; Ashkan Mirzaei; Igor Gilitschenski; | code |
| 687 | From Pixels to Views: Learning Angular-Aware and Physics-Consistent Representations for Light Field Microscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address these challenges by introducing three key contributions.First, we construct the XLFM-Zebrafish benchmark, a large-scale dataset and evaluation suite for XLFM reconstruction. |
Feng He; Guodong Tan; Qiankun Li; Jun Yu; Quan Wen; | code |
| 688 | EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that such inefficiency stems from the sequential execution of layers, which is seemingly natural but actually unnecessary. Therefore, we propose EasySpec, a layer-parallel speculation strategy that optimizes the efficiency of multi-GPU utilization. |
Yize Wu; KE GAO; Ling Li; Yanjun Wu; | code |
| 689 | Preserving LLM Capabilities Through Calibration Data Curation: From Analysis to Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More systematic research is still needed to examine the impacts on different LLM capabilities in terms of compositional properties and domain correspondence of calibration data. In this work, we aim at bridging this gap and further analyze underlying influencing mechanisms from the activation pattern perspective. |
Bowei He; Lihao Yin; Huiling Zhen; Shuqi LIU; Han Wu; Xiaokun Zhang; Mingxuan Yuan; Chen Ma; | code |
| 690 | FlyLoRA: Boosting Task Decoupling and Parameter Efficiency Via Implicit Rank-Wise Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the fly olfactory circuit, we propose FlyLoRA, an implicit MoE-based LoRA variant that introduces: (1) rank-wise expert activation in the up-projection matrix, and (2) an implicit router that unifies expert routing and down-projection, where a frozen sparse random projection matrix replaces the traditional dense trainable version. |
Heming Zou; Yunliang Zang; Wutong Xu; Yao Zhu; Xiangyang Ji; | code |
| 691 | Not All Data Are Good Labels: On The Self-supervised Labeling for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: During the optimization of a simple reconstruction network, intermediates are used as pseudo labels in a self-supervised paradigm, improving generalization for any predictor. We introduce the Self-Correction with Adaptive Mask (SCAM), which discards overfitted components and selectively replaces them with pseudo labels generated from reconstructions. |
Yuxuan Yang; Dalin Zhang; Yuxuan Liang; Hua Lu; Gang Chen; Huan Li; | code |
| 692 | ReDi: Rectified Discrete Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This reliance on a multi-step process originates from the factorization approximation of DFMs, which is necessary for handling high-dimensional data. In this paper, we analyze the factorization approximation error using Conditional Total Correlation (TC), and reveal its dependence on the coupling. |
Jaehoon Yoo; Wonjung Kim; Seunghoon Hong; | code |
| 693 | Efficient Multimodal Dataset Distillation Via Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce EDGE, a generative distillation method for efficient multimodal dataset distillation. |
Zhenghao Zhao; Haoxuan Wang; Junyi Wu; Yuzhang Shang; Gaowen Liu; Yan Yan; | code |
| 694 | StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks By Style Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, present methods show limited transferability across models, making them less effective against unknown text-to-image models. To address these issues, we propose a novel anti-mimicry method, StyleGuard. |
Yanjie Li; Wenxuan Zhang; Xinqi LYU; Yihao LIU; Bin Xiao; | code |
| 695 | LightFair: Towards An Efficient Alternative for Fair T2I Diffusion Via Debiasing Pre-trained Text Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To finetune the text embedding, we propose a collaborative distance-constrained debiasing strategy that balances embedding distances to improve fairness without auxiliary references. |
Boyu Han; Qianqian Xu; Shilong Bao; Zhiyong Yang; Kangli Zi; Qingming Huang; | code |
| 696 | GoRA: Gradient-driven Adaptive Low Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze and identify the core limitations of existing approaches and propose a novel framework—**GoRA** (**G**radient-driven Adaptive L**o**w **R**ank **A**daptation)—that simultaneously adapts both the rank and initialization strategy within a unified framework. |
haonan he; Peng Ye; Yuchen Ren; yuan yuan; LuyangZhou; ShucunJu; lei chen; | code |
| 697 | Association-Focused Path Aggregation for Graph Fraud Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing graph fraud detectors are limited by their narrow receptive fields, as they focus only on the relations between an entity and its neighbors while neglecting longer-range structural associations hidden between entities. To address this issue, we propose a novel fraud detector based on Graph Path Aggregation (GPA). |
Tian Qiu; Wenda Li; Zunlei Feng; Jie Lei; Tao Wang; Yi Gao; Mingli Song; Yang Gao; | code |
| 698 | Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill this research gap, in this work, we propose the concept of hyper-meta-path for heterogeneous hypergraphs, which is defined as the composition of a sequence of hyper-relations.To facilitate the reproducibility of this work, we provide our dataset as well as anonymized source code at: https://github.com/zhengziyu77/HHNN. |
Yaming Yang; Ziyu Zheng; Weigang Lu; Zhe Wang; Xinyan Huang; Wei Zhao; Ziyu Guan; | code |
| 699 | OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the open-world mobile manipulation (OWMM) task remains a challenge due to the need for generalization to open-ended instructions and environments, as well as the systematic complexity to integrate high-level decision making with low-level robot control based on both global scene understanding and current agent state. To address this complexity, we propose a novel multi-modal agent architecture that maintains multi-view scene frames and agent states for decision-making and controls the robot by function calling. |
Junting Chen; Haotian Liang; Lingxiao Du; Weiyun Wang; Mengkang Hu; Yao Mu; Wenhai Wang; Jifeng Dai; Ping Luo; Wenqi Shao; Lin Shao; | code |
| 700 | ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ToxicTextCLIP, a framework for generating high-quality adversarial texts that target CLIP during the pre-training phase. |
Xin Yao; Haiyang Zhao; Yimin Chen; Jiawei Guo; Kecheng Huang; Ming Zhao; | code |
| 701 | Towards General Continuous Memory for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically show that this design improves performance on complex multimodal reasoning tasks. Building on this, we introduce a data-efficient and parameter-efficient method to fine-tune the VLM into a memory encoder, requiring only 1.2\% of the model’s parameters and a small corpus of 15.6K self-synthesized samples. |
Wenyi WU; Zixuan Song; Kun Zhou; Yifei Shao; Zhiting Hu; Biwei Huang; | code |
| 702 | Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce \textbf{S}pectrum-Aware \textbf{T}est-Time \textbf{S}teering (\textbf{STS}), a \textit{lightweight adaptation framework} that extracts a spectral subspace from the textual embeddings to define principal semantic directions, and learns to steer latent representations in a spectrum-aware manner by adapting a small number of per-sample shift parameters to minimize entropy across augmented views. |
Konstantinos M. Dafnis; Dimitris N. Metaxas; | code |
| 703 | Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Time-Conditioned Contraction Matching (TCCM), a novel method for semi-supervised anomaly detection in tabular data. |
Zhong Li; Qi Huang; Yuxuan Zhu; Lincen Yang; Mohammad Mohammadi Amiri; Niki van Stein; Matthijs van Leeuwen; | code |
| 704 | Neural MJD: Neural Non-Stationary Merton Jump Diffusion for Time Series Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Neural MJD, a neural network based non-stationary Merton jump diffusion (MJD) model. |
Yuanpei Gao; Qi Yan; Yan Leng; Renjie Liao; | code |
| 705 | EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose EfficientNav to enable on-device efficient LLM-based zero-shot ObjNav. |
Zebin Yang; Sunjian Zheng; Tong Xie; Tianshi Xu; Bo Yu; Fan Wang; Jie Tang; Shaoshan Liu; Meng Li; | code |
| 706 | Grasp2Grasp: Vision-Based Dexterous Grasp Translation Via Schrödinger Bridges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach to vision-based dexterous grasp translation, which aims to transfer grasp intent across robotic hands with differing morphologies. |
Tao Zhong; Jonah Buchanan; Christine Allen-Blanchette; | code |
| 707 | Versatile Transferable Unlearnable Example Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that for broad applicability, UEs should maintain their effectiveness across diverse application scenarios. |
Zhihao Li; Jiale Cai; Gezheng Xu; Hao Zheng; Qiuyue Li; Fan Zhou; Shichun Yang; Charles Ling; Boyu Wang; | code |
| 708 | Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike previous EHR synthesis methods—which typically generate medical records consisting of expert-chosen features (e.g., a few vital signs, structured codes only)—we introduce RawMed, the first framework to synthesize multi-table, time-series EHR data that closely resembles raw EHRs. |
Eunbyeol Cho; Jiyoun Kim; Minjae Lee; Sungjin Park; Edward Choi; | code |
| 709 | TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an efficient solver for transposable N:M masks that scales to billion-parameter models. |
Xiang Meng; Mehdi Makni; Rahul Mazumder; | code |
| 710 | EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we introduce **EnzyBind**, a dataset with 11,100 experimentally validated enzyme-substrate pairs specifically curated from PDBbind. Building on this, we propose **EnzyControl**, a method that enables functional and substrate-specific control in enzyme backbone generation. |
Chao Song; Zhiyuan Liu; Han Huang; Liang Wang; Qiong Wang; Jian-Yu Shi; Hui Yu; Yihang Zhou; Yang Zhang; | code |
| 711 | PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous methods attempt to address this by aggregating spatio-temporal information but face a fundamental trade-off: limited temporal modeling provides only modest gains, whereas capturing long-range dependencies significantly increases computational cost. To address this limitation, we introduce a memory buffer for modeling long-range spatio-temporal consistency while achieving efficient dynamic stereo matching. |
Yun Wang; Qiaole Dong; Yongjian Zhang; Tin Lun Lam; Yanwei Fu; Dapeng Wu; Junjie Hu; | code |
| 712 | HyGen: Efficient LLM Serving Via Elastic Online-Offline Request Co-location Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces HyGen, an interference-aware LLM serving system that enables efficient co-location of online and offline workloads while preserving SLOs. |
Ting Sun; Penghan Wang; Fan Lai; | code |
| 713 | ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose ThermalGen, an adaptive flow-based generative model for RGB-T image translation, incorporating an RGB image conditioning architecture and a style-disentangled mechanism.To support large-scale training, we curated eight public satellite-aerial, aerial, and ground RGB-T paired datasets, and introduced three new large-scale satellite-aerial RGB-T datasets–DJI-day, Bosonplus-day, and Bosonplus-night–captured across diverse times, sensor types, and geographic regions. |
Jiuhong Xiao; Roshan Nayak; Ning Zhang; Daniel Toertei; Giuseppe Loianno; | code |
| 714 | GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **GPSToken**, a novel **G**aussian **P**arameterized **S**patially-adaptive **Token**ization framework, to achieve non-uniform image tokenization by leveraging parametric 2D Gaussians to dynamically model the shape, position, and textures of different image regions. |
Zhengqiang ZHANG; Rongyuan Wu; Lingchen Sun; Lei Zhang; | code |
| 715 | Neural B-frame Video Compression with Bi-directional Reference Harmonization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To optimize reference information utilization, we propose a novel NBVC method, termed Bi-directional Reference Harmonization Video Compression (BRHVC), with the proposed Bi-directional Motion Converge (BMC) and Bi-directional Contextual Fusion (BCF). |
Yuxi Liu; Dengchao Jin; Shuai Huo; Jiawen Gu; Chao Zhou; Huihui Bai; Ming Lu; Zhan Ma; | code |
| 716 | MobileODE: An Extra Lightweight Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to lightweight CNNs through the discretization of Ordinary Differential Equations (ODEs). |
Le Yu; Jun Wu; Bo Gou; Xiangde Min; Lei Zhang; Zhang Yi; Tao He; | code |
| 717 | HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we pioneer textual reference-guided human action segmentation in multi-person settings, where a textual description specifies the target person for segmentation.We introduce the first dataset for Referring Human Action Segmentation, i.e., RHAS133, built from 133 movies and annotated with 137 fine-grained actions with 33h video data, together with textual descriptions for this new task. |
Kunyu Peng; Junchao Huang; Xiangsheng Huang; Di Wen; Junwei Zheng; Yufan Chen; Kailun Yang; Jiamin Wu; Chongqing Hao; Rainer Stiefelhagen; | code |
| 718 | Preference-driven Knowledge Distillation for Few-shot Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) perform well in zero-/few-shot learning on TAGs but suffer from a scalability challenge. Therefore, we propose a preference-driven knowledge distillation (PKD) framework to synergize the complementary strengths of LLMs and various GNNs for few-shot node classification. |
Xing Wei; Chunchun Chen; Rui Fan; Xiaofeng Cao; Sourav Medya; Wei Ye; | code |
| 719 | Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods generally adopt the Invariant Risk Minimization (IRM) framework, requiring costly environment annotations or heuristically generated synthetic splits. To circumvent these limitations, in this work, we aim to develop an IRM-free method for capturing causal subgraphs. |
Yang Qiu; Yixiong Zou; Jun Wang; Wei Liu; Xiangyu Fu; Ruixuan Li; | code |
| 720 | L2RSI: Cross-view LiDAR-based Place Recognition for Large-scale Urban Scenes Via Remote Sensing Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We tackle the challenge of LiDAR-based place recognition, which traditionally depends on costly and time-consuming prior 3D maps.To overcome this, we first construct LiRSI-XA dataset, which encompasses approximately $110,000$ remote sensing submaps and $13,000$ LiDAR point cloud submaps captured in urban scenes, and propose a novel method, L2RSI, for cross-view LiDAR place recognition using high-resolution Remote Sensing Imagery. |
Ziwei Shi; Xiaoran Zhang; Wenjing Xu; Yan Xia; Yu Zang; Siqi Shen; Cheng Wang; | code |
| 721 | Quantum Visual Fields with Neural Amplitude Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper advances the field by introducing a novel QINR architecture for 2D image and 3D geometric field learning, which we collectively refer to as Quantum Visual Field (QVF). |
Shuteng Wang; Christian Theobalt; Vladislav Golyanik; | code |
| 722 | Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key challenge is the lack of supervision signals from unknown data, leading to overconfident predictions on OOD samples. To address this challenge, we propose Feature Mixing, an extremely simple and fast method for synthesizing multimodal outliers with theoretical support, which can be further optimized to help the model better distinguish between in-distribution (ID) and OOD data. |
Moru Liu; Hao Dong; Jessica Ivy Kelly; Olga Fink; Mario Trapp; | code |
| 723 | Dependency Parsing Is More Parameter-Efficient with Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide theoretical evidence and empirical results revealing that a lack of normalization necessarily results in overparameterized parser models, where the extra parameters compensate for the sharp softmax outputs produced by high variance inputs to the biaffine scoring function. |
Paolo Gajo; Domenic Rosati; Hassan Sajjad; Alberto Barrón-Cedeño; | code |
| 724 | Steering When Necessary: Flexible Steering Large Language Models with Backtracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose the **F**lexible **A**ctivation **S**teering with **B**acktracking (**FASB**) framework, which dynamically determines both the necessity and strength of intervention by tracking the internal states of the LLMs during generation, considering both the question and the generated content. |
Zifeng Cheng; Jinwei Gan; Zhiwei Jiang; Cong Wang; Yafeng Yin; Xiang Luo; Yuchen Fu; Qing Gu; | code |
| 725 | TabDPT: Scaling Tabular Foundation Models on Real Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an approach to combine ICL-based retrieval with self supervised learning to train tabular foundation models. |
Junwei Ma; Valentin Thomas; Rasa Hosseinzadeh; Alex Labach; Jesse C. Cresswell; Keyvan Golestan; Guangwei Yu; Anthony L. Caterini; Maksims Volkovs; | code |
| 726 | Investigating and Mitigating Catastrophic Forgetting in Medical Knowledge Injection Through Internal Knowledge Augmentation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While domain-specific fine-tuning effectively injects medical knowledge into LLMs, it often causes catastrophic forgetting of previously acquired knowledge and instruction-following capabilities. In this paper, we investigate this issue and reveal a pattern of proximity-dependent forgetting: knowledge that is semantically or topically close to the injected content is more likely to be forgotten, while unrelated knowledge shows minimal degradation. |
Yuxuan Zhou; Xien Liu; Xiao Zhang; Chen Ning; Shijin Wang; Guoping Hu; Ji Wu; | code |
| 727 | When Kernels Multiply, Clusters Unify: Fusing Embeddings with The Kronecker Product Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: State-of-the-art embeddings often capture distinct yet complementary discriminative features: For instance, one image embedding model may excel at distinguishing fine-grained textures, while another focuses on object-level structure. Motivated by this observation, we propose a principled approach to fuse such complementary representations through *kernel multiplication*. |
Youqi WU; Jingwei Zhang; Farzan Farnia; | code |
| 728 | EditInfinity: Image Editing with Binary-Quantized Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose \emph{EditInfinity}, which adapts \emph{Infinity}, a binary-quantized generative model, for image editing. |
Jiahuan Wang; Yuxin Chen; Jun Yu; Guangming Lu; Wenjie Pei; | code |
| 729 | Knowledge Distillation Detection for Open-weights Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem is motivated by growing concerns about model provenance and unauthorized replication through distillation. To address this task, we introduce a model-agnostic framework that combines data-free input synthesis and statistical score computation for detecting distillation. |
Qin Shi; Amber Yijia Zheng; Qifan Song; Raymond A. Yeh; | code |
| 730 | Rethinking Scale-Aware Temporal Encoding for Event-based Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a CNN-RNN hybrid framework that rethinks temporal modeling for event-based object detection. |
Lin Zhu; LongTengyu; Xiao Wang; Lizhi Wang; Hua Huang; | code |
| 731 | TrajAgent: An LLM-Agent Framework for Trajectory Modeling Via Large-and-Small Model Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TrajAgent, a agent framework powered by large language models (LLMs), designed to facilitate robust and efficient trajectory modeling through automation modeling. |
Yuwei Du; Jie Feng; Jie Zhao; Yong Li; | code |
| 732 | Measure-Theoretic Anti-Causal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Anti-Causal Invariant Abstractions (ACIA), a novel measure-theoretic framework for anti-causal representation learning. |
Arman Behnam; Binghui Wang; | code |
| 733 | ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper proposes REMINDRAG, which employs an LLM-guided graph traversal featuring node exploration, node exploitation, and, most notably, memory replay, to improve both system effectiveness and cost efficiency. |
Yikuan Hu; Jifeng Zhu; Lanrui Tang; Chen Huang; | code |
| 734 | Perturb A Model, Not An Image: Towards Robust Privacy Protection Via Anti-Personalized Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first provide a theoretical analysis demonstrating that a naive approach of existing loss functions to diffusion models is inherently incapable of ensuring convergence for robust anti-personalization. Motivated by this finding, we introduce Direct Protective Optimization (DPO), a novel loss function that effectively disrupts subject personalization in the target model without compromising generative quality. |
Tae-Young Lee; Juwon Seo; Jong Hwan Ko; Gyeong-Moon Park; | code |
| 735 | Watermarking Autoregressive Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite significant interest in autoregressive image generation models and their potential for misuse, no prior work has attempted to watermark their outputs at the token level. In this work, we present the first such approach by adapting language model watermarking techniques to this setting. |
Nikola Jovanović; Ismail Labiad; Tomas Soucek; Martin Vechev; Pierre Fernandez; | code |
| 736 | CATransformers: Carbon Aware Transformers Through Joint Model-Hardware Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CATransformers, the first carbon-aware co-optimization framework for Transformer-based models and hardware accelerators. |
Irene Wang; Mostafa Elhoushi; H Ekin Sumbul; Samuel Hsia; Daniel Jiang; Newsha Ardalani; Divya Mahajan; Carole-Jean Wu; Bilge Acun; | code |
| 737 | RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Predicting human mobility is inherently challenging due to complex long-range dependencies and multi-scale periodic behaviors. To address this, we introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework that leverages large language models (LLMs) as general-purpose spatio-temporal predictors and trajectory reasoners. |
Haoyu He; Haozheng Luo; Yan Chen; Qi Wang; | code |
| 738 | Bridging Scales: Spectral Theory Reveals How Local Connectivity Rules Sculpt Global Neural Dynamics in Spatially Extended Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By mechanistically linking neural structure to dynamics, this work advances a principled framework for dissecting how large-scale activity patterns—central to cognition and open questions in consciousness research—arise from, and constrain, local circuitry. |
Yuhan Huang; Keren Gao; Dongping Yang; Sen Song; Guozhang Chen; | code |
| 739 | Flash Invariant Point Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FlashIPA, a factorized reformulation of IPA that leverages hardware-efficient FlashAttention to achieve linear scaling in GPU memory and wall-clock time with sequence length. |
Andrew Liu; Axel Elaldi; Nicholas T Franklin; Nathan Russell; Gurinder S. Atwal; Yih-En Andrew Ban; Olivia Viessmann; | code |
| 740 | Focus-Then-Reuse: Fast Adaptation in Visual Perturbation Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We notice that humans tend to filter information at the object level prior to decision-making, facilitating efficient skill transfer across different contexts. Inspired by this, we introduce Focus-Then-Reuse (FTR), a method utilizing a novel object selection mechanism to focus on task-relevant objects, and directly reuse the simulation-trained policy on them. |
Jiahui Wang; Chao Chen; Jiacheng Xu; Zongzhang Zhang; Yang Yu; | code |
| 741 | FedRAM: Federated Reweighting and Aggregation for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FedRAM, a three-step framework that progressively updates two scalar hyperparameters: the task importance weight and the client aggregation coefficient. |
Fan Wu; Xinyu Yan; Jiabei Liu; Wei Yang Bryan Lim; | code |
| 742 | Knowledge Starts with Practice: Knowledge-Aware Exercise Generative Recommendation with Adaptive Multi-Agent Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve KEGR, we propose an adaptive multi-agent cooperation framework, called ExeGen, inspired by the excellent reasoning and generative capabilities of LLM-based AI agents. |
Yangtao Zhou; Hua Chu; Yongxiang Chen; Ziwen Wang; Jiacheng Liu; Jianan Li; Yueying Feng; Xiangming Li; Zihan Han; Qingshan Li; | code |
| 743 | Two‑Stage Learning of Stabilizing Neural Controllers Via Zubov Sampling and Iterative Domain Expansion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel two-stage training framework to jointly synthesize a controller a Lyapunov function for continuous-time systems. |
Haoyu Li; Xiangru Zhong; Bin Hu; Huan Zhang; | code |
| 744 | ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. |
Tonghe Zhang; Chao Yu; Sichang Su; Yu Wang; | code |
| 745 | SpiderSolver: A Geometry-Aware Transformer for Solving PDEs on Complex Geometries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SpiderSolver, a geometry-aware transformer that introduces spiderweb tokenization for handling complex domain geometry and irregularly discretized points. |
Kai Qi; Fan Wang; Zhewen Dong; Jian Sun; | code |
| 746 | Unleashing Foundation Vision Models: Adaptive Transfer for Diverse Data-Limited Scientific Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Cluster Attention Adapter (CLAdapter), which refines and adapts the rich representations learned from large-scale data to various data-limited downstream tasks. |
Qiankun Li; Feng He; Huabao Chen; Xin Ning; Kun Wang; Zengfu Wang; | code |
| 747 | TopoPoint: Enhance Topology Reasoning Via Endpoint Detection in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods often suffer from lane endpoints deviation, leading to incorrect topology construction. To address this issue, we propose TopoPoint, a novel framework that explicitly detects lane endpoints and jointly reasons over endpoints and lanes for robust topology reasoning. |
Yanping Fu; Xinyuan Liu; Tianyu Li; Yike Ma; Yucheng Zhang; Feng Dai; | code |
| 748 | BMW: Bidirectionally Memory Bank ReWriting for Unsupervised Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Rewriting memory banks with partial constraint limits their discrimination capacities, and hence hinders learning discriminative features based on those memory banks. In this paper, we claim that memory banks should be rewritten with both intra-class and inter-class constraints, and therefore propose a unified memory bank rewriting mechanism, Bidirectionally Memory bank reWriting (BMW), to chase enhanced discrimination capacity. |
Xiaobin Liu; Jianing Li; Baiwei Guo; WenbinZhu; Jing Yuan; | code |
| 749 | Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems Via Test-Time Projection Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This degradation arises from the distributional shift between training and testing data, rendering policies learned on small instances ineffective for larger problems. To overcome this limitation, we introduce a novel learning framework driven by Large Language Models (LLMs). |
Yuanyao Chen; Rongsheng Chen; Fu Luo; Zhenkun Wang; | code |
| 750 | SNEAKDOOR: Stealthy Backdoor Attacks Against Distribution Matching-based Dataset Condensation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While prior approaches have made progress in balancing attack success rate and clean test accuracy, they often fall short in preserving stealthiness, especially in concealing the visual artifacts of condensed data or the perturbations introduced during inference. To address this challenge, we introduce \textsc{Sneakdoor}, which enhances stealthiness without compromising attack effectiveness. |
He Yang; Dongyi Lv; Song Ma; Wei Xi; Jizhong Zhao; | code |
| 751 | Long-Tailed Recognition Via Information-Preservable Two-Stage Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The imbalance (or long-tail) is the nature of many real-world data distributions, which often induces the undesirable bias of deep classification models toward frequent classes, resulting in poor performance for tail classes. In this paper, we propose a novel two-stage learning approach to mitigate such a majority-biased tendency while preserving valuable information within datasets. |
Fudong Lin; Xu Yuan; | code |
| 752 | Fading to Grow: Growing Preference Ratios Via Preference Fading Discrete Diffusion for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, building upon recent advances in discrete diffusion, we propose \textbf{PreferGrow}, a discrete diffusion-based recommender modeling preference ratios by fading and growing user preferences over the discrete item corpus. |
Guoqing Hu; An Zhang; Shuchang Liu; Wenyu Mao; Jiancan Wu; Xun Yang; Xiang Li; Lantao Hu; Han Li; Kun Gai; Xiang Wang; | code |
| 753 | Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a transfer-based framework for dexterous grasp generation, leveraging a conditional diffusion model to transfer high-quality grasps from shape templates to novel objects within the same category. |
Yiyao Ma; Kai Chen; Kexin ZHENG; Qi Dou; | code |
| 754 | UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose **LumosBench**, a disentangled attribute-level benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. |
Pengwei Liu; Hangjie Yuan; Bo Dong; Jiazheng Xing; Jinwang Wang; Rui Zhao; Weihua Chen; Fan Wang; | code |
| 755 | Text to Sketch Generation with Multi-Styles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a training-free framework based on diffusion models that enables explicit style guidance via textual prompts and referenced style sketches. |
Tengjie Li; Shikui Tu; Lei Xu; | code |
| 756 | LayerNavigator: Finding Promising Intervention Layers for Efficient Activation Steering in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-layer steering promises stronger control but faces a combinatorial explosion of possible layer subsets, making exhaustive search impractical. To address these challenges, we propose LayerNavigator, which provides a principled and promising layer selection strategy. |
Hao Sun; Huailiang Peng; Qiong Dai; Xu Bai; Yanan Cao; | code |
| 757 | Precise Diffusion Inversion: Towards Novel Samples and Few-Step Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ***PreciseInv***, a general-purpose test-time optimization framework that enables fast and faithful inversion in as few as two inference steps. |
Jing Zuo; Luoping Cui; Chuang Zhu; Yonggang Qi; | code |
| 758 | MVSMamba: Multi-View Stereo with State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fully exploit Mamba’s potential in MVS, we propose a Dynamic Mamba module (DM-module) based on a novel reference-centered dynamic scanning strategy, which enables: (1) Efficient intra- and inter-view feature interaction from the reference to source views, (2) Omnidirectional multi-view feature representations, and (3) Multi-scale global feature aggregation. |
Jianfei Jiang; Qiankun Liu; Hongyuan Liu; Haochen Yu; Liyong Wang; Jiansheng Chen; Huimin Ma; | code |
| 759 | Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present **Loquetier**, a virtualized multi-LoRA framework that seamlessly integrates LoRA fine-tuning and serving within a single runtime. |
Yuchen Zhang; Hanyue Du; Chun Cao; Jingwei Xu; | code |
| 760 | Overcoming Long Context Limitations of State Space Models Via Context Dependent Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on analyzing and improving the long-context modeling capabilities of SSMs. |
Zhihao Zhan; Jianan Zhao; Zhaocheng Zhu; Jian Tang; | code |
| 761 | Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \textbf{Drag-and-Drop LLMs (\textit{DnD})}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. |
Zhiyuan Liang; Dongwen Tang; Yuhao Zhou; Xuanlei Zhao; Mingjia Shi; Wangbo Zhao; Zekai Li; Peihao Wang; Konstantin Schürholt; Damian Borth; Michael M. Bronstein; Yang You; Zhangyang Wang; Kai Wang; | code |
| 762 | Learning Dense Hand Contact Estimation from Imbalanced Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, hand contact datasets contain spatial imbalance issue with most of hand contact exhibited in finger tips, resulting in challenges for generalization towards contacts in other hand regions. To tackle these issues, we present a framework that learns dense HAnd COntact estimation (HACO) from imbalanced data. |
Daniel Sungho Jung; Kyoung Mu Lee; | code |
| 763 | Efficient Last-Iterate Convergence in Solving Extensive-Form Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper show that CFR$^+$, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. |
Linjian Meng; Tianpei Yang; Youzhi Zhang; Zhenxing Ge; Shangdong Yang; Tianyu Ding; Wenbin Li; Bo An; Yang Gao; | code |
| 764 | Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To provide last-iterate convergence for RM$^+$ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM$^+$ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. |
Linjian Meng; Youzhi Zhang; Zhenxing Ge; Tianyu Ding; Shangdong Yang; Zheng Xu; Wenbin Li; Yang Gao; | code |
| 765 | Consistent Story Generation: Unlocking The Potential of Zigzag Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel training-free sampling strategy called Zigzag Sampling with Asymmetric Prompts and Visual Sharing to enhance subject consistency in visual story generation. |
Mingxiao Li; Mang Ning; Marie-Francine Moens; | code |
| 766 | Extracting Task-relevant Preserved Dynamics from Contrastive Aligned Neural Recordings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $\underline{\text{C}}$ontrastive $\underline{\text{A}}$ligned $\underline{\text{N}}$eural $\underline{\text{D}}$$\underline{\text{Y}}$namics (CANDY), an end‑to‑end framework that aligns neural and behavioral data using rank-based contrastive learning, adapted for continuous behavioral variables, to project neural activity from different sessions onto a shared low-dimensional embedding space. |
Yiqi Jiang; Kaiwen Sheng; Yujia Gao; E. Kelly Buchanan; Yu Shikano; Seung Je Woo; Yixiu Zhao; Tony Hyun Kim; Fatih Dinc; Scott Linderman; Mark Schnitzer; | code |
| 767 | Beyond Oracle: Verifier-Supervision for Instruction Hierarchy in Reasoning and Instruction-Tuned LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a unified supervision framework that embeds programmatically verifiable checkers into synthesized instruction-conflict instances. |
Sian-Yao Huang; Li-Hsien Chang; Che-Yu Lin; Cheng-Lin Yang; | code |
| 768 | Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this regime, diffusion models fail to satisfy the Fokker–Planck equation, which governs the evolution of the score. We interpret this deviation as the source of the observed inconsistencies and propose an energy-based diffusion model with a Fokker–Planck-derived regularization term to enforce consistency. |
Michael Plainer; Hao Wu; Leon Klein; Stephan Günnemann; Frank Noe; | code |
| 769 | ReservoirTTA: Prolonged Test-time Adaptation for Evolving and Recurring Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces **ReservoirTTA**, a novel plug–in framework designed for prolonged test–time adaptation (TTA) in scenarios where the test domain continuously shifts over time, including cases where domains recur or evolve gradually. |
Guillaume Vray; Devavrat Tomar; Xufeng Gao; Jean-Philippe Thiran; Evan Shelhamer; Behzad Bozorgtabar; | code |
| 770 | REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Region Encoder Network (REN), a fast and effective model for generating region-based image representations using point prompts. |
Savya Khosla; Sethuraman T V; Barnett Lee; Alex Schwing; Derek Hoiem; | code |
| 771 | Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we advocate for moving to layer-level sparsity to overcome the accuracy trade-off in sparse layer approximation. |
James Oldfield; Shawn Im; Sharon Li; Mihalis Nicolaou; Ioannis Patras; Grigorios Chrysos; | code |
| 772 | Dual Alignment Framework for Few-shot Learning with Inter-Set and Intra-Set Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Dual Support Query Shift (DSQS), a novel challenge in FSL that integrates two key issues: inter-set shifts (between support and query sets) and intra-set shifts (within each set), which significantly hinder model performance. |
Siyang Jiang; Rui Fang; Hsi-Wen Chen; Wei Ding; Guoliang Xing; Ming-syan Chen; | code |
| 773 | MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing Reinforcement Learning (RL)-based multi-task methods can only train light decoder models on small-scale problems, exhibiting limited generalization ability when solving large-scale problems. To overcome this limitation, this work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD), which enables efficient training of heavy decoder models with strong generalization ability. |
Yuepeng Zheng; Fu Luo; Zhenkun Wang; Yaoxin Wu; Yu Zhou; | code |
| 774 | An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing solutions require access to the training pipelines, data or prior knowledge of the proportions of anomalies in the data, limiting their real-world applicability. To address this challenge, we propose EPHAD, a simple yet effective test-time adaptation framework that updates the outputs of AD models trained on contaminated datasets using evidence gathered at test time. |
Sukanya Patra; Souhaib Ben Taieb; | code |
| 775 | RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose RespoDiff, a novel framework for responsible text-to-image generation that incorporates a dual-module transformation on the intermediate bottleneck representations of diffusion models. |
Silpa Vadakkeeveetil Sreelatha; Sauradip Nag; Muhammad Awais; Serge Belongie; Anjan Dutta; | code |
| 776 | ATLAS: Autoformalizing Theorems Through Lifting, Augmentation, and Synthesis of Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements.Running the proposed ATLAS framework for 10 iterations, we construct an undergraduate-level dataset of 117k theorem statements and develop the ATLAS Translator by fine-tuning Llama3.1-8B-Instruct with LoRA. |
Xiaoyang Liu; Kangjie Bao; Jiashuo Zhang; Yunqi Liu; Yu Chen; Yuntian Liu; Yang Jiao; Tao Luo; | code |
| 777 | Uncertainty-aware Preference Alignment for Diffusion Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since preference data is practically collected from populations with different backgrounds, a key challenge lies in handling the inherent uncertainties in people’s preferences during policy updates. To address this challenge, we propose the Diff-UAPA algorithm, designed for uncertainty-aware preference alignment in diffusion policies. |
Runqing Miao; Sheng Xu; Runyi Zhao; Wai Kin Victor Chan; Guiliang Liu; | code |
| 778 | Novel View Synthesis from A Few Glimpses Via Test-Time Natural Video Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: That’s the lens we take on sparse-input novel view synthesis, not only as filling spatial gaps between widely spaced views, but also as completing a natural video unfolding through space. |
Yan Xu; Yixing Wang; Stella X. Yu; | code |
| 779 | Neural Atlas Graphs for Dynamic Scene Decomposition and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Neural Atlas Graphs (NAGs), a hybrid high-resolution scene representation, where every graph node is a view-dependent neural atlas, facilitating both 2D appearance editing and 3D ordering and positioning of scene elements. |
Jan Philipp Schneider; Pratik Singh Bisht; Ilya Chugunov; Andreas Kolb; Michael Moeller; Felix Heide; | code |
| 780 | Learning Differential Pyramid Representation for Tone Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the Differential Pyramid Representation Network (DPRNet), an end-to-end framework for high-fidelity tone mapping. |
Qirui Yang; Yinbo Li; Yihao Liu; Peng-Tao Jiang; zhangfangpu; cheng qihua; Huanjing Yue; Jingyu Yang; | code |
| 781 | RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RAGRouter, a RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts and enable informed routing decisions. |
Jiarui Zhang; Xiangyu Liu; Yong Hu; Chaoyue Niu; Fan Wu; Guihai Chen; | code |
| 782 | Revealing Multimodal Causality with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even with the advent of multimodal LLMs (MLLMs), their efficacy in multimodal CD is hindered by two primary limitations: (1) difficulty in exploring intra- and inter-modal interactions for comprehensive causal variable identification; and (2) insufficiency to handle structural ambiguities with purely observational data. To address these challenges, we propose MLLM-CD, a novel framework for multimodal causal discovery from unstructured data. |
Jin Li; Shoujin Wang; Qi Zhang; Feng Liu; Tongliang Liu; Longbing Cao; Shui Yu; Fang Chen; | code |
| 783 | 3D-GSRD: 3D Molecular Graph Auto-Encoder with Selective Re-mask Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, extending the success of re-mask decoding from 2D to 3D MGM is non-trivial, primarily due to two conflicting challenges: avoiding 2D structure leakage to the decoder, while still providing sufficient 2D context for reconstructing re-masked atoms. To address these challenges, we propose 3D-GSRD: a 3D Molecular Graph Auto-Encoder with Selective Re-mask Decoding. |
Chang Wu; Zhiyuan Liu; Wen Shu; Liang Wang; Yanchen Luo; Wenqiang Lei; Yatao Bian; Junfeng Fang; Xiang Wang; | code |
| 784 | DNA-DetectLLM: Unveiling AI-Generated Text Via A DNA-Inspired Mutation-Repair Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent advances in generative language modeling have resulted in significant overlap between the feature distributions of human-written and AI-generated text, blurring classification boundaries and making accurate detection increasingly challenging. To address the above challenges, we propose a DNA-inspired perspective, leveraging a repair-based process to directly and interpretably capture the intrinsic differences between human-written and AI-generated text. |
Xiaowei Zhu; Yubing Ren; Fang Fang; Qingfeng Tan; Shi Wang; Yanan Cao; | code |
| 785 | Beyond Pairwise Connections: Extracting High-Order Functional Brain Network Structures Under Global Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Functional brain network (FBN) modeling often relies on local pairwise interactions, whose limitation in capturing high-order dependencies is theoretically analyzed in this paper. |
Ling Zhan; Junjie Huang; Xiaoyao Yu; Wenyu Chen; Tao Jia; | code |
| 786 | A Latent Multilayer Graphical Model For Complex, Interdependent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new estimation method, called multilayer sparse + low-rank inverse covariance estimation (multiSLICE), which estimates the interlayer edges. |
Martin Ondrus; Ivor Cribben; Yang Feng; | code |
| 787 | Flexible MOF Generation with Torsion-Aware Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this limits their ability to (1) design novel MOFs and (2) generate the structure using novel building blocks. We propose a two-stage MOF generation framework that overcomes these limitations by modeling both chemical and geometric degrees of freedom. |
Nayoung Kim; Seongsu Kim; Sungsoo Ahn; | code |
| 788 | Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in a form of bias affecting training data, which typically leads to unrecoverable weak generalization in prediction. This paper addresses this problem by leveraging bias amplification with generated synthetic data only: we introduce Diffusing DeBias (DDB), a novel approach acting as a plug-in for common methods of unsupervised model debiasing, exploiting the inherent bias-learning tendency of diffusion models in data generation. |
Massimiliano Ciranni; Vito Paolo Pastore; Roberto Di Via; Enzo Tartaglione; Francesca Odone; Vittorio Murino; | code |
| 789 | CAGE: Continuity-Aware EdGE Network Unlocks Robust Floorplan Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CAGE (Continuity-Aware edGE) network, a robust framework for reconstructing vector floorplans directly from point-cloud density maps. |
Yiyi Liu; Chunyang Liu; Bohan Wang; Weiqin Jiao; Bojian Wu; Lubin Fan; Yuwei Chen; Fashuai Li; Biao Xiong; | code |
| 790 | LBMKGC: Large Model-Driven Balanced Multimodal Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel **L**arge model-driven **B**alanced **M**ultimodal **K**nowledge **G**raph **C**ompletion framework, termed LBMKGC. |
Yuan Guo; Qian Ma; Hui Li; Qiao Ning; Furui Zhan; Yu Gu; Ge Yu; Shikai Guo; | code |
| 791 | G-Net: A Provably Easy Construction of High-Accuracy Random Binary Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel randomized algorithm for constructing binary neural networks with tunable accuracy. |
Alireza Aghasi; Nicholas F. Marshall; Saeid Pourmand; Wyatt D. Whiting; | code |
| 792 | Continuous Simplicial Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce continuous simplicial neural network (COSIMO), a novel SNN architecture derived from PDEs on simplicial complexes. |
Aref Einizade; Dorina Thanou; Fragkiskos D. Malliaros; Jhony H. Giraldo; | code |
| 793 | RAG4GFM: Bridging Knowledge Gaps in Graph Foundation Models Through Graph Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RAG4GFM, an end-to-end framework that seamlessly integrates multi-level graph indexing, task-aware retrieval, and graph fusion enhancement. |
Xingliang Wang; Zemin Liu; Junxiao Han; Shuiguang Deng; | code |
| 794 | REDOUBT: Duo Safety Validation for Autonomous Vehicle Motion Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce REDOUBT, the first systematic safety validation framework for autonomous vehicle motion planning that employs a duo mechanism, simultaneously inspecting input distributions and output uncertainty. |
Shuguang Wang; Qian Zhou; Kui Wu; Dapeng Wu; Wei-Bin Lee; Jianping Wang; | code |
| 795 | Learning to Rank for In-Context Example Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this paper, we propose a novel algorithm that trains a retrieval model by ranking formulation, where the preference rankings between ICEs are given by comparing the likelihood of the LLM generating the correct answer conditioned on each exemplar. |
Yuwen Ji; Luodan Zhang; Ambyerhan; Haoran Que; Lei Shi; Wang Chao; Yue Zhang; | code |
| 796 | Learning to Watermark: A Selective Watermarking Framework for Large Language Models Via Multi-Objective Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing watermarking techniques often face trade-off between watermark detectability and generated text quality. In this paper, we introduce Learning to Watermark (LTW), a novel selective watermarking framework that leverages multi-objective optimization to effectively balance these competing goals. |
Chenrui Wang; Junyi Shu; Billy Chiu; YU LI; Saleh Alharbi; Min Zhang; Jing Li; | code |
| 797 | Color Conditional Generation with Sliced Wasserstein Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SW-Guidance, a training-free approach for image generation conditioned on the color distribution of a reference image. |
Alexander Lobashev; Maria Larchenko; Dmitry Guskov; | code |
| 798 | PolyPose: Deformable 2D/3D Registration Via Polyrigid Transformations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To integrate volumetric guidance into intraoperative procedures, we present PolyPose, a simple and robust method for deformable 2D/3D registration. |
Vivek Gopalakrishnan; Neel Dey; Polina Golland; | code |
| 799 | Solving Partial Differential Equations Via Radon Neural Operator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Within the sinogram domain, we further evidence that different angles contribute unequally to the overall space, thus engineering a reweighting technique to enable more effective PDE solutions. |
Wenbin Lu; Yihan Chen; Junnan Xu; Wei Li; Junwei Zhu; Jianwei Zheng; | code |
| 800 | Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while Large Language Models (LLMs) are a promising tool for this, they often produce flawed or infeasible results due to errors and hallucinations. To address this issue, we propose Solver-Informed Reinforcement Learning (SIRL), a framework that uses Reinforcement Learning with Verifiable Reward to improve LLMs’ ability to generate accurate and executable optimization models. |
Yitian Chen; Jingfan Xia; Siyu Shao; Dongdong Ge; Yinyu Ye; | code |
| 801 | GraSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose **GraSS**, a novel gradient compression algorithm and its variants **FactGraSS** for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. |
Pingbang Hu; Joseph Melkonian; Weijing Tang; Han Zhao; Jiaqi W. Ma; | code |
| 802 | Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present WEDGE, a framework for generating performance-stressing input given the program under test. |
Jun Yang; Cheng-Chi Wang; Bogdan Alexandru Stoica; Kexin Pei; | code |
| 803 | SparseDiT: Token Sparsification for Efficient Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SparseDiT, a novel framework that implements token sparsification across spatial and temporal dimensions to enhance computational efficiency while preserving generative quality. |
Shuning Chang; Pichao WANG; Jiasheng Tang; Fan Wang; Yi Yang; | code |
| 804 | DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DeepASA, a multi-purpose model for auditory scene analysis that performs multi-input multi-output (MIMO) source separation, dereverberation, sound event detection (SED), audio classification, and direction-of-arrival estimation (DoAE) within a unified framework. |
Dongheon Lee; Younghoo Kwon; Jung-Woo Choi; | code |
| 805 | Gains: Fine-grained Federated Domain Adaptation in Open Set Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a fine-grained federated domain adaptation approach in open set (Gains). |
Zhengyi Zhong; Wenzheng Jiang; Weidong Bao; Ji Wang; Cheems Wang; Guanbo Wang; Yongheng Deng; Ju Ren; | code |
| 806 | HairFree: Compositional 2D Head Prior for Text-Driven 360° Bald Texture Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HairFree, an unsupervised texturing framework guided by textual descriptions and 2D diffusion priors, producing high-consistency 360° bald head textures—including non-human skin with fine details—without any texture, back-view, bald, non-human, or synthetic training data. |
Mirela Ostrek; Michael J. Black; Justus Thies; | code |
| 807 | Universal Few-shot Spatial Control for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing control adapters exhibit limited adaptability and incur high training costs when encountering novel spatial control conditions that differ substantially from the training tasks. To address this limitation, we propose Universal Few-Shot Control (UFC), a versatile few-shot control adapter capable of generalizing to novel spatial conditions. |
Kiet T Nguyen; Chanhyuk Lee; Donggyun Kim; Dong Hoon Lee; Seunghoon Hong; | code |
| 808 | Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other extreme, trajectory-level methods (e.g., GRPO) solely rely on a coarse-grained advantage signal from the final reward, leading to imprecise credit assignment. To address these limitations, we propose Segment Policy Optimization (SPO), a novel RL framework that leverages segment-level advantage estimation at an intermediate granularity, achieving a better balance by offering more precise credit assignment than trajectory-level methods and requiring fewer estimation points than token-level methods, enabling accurate advantage estimation based on Monte Carlo (MC) without a critic model. |
Yiran Guo; Lijie Xu; Jie Liu; Ye Dan; Shuang Qiu; | code |
| 809 | $\texttt{STRCMP}$: Integrating Graph Structural Priors with Language Models for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by human experts’ success in leveraging CO structures for algorithm design, we propose $\texttt{STRCMP}$, a novel structure-aware LLM-based algorithm discovery framework that systematically integrates structure priors to enhance solution quality and solving efficiency. |
Xijun Li; Jiexiang Yang; Jinghao Wang; Bo Peng; Jianguo Yao; Haibing Guan; | code |
| 810 | Exploring Structural Degradation in Dense Representations for Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we observe a counterintuitive phenomenon in self-supervised learning (SSL): longer training may impair the performance of dense prediction tasks (e.g., semantic segmentation). |
Siran Dai; Qianqian Xu; Peisong Wen; Yang Liu; Qingming Huang; | code |
| 811 | Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of generating executable STL plans for systems with unknown dynamics. |
Ruijia Liu; Ancheng Hou; Xiao Yu; Xiang Yin; | code |
| 812 | MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary, enabling cross-modal learning through large-scale pretraining on reliably computed fingerprint–molecule datasets.We provide the data and code at [https://github.com/OpenDFM/MS-BART](https://github.com/OpenDFM/MS-BART). |
Yang Han; Pengyu Wang; Kai Yu; xin chen; Lu Chen; | code |
| 813 | Defending Multimodal Backdoored Models By Repulsive Visual Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets, while they exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns. In this paper, we reveal that CLIP’s vulnerabilities primarily stem from its tendency to encode features beyond in-dataset predictive patterns, compromising its visual feature resistivity to input perturbations. |
Zhifang Zhang; Shuo He; Haobo Wang; Bingquan Shen; Lei Feng; | code |
| 814 | Role-aware Multi-agent Reinforcement Learning for Coordinated Emergency Traffic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing models primarily only focus on traffic light control, leaving emergency and regular vehicles prone to delay due to the lack of navigation strategies. To address this issue, we propose the ***R*ole-aware *M*ulti-agent *T*raffic *C*ontrol (RMTC)** framework, which dynamically assigns appropriate roles to traffic components for better cooperation by considering their relations with emergency vehicles and adaptively adjusting their policies. |
Ming Cheng; Hao Chen; Zhiqing Li; Jia Wang; Senzhang Wang; | code |
| 815 | MIND: Material Interface Generation from UDFs for Non-Manifold Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, local SDFs are inherently incapable of representing non-manifold geometry, leading to complete failure in such cases. To address this gap, we propose MIND ($\underline{M}aterial$ $\underline{I}nterface$ $from$ $\underline{N}on$-$manifold$ $\underline{D}istance$ $fields$), a novel algorithm for generating material interfaces directly from UDFs, enabling non-manifold mesh extraction from a global perspective. |
Xuhui Chen; Fei Hou; Wencheng Wang; Hong Qin; Ying He; | code |
| 816 | Adaptive Discretization for Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing CMs rely on manually designed discretization schemes, which can cause repeated adjustments for different noise schedules and datasets. To address this, we propose a unified framework for the automatic and adaptive discretization of CMs, formulating it as an optimization problem with respect to the discretization step. |
Jiayu Bai; Zhanbo Feng; Zhijie Deng; TianQi Hou; Robert C Qiu; Zenan Ling; | code |
| 817 | OCTDiff: Bridged Diffusion Model for Portable OCT Super-Resolution and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose OCTDiff, a bridged diffusion model designed to enhance image resolution and quality from portable OCT devices. |
Ye Tian; Angela McCarthy; Gabriel Gomide; Nancy Liddle; Jedrzej Golebka; Royce Chen; Jeff Liebmann; Kaveri A. Thakoor; | code |
| 818 | FHGS: Feature-Homogenized Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the limitation, we proposes FHGS (Feature-Homogenized Gaussian Splatting), a novel 3D feature distillation framework inspired by physical models, which freezes and distills 2D pre-trained features into 3D representations while preserving the real-time rendering efficiency of 3DGS. |
Q.G. Duan; Benyun ZHAO; Mingqiao Han; Yijun Huang; Ben M. Chen; | code |
| 819 | Learning to Cluster Neuronal Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce DECEMber — Deep Embedding Clustering via Expectation Maximization-based refinement — an explicit inductive bias into predictive models that enhances clustering by adding an auxiliary $t$-distribution-inspired loss function that enforces structured organization among per-neuron embeddings. |
Nina S. Nellen; Polina Turishcheva; Michaela Vystrčilová; Shashwat Sridhar; Tim Gollisch; Andreas S. Tolias; Alexander S. Ecker; | code |
| 820 | Ditch The Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a fully self-supervised framework that enables noise-robust representation learning without requiring a denoiser at inference or downstream fine-tuning. |
Wenquan Lu; Jiaqi Zhang; Hugues Van Assel; Randall Balestriero; | code |
| 821 | Activation-Informed Merging of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Activation-Informed Merging (AIM), a technique that integrates the information from the activation space of LLMs into the merging process to improve performance and robustness. |
Amin Heyrani Nobari; Kaveh Alim; Ali ArjomandBigdeli; Akash Srivastava; Faez Ahmed; Navid Azizan; | code |
| 822 | ALMGuard: Safety Shortcuts and Where to Find Them As Guardrails for Audio–Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous studies have proposed jailbreak attacks that specifically target ALMs, revealing that defenses directly transferred from traditional audio adversarial attacks or text-based Large Language Model (LLM) jailbreaks are largely ineffective against these ALM-specific threats. To address this issue, we propose ALMGuard, the first defense framework tailored to ALMs. |
Weifei Jin; Yuxin Cao; Junjie Su; Jason Xue; Jie Hao; Ke Xu; Jin Song Dong; Derui Wang; | code |
| 823 | Attack Via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that LLMs can be jailbroken by fine-tuning with only 10 benign QA pairs; our attack exploits the increased sensitivity of LLMs to fine-tuning data after being overfitted. |
Zhixin Xie; Xurui Song; Jun Luo; | code |
| 824 | No Object Is An Island: Enhancing 3D Semantic Segmentation Generalization with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel cross-modal learning framework based on diffusion models to enhance the generalization of 3D semantic segmentation, named XDiff3D. |
Fan Li; Xuan Wang; Xuanbin Wang; Zhaoxiang Zhang; Yuelei Xu; | code |
| 825 | Pragmatic Heterogeneous Collaborative Perception Via Generative Communication Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches based on adaptation and reconstruction fail to support *pragmatic heterogeneous collaboration* due to two key limitations: (1) Intrusive retraining of the encoder or core modules disrupts the established semantic consistency among agents; and (2) accommodating new agents incurs high computational costs, limiting scalability. To address these challenges, we present a novel **Gen**erative **Comm**unication mechanism (GenComm) that facilitates seamless perception across heterogeneous multi-agent systems through feature generation, without altering the original network, and employs lightweight numerical alignment of spatial information to efficiently integrate new agents at minimal cost. |
Junfei Zhou; Penglin Dai; Quanmin Wei; Bingyi Liu; Xiao Wu; Jianping Wang; | code |
| 826 | LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $\textbf{LUNA}$ ($\textbf{L}$atent $\textbf{U}$nified $\textbf{N}$etwork $\textbf{A}$rchitecture), a self-supervised foundation model that reconciles disparate electrode geometries while scaling linearly—not quadratically—with channel count. |
Berkay Döner; Thorir Mar Ingolfsson; Luca Benini; Yawei Li; | code |
| 827 | SpaceServe: Spatial Multiplexing of Complementary Encoders and Decoders for Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the encoder is compute- intensive but memory-light, the decoder is the opposite, yet state-of-the-art serving stacks still time-multiplex these complementary kernels, idling SMs or HBM in turn. We introduce SpaceServe, a serving system that space-multiplexes MLLMs: it decouples all modality encoders from the decoder, and co-locates them on the same GPU using fine-grained SM partitioning available in modern runtimes. |
zhicheng li; Shuoming Zhang; Jiacheng Zhao; Siqi Li; Xiyu Shi; Yangyu Zhang; Shuaijiang Li; Donglin Yu; Zheming Yang; YUAN WEN; Huimin Cui; | code |
| 828 | SteerConf: Steering LLMs for Confidence Elicitation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SteerConf, a novel framework that systematically steers LLMs’ confidence scores to improve their calibration and reliability. |
Ziang Zhou; Tianyuan Jin; Jieming Shi; Li Qing; | code |
| 829 | Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we observe that performance degradation closely correlates with feature drift, i.e., differences in feature representations of the same sample caused by model merging. Motivated by this observation, we propose Layer-wise Optimal Task Vector Merging (LOT Merging), a technique that explicitly minimizes feature drift between task-specific experts and the unified model in a layer-by-layer manner. |
Wenju Sun; Qingyong Li; Wen Wang; Yang Liu; Yangliao Geng; Boyang Li; | code |
| 830 | Balancing Multimodal Training Through Game-Theoretic Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the Multimodal Competition Regularizer (MCR), inspired by a mutual information (MI) decomposition designed to prevent the adverse effects of competition in multimodal training. |
Konstantinos Kontras; Thomas Strypsteen; Christos Chatzichristos; Paul Pu Liang; Matthew B. Blaschko; Maarten De Vos; | code |
| 831 | Text-to-Code Generation for Modular Building Layouts in Building Information Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Text2MBL, a text-to-code generation framework that generates executable Building Information Modeling (BIM) code directly from textual descriptions of modular building layout (MBL) design. |
Yinyi WEI; Xiao LI; | code |
| 832 | Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Fourier Token Merging, a new method for understanding and capitalizing frequency domain for efficient image generation. |
Jiesong Liu; Xipeng Shen; | code |
| 833 | IA-GGAD: Zero-shot Generalist Graph Anomaly Detection Via Invariant and Affinity Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle FSS, we develop an anomaly-driven graph invariant learning module that learns domain-invariant node representations. |
Xiong Zhang; Zhenli He; Changlong Fu; Cheng Xie; | code |
| 834 | 🎧MOSPA: Human Motion Generation Driven By Spatial Audio Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As of yet, these models typically overlook the impact of spatial features encoded in spatial audio signals on human motion. To bridge this gap and enable high-quality modeling of human movements in response to spatial audio, we introduce the first comprehensive Spatial Audio-Driven Human Motion (SAM) dataset, which contains diverse and high-quality spatial audio and motion data. |
Shuyang Xu; Zhiyang Dou; Mingyi Shi; Liang Pan; Leo Ho; Jingbo Wang; Yuan Liu; Cheng Lin; Yuexin Ma; Wenping Wang; Taku Komura; | code |
| 835 | Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct the first large-scale study evaluating leading generative models in data-scarce settings, revealing a substantial performance gap between full-data and data-scarce regimes. |
Tal Gonen; Itai Pemper; Ilan Naiman; Nimrod Berman; Omri Azencot; | code |
| 836 | AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce AdaDetectGPT — a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. |
Hongyi Zhou; Jin Zhu; Pingfan Su; Kai Ye; Ying Yang; Shakeel A O B Gavioli-Akilagun; Chengchun Shi; | code |
| 837 | Robust Ego-Exo Correspondence with Long-Term Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, when simply applied to the ego-exo correspondence (EEC) task, SAM 2 encounters severe difficulties due to ineffective ego-exo feature fusion and limited long-term memory capacity, especially for long videos. Addressing these problems, we propose a novel EEC framework based on SAM 2 with long-term memories by presenting a dual-memory architecture and an adaptive feature routing module inspired by Mixture-of-Experts (MoE). |
Yijun Hu; Bing Fan; Xin Gu; Haiqing Ren; Dongfang Liu; Heng Fan; Libo Zhang; | code |
| 838 | Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a probabilistic extrapolation framework for data mixture optimization that avoids rigid assumptions and explicitly models the uncertainty in performance across decision variables.To accelerate methodological progress, we build a simulator based on 472 language model pre-training runs with varying data compositions from the SlimPajama dataset. |
Thomson Yen; Andrew Wei Tung Siah; Haozhe Chen; C. Daniel Guetta; Tianyi Peng; Hongseok Namkoong; | code |
| 839 | Covering Multiple Objectives with A Small Set of Solutions Using Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional approaches often seek a single Pareto-optimal set that balances trade-offs among all objectives. In contrast, we consider a problem setting that departs from this paradigm: finding a small set of $K < T$ solutions, that collectively cover the $T$ objectives. |
Natalie Maus; Kyurae Kim; Yimeng Zeng; Haydn Thomas Jones; Fangping Wan; Marcelo Der Torossian Torres; Cesar de la Fuente-Nunez; Jacob R. Gardner; | code |
| 840 | Counterfactual Reasoning: An Analysis of In-context Emergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on a well-defined, synthetic linear regression task that requires noise abduction. |
Moritz Miller; Bernhard Schölkopf; Siyuan Guo; | code |
| 841 | Mamba Only Glances Once (MOGO): A Lightweight Framework for Efficient Video Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MOGO (Mamba Only Glances Once), an end-to-end framework for efficient video action detection built entirely on the Mamba architecture. |
Yunqing Liu; Nan Zhang; Fangjun Wang; Kengo Murata; Takuma Yamamoto; Osafumi Nakayama; Genta Suzuki; Zhiming Tan; | code |
| 842 | Efficient Low Rank Attention for Long-Context Inference in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches, such as KV quantization and pruning, reduce memory usage but suffer from numerical precision loss or suboptimal retention of key-value pairs. We introduce Low Rank Query and Key attention (LRQK), a two‐stage framework that jointly decomposes the full‐precision query and key matrices into compact rank-\(r\) factors during the prefill stage, and then uses these low-dimensional projections to compute proxy attention scores in \(\mathcal{O}(lr)\) time at each decode step. |
Li Tenghui; Guoxu Zhou; Xuyang ZHAO; Yuning Qiu; Qibin Zhao; | code |
| 843 | Sim-LLM: Optimizing LLM Inference at The Edge Through Inter-Task KV Reuse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods are computationally expensive for resource-constrained edge computing nodes. To tackle this challenge, this paper presents Sim-LLM, a novel inference optimization mechanism that leverages task similarity to reduce KV cache memory consumption for LLMs. |
Ruikun Luo; Changwei Gu; Qiang He; Feifei Chen; Song Wu; Hai Jin; Yun Yang; | code |
| 844 | Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective solution: _**Option-aware Temporally Abstracted**_ value learning, dubbed **OTA**, which incorporates temporal abstraction into the temporal-difference learning process. |
Hongjoon Ahn; Heewoong Choi; Jisu Han; Taesup Moon; | code |
| 845 | Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we equip MLLM with a comprehensive and extensible Video Toolkit, to enhance MLLM’s spatiotemporal reasoning capabilities as well as guarantee the harmony between the quantity and diversity of tools. |
Sunqi Fan; Jiashuo Cui; Meng-Hao Guo; Shuojin Yang; | code |
| 846 | AlignedGen: Aligning Style Across Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although several style-aligned image generation methods have been proposed to address this issue, they exhibit suboptimal performance and are primarily built upon the U-Net architecture, limiting their compatibility with DiT diffusion models like Flux that has emerged as a predominant model in the field of image generation. To address these limitations, we propose AlignedGen, a novel training-free style-aligned image generation method for DiT models to significantly enhance style consistency across generated images. |
Jiexuan Zhang; Yiheng Du; Qian Wang; Weiqi Li; Yu Gu; Jian Zhang; | code |
| 847 | ReCon-GS: Continuum-Preserved Guassian Streaming for Fast and Compact Reconstruction of Dynamic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose the Reconfigurable Continuum Gaussian Stream, dubbed ReCon-GS, a novel storage-aware framework that enables high-fidelity online dynamic scene reconstruction and real-time rendering. |
Jiaye Fu; Qiankun Gao; Chengxiang Wen; Yanmin Wu; Siwei Ma; Jiaqi Zhang; Jian Zhang; | code |
| 848 | Curvature Tuning: Provable Training-free Model Steering From A Single Parameter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, dominant finetuning methods typically rely on weight adaptation, often lack interpretability, and depend on heuristically chosen hyperparameters. In this paper, we take a different perspective and shift the focus from weights to activation functions, viewing them through the lens of spline operators. |
Leyang Hu; Matteo Gamba; Randall Balestriero; | code |
| 849 | NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, this paper proposes NestedFP, an LLM serving technique that supports both FP16 and FP8 models in a memoryefficient manner by overlaying FP8 parameters onto FP16 parameters, allowing both models to share the same FP16 memory footprint. |
Haeun Lee; Omin Kwon; Yeonhong Park; Jae W. Lee; | code |
| 850 | Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose two techniques to improve data efficiency in LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay. |
Yifan Sun; Jingyan Shen; Yibin Wang; Tianyu Chen; Zhendong Wang; Mingyuan Zhou; Huan Zhang; | code |
| 851 | Object Concepts Emerge from Motion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by findings in developmental psychology—where infants are shown to acquire object understanding through observation of motion—we propose a biologically inspired framework for learning object-centric visual representations in an unsupervised manner. |
Haoqian Liang; Xiaohui Wang; Zhichao Li; Ya Yang; Naiyan Wang; | code |
| 852 | Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In reinforcement learning with sparse rewards, demonstrations can accelerate learning, but determining when to imitate them remains challenging. We propose Smooth Policy Regularisation from Demonstrations (SPReD), a framework that addresses the fundamental question: when should an agent imitate a demonstration versus follow its own policy? |
Yujie Zhu; Charles Alexander Hepburn; Matthew Thorpe; Giovanni Montana; | code |
| 853 | Multi-Agent Imitation By Learning and Sampling from Factorized Soft Q-Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing MAIL methods including Behavior Cloning (BC) and Adversarial Imitation Learning (AIL) face significant challenges: BC suffers from the compounding error issue, while the very nature of adversarial optimization makes AIL prone to instability. In this work, we propose \textbf{M}ulti-\textbf{A}gent imitation by learning and sampling from \textbf{F}actor\textbf{I}zed \textbf{S}oft Q-function (MAFIS), a novel method that addresses these limitations for both online and offline MAIL settings. |
Yi-Chen Li; Zhongxiang Ling; Tao Jiang; Fuxiang Zhang; Pengyuan Wang; Lei Yuan; Zongzhang Zhang; Yang Yu; | code |
| 854 | Multimodal Negative Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We proceed to reveal the multimodal learning from a robustness perspective and theoretically derive the Multimodal Negative Learning (MNL) framework, which introduces a dynamic guidance mechanism tailored for negative learning. |
Baoquan Gong; Xiyuan Gao; Pengfei Zhu; Qinghua Hu; Bing Cao; | code |
| 855 | You Can Trust Your Clustering Model: A Parameter-free Self-Boosting Plug-in for Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By harnessing reliable local structural cues, our method aims to elevate clustering performance effectively. |
Hanyang Li; Yuheng Jia; Hui LIU; Junhui Hou; | code |
| 856 | TRAP: Targeted Redirecting of Agentic Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce TRAP, a novel generative adversarial framework that manipulates the agent’s decision-making using diffusion-based semantic injections into the vision-language embedding space. |
Hangoo Kang; Jehyeok Yeon; Gagandeep Singh; | code |
| 857 | Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, recent advances in pretrained vision-language models (VLMs) have demonstrated strong cross-task generalization, offering a promising foundation for developing unified solutions. In this paper, we introduce Uni-MuMER, which fully fine-tunes a VLM for the HMER task without modifying its architecture, effectively injecting domain-specific knowledge into a generalist framework. |
Yu Li; Jin Jiang; Jianhua Zhu; Shuai Peng; Baole Wei; Yuxuan Zhou; Liangcai Gao; | code |
| 858 | SGN: Shifted Window-Based Hierarchical Variable Grouping for Multivariate Time Series Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These limitations become particularly pronounced when dealing with complex and heterogeneous variable types. To address these challenges, we propose SwinGroupNet (SGN), which explores a novel perspective for constructing variable interaction and temporal dependency. |
Zenan Ying; Zhi Zheng; huijun hou; Tong Xu; Qi Liu; Jinke wang; Wei Chen; | code |
| 859 | ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing PointMamba-based methods rely on complex token ordering and random masking, disrupting spatial continuity and local semantic correlations. We propose \textbf{ZigzagPointMamba} to address these challenges. |
LinshuangDiao; Sensen Song; Yurong Qian; Dayong Ren; | code |
| 860 | Transstratal Adversarial Attack: Compromising Multi-Layered Defenses in Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing adversarial attacks have demonstrated vulnerabilities in isolated defense layers, they prove largely ineffective against multi-layered defenses deployed in real-world T2I systems. In this paper, we demonstrate that exploiting overlapping vulnerabilities across these distinct defense layers enables adversaries to systematically bypass the entire safeguard of T2I systems. |
Chunlong Xie; Kangjie Chen; Shangwei Guo; Shudong Zhang; Tianwei Zhang; Tao Xiang; | code |
| 861 | Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Several researches exploit the high-speed and high-dynamic event modality as a complement, but event and RGB are naturally heterogeneous, which leads to feature-level mismatch and inferior optimization of existing multi-modality methods. Different from these researches, we delve into the edge secret of both modalities for resilient fusion and propose a novel Edge-awareness Semantic Concordance framework to unify the multi-modality heterogeneous features with latent edge cues. |
Nan Bao; Yifan Zhao; Lin Zhu; Jia Li; | code |
| 862 | Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first derive the information-theoretically optimal bit allocation for Gaussianized weights under given bit budgets, revealing that fine-grained fractional-bit quantizers approaching the Gaussian distortion-rate bound are essential to achieve near-optimal quantization performance. To bridge this theoretical insight and practical implementation, we introduce Q-Palette, a versatile collection of fractional-bit quantizers that range from trellis-coded quantizers offering near-optimal distortion to simpler vector and scalar quantizers optimized for faster inference, all efficiently implemented with optimized CUDA kernels across various bitwidths. |
Deokjae Lee; Hyun Oh Song; | code |
| 863 | E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, this E2E speech synthesis also requires new security mechanisms. To tackle these challenges, we propose E2E-VGuard, a proactive defense framework for two emerging threats: (1) production LLM-based speech synthesis, and (2) the novel attack arising from ASR-driven E2E scenarios. |
Zhisheng Zhang; Derui Wang; Yifan Mi; Zhiyong Wu; JieGao; Yuxin Cao; Kai Ye; Jason Xue; Jie Hao; | code |
| 864 | Self Iterative Label Refinement Via Robust Unlabeled Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an initial step toward enhancing self-refinement for broader applications, we introduce an iterative refinement pipeline that employs the Unlabeled-Unlabeled learning framework to improve LLM-generated pseudo-labels for classification tasks. |
Hikaru Asano; Tadashi Kozuno; Yukino Baba; | code |
| 865 | Lookahead Routing for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These limitations can result in suboptimal routing decisions, particularly for complex or ambiguous queries that require deeper semantic understanding. To address this challenge, we propose Lookahead, a routing framework that foresees potential model outputs by predicting their latent representations and uses these predictions to guide model selection, thus enabling more informed routing without full inference. |
Canbin Huang; Tianyuan Shi; Yuhua Zhu; Ruijun Chen; Xiaojun Quan; | code |
| 866 | Smooth Regularization for Efficient Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a smooth regularization technique that instills a strong temporal inductive bias in video recognition models, particularly benefiting lightweight architectures. |
Gil Goldman; Raja Giryes; Mahadev Satyanarayanan; | code |
| 867 | Few-Shot Learning from Gigapixel Images Via Hierarchical Vision-Language Alignment and Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often face two key limitations: (1) insufficient modeling of interactions within the same modalities across scales (e.g., 5x and 20x) and (2) inadequate alignment between visual and textual modalities on the same scale. To address these gaps, we propose HiVE-MIL, a hierarchical vision-language framework that constructs a unified graph consisting of (1) parent–child links between coarse (5x) and fine (20x) visual/textual nodes to capture hierarchical relationships, and (2) heterogeneous intra-scale edges linking visual and textual nodes on the same scale. |
Bryan Wong; Jong woo kim; Huazhu Fu; Mun Yong Yi; | code |
| 868 | FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we analyze that FP8 quantization offers speedup primarily for large-dimensional matrix multiplications, while inherent quantization overheads diminish speedup when applied to low-rank adaptation (LoRA), which uses small-dimensional matrices for efficient fine-tuning of large language models (LLMs). To address this limitation, we propose FALQON, a novel framework that eliminates the quantization overhead from separate LoRA computational paths by directly merging LoRA adapters into an FP8-quantized backbone during fine-tuning. |
Kanghyun Choi; Hyeyoon Lee; SunJong Park; Dain Kwon; Jinho Lee; | code |
| 869 | Latent Space Factorization in LoRA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Factorized Variational Autoencoder LoRA (FVAE-LoRA), which leverages a VAE to learn two distinct latent spaces. |
Shashi Kumar; Yacouba Kaloga; John Mtr.; Petr Motlicek; Ina Kodrasi; | code |
| 870 | PairEdit: Learning Semantic Variations for Exemplar-based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce PairEdit, a novel visual editing method designed to effectively learn complex editing semantics from a limited number of image pairs or even a single image pair, without using any textual guidance. |
Haoguang Lu; Jiacheng Chen; Zhenguo Yang; Aurele Tohokantche Gnanha; Fu Lee Wang; Li Qing; Xudong Mao; | code |
| 871 | TF-MAS: Training-free Mamba2 Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing NAS methods tailored for Mamba are training-based, leading to substantial time and computational resource expenditure. To address this issue, and considering that Mamba2 is an improved version of the original Mamba, we propose a training-free NAS method specifically designed for Mamba2. |
Yi Fan; Yu-Bin Yang; | code |
| 872 | ConTextTab: A Semantics-Aware Tabular In-Context Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the other end of the spectrum, tabular ICL models based on pretrained large language models such as TabuLa-8B integrate deep semantic understanding and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. With the aim to combine the best of both these worlds, we introduce ConTextTab, integrating semantic understanding and alignment into a table-native ICL framework. |
Marco Spinaci; Marek Polewczyk; Maximilian Schambach; Sam Thelin; | code |
| 873 | Navigating The MIL Trade-Off: Flexible Pooling for Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For slides partitioned into $N$ patches, we theoretically show that LSE has a smooth transition at a critical $\beta_{\mathrm{crit}}=\mathcal{O}(\log N)$ threshold, interpolating between mean-like aggregation (stable, better generalization but less sensitive) and max-like aggregation (more sensitive but looser generalization bounds). Grounded in this analysis, we introduce Maxsoft—a novel MIL pooling function that enables flexible control over this trade-off, allowing adaptation to specific tasks and datasets. |
Hossein Jafarinia; Danial Hamdi; Amirhossein Alamdar; Elahe Zahiri; Soroush Vafaie Tabar; Alireza Alipanah; Nahal Mirzaie; Saeed Razavi; Amir Najafi; Mohammad Hossein Rohban; | code |
| 874 | FLOWING: Implicit Neural Flows for Structure-Preserving Morphing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, multilayer perceptrons (MLPs) have been explored as implicit neural representations (INRs) for modeling such deformations, due to their meshlessness and differentiability; however, extracting coherent and accurate morphings from standard MLPs typically relies on costly regularizations, which often lead to unstable training and prevent effective feature alignment. To overcome these limitations, we propose FLOWING (FLOW morphING), a framework that recasts warping as the construction of a differential vector flow, naturally ensuring continuity, invertibility, and temporal coherence by encoding structural flow prop- erties directly into the network architectures. |
Arthur Bizzi; Matias Grynberg Portnoy; Vitor Pereira Matias; Daniel Perazzo; João Paulo Silva do Monte Lima; Luiz Velho; Nuno Gonçalves; João M. Pereira; Guilherme Schardong; Tiago Novello; | code |
| 875 | MUSTAFAR: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that unstructured sparsity significantly improves KV cache compression for LLMs, enabling sparsity levels up to 70\% without compromising accuracy or requiring fine-tuning. |
Donghyeon Joo; Helya Hosseini; Ramyad Hadidi; Bahar Asgari; | code |
| 876 | Deep Edge Filter: Return of The Human-Crafted Layer in Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Deep Edge Filter, a novel approach that applies high-pass filtering to deep neural network features to improve model generalizability. |
Dongkwan Lee; Junhoo Lee; Nojun Kwak; | code |
| 877 | Accurate and Efficient Low-Rank Model Merging in Core Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the challenges associated with merging low-rank adaptations of large neural networks. |
Aniello Panariello; Daniel Marczak; Simone Magistri; Angelo Porrello; Bartłomiej Twardowski; Andrew D. Bagdanov; Simone Calderara; Joost van de Weijer; | code |
| 878 | MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel **M**emory-based online scoring queue scheme for **T**raining-free VAD (MoniTor), to address the inherent complexities in online VAD. |
Shengtian Yang; Yue Feng; Yingshi Liu; Jingrou Zhang; Jie Qin; | code |
| 879 | Go With The Flow: Fast Diffusion for Gaussian Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an analytic parametrization of a set of feasible policies for steering the distribution of a dynamical system from one Gaussian Mixture Model (GMM) to another. |
George Rapakoulias; Ali Reza Pedram; Fengjiao Liu; Lingjiong Zhu; Panagiotis Tsiotras; | code |
| 880 | How Does Topology Bias Distort Message Passing in Graph Recommender? A Dirichlet Energy Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We bridge this gap by providing an empirical and theoretical analysis from a Dirichlet energy perspective, revealing that graph message passing inherently amplifies topology bias and consistently benefits highly connected nodes. To address these limitations, we propose Test-time Simplicial Propagation (TSP), which extends message passing to higher-order simplicial complexes. |
Yanbiao Ji; Yue Ding; Dan Luo; Chang Liu; Yuxiang Lu; Xin Xin; Hongtao Lu; | code |
| 881 | Fast Constrained Sampling in Pre-trained Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an algorithm that enables fast, high-quality generation under arbitrary constraints. |
Alexandros Graikos; Nebojsa Jojic; Dimitris Samaras; | code |
| 882 | OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present OpenWorldSAM, a framework that extends the prompt-driven Segment Anything Model v2 (SAM2) to open-vocabulary scenarios by integrating multi-modal embeddings extracted from a lightweight vision-language model (VLM). |
Shiting Xiao; Rishabh Kabra; Yuhang Li; Donghyun Lee; Joao Carreira; Priyadarshini Panda; | code |
| 883 | Bootstrap Off-policy with World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, using planning for environment interaction inevitably introduces a divergence between the collected data and the policy’s actual behaviors, degrading both model learning and policy improvement. To address this, we propose BOOM (Bootstrap Off-policy with WOrld Model), a framework that tightly integrates planning and off-policy learning through a bootstrap loop: the policy initializes the planner, and the planner refines actions to bootstrap the policy through behavior alignment. |
Guojian Zhan; Likun Wang; Xiangteng Zhang; Jiaxin Gao; Masayoshi Tomizuka; Shengbo Eben Li; | code |
| 884 | Calibrating Translation Decoding with Quality Estimation on LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to directly calibrate hypothesis likelihood with translation quality from a distributional view by directly optimizing their Pearson correlation, thereby enhancing decoding effectiveness. |
Di Wu; Yibin Lei; Christof Monz; | code |
| 885 | RODS: Robust Optimization Inspired Diffusion Sampling for Detecting and Reducing Hallucination in Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reinterpret diffusion sampling through the lens of optimization and introduce RODS (Robust Optimization–inspired Diffusion Sampler), a novel method that detects and corrects high-risk sampling steps using geometric cues from the loss landscape. |
Yiqi Tian; Pengfei Jin; Mingze Yuan; Na Li; Bo Zeng; Quanzheng Li; | code |
| 886 | UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At present, multi-modal object ReID faces two core challenges: (1) learning robust features under fine-grained local noise caused by occlusion, frame loss, and other disruptions; and (2) effectively integrating heterogeneous modalities to enhance multi-modal representation. To address the above challenges, we propose a robust approach named Uncertainty-Guided Graph model for multi-modal object ReID (UGG-ReID). |
Xixi Wan; AIHUA ZHENG; Bo Jiang; Beibei Wang; Chenglong Li; Jin Tang; | code |
| 887 | Unlocking Dataset Distillation with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This trend arises because naive backpropagation through the long denoising chain leads to vanishing gradients, which prevents effective synthetic sample optimization. To address this limitation, we introduce Latent Dataset Distillation with Diffusion Models (LD3M), the first method to learn gradient-based distilled latents and class embeddings end-to-end through a pre-trained latent diffusion model. |
Brian Bernhard Moser; Federico Raue; Sebastian Palacio; Stanislav Frolov; Andreas Dengel; | code |
| 888 | $\text{G}^2\text{M}$: A Generalized Gaussian Mirror Method to Boost Feature Selection Power Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find that the unit variance assumption on mirror statistics could potentially limit the feature selection power. To address this, we generalize the mirror statistics in the Gaussian mirror framework and introduce a new approach called generalized Gaussian mirror ($\text{G}^2\text{M}$), which adaptively learns the variance and forms new test statistics. |
Hongyu Shen; Zhizhen Zhao; | code |
| 889 | InstructRestore: Region-Customized Image Restoration with Human Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new framework, namely InstructRestore, to perform region-adjustable image restoration following human instructions.With this engine and careful data screening, we construct a comprehensive dataset comprising 536,945 triplets to support the training and evaluation of this task. |
Shuaizheng Liu; Jianqi Ma; Lingchen Sun; Xiangtao Kong; Lei Zhang; | code |
| 890 | Addressing Mark Imbalance in Integration-free Marked Temporal Point Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The imbalance poses a significant challenge to the performance of the next event prediction, especially for events of rare marks. To address this issue, we propose a thresholding method, which learns thresholds to tune the mark probability normalized by the mark’s prior probability to optimize mark prediction, rather than predicting the mark directly based on the mark probability as in existing studies. |
Sishun Liu; KE DENG; Yongli Ren; Yan Wang; Xiuzhen Zhang; | code |
| 891 | Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its success, this paradigm can generate erroneous pseudo-labels, which are further amplified during training due to utilization of one-hot encoding. To address this issue, we propose ECOCSeg, a novel perspective for segmentation models that utilizes error-correcting output codes (ECOC) to create a fine-grained encoding for each class. |
Wangkai Li; Rui Sun; Zhaoyang Li; Tianzhu Zhang; | code |
| 892 | Towards Unsupervised Domain Bridging Via Image Degradation in Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DiDA, an unsupervised domain bridging approach for semantic segmentation. |
Wangkai Li; Rui Sun; Huayu Mai; Tianzhu Zhang; | code |
| 893 | MI-TRQR: Mutual Information-Based Temporal Redundancy Quantification and Reduction for Energy-Efficient Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue is further aggravated when processing static images due to the duplicated input. To mitigate this problem, we propose a parameter-free and plug-and-play module named Mutual Information-based Temporal Redundancy Quantification and Reduction (MI-TRQR), constructing energy-efficient SNNs. |
Dengfeng Xue; Wenjuan Li; Yifan Lu; Chunfeng Yuan; Yufan Liu; Wei Liu; Man Yao; Li Yang; Guoqi Li; Bing Li; Stephen Maybank; Weiming Hu; Zhetao Li; | code |
| 894 | Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: *Can algebraic geometry enhance the sharpness, robustness, and interpretability of modern neural reasoning models by equipping them with a mathematically grounded inductive bias? * To answer this, we introduce Tropical Attention, an attention mechanism grounded in tropical geometry that lifts the attention kernel into tropical projective space, where reasoning is piecewise-linear and 1-Lipschitz, thus preserving the polyhedral decision structure inherent to combinatorial reasoning. |
Baran Hashemi; Kurt Pasque; Chris Teska; Ruriko Yoshida; | code |
| 895 | FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FlowMoE, a scalable framework for scheduling multi-type task pipelines. |
Yunqi Gao; Bing Hu; Mahdi Boloursaz Mashhadi; A-Long Jin; Yanfeng Zhang; Pei Xiao; Rahim Tafazolli; Merouane Abdelkader DEBBAH; | code |
| 896 | Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they are unable to make full use of the available information and are difficult to meet diverse application requirements. To address the above limitations, we propose a new Composed Person Retrieval (CPR) task, which combines visual and textual queries to identify individuals of interest from large-scale person image databases. |
Delong Liu; Haiwen Li; Zhaohui Hou; Zhicheng Zhao; Fei Su; Yuan Dong; | code |
| 897 | CoIDO: Efficient Data Selection for Visual Instruction Tuning Via Coupled Importance-Diversity Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing data selection methods aim to mitigate this by selecting important and diverse subsets, but they often suffer from two critical drawbacks: high computational overhead from processing the entire dataset and suboptimal data selection due to separate treatment of importance and diversity. We introduce CoIDO, a novel dual-objective framework that jointly optimizes data importance and diversity to overcome these challenges. |
Yichen Yan; Ming Zhong; Qi Zhu; Xiaoling Gu; Jinpeng Chen; Huan Li; | code |
| 898 | Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by these, we propose a trustworthy reasoning framework, termed Deliberation over Priors (\texttt{DP}), which sufficiently utilizes the priors contained in KGs. |
Jie Ma; Ning Qu; Zhitao Gao; Xing Rui; Jun Liu; Hongbin Pei; Jiang Xie; Lingyun Song; Pinghui Wang; Jing Tao; su zhou; | code |
| 899 | Instant4D: 4D Gaussian Splatting in Minutes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present **Instant4D**, a monocular reconstruction system that leverages native 4D representation to efficiently process casual video sequences within minutes, without calibrated cameras or depth sensors. |
Zhanpeng Luo; Haoxi Ran; Li Lu; | code |
| 900 | Ada-KV: Optimizing KV Cache Eviction By Adaptive Budget Allocation for Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish a theoretical loss upper bound between pre- and post-eviction attention output, explaining the optimization target of prior cache eviction methods, while guiding the optimization of adaptive budget allocation. |
Yuan Feng; Junlin Lv; Yukun Cao; Xike Xie; S Kevin Zhou; | code |
| 901 | MaxSup: Overcoming Representation Collapse in Label Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analytically decompose the LS-induced loss, exposing two key terms: (i) a regularization term that dampens overconfidence only when the prediction is correct, and (ii) an error-amplification term that arises under misclassifications. |
Yuxuan Zhou; Heng Li; Zhi-Qi Cheng; Xudong Yan; Yifei Dong; Mario Fritz; Margret Keuper; | code |
| 902 | GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. |
Jialong Zhou; Lichao Wang; Xiao Yang; | code |
| 903 | Enhanced Self-Distillation Framework for Efficient Spiking Neural Network Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable high-performance SNN training under limited computational resources, we propose an enhanced self-distillation framework, jointly optimized with rate-based backpropagation. |
Xiaochen Zhao; Chengting Yu; Kairong Yu; Lei Liu; Aili Wang; | code |
| 904 | Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a causal framework that characterizes CoT reasoning through the dual lenses of sufficiency and necessity. |
Xiangning Yu; Zhuohan Wang; Linyi Yang; Haoxuan Li; Anjie Liu; Xiao Xue; Jun Wang; Mengyue Yang; | code |
| 905 | Real-World Adverse Weather Image Restoration Via Dual-Level Reinforcement Learning with High-Quality Cold Start Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This framework enables continuous adaptation to real-world conditions and achieves state-of-the-art performance across a wide range of adverse weather scenarios. |
Fuyang Liu; Jiaqi Xu; Xiaowei Hu; | code |
| 906 | Scaling Epidemic Inference on Contact Networks: Theory and Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a unified theoretical framework for analyzing disease spread dynamics on both directed and undirected contact networks, and propose an algorithm, **RAPID**, that significantly improves computational efficiency. |
Guanghui Min; Yinhan He; Chen Chen; | code |
| 907 | RiverMamba: A State Space Model for Global River Discharge and Flood Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, there is a strong need for new deep learning methodologies that are capable of modeling spatio-temporal relations to improve river discharge and flood forecasting for scientific and operational applications. To address this, we present RiverMamba, a novel deep learning model that is pretrained with long-term reanalysis data and that can forecast global river discharge and floods on a $0.05^\circ$ grid up to 7 days lead time, which is of high relevance in early warning. |
Mohamad Hakam Shams Eddin; Yikui Zhang; Stefan Kollet; Juergen Gall; | code |
| 908 | Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **U**nified Variational **A**uto-**E**ncoder for **3D** Molecular Latent Diffusion Modeling (**UAE-3D**), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error. |
Yanchen Luo; Zhiyuan Liu; Yi Zhao; Sihang Li; Hengxing Cai; Kenji Kawaguchi; Tat-Seng Chua; Yang Zhang; Xiang Wang; | code |
| 909 | Connecting Neural Models Latent Geometries with Relative Geodesic Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the *pullback metric* that captures the intrinsic structure of the latent space, while scaling efficiently to large models. |
Hanlin Yu; Berfin Inal; Georgios Arvanitidis; Søren Hauberg; Francesco Locatello; Marco Fumero; | code |
| 910 | Rethinking Approximate Gaussian Inference in Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To capture epistemic uncertainty, approximate Gaussian inference methods have been proposed. We develop a common formalism to describe such methods, which we view as outputting Gaussian distributions over the logit space. |
Bálint Mucsányi; Nathaël Da Costa; Philipp Hennig; | code |
| 911 | Learning 3D Anisotropic Noise Distributions Improves Molecular Force Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing denoising methods rely on oversimplied molecular dynamics that assume atomic motions to be isotropic and homoscedastic. To address these limitations, we propose a novel denoising framework AniDS: Anisotropic Variational Autoencoder for 3D Molecular Denoising. |
Xixian Liu; Rui Jiao; Zhiyuan Liu; Yurou Liu; Yang Liu; Ziheng Lu; Wenbing Huang; Yang Zhang; Yixin Cao; | code |
| 912 | See Through The Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing vision-based methods perform well on daytime benchmarks but struggle in nighttime scenarios due to limited visibility and challenging lighting conditions. To address these challenges, we propose LIAR, a novel framework that learns illumination-affined representations. |
Wuyuan; Zhiqiang Yan; Yigong Zhang; Xiang Li; Jian Yang; | code |
| 913 | TAMI: Taming Heterogeneity in Temporal Interactions for Temporal Graph Link Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cope with the heterogeneity, we propose a novel framework called TAMI, which contains two effective components, namely log time encoding function (LTE) and link history aggregation (LHA). |
Zhongyi Yu; Jianqiu Wu; Zhenghao Wu; Shuhan Zhong; Weifeng Su; Chul-Ho Lee; Weipeng Zhuo; | code |
| 914 | Unlocker: Disentangle The Deadlock of Learning Between Label-noisy and Long-tailed Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In real world, the observed label distribution of a dataset often mismatches its true distribution due to noisy labels. In this situation, noisy labels learning (NLL) methods directly integrated with long-tail learning (LTL) methods tend to fail due to a dilemma: NLL methods normally rely on unbiased model predictions to recover true distribution by selecting and correcting noisy labels; while LTL methods like logit adjustment depends on true distributions to adjust biased predictions, leading to a deadlock of mutual dependency defined in this paper. |
Chen Shu; HongJun Xu; Ruichi Zhang; Mengke Li; Yonggang Zhang; Yang Lu; Bo Han; Yiu-ming Cheung; Hanzi Wang; | code |
| 915 | Enhancing Interpretability in Deep Reinforcement Learning Through Semantic Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore semantic clustering properties of deep reinforcement learning (DRL) to improve its interpretability and deepen our understanding of its internal semantic organization. |
Liang Zhang; Justin Lieffers; Adarsh Pyarelal; | code |
| 916 | Towards Self-Refinement of Vision-Language Models with Triangular Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, to stimulate the self-refinement ability of VLMs, we propose a self-refinement framework based on a Triangular Consistency principle: within the image-query-answer triangle, any masked elements should be consistently and accurately reconstructed. |
Yunlong Deng; Guangyi Chen; Tianpei Gu; Lingjing Kong; Yan Li; Zeyu Tang; Kun Zhang; | code |
| 917 | DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage temporal information more efficiently, we propose DeltaFlow ($\Delta$Flow), a lightweight 3D framework that captures motion cues via a $\Delta$ scheme, extracting temporal features with minimal computational cost, regardless of the number of frames. |
Qingwen Zhang; Xiaomeng Zhu; Yushan Zhang; Yixi Cai; Olov Andersson; Patric Jensfelt; | code |
| 918 | EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these limitations, we introduce EVOREFUSE, a prompt optimization approach that generates diverse pseudo-malicious instructions consistently eliciting confident refusals across LLMs.Using EVOREFUSE, we create two novel datasets: EVOREFUSE-TEST, a benchmark of 582 pseudo-malicious instructions that outperforms the next-best benchmark with 85.34% higher average refusal triggering rate across 9 LLMs, 34.86% greater lexical diversity, and 40.03% improved LLM response confidence scores; and EVOREFUSE-ALIGN, which provides 3,000 pseudo-malicious instructions with responses for supervised and preference-based alignment training. |
Xiaorui Wu; Fei Li; Xiaofeng Mao; Xin Zhang; Li Zheng; Yuxiang Peng; Chong Teng; Donghong Ji; Zhuang Li; | code |
| 919 | 3BASiL: An Algorithmic Framework for Sparse Plus Low-Rank Compression of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent progress, existing methods often suffer from substantial performance degradation compared to dense models. In this work, we introduce $\texttt{3BASiL-TM}$, an efficient one-shot post-training method for $(\mathbf{S} + \mathbf{L}\mathbf{R})$ decomposition of LLMs that addresses this gap. |
Mehdi Makni; Xiang Meng; Rahul Mazumder; | code |
| 920 | Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Layer-Wise Modality Decomposition (LMD), a post-hoc, model-agnostic interpretability method that disentangles modality-specific information across all layers of a pretrained fusion model. |
Park Jae Hyun; Konyul Park; Daehun Kim; Junseo Park; Jun Won Choi; | code |
| 921 | Bridging Expressivity and Scalability with Adaptive Unitary SSMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we draw inspiration from adaptive and structured dynamics observed in biological neural systems and introduce the Adaptive Unitary State Space Model (AUSSM): a novel class of SSMs that leverages skew-symmetric, input-dependent recurrence to achieve unitary evolution and high expressive power. |
Arjun Karuvally; Franz Nowak; T. Anderson Keller; Carmen Amo Alonso; Terrence Sejnowski; Hava T Siegelmann; | code |
| 922 | Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitation, we propose a training-free compression framework (AsymKV) that combines homogeneity-based key merging with a mathematically proven lossless value compression. |
Wanyun Cui; Mingwei Xu; | code |
| 923 | Effects of Dropout on Performance in Long-range Graph Learning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the aforementioned algorithms, and closely related edge-dropping algorithms – DropNode, DropAgg and DropGNN – in the context of over-squashing. |
Jasraj Singh; Keyue Jiang; Brooks Paige; Laura Toni; | code |
| 924 | DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They generally predict a single scanpath of fixed, pre-defined length, which conflicts with the inherent diversity and stochastic nature of real-world visual attention. To address these challenges, we propose DiffEye, a diffusion-based training framework designed to model continuous and diverse eye movement trajectories during free viewing of natural images. |
Ozgur Kara; Harris Nisar; James Matthew Rehg; | code |
| 925 | Rethinking Multimodal Learning from The Perspective of Mitigating Classification Ability Disproportion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multimodal learning approach to dynamically balance the classification ability of weak and strong modalities by incorporating the principle of boosting. |
Qing-Yuan Jiang; Longfei Huang; Yang Yang; | code |
| 926 | Online Segment Any 3D Thing As Instance Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method establishes a new state-of-the-art, surpassing ESAM by 2.8 AP on ScanNet200 and delivering consistent gains on ScanNet, SceneNN, and 3RScan datasets, corroborating that identity-aware temporal reasoning is a crucial, previously underemphasized component for robust 3D segmentation in real-time embodied intelligence. |
Hanshi Wang; caizijian; Jin Gao; Yiwei Zhang; Weiming Hu; Ke Wang; Zhipeng Zhang; | code |
| 927 | Each Complexity Deserves A Pruning Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This observation strongly suggests that neither a fixed pruning schedule nor a heuristics layer-wise strategy can optimally accommodate the diverse complexities inherent in different inputs. To overcome this limitation, we introduce Complexity-Adaptive Pruning (AutoPrune), which is a training-free, plug-and-play framework that tailors pruning policies to varying sample and task complexities. |
Hanshi Wang; yuhao xu; Zekun Xu; Jin Gao; Yufan Liu; Weiming Hu; Ke Wang; Zhipeng Zhang; | code |
| 928 | InstructSAM: A Training-free Framework for Instruction-Oriented Remote Sensing Object Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the scarcity of semantically rich labeled data in remote sensing, we propose InstructSAM, a training-free framework for instruction-driven object recognition. |
Yijie Zheng; Weijie Wu; Qingyun Li; Xuehui Wang; Xu Zhou; Aiai Ren; Jun Shen; Long Zhao; Guoqing Li; Xue Yang; | code |
| 929 | Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, here we introduce the Structural Projection Hebbian Representation (SPHeRe), a novel unsupervised learning method that integrates orthogonality and structural information preservation through a local auxiliary nonlinear block. |
Shikuang Deng; Jiayuan Zhang; Yuhang Wu; Ting Chen; Shi Gu; | code |
| 930 | Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. |
Yi-Lun Wu; Bo-Kai Ruan; Chiang Tseng; Hong-Han Shuai; | code |
| 931 | BNMusic: Blending Environmental Noises Into Personalized Music Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, misalignment between the dominant sound and the noise—such as mismatched downbeats—often requires an excessive volume increase to achieve effective masking. Motivated by recent advances in cross-modal generation, in this work, we introduce an alternative method to acoustic masking, aiming to reduce the noticeability of environmental noises by blending them into personalized music generated based on user-provided text prompts. |
Chi Zuo; Martin B. Møller; Pablo Martínez-Nuevo; Huayang Huang; Yu Wu; Ye Zhu; | code |
| 932 | Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with provably optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs. |
Aleksandar Terzic; Nicolas Menet; Michael Hersche; Thomas Hofmann; Abbas Rahimi; | code |
| 933 | Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new method, Distilled Decoding 2 (DD2), to further advances the feasibility of one-step sampling for image AR models. |
Enshu Liu; Qian Chen; Xuefei Ning; Shengen Yan; Guohao Dai; Zinan Lin; Yu Wang; | code |
| 934 | Influence Guided Context Selection for Effective Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent advances in data selection, we reconceptualize context quality assessment as an inference-time data valuation problem and introduce the Contextual Influence Value (CI value). |
Jiale Deng; Yanyan Shen; Ziyuan Pei; Youmin Chen; Linpeng Huang; | code |
| 935 | Higher-Order Learning with Graph Neural Networks Via Hypergraph Encodings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to instead use hypergraph-level encodings based on characteristics such as hypergraph Laplacians and discrete curvature notions. |
Raphaël Pellegrin; Lukas Fesser; Melanie Weber; | code |
| 936 | DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Isolation Network, an OOD detection framework that quantifies the difficulty of separating a target test sample from the training data by solving a binary classification task. |
Felix Wagner; Pramit Saha; Harry Anthony; Alison Noble; Konstantinos Kamnitsas; | code |
| 937 | UniZyme: A Unified Protein Cleavage Site Predictor Enhanced with Enzyme Active-Site Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we introduce a unified protein cleavage site predictor named UniZyme, which can generalize across diverse enzymes. |
Chenao Li; Shuo Yan; Enyan Dai; | code |
| 938 | Black-Box Membership Inference Attack for LVLMs Via Prior Knowledge-Calibrated Memory Probing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first black-box MIA framework for LVLMs, based on a prior knowledge-calibrated memory probing mechanism. |
Jinhua Yin; Peiru Yang; Chen Yang; Huili Wang; Zhiyang Hu; Shangguang Wang; Yongfeng Huang; Tao Qi; | code |
| 939 | Analog Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a general and scalable method to robustly adapt LLMs for execution on noisy, low-precision analog hardware. |
Julian Büchel; Iason Chalas; Giovanni Acampa; An Chen; Omobayode Fagbohungbe; Hsinyu Tsai; Kaoutar El Maghraoui; Manuel Le Gallo; Abbas Rahimi; Abu Sebastian; | code |
| 940 | GAMMA: Gated Multi-hop Message Passing for Homophily-Agnostic Node Representation in GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Gated Multi-hop Message Passing (GAMMA), where nodes assess how relevant the aggregated information is from their k-hop neighbors. |
Amir Ghazizadeh Ahsaei; Rickard Ewetz; Hao Zheng; | code |
| 941 | CausalPFN: Amortized Causal Effect Estimation Via In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CausalPFN, a single transformer that *amortizes* this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out of the box. |
Vahid Balazadeh; Hamidreza Kamkari; Valentin Thomas; Junwei Ma; Bingru Li; Jesse C. Cresswell; Rahul Krishnan; | code |
| 942 | ReplaceMe: Network Simplification Via Depth Pruning and Transformer Block Linearization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. |
Dmitriy Shopkhoev; Ammar Ali; Magauiya Zhussip; Valentin Malykh; Stamatios Lefkimmiatis; Nikos Komodakis; Sergey Zagoruyko; | code |
| 943 | DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the flexibility of linguistic instructions induces substantial ambiguity across language-conditioned tasks, severely degrading algorithmic performance. To address these limitations, we present a novel method named DAIL (Distributional Aligned Learning), featuring two key components: distributional policy and semantic alignment. |
Runpeng Xie; Quanwei Wang; Hao Hu; Zherui Zhou; Ni Mu; Xiyun Li; Yiqin Yang; Shuang Xu; Qianchuan Zhao; Bo XU; | code |
| 944 | Uncertainty-Sensitive Privileged Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we quantify the DP’s information-gathering progress by estimating the prediction uncertainty of privileged observations reconstructed from partial observations, and accordingly propose the framework of Uncertainty-Sensitive Privileged Learning (USPL). |
Fan-Ming Luo; Lei Yuan; Yang Yu; | code |
| 945 | POCO: Scalable Neural Forecasting Through Population Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce POCO, a unified forecasting model that combines a lightweight univariate forecaster with a population-level encoder to capture both neuron-specific and brain-wide dynamics. |
Yu Duan; Hamza Tahir Chaudhry; Misha B. Ahrens; Christopher D Harvey; Matthew G Perich; Karl Deisseroth; Kanaka Rajan; | code |
| 946 | Less Greedy Equivalence Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Still, it faces two challenges in practice: computational cost and finite-sample accuracy. In this paper, we develop Less Greedy Equivalence Search (LGES), a variant of GES that retains its theoretical guarantees while partially addressing these limitations. |
Adiba Ejaz; Elias Bareinboim; | code |
| 947 | LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches typically rely on either large language models (LLMs) or equivariant denoising models, each with complementary strengths: LLMs excel at handling discrete atomic types but often struggle with continuous features such as atomic positions and lattice parameters, while denoising models are effective at modeling continuous variables but encounter difficulties in generating accurate atomic compositions. To bridge this gap, we propose CrysLLMGen, a hybrid framework that integrates an LLM with a diffusion model to leverage their complementary strengths for crystal material generation. |
Subhojyoti Khastagir; KISHALAY DAS; Pawan Goyal; Seung-Cheol Lee; Satadeep Bhattacharjee; Niloy Ganguly; | code |
| 948 | EDELINE: Enhancing Memory in Diffusion-based World Models Via Linear-Time Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce EDELINE, a unified world model architecture that integrates state space models with diffusion models. |
Jia-Hua Lee; Bor-Jiun Lin; Wei-Fang Sun; Chun-Yi Lee; | code |
| 949 | ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibiltiy Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present ChromFound, a foundation model tailored for scATAC-seq. |
Yifeng Jiao; Yuchen Liu; Yu Zhang; Xin Guo; Yushuai Wu; Chen Jiang; Jiyang Li; Hongwei Zhang; LIMEI HAN; Xin Gao; Yuan Qi; Yuan Cheng; | code |
| 950 | Variational Polya Tree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional techniques like Markov chain Monte Carlo (MCMC) face high computational complexity and scalability limitations, hindering the use of Bayesian nonparametric methods in deep learning. To tackle this, we introduce the variational \polya tree (VPT) model, which employs stochastic variational inference to compute posterior distributions. |
Lu Xu; Tsai Hor Chan; Lequan Yu; Kwok Fai Lam; Guosheng Yin; | code |
| 951 | Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. |
Yixiao Wang; Zishan Shao; Ting Jiang; Aditya Devarakonda; | code |
| 952 | CamSAM2: Segment Anything Accurately in Camouflaged Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, we propose Camouflaged SAM2 (CamSAM2), which enhances SAM2’s ability to handle camouflaged scenes without modifying SAM2’s parameters. |
Yuli Zhou; Yawei Li; Yuqian Fu; Luca Benini; Ender Konukoglu; Guolei Sun; | code |
| 953 | CORE: Reducing UI Exposure in Mobile Agents Via Collaboration Between Cloud and Local LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose $\textbf{CORE}$, a $\textbf{CO}$llaborative framework that combines the strengths of cloud and local LLMs to $\textbf{R}$educe UI $\textbf{E}$xposure, while maintaining task accuracy for mobile agents. |
Gucongcong Fan; Chaoyue Niu; chengfei lv; Fan Wu; Guihai Chen; | code |
| 954 | FlowRefiner: A Robust Traffic Classification Framework Against Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FlowRefiner, a robust and general traffic classification framework against label noise. |
Mingwei Zhan; Ruijie Zhao; Xianwen Deng; Zhi Xue; Qi Li; Zhuotao Liu; Guang Cheng; Ke Xu; | code |
| 955 | Doodle to Detect: A Goofy But Powerful Approach to Skeleton-based Hand Gesture Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The proposed framework incorporates a novel learnable Dynamic Range Embedding (DRE) to preserve axis-wise motion magnitudes lost during normalization and visual graph representations, enabling richer and more discriminative feature learning. |
Sang Hoon Han; Seonho Lee; Hyeok Nam; Jae Hyeon Park; Min Hee Cha; Min Geol Kim; Hyunse Lee; Sangyeon Ahn; Chae moon ju; Sung In Cho; | code |
| 956 | Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Vireo, a novel single-stage framework for OV-DGSS that unifies the strengths of OVSS and DGSS for the first time. |
Siyu Chen; Ting Han; Chengzheng Fu; Changshe Zhang; Chaolei Wang; Jinhe Su; Guorong Cai; Meiliu Wu; | code |
| 957 | No Loss, No Gain: Gated Refinement and Adaptive Compression for Prompt Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods often struggle to stably generate improved prompts, leading to low efficiency, and overlook that prompt optimization easily gets trapped in local optima. Addressing this, we propose GRACE, a framework that integrates two synergistic strategies: Gated Refinement and Adaptive Compression, achieving Efficient prompt optimization. |
Wenhang Shi; Yiren Chen; Shuqing Bian; Xinyi Zhang; Kai Tang; Pengfei Hu; Zhe Zhao; WEI LU; Xiaoyong Du; | code |
| 958 | Vicinity-Guided Discriminative Latent Diffusion for Privacy-Preserving Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Discriminative Vicinity Diffusion (DVD), a novel LDM-based framework for a more practical variant of source-free domain adaptation (SFDA): the source provider may share not only a pre-trained classifier but also an auxiliary latent diffusion module, trained once on the source data and never exposing raw source samples. |
Jing Wang; Wonho Bae; Jiahong Chen; Wenxu Wang; Junhyug Noh; | code |
| 959 | Learning Provably Improves The Convergence of Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a deterministic initialization strategy to support our theoretical results and promote stable training over extended optimization horizons by mitigating gradient explosion. |
Qingyu Song; Wei Lin; Hong Xu; | code |
| 960 | ErrorTrace: A Black-Box Traceability Mechanism Based on Model Family Error Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing IP protection methods either require access to model parameters or are vulnerable to fine-tuning attacks. To fill this gap, we propose ErrorTrace, a robust and black-box traceability mechanism for protecting LLM IP. |
Chuanchao Zang; Xiangtao Meng; Wenyu Chen; Tianshuo Cong; Zha Yaxing; Dong Qi; Zheng Li; Shanqing Guo; | code |
| 961 | DSRF: A Dynamic and Scalable Reasoning Framework for Solving RPMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenges, we propose a Dynamic and Scalable Reasoning Framework (DSRF) that greatly enhances the reasoning ability by widening the network instead of deepening it, and dynamically adjusting the reasoning network to better fit novel samples instead of a static network. |
Chengtai Li; Yuting He; Jianfeng Ren; Ruibin Bai; Yitian Zhao; Xudong Jiang; | code |
| 962 | From Softmax to Score: Transformers Can Effectively Implement In-Context Denoising Steps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformers have emerged as powerful meta-learners, with growing evidence that they implement learning algorithms within their forward pass. We study this phenomenon in the context of denoising, presenting a unified framework that shows Transformers can implement (a) manifold denoising via Laplacian flows, (b) score-based denoising from diffusion models, and (c) a generalized form of anisotropic diffusion denoising. |
Paul Rosu; Lawrence Carin; Xiang Cheng; | code |
| 963 | CURE: Concept Unlearning Via Orthogonal Representation Editing in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce CURE, a training-free concept unlearning framework that operates directly in the weight space of pre-trained diffusion models, enabling fast, interpretable, and highly specific suppression of undesired concepts. |
Shristi Das Biswas; Arani Roy; Kaushik Roy; | code |
| 964 | MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While large language models (LLMs) excel in semantic understanding tasks, they struggle with the ambiguity and contextual nuance inherent in human communication. To bridge this gap, we introduce **MetaMind**, a multi-agent framework inspired by psychological theories of metacognition, designed to emulate human-like social reasoning. |
Xuanming Zhang; Yuxuan Chen; Samuel Yeh; Sharon Li; | code |
| 965 | Adaptive Cannistraci-Hebb Network Automata Modelling of Complex Networks for Path-based Link Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Cannistraci-Hebb (CH) theory provides a topological formulation of Hebbian learning, grounded on two pillars: (1) the **minimization of external links** within local communities, and (2) the **path-based definition of local communities** that capture homophilic (similarity-driven) interactions via paths of length 2 and synergetic (diversity-driven) interactions via paths of length 3. Building on this, we introduce the Cannistraci-Hebb Adaptive (CHA) network automata, an adaptive learning machine that automatically selects the optimal CH rule and path length to model each network. |
Jialin Zhao; Alessandro Muscoloni; Umberto Michieli; Yingtao Zhang; Carlo Vittorio Cannistraci; | code |
| 966 | Discretization-free Multicalibration Through Loss Minimization Over Tree Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a discretization-free multicalibration method that directly optimizes an empirical risk objective over an ensemble of depth-two decision trees. |
Hongyi Henry Jin; Zijun Ding; Dung Daniel Ngo; Steven Wu; | code |
| 967 | Statistical Analysis of An Adversarial Bayesian Weak Supervision Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: That label model constructs a polytope of plausible labelings using the LF predictions and outputs the center of that polytope as its proposed labeling. In this paper, we attempt to theoretically study that strategy by proposing Bayesian Balsubramani-Freund (BBF), a label model that implicitly constructs a polytope of plausible labelings and selects a labeling in its interior. |
Steven An; | code |
| 968 | Physics-informed Reduced Order Modeling of Time-dependent PDEs Via Differentiable Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **Ph**ysics-**i**nformed **ROM** ($\Phi$-ROM) by incorporating differentiable PDE solvers into the training procedure. |
Nima Hosseini Dashtbayaz; Hesam Salehipour; Adrian Butscher; Nigel J. W. Morris; | code |
| 969 | Enhancing Tactile-based Reinforcement Learning for Robotic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its potential, the efficacy of tactile sensing in reinforcement learning (RL) remains inconsistent. We address this by developing self-supervised learning (SSL) methodologies to more effectively harness tactile observations, focusing on a scalable setup of proprioception and sparse binary contacts. |
Elle Miller; Trevor McInroe; David Abel; Oisin Mac Aodha; Sethu Vijayakumar; | code |
| 970 | SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose SubTrack++ that leverages Grassmannian gradient subspace tracking combined with projection-aware optimizers, enabling Adam’s internal statistics to adapt to subspace changes. |
Sahar Rajabi; Nayeema Nonta; Sirisha Rambhatla; | code |
| 971 | On Inductive Biases That Enable Generalization in Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Investigating a DiT’s pivotal attention modules, we find that locality of attention maps in a DiT’s early layers are closely associated with generalization. |
Jie An; De Wang; Pengsheng Guo; Jiebo Luo; Alex Schwing; | code |
| 972 | Neuro-Spectral Architectures for Causal Physics-Informed Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, standard MLP- based PINNs often fail to converge when dealing with complex initial value problems, leading to solutions that violate causality and suffer from a spectral bias towards low-frequency components. To address these issues, we introduce NeuSA (Neuro-Spectral Architectures), a novel class of PINNs inspired by classi- cal spectral methods, designed to solve linear and nonlinear PDEs with variable coefficients. |
Arthur Bizzi; Leonardo M. Moreira; Márcio Marques; Leonardo Mendonça; Christian Júnior de Oliveira; Vitor Balestro; Lucas dos Santos Fernandez; Daniel Yukimura; Pavel Petrov; João M. Pereira; Tiago Novello; Lucas Nissenbaum; | code |
| 973 | MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we introduce Mixture of Experts for MEMS Gyroscopes (MoE-Gyro), a novel self-supervised framework specifically designed for simultaneous over-range signal reconstruction and noise suppression.To bridge this gap, we introduce IMU Signal Enhancement Benchmark (ISEBench), an open-source benchmarking platform comprising the GyroPeak-100 dataset and a unified evaluation of IMU signal enhancement methods. |
Feiyang Pan; Shenghe Zheng; Chunyan Yin; Guangbin Dou; | code |
| 974 | Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel end-to-end RD-optimized compression framework tailored for 4DGS, aiming to enable flexible, high-fidelity rendering across varied computational platforms. |
Hyeongmin Lee; Kyungjune Baek; | code |
| 975 | Incomplete Multi-view Clustering Via Hierarchical Semantic Alignment and Cooperative Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing deep incomplete multi-view clustering approaches often rely on static fusion strategies or two-stage pipelines, leading to suboptimal fusion results and error propagation issues. To address these limitations, this paper proposes a novel incomplete multi-view clustering framework based on Hierarchical Semantic Alignment and Cooperative Completion (HSACC). |
Xiaojian Ding; Lin Zhao; Xian Li; Xiaoying Zhu; | code |
| 976 | Text-Aware Real-World Image Super-Resolution Via Diffusion Model with Joint Segmentation Decoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel diffusion-based SR framework, namely TADiSR, which integrates text-aware attention and joint segmentation decoders to recover not only natural details but also the structural fidelity of text regions in degraded real-world images. |
Qiming Hu; Linlong Fan; luoyiyan; Yuhang Yu; Xiaojie Guo; Qingnan Fan; | code |
| 977 | Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Developing upon Utility Theory and leveraging the textual-reasoning capabilities of Large Language Models (LLMs), this paper proposes an Adaptive Textual-symbolic Human-centric Reasoning framework (ATHENA) to address the optimal information integration. |
Yibo Zhao; Yang Zhao; Hongru Du; Hao Frank Yang; | code |
| 978 | Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This can lead to updates that predominantly reinforce or modulate the existing stream direction, potentially underutilizing the module’s capacity for learning entirely novel features. In this work, we introduce _Orthogonal Residual Update_: we decompose the module’s output relative to the input stream and add only the component orthogonal to this stream. |
Giyeong Oh; Woohyun Cho; Siyeol Kim; Suhwan Choi; Youngjae Yu; | code |
| 979 | Frequency-Aware Token Reduction for Efficient Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a frequency-aware token reduction strategy that improves computational efficiency while preserving performance by mitigating rank collapsing. |
Dong-Jae Lee; Jiwan Hur; Jaehyun Choi; Jaemyung Yu; Junmo Kim; | code |
| 980 | Doubly Robust Alignment for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address model misspecification, we propose a doubly robust preference optimization algorithm that remains consistent when either the preference model or the reference policy is correctly specified (without requiring both). |
Erhan Xu; Kai Ye; Hongyi Zhou; Luhan Zhu; Francesco Quinzan; Chengchun Shi; | code |
| 981 | Synergistic Tensor and Pipeline Parallelism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new synergistic tensor and pipeline parallelism schedule that simultaneously reduces both types of bubbles. |
Mengshi Qi; Jiaxuan Peng; Jie Zhang; Juan Zhu; Yong Li; Huadong Ma; | code |
| 982 | Unlearned But Not Forgotten: Data Extraction After Exact Unlearning in LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Of these, exact unlearning—which retrains the model from scratch without the target data—is widely regarded as the gold standard for mitigating privacy risks in deployment. In this paper, we revisit this assumption in a practical deployment setting where both the pre- and post-unlearning logits API are exposed, such as in open-weight scenarios. |
Xiaoyu Wu; Yifei Pang; Terrance Liu; Steven Wu; | code |
| 983 | MemEIC: A Step Toward Continual and Compositional Knowledge Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This prevalent practice neglects the inherent multimodality of LVLMs and the continuous nature of knowledge updates, potentially leading to suboptimal editing outcomes when considering the interplay between modalities and the need for ongoing knowledge refinement. To address these limitations, we propose MemEIC, a novel method for Continual and Compositional Knowledge Editing (CCKE) in LVLMs. |
Jin Seong; Jiyun Park; Wencke Liermann; Hongseok Choi; Yoonji Nam; Hyun Kim; Soojong Lim; Namhoon Lee; | code |
| 984 | Gaze-VLM: Bridging Gaze and VLMs Through Attention Regularization for Egocentric Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a gaze-regularized framework that enhances VLMs for two key egocentric understanding tasks: fine-grained future event prediction and current activity understanding. |
Anupam Pani; Yanchao Yang; | code |
| 985 | PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided By Vision-Language Planar Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in scenes dominated by large and low-texture regions, common in indoor environments, the photometric loss used to optimize 3DGS yields ambiguous geometry and fails to recover high-fidelity 3D surfaces. To overcome this limitation, we introduce PlanarGS, a 3DGS-based framework tailored for indoor scene reconstruction. |
Xirui Jin; Renbiao Jin; Boying Li; Danping Zou; Wenxian Yu; | code |
| 986 | Bridging Equivariant GNNs and Spherical CNNs for Structured Physical Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces G2Sphere, a general method for mapping object geometries to spherical signals. |
Colin Kohler; Purvik Patel; Nathan Vaska; Justin Goodwin; Matthew C. Jones; Robert Platt; Rajmonda S. Caceres; Robin Walters; | code |
| 987 | Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the use of small neural networks — which we refer to as the InfluenceNetwork — to estimate influence values, achieving up to 99% cost reduction. |
Ishika Agarwal; Dilek Hakkani-Tür; | code |
| 988 | Prompt-guided Disentangled Representation for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Prompt-guided Disentangled Representation for Action Recognition (ProDA), a novel framework that disentangles any specified actions from a multi-action scene. |
wu tianci; Guangming Zhu; Lu jiang; Siyuan Wang; Ning Wang; Nuoye Xiong; zhang liang; | code |
| 989 | Localist Topographic Expert Routing: A Barrel Cortex-Inspired Modular Network for Sensorimotor Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a brain-inspired modular architecture that treats the barrel cortex as a biologically constrained instantiation of an expert system. |
Tianfang Zhu; Dongli Hu; Jiandong Zhou; Kai Du; Anan LI; | code |
| 990 | Towards Syn-to-Real IQA: A Novel Perspective on Reshaping Synthetic Data Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make a key observation that representations learned from synthetic datasets often exhibit a discrete and clustered pattern that hinders regression performance: features of high-quality images cluster around reference images, while those of low-quality images cluster based on distortion types. |
Aobo Li; Jinjian Wu; Yongxu Liu; Leida Li; Weisheng Dong; | code |
| 991 | StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as the sequence length scales up, the memory cost for storing activation values becomes huge during the Backpropagation (BP) process, even with the application of gradient checkpointing technique. To tackle this challenge, we propose a *memory-efficient* and *exact* BP method called **StreamBP**, which performs a linear decomposition of the chain rule along the sequence dimension in a layer-wise manner, significantly reducing the memory cost of activation values and logits. |
Qijun Luo; Mengqi Li; Lei Zhao; Xiao Li; | code |
| 992 | Soft Task-Aware Routing of Experts for Equivariant Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this design overlooks information shared between invariant and equivariant learning, which leads to redundant feature learning and inefficient use of model capacity. To address this, we introduce \textbf{S}oft \textbf{T}ask-\textbf{A}ware \textbf{R}outing (STAR), a routing strategy for projection heads that models them as experts. |
Jaebyeong Jeon; Hyeonseo Jang; Jy-yong Sohn; Kibok Lee; | code |
| 993 | SNAP: Low-Latency Test-Time Adaptation with Sparse Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods rely on frequent adaptation and high computational cost, making them unsuitable for resource-constrained edge environments. To address this, we propose SNAP, a sparse TTA framework that reduces adaptation frequency and data usage while preserving accuracy. |
Hyeongheon Cha; Dong Min Kim; Hye Won Chung; Taesik Gong; Sung-Ju Lee; | code |
| 994 | Wavelet Canonical Coherence for Nonstationary Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the growing interest in multivariate time series analysis, existing methods for between-clusters dependence typically rely on the assumption of stationarity and lack the temporal resolution to capture transient, frequency-specific interactions. To overcome this limitation, we propose scale-specific wavelet canonical coherence (WaveCanCoh), a novel framework that extends canonical coherence analysis to the nonstationary setting by leveraging the multivariate locally stationary wavelet model. |
Haibo Wu; Marina I. Knight; Keiland W. Cooper; Norbert J. Fortin; Hernando Ombao; | code |
| 995 | Deep Compositional Phase Diffusion for Long Motion Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when employing these models to create composite sequences containing multiple semantically generated motion clips, they often struggle to preserve the continuity of motion dynamics at the transition boundaries between clips, resulting in awkward transitions and abrupt artifacts. To address these challenges, we present Compositional Phase Diffusion, which leverages the Semantic Phase Diffusion Module (SPDM) and Transitional Phase Diffusion Module (TPDM) to progressively incorporate semantic guidance and phase details from adjacent motion clips into the diffusion process. |
Ho Yin Au; Jie Chen; Junkun Jiang; Jingyu Xiang; | code |
| 996 | ImageNet-trained CNNs Are Not Biased Towards Texture: Revisiting Feature Reliance Through Controlled Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit this hypothesis by examining limitations in the cue-conflict experiment by Geirhos et al. To address these limitations, we propose a domain-agnostic framework that quantifies feature reliance through systematic suppression of shape, texture, and color cues, avoiding the confounds of forced-choice conflicts. |
Tom Burgert; Oliver Stoll; Paolo Rota; Begüm Demir; | code |
| 997 | FocalCodec: Low-Bitrate Speech Coding Via Focal Modulation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches face limitations, including high bitrates, the loss of either semantic or acoustic information, and the reliance on multi-codebook designs when trying to capture both, which increases architectural complexity for downstream tasks. To address these challenges, we introduce FocalCodec, an efficient low-bitrate codec based on focal modulation that utilizes a single binary codebook to compress speech between 0.16 and 0.65 kbps. |
Luca Della Libera; Francesco Paissan; Cem Subakan; Mirco Ravanelli; | code |
| 998 | Self-supervised Learning of Echocardiographic Video Representations Via Online Cluster Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DISCOVR (Distilled Image Supervision for Cross Modal Video Representation), a self-supervised dual-branch framework for cardiac ultrasound video representation learning. |
Divyanshu Mishra; Mohammadreza Salehi; Pramit Saha; Olga Patey; Aris T Papageorghiou; Yuki M Asano; Alison Noble; | code |
| 999 | FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their one-shot generation lacks stepwise physical consistency and struggles with complex stochastic processes. To address these issues, we propose FlowDAS, a generative DA framework that employs stochastic interpolants to learn state transition dynamics through step-by-step stochastic updates. |
Siyi Chen; Yixuan Jia; Qing Qu; He Sun; Jeffrey A Fessler; | code |
| 1000 | FlashMoE: Fast Distributed MoE in A Single Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing implementations suffer from low GPU utilization, significant latency overhead, and a fundamental inability to leverage task locality, primarily due to CPU-managed scheduling, host-initiated communication, and frequent kernel launches. To overcome these limitations, we develop FlashMoE, a fully GPU-resident MoE operator that fuses expert computation and inter-GPU communication into a single persistent GPU kernel. |
Osayamen Jonathan Aimuyo; Byungsoo Oh; Rachee Singh; | code |
| 1001 | Spike4DGS: Towards High-Speed Dynamic Scene Rendering with 4D Gaussian Splatting Via A Spike Camera Array Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods struggle with dynamic motion, while a single camera suffers from limited spatial coverage, making it challenging to reconstruct fine details in high-speed scenes. To address these problems, we propose Spike4DGS, the first high-speed dynamic scene rendering framework with 4D Gaussian Splatting using spike camera arrays. |
Qinghong Ye; Yiqian Chang; Jianing Li; Haoran Xu; Xuan Wang; Wei Zhang; Yonghong Tian; Peixi Peng; | code |
| 1002 | Bandit Guided Submodular Curriculum for Adaptive Subset Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce OnlineSubmod, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. |
Prateek Chanda; Prayas Agrawal; Saral Sureka; Lokesh Reddy Polu; Atharv Kshirsagar; Ganesh Ramakrishnan; | code |
| 1003 | Context-Aware Hierarchical Learning: A Two-Step Paradigm Towards Safer LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior.To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. |
Tengyun Ma; Jiaqi Yao; Daojing He; Shihao Peng; YU LI; Shaohui Liu; Zhuotao Tian; | code |
| 1004 | Per-Architecture Training-Free Metric Optimization for Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, these methods typically optimize global metric combinations over the entire search space, overlooking the varying sensitivities of different architectures to specific metrics, which may limit the final architectures’ performance. To address these challenges, we propose the Per-Architecture Training-Free Metric Optimization NAS (PO-NAS) algorithm. |
Mingzhuo Lin; Jianping Luo; | code |
| 1005 | Tree-Guided Diffusion Planner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Tree-guided Diffusion Planner (TDP), a zero-shot test-time planning framework that balances exploration and exploitation through structured trajectory generation. |
Hyeonseong Jeon; Cheolhong Min; Jaesik Park; | code |
| 1006 | Holistic Order Prediction in Natural Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At its core, InstaFormer relies on interactions between object queries and latent mask descriptors that semantically represent the same objects while carrying complementary information. We comprehensively benchmark and ablate our approach to highlight its effectiveness. |
Pierre Musacchio; Hyunmin Lee; Jaesik Park; | code |
| 1007 | FerretNet: Efficient Synthetic Image Detection Via Local Pixel Dependencies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon LPD, we propose FerretNet, a lightweight neural network with only 1.1M parameters that delivers efficient and robust synthetic image detection. |
Shuqiao Liang; Jian Liu; Chen Renzhang; Quanlong Guan; | code |
| 1008 | Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing online methods face challenge in prohibitive storage requirements primarily due to point-wise modeling that fails to exploit the motion properties. To address this limitation, we propose a novel Compact Gaussian Streaming (ComGS) framework, leveraging the locality and consistency of motion in dynamic scene, that models object-consistent Gaussian point motion through keypoint-driven motion representation. |
Jiacong Chen; Qingyu Mao; Youneng Bao; Xiandong MENG; Fanyang Meng; Ronggang Wang; Yongsheng Liang; | code |
| 1009 | Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: SDForger is a flexible and efficient framework for generating high-quality multivariate time series using LLMs. Leveraging a compact data representation, SDForger provides … |
Cécile Rousseau; Tobia Boschi; Giandomenico Cornacchia; Dhaval Salwala; Alessandra Pascale; Juan Bernabe Moreno; | code |
| 1010 | Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. |
Thomas Pethick; Wanyun Xie; Mete Erdogan; Kimon Antonakopoulos; Tony Silveti-Falls; Volkan Cevher; | code |
| 1011 | 3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, many of these tactics are semantically similar or lead to an execution error, wasting valuable resources in both cases. We address the problem of effectively pruning this search, using only synthetic data generated from previous proof attempts. |
Sean Lamont; Christian Walder; Amir Dezfouli; Paul Montague; Michael Norrish; | code |
| 1012 | JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensemble Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose JAMUN which performs MD in a smoothed, noised space of all-atom 3D conformations of molecules by utilizing the framework of walk-jump sampling. |
Ameya Daigavane; Bodhi P. Vani; Darcy Davidson; Saeed Saremi; Joshua A Rackers; Joseph Kleinhenz; | code |
| 1013 | RAPTR: Radar-based 3D Pose Estimation Using Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose \textbf{RAPTR} (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels which are considerably easier and more scalable to collect. |
Sorachi Kato; Ryoma Yataka; Pu Perry Wang; Pedro Miraldo; Takuya Fujihashi; Petros Boufounos; | code |
| 1014 | LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify two key limitations in these approaches: (i) they often suffer from instability, especially during the early stages of fine-tuning, resulting in suboptimal convergence and degraded performance on clean data, and (ii) they exhibit a suboptimal trade-off between robustness and clean data accuracy, hindering the simultaneous optimization of both objectives. To overcome these challenges, we propose **L**agrangian-**O**ptimized **R**obust **E**mbeddings (LORE), a novel unsupervised adversarial fine-tuning framework. |
Borna khodabandeh; Amirabbas Afzali; Amirhossein Afsharrad; Seyed Shahabeddin Mousavi; Sanjay Lall; Sajjad Amini; Seyed-Mohsen Moosavi-Dezfooli; | code |
| 1015 | One Head to Rule Them All: Amplifying LVLM Safety Through A Single Critical Attention Head Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although these approaches provide a certain level of protection, they tend to be resource-intensive and struggle to effectively counter sophisticated attack techniques. To tackle such issues, we propose One-head Defense (Oh Defense), a novel yet simple approach utilizing LVLMs’ internal safety capabilities. |
Junhao Xia; Haotian Zhu; Shuchao Pang; Zhigang Lu; Bing Li; Yongbin Zhou; Jason Xue; | code |
| 1016 | Improving The Straight-Through Estimator with Zeroth-Order Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of training neural networks with quantized parameters. |
Ningfeng Yang; Tor M. Aamodt; | code |
| 1017 | TokenSwap: A Lightweight Method to Disrupt Memorized Sequences in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce TokenSwap, a lightweight, post-hoc defense designed for realistic settings where the user can only access token-level outputs. |
Parjanya Prajakta Prashant; Kaustubh Ponkshe; Babak Salimi; | code |
| 1018 | Towards Prospective Medical Image Reconstruction Via Knowledge-Informed Dynamic Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, training on simulated pairs commonly leads to performance degradation on real prospective data due to the retrospective-to-prospective gap caused by incomplete imaging knowledge in simulation. To address this challenge, this paper introduces imaging Knowledge-Informed Dynamic Optimal Transport (KIDOT), a novel dynamic optimal transport framework with optimality in the sense of preserving consistency with imaging physics in transport, that conceptualizes reconstruction as finding a dynamic transport path. |
Taoran Zheng; Yan Yang; Xing Li; Xiang Gu; Jian Sun; Zongben Xu; | code |
| 1019 | Unveiling Extraneous Sampling Bias with Data Missing-Not-At-Random Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a more practical scenario: the joint distribution of the feature and rating is the same in the training and test sets. |
Chunyuan Zheng; Haocheng Yang; Haoxuan Li; Mengyue Yang; | code |
| 1020 | SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, removing self-attention mechanism not only minimally impacts regression predictions of the tracker, but also tends to generate more latent candidate boxes. Based on these analyses, we present SynCL, a novel plug-and-play synergistic training strategy designed to co-facilitate multi-task learning for detection and tracking. |
Shubo Lin; Yutong Kou; Zirui Wu; Shaoru Wang; Bing Li; Weiming Hu; Jin Gao; | code |
| 1021 | SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recognizing that visual appearance and motion patterns share fundamental physical laws in the real world, we propose a novel framework that combines visual priors and dynamic constraints within a synchronized diffusion process to generate the HOI video and motion simultaneously. |
Lingwei Dang; Ruizhi Shao; Hongwen Zhang; Wei MIN; Yebin Liu; Qingyao Wu; | code |
| 1022 | CCL: Causal-aware In-context Learning for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by causal representation learning, we propose causal-aware in-context learning (CCL). |
Hoyoon Byun; Gyeongdeok Seo; Joonseong Kang; Taero Kim; Jihee Kim; Kyungwoo Song; | code |
| 1023 | Mitigating Spurious Features in Contrastive Learning with Spectral Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify a key spectral signature of this failure: early reliance on dominant singular modes of the learned feature matrix. To mitigate this, we propose a novel framework that promotes a uniform eigenspectrum of the feature covariance matrix, encouraging diverse and semantically rich representations. |
Naghmeh Ghanooni; Waleed Mustafa; Dennis Wagner; Sophie Fellenz; Anthony Widjaja Lin; Marius Kloft; | code |
| 1024 | Information-Driven Design of Imaging Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We developed the theoretical foundations and a practical method to directly quantify mutual information between noisy measurements and unknown objects. |
Henry Pinkard; Leyla A Kabuli; Eric Markley; Tiffany Chien; Jiantao Jiao; Laura Waller; | code |
| 1025 | From Indicators to Insights: Diversity-Optimized for Medical Series-Text Decoding Via LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose InDiGO, a knowledge-aware evolutionary learning framework that integrates clinical signals and decision-making indicators through iterative optimization. |
Xiyuan Jin; Jing Wang; Ziwei Lin; Qianru Jia; Yuqing Huang; Xiaojun Ning; Zhonghua Shi; Youfang Lin; | code |
| 1026 | Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Skeleton-Cache, the first training-free test-time adaptation framework for skeleton-based zero-shot action recognition (SZAR), aimed at improving model generalization to unseen actions during inference. |
Jingmin Zhu; Anqi Zhu; Hossein Rahmani; Jun Liu; Mohammed Bennamoun; Qiuhong Ke; | code |
| 1027 | Tracking and Understanding Object Transformations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often lose track of the target object after transformation, due to significant changes in object appearance. To address this limitation, we introduce the task of Track Any State: tracking objects through transformations while detecting and describing state changes, accompanied by a new benchmark dataset, VOST-TAS. |
Yihong Sun; Xinyu Yang; Jennifer J. Sun; Bharath Hariharan; | code |
| 1028 | Differentiable Constraint-Based Causal Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Constraint-based methods offer rigorous causal discovery but are often hindered by small sample sizes, while score-based methods provide flexible optimization but typically forgo explicit conditional independence testing. |
Jincheng Zhou; Mengbo Wang; Anqi He; Yumeng Zhou; Hessam Olya; Murat Kocaoglu; Bruno Ribeiro; | code |
| 1029 | Reasoning Planning for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit this assumption through a rigorous theoretical analysis, deriving accuracy bounds for standard aggregation methods under fixed generation distributions and candidate sizes. Building on these insights, we introduce EPIC, an Ensemble Planning with Contrastive learning framework to learn a shared representation space that captures both model reasoning abilities and query-method compatibility. |
Bao Nguyen; Hieu Trung Nguyen; Ruifeng She; Xiaojin Fu; Viet Anh Nguyen; | code |
| 1030 | BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the elegance of this approach, obtaining these likelihoods requires computing costly Jacobians during integration, which is impractical for large molecular systems. To overcome this difficulty, we train an energy-based model (EBM) to approximate likelihoods using both noise contrastive estimation (NCE) and score matching, which we show outperforms the use of either objective in isolation. |
Rishal Aggarwal; Jacky Chen; Nicholas Matthew Boffi; David Koes; | code |
| 1031 | Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MaxUCB, a max $k$-armed bandit method to trade off exploring different model classes and conducting hyperparameter optimization. |
Amir Rezaei Balef; Claire Vernade; Katharina Eggensperger; | code |
| 1032 | FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent studies reveal that these models often replicate and amplify societal biases, particularly along demographic attributes like gender and race. In this paper, we introduce FairImagen (https://github.com/fuzihaofzh/FairImagen), a post-hoc debiasing framework that operates on prompt embeddings to mitigate such biases without retraining or modifying the underlying diffusion model. |
Zihao Fu; Ryan Brown; Shun Shao; Kai Rawal; Eoin D. Delaney; Chris Russell; | code |
| 1033 | A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this field, existing works typically use all available samples from sources in training, which constrains their training efficiency and may lead to suboptimal results. To address this, we propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model? |
Qingyue Zhang; Haohao Fu; Guanbo Huang; Yaoyuan Liang; Chang Chu; Tianren Peng; Yanru Wu; Qi Li; Yang Li; Shao-Lun Huang; | code |
| 1034 | C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches neglect how input characteristics affect the predictive uncertainty estimates. To address this limitation, we propose Contextual Low-Rank Adaptation (**C-LoRA**) as a novel uncertainty-aware and parameter efficient fine-tuning approach, by developing new lightweight LoRA modules contextualized to each input data sample to dynamically adapt uncertainty estimates. |
Amir Hossein Rahmati; Sanket Jantre; Weifeng Zhang; Yucheng Wang; Byung-Jun Yoon; Nathan Urban; Xiaoning Qian; | code |
| 1035 | MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although intra-layer selective updates have been explored, a general mechanism that enables fine-grained control while ensuring convergence guarantees is still lacking. To bridge this gap, we propose \textbf{MGUP}, a novel mechanism for selective updates. |
Da Chang; Ganzhao Yuan; | code |
| 1036 | Scalable Valuation of Human Feedback Through Provably Robust Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that no existing alignment methods satisfy this property. To address this, we propose Hölder-DPO, the first principled alignment loss with a provable redescending property, enabling estimation of the clean data distribution from noisy feedback. |
Masahiro Fujisawa; Masaki Adachi; Michael A Osborne; | code |
| 1037 | Accelerating Visual-Policy Learning Through Parallel Differentiable Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. |
Haoxiang You; Yilang Liu; Ian Abraham; | code |
| 1038 | Adaptive 3D Reconstruction Via Diffusion Priors and Forward Curvature-Matching Likelihood Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent diffusion-based methods have attempted to address this by combining prior models with likelihood updates, but they rely on heuristic fixed step sizes for the likelihood update that lead to slow convergence and suboptimal reconstruction quality. We advance this line of approach by integrating our novel Forward Curvature-Matching (FCM) update method with diffusion sampling. |
Seunghyeok Shin; Dabin Kim; Hongki Lim; | code |
| 1039 | JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they are inefficient since only masked tokens contribute to loss calculations at each training step. To address these limitations, we introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm, integrating the optimization efficiency of autoregressive modeling with the bidirectional comprehension capability of masked modeling. |
Qihao Duan; Bingding Huang; Zhenqiao Song; Irina Lehmann; Lei Gu; Roland Eils; Benjamin Wild; | code |
| 1040 | Elastic ViTs from Pretrained Models Without Retraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contributions include an efficient pruning strategy for pretrained Vision Transformers, a novel evolutionary approximation of Hessian off-diagonal structures, and a self-supervised importance scoring mechanism that maintains strong performance without requiring retraining or labels. |
Walter Simoncini; Michael Dorkenwald; Tijmen Blankevoort; Cees G. M. Snoek; Yuki M Asano; | code |
| 1041 | Improving Regret Approximation for Unsupervised Dynamic Environment Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Dynamic Environment Generation for UED (DEGen) to enable a denser level generator reward signal, reducing the difficulty of credit assignment and allowing for UED to scale to larger environment sizes. |
Harry Mead; Bruno Lacerda; Jakob Nicolaus Foerster; Nick Hawes; | code |
| 1042 | FrameShield: Adversarially Robust Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, pseudo-labels generated directly from the model can enable frame-level adversarial training; however, these pseudo-labels are inherently noisy, significantly degrading performance. We therefore introduce a novel Pseudo-Anomaly Generation method called Spatiotemporal Region Distortion (SRD), which creates synthetic anomalies by applying severe augmentations to localized regions in normal videos while preserving temporal consistency. |
Mojtaba Nafez; Mobina Poulaei; Nikan Vasei; Bardia soltani moakhar; Mohammad Sabokrou; Mohammad Hossein Rohban; | code |
| 1043 | How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the finding that tokens are remarkably redundant, leading to substantial inefficiency. |
Tuan Anh Tran; Duy Minh Ho Nguyen; Hoai-Chau Tran; Michael Barz; Khoa D Doan; Roger Wattenhofer; Vien Anh Ngo; Mathias Niepert; Daniel Sonntag; Paul Swoboda; | code |
| 1044 | Differentiation Through Black-Box Quadratic Programming Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This integration limits their applicability, including their use in neural network architectures and bi-level optimization tasks, restricting users to a narrow selection of solver choices. To address this limitation, we introduce **dQP**, a modular and solver-agnostic framework for plug-and-play differentiation of virtually any QP solver. |
Connor W. Magoon; Fengyu Yang; Noam Aigerman; Shahar Z. Kovalsky; | code |
| 1045 | Prior-Guided Diffusion Planning for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing guided sampling strategies such as Classifier Guidance, Classifier-Free Guidance, and Monte Carlo Sample Selection either produce suboptimal multi-modal actions, struggle with distributional drift, or incur prohibitive inference-time costs. To address these challenges, we propose \textbf{\textit{Prior Guidance}} (PG), a novel guided sampling framework that replaces the standard Gaussian prior of a behavior-cloned diffusion model with a learnable distribution, optimized via a behavior-regularized objective. |
Donghyeon Ki; JunHyeok Oh; Seong-Woong Shim; Byung-Jun Lee; | code |
| 1046 | BlockDecoder: Boosting ASR Decoders with Context and Merger Modules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe a systematic pattern across the attention distributions of decoder layers in prior architectures: the initial layers direct most attention towards building textual context, while the later layers largely focus on merging acoustic and textual information for the final predictions. Leveraging this key insight, we propose **BlockDecoder**, a novel decoder architecture comprising two distinct components: a text encoder that is purely text-based, and a **Merger** that combines information from the audio encoder and text encoder to generate output tokens. |
Darshan Prabhu; Preethi Jyothi; | code |
| 1047 | Understanding and Enhancing Mask-Based Pretraining Towards Universal Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that the behavior of mask-based pretraining can be directly characterized by test risk in high-dimensional minimum-norm (ridge-less) linear regression, without relying on further model specifications. |
Mingze Dong; Leda Wang; Yuval Kluger; | code |
| 1048 | Availability-aware Sensor Fusion Via Unified Canonical Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents availability-aware sensor fusion (ASF), a novel method that employs unified canonical projection (UCP) to enable consistency in all sensor features for fusion and cross-attention across sensors along patches (CASAP) to enhance robustness of sensor fusion against sensor degradation and failure. |
Dong-Hee Paek; Seung-Hyun Kong; | code |
| 1049 | Space Group Equivariant Crystal Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SGEquiDiff, a crystal generative model which naturally handles space group constraints with space group invariant likelihoods. |
Rees Chang; Angela Pak; Alex Guerra; Ni Zhan; Nick Richardson; Elif Ertekin; Ryan P Adams; | code |
| 1050 | Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it remains challenging in practice due to the need to learn from datasets with limited coverage of the state-action space and to generalize across long-horizon tasks. To improve on these challenges, we propose a Physics-informed (Pi) regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE) and which induces a geometric inductive bias in the learned value function. |
Vittorio Giammarino; Ruiqi Ni; Ahmed H Qureshi; | code |
| 1051 | Top-H Decoding: Adapting The Creativity and Coherence with Bounded Entropy in Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively incorporate the model confidence, this paper presents **_top-H_ decoding**. |
Erfan Baghaei Potraghloo; Seyedarmin Azizi; Souvik Kundu; Massoud Pedram; | code |
| 1052 | What Moves The Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At an opposite end of the modeling spectrum, cognitively inspired mechanistic models aim to explain scanpath behavior through interpretable cognitive mechanisms but lag far behind in predictive accuracy. In this work, we bridge this gap by using a high-performing deep model—DeepGaze III—to discover and test mechanisms that improve a leading mechanistic model, SceneWalk. |
Federico D’Agostino; Lisa Schwetlick; Matthias Bethge; Matthias Kuemmerer; | code |
| 1053 | Scalable and Adaptive Prediction Bands with Kernel Sum-of-squares Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build upon recent ideas that rely on recasting the CP problem as a statistical learning problem, directly targeting coverage and adaptivity. |
Louis Allain; Sébastien Da Veiga; Brian Staber; | code |
| 1054 | UGoDIT: Unsupervised Group Deep Image Prior Via Transferable Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **UGoDIT**—an **U**nsupervised **G**r**o**up **DI**P with **T**ransferable weights—designed for the low-data regime where only a very small number, $M$, of sub-sampled measurement vectors are available during training. |
Shijun Liang; Ismail Alkhouri; Siddhant Gautam; Qing Qu; Saiprasad Ravishankar; | code |
| 1055 | EVODiff: Entropy-aware Variance Optimized Diffusion Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. |
Shigui Li; Wei Chen; Delu Zeng; | code |
| 1056 | Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers, and Gradient Clipping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this gap, we establish **the first benchmark for FL with DP** in end-to-end ASR. Our approach centers on per-layer clipping and layer-wise gradient normalization: theoretical analysis reveals that these techniques together mitigate clipping bias and gradient heterogeneity across layers in deeper models. |
Martin Pelikan; Sheikh Shams Azam; Vitaly Feldman; Jan Silovsky; Kunal Talwar; Christopher Brinton; Tatiana Likhomanenko; | code |
| 1057 | Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Observation Delays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work provides a novel perspective on multi-agent delayed observation problems and offers an effective solution framework. |
Songchen Fu; Siang Chen; Shaojing Zhao; Letian Bai; Hong Liang; Ta Li; Yonghong Yan; | code |
| 1058 | Learning Relative Gene Expression Trends from Pathology Images in Spatial Transcriptomics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the complexity of the sequencing techniques and intrinsic variability across cells, the observed gene expression contains stochastic noise and batch effects, and estimating the absolute expression values accurately remains a significant challenge. To mitigate this, we propose a novel objective of learning relative expression patterns rather than absolute levels. |
Kazuya Nishimura; Haruka Hirose; Ryoma Bise; Kaito Shiku; Yasuhiro Kojima; | code |
| 1059 | Object-Centric Representation Learning for Enhanced 3D Semantic Scene Graph Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate through extensive analysis that the quality of object features plays a critical role in determining overall scene graph accuracy. |
KunHo Heo; GiHyeon Kim; SuYeon Kim; MyeongAh Cho; | code |
| 1060 | Riemannian Flow Matching for Brain Connectivity Matrices Via Pullback Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DiffeoCFM, an approach that enables conditional flow matching (CFM) on matrix manifolds by exploiting pullback metrics induced by global diffeomorphisms on Euclidean spaces. |
Antoine Collas; Ce Ju; Nicolas Salvy; Bertrand Thirion; | code |
| 1061 | Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro actions via Vector-Quantized VAE. |
Jian-Ting Guo; Yu-Cheng Chen; Ping-Chun Hsieh; Kuo-Hao Ho; Po-Wei Huang; Ti-Rong Wu; I-Chen Wu; | code |
| 1062 | FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We begin by investigating the internal mechanisms underlying associative behavior in MLLMs and find that: (1) middle layers play a pivotal role in shaping model’s associative tendencies, (2) modifying representations in these layers effectively regulates associative reasoning strength, and (3) hallucinations can be exploited to derive steering vectors that guide this modulation. Building on these findings, we introduce Flexible Association Control (FlexAC), a lightweight and training-free framework for modulating associative behavior in MLLMs. |
Shengming Yuan; Xinyu Lyu; Shuailong Wang; Beitao Chen; Jingkuan Song; Lianli Gao; | code |
| 1063 | SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contributions are threefold: (i) we formulate finding realistic attacks for hallucination elicitation as a constrained optimization problem over the input prompt space under semantic equivalence and coherence constraints; (ii) we introduce a constraint-preserving zeroth-order method to effectively search for adversarial yet feasible prompts; and (iii) we demonstrate through experiments on open-ended multiple-choice question answering tasks that SECA achieves higher attack success rates while incurring almost no constraint violations compared to existing methods. |
Buyun Liang; Liangzu Peng; Jinqi Luo; Darshan Thaker; Kwan Ho Ryan Chan; Rene Vidal; | code |
| 1064 | A Clean Slate for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To resolve opaque algorithmic design, we provide clean, minimalistic, single-file implementations of various model-free and model-based offline RL methods, significantly enhancing clarity and achieving substantial speed-ups. Leveraging these streamlined implementations, we propose Unifloral, a unified algorithm that encapsulates diverse prior approaches and enables development within a single, comprehensive hyperparameter space. |
Matthew Thomas Jackson; Uljad Berdica; Jarek Luca Liesen; Shimon Whiteson; Jakob Nicolaus Foerster; | code |
| 1065 | Fast MRI for All: Bridging Access Gaps By Training Without Raw Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is especially an issue for rural and under-resourced areas, where commercial MRI scanners only provide access to a final reconstructed image. To tackle these challenges, we propose Compressibility-inspired Unsupervised Learning via Parallel Imaging Fidelity (CUPID) for high-quality PD-DL training using only routine clinical reconstructed images exported from an MRI scanner. |
Yasar Utku Alcalar; Merve Gulle; Mehmet Akcakaya; | code |
| 1066 | Proxy-SPEX: Sample-Efficient Interpretability Via Sparse Feature Interactions in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe that LLM feature interactions are often *hierarchical*—higher-order interactions are accompanied by their lower-order subsets—which enables more efficient discovery. |
Landon Butler; Abhineet Agarwal; Justin Singh Kang; Yigit Efe Erginbas; Bin Yu; Kannan Ramchandran; | code |
| 1067 | Towards Interpretable and Efficient Attention: Compressing All By Contracting A Few Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified optimization objective that derives inherently interpretable and efficient attention mechanisms through algorithm unrolling. |
Qishuai Wen; Zhiyuan Huang; Chun-Guang Li; | code |
| 1068 | HyPINO: Multi-Physics Neural Operators Via HyperPINNs and The Method of Manufactured Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HyPINO, a multi-physics neural operator designed for zero-shot generalization across a broad class of parametric PDEs without requiring task-specific fine-tuning. |
Rafael Bischof; Michal Piovarci; Michael Anton Kraus; Siddhartha Mishra; Bernd Bickel; | code |
| 1069 | Zero-shot Denoising Via Neural Compression: Theoretical and Algorithmic Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the *Zero-Shot Neural Compression Denoiser* (ZS-NCD), a novel denoising framework based on neural compression. |
Ali Zafari; Xi Chen; Shirin Jalali; | code |
| 1070 | Transformer Brain Encoders Explain Human High-level Visual Responses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we employ the attention mechanism used in the transformer architecture to study how retinotopic visual features can be dynamically routed to category-selective areas in high-level visual processing. |
Hossein Adeli; Minni Sun; Nikolaus Kriegeskorte; | code |
| 1071 | DisMo: Disentangled Motion Representations for Open-World Motion Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often fail to provide an explicit representation of motion separate from content, limiting their applicability for content creators. To address this gap, we propose DisMo, a novel paradigm for learning abstract motion representations directly from raw video data via an image-space reconstruction objective. |
Thomas Ressler-Antal; Frank Fundel; Malek Ben Alaya; Stefan Andreas Baumann; Felix Krause; Ming Gui; Björn Ommer; | code |
| 1072 | Some Optimizers Are More Equal: Understanding The Role of Optimizers in Group Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through stochastic differential equation analysis of optimization dynamics in an analytically tractable setup, we demonstrate that the choice of optimization algorithm indeed influences fairness outcomes, particularly under severe imbalance. |
Mojtaba Kolahdouzi; Hatice Gunes; Ali Etemad; | code |
| 1073 | DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DEXTER, a data-free framework that employs diffusion models and large language models to generate global, textual explanations of visual classifiers. |
Simone Carnemolla; Matteo Pennisi; Sarinda Samarasinghe; Giovanni Bellitto; Simone Palazzo; Daniela Giordano; Mubarak Shah; Concetto Spampinato; | code |
| 1074 | AgentBreeder: Mitigating The AI Safety Risks of Multi-Agent Scaffolds Via Self-Improvement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AgentBreeder, a framework for multi-objective self-improving evolutionary search over scaffolds. |
J Rosser; Jakob Nicolaus Foerster; | code |
| 1075 | Boundary-Value PDEs Meet Higher-Order Differential Topology-aware GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a higher-order GNN framework that incorporates higher-order interactions based on discrete and finite element exterior calculus. |
Yunfeng Liao; Yangxin Wu; Xiucheng Li; | code |
| 1076 | Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this view offers fine-grained cues about user attention and hand-object interactions, its narrow field of view and lack of global context often lead to failures on spatially or contextually demanding queries. To address this, we introduce a framework that augments egocentric inputs with third-person (exocentric) views, providing complementary information such as global scene layout and object visibility to LVLMs. |
Insu Lee; Wooje Park; Jaeyun Jang; Minyoung Noh; Kyuhong Shim; Byonghyo Shim; | code |
| 1077 | Flattening Hierarchies with Policy Bootstrapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. |
John Luoyu Zhou; Jonathan Kao; | code |
| 1078 | Approximate Domain Unlearning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce {\em Approximate Domain Unlearning (ADU)}, a novel problem setting that requires reducing recognition accuracy for images from specified domains (e.g., {\em illustration}) while preserving accuracy for other domains (e.g., {\em real}). |
Kodai Kawamura; Yuta Goto; Rintaro Yanagi; Hirokatsu Kataoka; Go Irie; | code |
| 1079 | Information-Theoretic Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. |
Moongyu Jeon; Sangwoo Shin; Dongjae Jeon; Albert No; | code |
| 1080 | Self-Supervised Learning of Graph Representations for Network Intrusion Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent works have leveraged graph neural networks for network intrusion detection, they often decouple representation learning from anomaly detection, limiting the utility of the embeddings for identifying attacks. We propose GraphIDS, a self-supervised intrusion detection model that unifies these two stages by learning local graph representations of normal communication patterns through a masked autoencoder. |
Lorenzo Guerra; Thomas Chapuis; Guillaume Duc; Pavlo Mozharovskyi; Van-Tam Nguyen; | code |
| 1081 | MetaKoopman: Bayesian Meta-Learning of Koopman Operators for Modeling Structured Dynamics Under Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **MetaKoopman**, a Bayesian meta-learning framework for modeling nonlinear dynamics through linear latent representations. |
Mahmoud Selim; Sriharsha Bhat; Karl Henrik Johansson; | code |
| 1082 | Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we introduce SEPAL: a Scalable Embedding Propagation ALgorithm for large knowledge graphs designed to produce high-quality embeddings for downstream tasks at scale. |
Félix Lefebvre; Gaël Varoquaux; | code |
| 1083 | CLAWS:Creativity Detection for LLM-generated Solutions Using Attention Window of Sections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The lack of research on creativity assessment in reasoning primarily stems from two challenges: (1) the difficulty of defining the range of creativity, and (2) the necessity of human evaluation in the assessment process. To address these challenges, we propose CLAWS, a novel method that defines and classifies mathematical solutions into Typical, Creative, and Hallucinated categories without human evaluation, by leveraging attention weights across prompt sections and output. |
Keuntae Kim; Eunhye Jeong; Sehyeon Lee; Seohee Yoon; Yong Suk Choi; | code |
| 1084 | Periodic Skill Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that many robotic tasks—particularly those involving locomotion—require periodic behaviors across varying timescales, the ability to discover diverse periodic skills is essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a framework that discovers periodic behaviors in an unsupervised manner. |
Jonghae Park; Daesol Cho; Jusuk Lee; Dongseok Shim; Inkyu Jang; H. Jin Kim; | code |
| 1085 | Sequential Attention-based Sampling for Histopathological Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SASHA as an intelligent sampling model for medical imaging challenges that involve automated diagnosis with exceptionally large images containing sparsely informative features. |
Tarun G; Naman Malpani; Gugan Thoppe; Devarajan Sridharan; | code |
| 1086 | Time-Embedded Algorithm Unrolling for Computational MRI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Heuristically, practitioners have shown that using distinct networks may be beneficial, but this significantly increases the number of learnable parameters, making it challenging to prevent overfitting. To address these shortcomings, by taking inspirations from proximal operators with varying thresholds in approximate message passing (AMP) and the success of time-embedding in diffusion models, we propose a time-embedded algorithm unrolling scheme for inverse problems. |
Junno Yun; Yasar Utku Alcalar; Mehmet Akcakaya; | code |
| 1087 | Parameter-Free Hypergraph Neural Network for Few-Shot Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose ZEN (Zero-Parameter Hypergraph Neural Network), a fully linear and parameter-free model that achieves both expressiveness and efficiency. |
Chaewoon Bae; Doyun Choi; Jaehyun Lee; Jaemin Yoo; | code |
| 1088 | Many LLMs Are More Utilitarian Than One Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study whether a similar dynamic emerges in multi-agent LLM systems. |
Anita Keshmirian; Razan Baltaji; Babak Hemmatian; Hadi Asghari; Lav R. Varshney; | code |
| 1089 | Robust and Diverse Multi-Agent Learning Via Rational Policy Gradient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the success of adversarial optimization has been largely limited to zero-sum settings because its naive application in cooperative settings leads to a critical failure mode: agents are irrationally incentivized to *self-sabotage*, blocking the completion of tasks and halting further learning. To address this, we introduce *Rationality-preserving Policy Optimization (RPO)*, a formalism for adversarial optimization that avoids self-sabotage by ensuring agents remain *rational*—that is, their policies are optimal with respect to some possible partner policy. |
Niklas Lauffer; Ameesh Shah; Micah Carroll; Sanjit A. Seshia; Stuart Russell; Michael D Dennis; | code |
| 1090 | On Topological Descriptors for Graph Products Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider various filtrations on the (box) product of graphs and the effect on their outputs on the topological descriptors – the Euler characteristic (EC) and persistent homology (PH). |
Mattie Ji; Amauri H Souza; Vikas K Garg; | code |
| 1091 | Adversary Aware Optimization for Robust Defense Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep neural networks remain highly susceptible to adversarial attacks, where small, subtle perturbations to input images may induce misclassification. We propose a novel optimization-based purification framework that directly removes these perturbations by maximizing a Bayesian-inspired objective combining a pretrained diffusion prior with a likelihood term tailored to the adversarial perturbation space. |
Daniel Wesego; Pedram Rooshenas; | code |
| 1092 | GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images Via Rectified Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using spatial transcriptomic gene expression and corresponding histology data, we construct a novel framework, GeneFlow, to map single- and multi-cell gene expression onto paired cellular images. |
Mengbo Wang; Shourya Verma; Aditya Malusare; Luopin Wang; Yiyang Lu; Vaneet Aggarwal; Mario Sola; Ananth Grama; Nadia Atallah Lanman; | code |
| 1093 | Learning Reconfigurable Representations for Multimodal Federated Learning with Missing Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the resulting misalignment in learned representations, we propose a new federated learning framework featuring locally adaptive representations based on learnable client-side embedding controls that encode each client’s data-missing patterns. |
Manh Duong Nguyen; Trong Nghia Hoang; Thanh Trung Huynh; Quoc Viet Hung Nguyen; Phi Le Nguyen; | code |
| 1094 | Know Thyself By Knowing Others: Learning Neuron Identity from Population Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce NuCLR, a self-supervised framework that learns context-aware representations of neuron identity by modeling each neuron’s role within the broader population. |
Vinam Arora; Divyansha Lachi; Ian Jarratt Knight; Mehdi Azabou; Blake Aaron Richards; Cole Lincoln Hurwitz; Josh Siegle; Eva L Dyer; | code |
| 1095 | Adaptive Kernel Design for Bayesian Optimization Is A Piece of CAKE with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional BO methods often rely on fixed or heuristic kernel selection strategies, which can result in slow convergence or suboptimal solutions when the chosen kernel is poorly suited to the underlying objective function. To address this limitation, we propose a freshly-baked Context-Aware Kernel Evolution (CAKE) to enhance BO with large language models (LLMs). |
Richard Cornelius Suwandi; Feng Yin; Juntao Wang; Renjie Li; Tsung-Hui Chang; Sergios Theodoridis; | code |
| 1096 | The Flood Complex: Large-Scale Persistent Homology on Millions of Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This poses challenges in the complex construction and in the subsequent PH computation, prohibiting their use on large-scale point clouds. To mitigate these issues, we introduce the Flood complex, inspired by the advantages of the Alpha and Witness complex constructions. |
Florian Graf; Paolo Pellizzoni; Martin Uray; Stefan Huber; Roland Kwitt; | code |
| 1097 | SDPGO: Efficient Self-Distillation Training Meets Proximal Gradient Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current SKD methods focus mainly on replicating common features in the student model, neglecting the extraction of key features that significantly enhance student learning. Inspired by this, we devise a self-knowledge distillation framework entitled Self-Distillation training via Proximal Gradient Optimization or SDPGO, which utilizes gradient information to identify and assign greater weight to features that significantly impact classification performance, enabling the network to learn the most relevant features during training. |
Tongtong Su; Yun Liao; Fengbo Zheng; | code |
| 1098 | Scaling Image Geo-Localization to Continent Level Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent. |
Philipp Lindenberger; Paul-Edouard Sarlin; Jan Hosang; Marc Pollefeys; Simon Lynen; Eduard Trulls; | code |
| 1099 | Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Hierarchical Soft Mixture-of-Experts (HoME), a two-level token-routing layer for efficient long-context modeling, specifically designed for 3D medical image segmentation. |
Szymon Plotka; Gizem Mert; Maciej Chrabaszcz; Ewa Szczurek; Arkadiusz Sitek; | code |
| 1100 | APML: Adaptive Probabilistic Matching Loss for Robust 3D Point Cloud Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Adaptive Probabilistic Matching Loss (APML), a fully differentiable approximation of one-to-one matching that leverages Sinkhorn iterations on a temperature-scaled similarity matrix derived from pairwise distances. |
Sasan Sharifipour; Constantino Alvarez Casado; Mohammad Sabokrou; Miguel Bordallo Lopez; | code |
| 1101 | Towards A Translative Model of Sperm Whale Vocalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present WhAM (Whale Acoustics Model), the first transformer-based model capable of generating synthetic sperm whale codas from any audio prompt. |
Orr Paradise; Liangyuan Chen; Pranav Muralikrishnan; Hugo Flores García; Bryan Pardo; Roee Diamant; David Gruber; Shane Gero; Shafi Goldwasser; | code |
| 1102 | DISCO: Disentangled Communication Steering for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. |
Max Torop; Aria Masoomi; Masih Eskandar; Jennifer Dy; | code |
| 1103 | Self Supervised Learning for in Vivo Localization of Microelectrode Arrays Using Raw Local Field Potential Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a self-supervised learning framework, Lfp2vec, to infer anatomical regions directly from the neural signal in vivo. |
Tianxiao He; Malhar Patel; Chenyi Li; Anna Maslarova; Mihály Vöröslakos; Nalini Ramanathan; Wei-Lun Hung; Gyorgy Buzsaki; Erdem Varol; | code |
| 1104 | In Silico Mapping of Visual Categorical Selectivity Across The Whole Brain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an in silico approach for data-driven discovery of novel category-selectivity hypotheses based on an encoder–decoder transformer model. |
Ethan Hwang; Hossein Adeli; Wenxuan Guo; Andrew Luo; Nikolaus Kriegeskorte; | code |
| 1105 | Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CARMANIA (Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis), a self-supervised pretraining framework that augments next-token (NT) prediction with a transition-matrix (TM) loss. |
Mohammadsaleh Refahi; Mahdi Abavisani; Bahrad A. Sokhansanj; James R. Brown; Gail Rosen; | code |
| 1106 | Graph Persistence Goes Spectral Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SpectRe — a new topological descriptor for graphs that integrates spectral information into PH diagrams. |
Mattie Ji; Amauri H Souza; Vikas K Garg; | code |
| 1107 | ForceFM: Enhancing Protein-Ligand Predictions Through Force-Guided Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent deep learning (DL) approaches have substantially accelerated docking and improved prediction accuracy; however, they frequently generate conformations that lack physical plausibility due to insufficient integration of physical priors. To deal with these challenges, we propose ForceFM, a novel force-guided model that integrates a force-guided network into the generation process, steering ligand poses toward low-energy, physically realistic conformations. |
Huanlei Guo; Song Liu; Bingyi Jing; | code |
| 1108 | Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of low-budget active learning for semantic segmentation by proposing a novel two-stage selection pipeline. |
Jeongin Kim; Wonho Bae; YouLee Han; Giyeong Oh; Youngjae Yu; Danica J. Sutherland; Junhyug Noh; | code |
| 1109 | Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce **Semantic Surgery**, a novel training-free framework for zero-shot concept erasure. |
Lexiang Xiong; Liu Chengyu; Jingwen Ye; YAN LIU; Yuecong Xu; | code |
| 1110 | MixAT: Combining Continuous and Discrete Adversarial Training for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, despite their effectiveness and generalization capabilities, training with continuous perturbations does not always capture the full spectrum of vulnerabilities exploited by discrete attacks. In this work, we aim to bridge this gap by introducing MIXAT, a novel method that combines stronger discrete and faster continuous attacks during training. |
Csaba Dékány; Stefan Balauca; Dimitar Iliev Dimitrov; Robin Staab; Martin Vechev; | code |
| 1111 | VESSA: Video-based ObjEct-centric Self-Supervised Adaptation for Visual Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While continued self-supervised learning for model adaptation is common for generative language models, this strategy has not proven effective for vision-centric encoder models. To address this challenge, we introduce a novel formulation of self-supervised fine-tuning for vision foundation models, where the model is adapted to a new domain without requiring annotations, leveraging only short multi-view object-centric videos. |
Jesimon Barreto; Carlos Caetano; Andre Araujo; William Robson Schwartz; | code |
| 1112 | Attention! Your Vision Language Model Could Be Maliciously Manipulated Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we empirically and theoretically demonstrate that VLMs are particularly susceptible to image-based adversarial examples, where imperceptible perturbations can precisely manipulate each output token. |
Xiaosen Wang; Shaokang Wang; Zhijin Ge; Yuyang Luo; Shudong Zhang; | code |
| 1113 | Probing Equivariance and Symmetry Breaking in Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the trade-offs of explicit structural priors, particularly group-equivariance. |
Sharvaree Vadgama; Mohammad Mohaiminul Islam; Domas Buracas; Christian A Shewmake; Artem Moskalev; Erik J Bekkers; | code |
| 1114 | VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its potential, PTQ faces significant challenges when applied to DiTs, often resulting in severe degradation of generative quality. To address these issues, we propose VETA-DiT (**V**ariance-**E**qualized and **T**emporal **A**daptation for **Di**ffusion **T**ransformers), a dedicated quantization framework for DiTs. |
QinkaiXu; yijin liu; YangChen; Lin Yang; Li Li; Yuxiang Fu; | code |
| 1115 | Hybrid-Collaborative Augmentation and Contrastive Sample Adaptive-Differential Awareness for Robust Attributed Graph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Due to its powerful capability of self-supervised representation learning and clustering, contrastive attributed graph clustering (CAGC) has achieved great success, which mainly … |
TianxiangZhao; Youqing Wang; Jinlu Wang; Jiapu Wang; Mingliang Cui; Junbin Gao; Jipeng Guo; | code |
| 1116 | Multiresolution Analysis and Statistical Thresholding on Dynamic Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Choosing an appropriate resolution parameter is typically difficult, and can be especially problematic in domains like cybersecurity, where anomalous behavior may emerge at multiple time scales. We address this challenge by proposing ANIE ($\textbf{A}$daptive $\textbf{N}$etwork $\textbf{I}$ntensity $\textbf{E}$stimation), a multi-resolution framework designed to automatically identify the time scales at which network structure evolves, enabling the joint detection of both rapid and gradual changes. |
Raphael Romero; Tijl De Bie; Nick Heard; Alexander Modell; | code |
| 1117 | Covariate-moderated Empirical Bayes Matrix Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods that leverage side information are limited in the types of data they can incorporate, and they assume specific parametric models. Here, we introduce a novel method for this problem, *covariate-moderated empirical Bayes matrix factorization* (cEBMF). |
William R.P. Denault; Karl Tayeb; Peter Carbonetto; Jason Willwerscheid; Matthew Stephens; | code |
| 1118 | Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we explore a series of alternative training paradigms that leverage insights from hard-data-mining and dropout, simple enough to implement and use that can become the new training standard. |
Shriram M S; Xinyue Hao; Shihao Hou; Yang Lu; Laura Sevilla-Lara; Anurag Arnab; Shreyank N Gowda; | code |
| 1119 | Crucible: Quantifying The Potential of Control Algorithms Through LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing research predominantly focuses on algorithmic performance under ideal or default configurations, overlooking the critical aspect of Tuning Potential. To bridge this gap, we introduce \texttt{Crucible}, an agent that employs an LLM-driven, multi-level expert simulation to turn algorithms and defines a formalized metric to quantitatively evaluate their Tuning Potential. |
Lianchen Jia; Chaoyang Li; Qian Houde; Tianchi Huang; Jiangchuan Liu; Lifeng Sun; | code |
| 1120 | $\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we impose a relaxed orthogonality constraint, namely $\lambda$-Orthogonality regularization, while learning an affine transformation, to obtain distribution-specific adaptation while retaining the original learned representations. |
Simone Ricci; Niccolò Biondi; Federico Pernici; Ioannis Patras; Alberto Del Bimbo; | code |
| 1121 | MetaSlot: Break Through The Fixed Number of Slots in Object-Centric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts. |
Hongjia Liu; Rongzhen Zhao; Haohan Chen; Joni Pajarinen; | code |
| 1122 | FastJAM: A Fast Joint Alignment Model for Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FastJAM, a rapid, graph-based method that drastically reduces the computational complexity of joint alignment tasks. |
Omri Hirsch; Ron Shapira Weber; Shira Ifergane; Oren Freifeld; | code |
| 1123 | Rectified CFG++ for Flow Based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Rectified-CFG++, an adaptive predictor–corrector guidance that couples the deterministic efficiency of rectified flows with a geometry‑aware conditioning rule. |
Shreshth Saini; Shashank Gupta; Alan Bovik; | code |
| 1124 | Tree Ensemble Explainability Through The Hoeffding Functional Decomposition and TreeHFD Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the practical estimation of this decomposition from a data sample is still an open problem. Therefore, we introduce the TreeHFD algorithm to estimate the Hoeffding decomposition of a tree ensemble from a data sample. |
Clement Benard; | code |