ICML 2025 Papers with Code & Data
To facilitate rapid community engagement with the presented research, we have compiled an extensive index of accepted papers that have associated public code or data repositories. We list all of them in the following table. This index was generated using an automated extraction process. While we strive for completeness, some papers with public resources may have been missed. Please inform us if you discover any additional papers that should be included. Readers should be aware that some code repositories may not be made fully public until the conference officially begins.
In addition to this index, we encourage readers to explore our related resources: ICML-2025 papers & highlights: For curated summaries and key takeaways from this year’s conference. “Best Paper” Digest (ICML): A historical overview of the most influential ICML papers published since 2004.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive daily paper digests on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: ICML 2025 Papers with Code & Data
| Paper | Author(s) | Code | |
|---|---|---|---|
| 1 | Synthesizing Software Engineering Data in A Test-Driven Manner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD).To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow). |
Lei Zhang; Jiaxi Yang; Min Yang; Jian Yang; Mouxiang Chen; Jiajun Zhang; Zeyu Cui; Binyuan Hui; Junyang Lin; | code |
| 2 | PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models By Watching Stuff Drop Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work studies the process of post-training these models for accurate world modeling through the lens of the simple, yet fundamental, physics task of modeling object freefall. |
Chenyu Li; Oscar Michel; Xichen Pan; Sainan Liu; Mike Roberts; Saining Xie; | code |
| 3 | Prompt-to-Leaderboard: Prompt-Adaptive LLM Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This averaging obscures user- and prompt-specific variations in model performance. To address this, we propose Prompt-to-Leaderboard (P2L), a method that produces leaderboards specific to a prompt or set of prompts. |
Evan Frick; Connor Chen; Joseph Tennyson; Tianle Li; Wei-Lin Chiang; Anastasios Nikolas Angelopoulos; Ion Stoica; | code |
| 4 | PaperBench: Evaluating AI’s Ability to Replicate AI Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research. |
Giulio Starace; Oliver Jaffe; Dane Sherburn; James Aung; Jun Shern Chan; Leon Maksin; Rachel Dias; Evan Mays; Benjamin Kinsella; Wyatt Thompson; Johannes Heidecke; Amelia Glaese; Tejal Patwardhan; | code |
| 5 | Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe a system that uses vision-language models in a hierarchical structure, first reasoning over complex prompts and user feedback to deduce the most appropriate next step to fulfill the task, and then performing that step with low-level actions. |
Lucy Xiaoyang Shi; brian ichter; Michael Robert Equi; Liyiming Ke; Karl Pertsch; Quan Vuong; James Tanner; Anna Walling; Haohuan Wang; Niccolo Fusai; Adrian Li-Bell; Danny Driess; Lachy Groom; Sergey Levine; Chelsea Finn; | code |
| 6 | Roll The Dice & Look Before You Leap: Going Beyond The Creative Limits of Next-token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a suite of minimal algorithmic tasks that are a loose abstraction of _open-ended_ real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. |
Vaishnavh Nagarajan; Chen Henry Wu; Charles Ding; Aditi Raghunathan; | code |
| 7 | XLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce xLSTM 7B, a 7-billion-parameter LLM that combines xLSTM’s architectural benefits with targeted optimizations for fast and efficient inference. |
Maximilian Beck; Korbinian Pöppel; Phillip Lippe; Richard Kurle; Patrick M Blies; Günter Klambauer; Sebastian Böck; Sepp Hochreiter; | code |
| 8 | SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While significant progress has been made in robotic manipulation, existing approaches often fall short in generalization to complex environmental variations and addressing memory-dependent tasks. To bridge this gap, we introduce **SAM2Act**, a multi-view robotic transformer-based policy that leverages multi-resolution upsampling with visual representations from large-scale foundation model. |
Haoquan Fang; Markus Grotz; Wilbert Pumacay; Yi Ru Wang; Dieter Fox; Ranjay Krishna; Jiafei Duan; | code |
| 9 | Any4: Learned 4-bit Numeric Representation for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present any4, a learned 4-bit weight quantization solution for large language models (LLMs) providing arbitrary numeric representations without requiring pre-processing of weights or activations. |
Mostafa Elhoushi; Jeff Johnson; | code |
| 10 | Agent-as-a-Judge: Evaluate Agents with Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These approaches either focus exclusively on final outcomes—ignoring the step-by-step nature of the thinking done by agentic systems—or require excessive manual labour. To address this, we introduce the **Agent-as-a-Judge** framework, wherein agentic systems are used to evaluate agentic systems. |
Mingchen Zhuge; Changsheng Zhao; Dylan R. Ashley; Wenyi Wang; Dmitrii Khizbullin; Yunyang Xiong; Zechun Liu; Ernie Chang; Raghuraman Krishnamoorthi; Yuandong Tian; Yangyang Shi; Vikas Chandra; Jürgen Schmidhuber; | code |
| 11 | Context Is Key: A Benchmark for Forecasting with Essential Textual Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. |
Andrew Robert Williams; Arjun Ashok; Étienne Marcotte; Valentina Zantedeschi; Jithendaraa Subramanian; Roland Riachi; James Requeima; Alexandre Lacoste; Irina Rish; Nicolas Chapados; Alexandre Drouin; | code |
| 12 | Taming Rectified Flow for Inversion and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their robust generative capabilities, these models often struggle with inversion inaccuracies, which could further limit their effectiveness in downstream tasks such as image and video editing. To address this issue, we propose RF-Solver, a novel training-free sampler that effectively enhances inversion precision by mitigating the errors in the ODE-solving process of rectified flow. |
Jiangshan Wang; Junfu Pu; Zhongang Qi; Jiayi Guo; Yue Ma; Nisha Huang; Yuxin Chen; Xiu Li; Ying Shan; | code |
| 13 | Detecting Strategic Deception with Linear Probes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Monitoring outputs alone is insufficient, since the AI might produce seemingly benign outputs while its internal reasoning is misaligned. We thus evaluate if linear probes can robustly detect deception by monitoring model activations. |
Nicholas Goldowsky-Dill; Bilal Chughtai; Stefan Heimersheim; Marius Hobbhahn; | code |
| 14 | MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. |
Yuang Zhang; Jiaxi Gu; Li-Wen Wang; Han Wang; JunqiCheng; Yuefeng Zhu; FangYuan Zou; | code |
| 15 | SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. |
Yung-Sung Chuang; Benjamin Cohen-Wang; Zejiang Shen; Zhaofeng Wu; Hu Xu; Xi Victoria Lin; James R. Glass; Shang-Wen Li; Wen-tau Yih; | code |
| 16 | ShieldAgent: Shielding Agents Via Verifiable Safety Policy Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More critically, existing guardrails for LLMs are not applicable due to the complex and dynamic nature of agents. To tackle these challenges, we propose ShieldAgent, the first guardrail agent designed to enforce explicit safety policy compliance for the action trajectory of other protected agents through logical reasoning. |
Zhaorun Chen; Mintong Kang; Bo Li; | code |
| 17 | The Diffusion Duality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. |
Subham Sekhar Sahoo; Justin Deschenaux; Aaron Gokaslan; Guanghan Wang; Justin T Chiu; Volodymyr Kuleshov; | code |
| 18 | History-Guided Video Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. |
Kiwhan Song; Boyuan Chen; Max Simchowitz; Yilun Du; Russ Tedrake; Vincent Sitzmann; | code |
| 19 | Spatial Reasoning with Denoising Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Spatial Reasoning Models (SRMs), a framework to perform reasoning over sets of continuous variables via denoising generative models.To measure this, we introduce a set of benchmark tasks that test the quality of complex reasoning in generative models and can quantify hallucination. |
Christopher Wewer; Bartlomiej Pogodzinski; Bernt Schiele; Jan Eric Lenssen; | code |
| 20 | Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Aguvis, a unified vision-based framework for autonomous GUI agents that directly operates on screen images, standardizes cross-platform interactions and incorporates structured reasoning via inner monologue. |
Yiheng Xu; Zekun Wang; Junli Wang; Dunjie Lu; Tianbao Xie; Amrita Saha; Doyen Sahoo; Tao Yu; Caiming Xiong; | code |
| 21 | Weak-to-Strong Jailbreaking on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the **weak-to-strong** jailbreaking attack, an efficient inference time attack for aligned LLMs to produce harmful text. |
Xuandong Zhao; Xianjun Yang; Tianyu Pang; Chao Du; Lei Li; Yu-Xiang Wang; William Yang Wang; | code |
| 22 | Effective and Efficient Masked Image Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as \textbf{eMIGM}. |
Zebin You; Jingyang Ou; Xiaolu Zhang; Jun Hu; JUN ZHOU; Chongxuan Li; | code |
| 23 | ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers Under Domain Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ExPLoRA, a highly effective technique to improve transfer learning of pre-trained vision transformers (ViTs) under domain shifts. |
Samar Khanna; Medhanie Irgau; David B. Lobell; Stefano Ermon; | code |
| 24 | AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. |
Nicholas Carlini; Edoardo Debenedetti; Javier Rando; Milad Nasr; Florian Tramèr; | code |
| 25 | Massive Values in Self-Attention Modules Are The Key to Contextual Knowledge Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) have achieved remarkable success in contextual knowledge understanding. In this paper, we show for the first time that these concentrated massive values consistently emerge in specific regions of attention queries (Q) and keys (K) while not having such patterns in values (V) in various modern transformer-based LLMs. |
Mingyu Jin; Kai Mei; Wujiang Xu; Mingjie Sun; Ruixiang Tang; Mengnan Du; Zirui Liu; Yongfeng Zhang; | code |
| 26 | Mitigating Object Hallucination in Large Vision-Language Models Via Image-Grounded Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches require either costly training or fine-tuning, or API access to proprietary LLMs for post-generation correction. In response to these limitations, we propose Mitigating hallucinAtion via image-gRounded guIdaNcE (MARINE), a framework that is both training-free and API-free. |
Linxi Zhao; Yihe Deng; Weitong Zhang; Quanquan Gu; | code |
| 27 | FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, VAR encounters two primary challenges: (1) its complex and rigid scale design limits generalization in next scale prediction, and (2) the generator’s dependence on a discrete tokenizer with the same complex scale structure restricts modularity and flexibility in updating the tokenizer. To address these limitations, we introduce FlowAR, a general next scale prediction method featuring a streamlined scale design, where each subsequent scale is simply double the previous one. |
Sucheng Ren; Qihang Yu; Ju He; Xiaohui Shen; Alan Yuille; Liang-Chieh Chen; | code |
| 28 | ProofAug: Efficient Neural Theorem Proving Via Fine-grained Proof Structure Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. |
Haoxiong Liu; Jiacheng Sun; Zhenguo Li; Andrew C Yao; | code |
| 29 | Improving The Diffusability of Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. |
Ivan Skorokhodov; Sharath Girish; Benran Hu; Willi Menapace; Yanyu Li; Rameen Abdal; Sergey Tulyakov; Aliaksandr Siarohin; | code |
| 30 | TabICL: A Tabular Foundation Model for In-Context Learning on Large Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce TabICL, a tabular foundation model for classification, pretrained on synthetic datasets with up to 60K samples and capable of handling 500K samples on affordable resources. |
Jingang QU; David Holzmüller; Gaël Varoquaux; Marine Le Morvan; | code |
| 31 | CoMemo: LVLMs Need Image Context with Image Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, inherited LLM architectural designs introduce suboptimal characteristics for multimodal processing. First, LVLMs exhibit a bimodal distribution in attention allocation, leading to the progressive neglect of middle visual content as context expands. Second, conventional positional encoding schemes fail to preserve vital 2D structural relationships when processing dynamic high-resolution images. To address these limitations, we propose **CoMemo** – a dual-path architecture that combines a **Co**ntext image path with an image **Memo**ry path for visual processing, effectively alleviating visual information neglect. |
Shi Liu; Weijie Su; Xizhou Zhu; Wenhai Wang; Jifeng Dai; | code |
| 32 | Temporal Query Network for Efficient Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel technique called Temporal Query (TQ) to more effectively capture multivariate correlations, thereby improving model performance in MTSF tasks. |
Shengsheng Lin; Haojun Chen; Haijie Wu; Chunyun Qiu; Weiwei Lin; | code |
| 33 | MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore quantization for MoE models and highlight two key insights: 1) linear blocks exhibit varying quantization sensitivity, and 2) divergent expert activation frequencies create heterogeneous computational characteristics. Based on these observations, we introduce MxMoE, a mixed-precision optimization framework for MoE models that considers both algorithmic and system perspectives. |
Haojie Duanmu; Xiuhong Li; Zhihang Yuan; Size Zheng; Jiangfei Duan; Xingcheng Zhang; Dahua Lin; | code |
| 34 | An All-Atom Generative Model for Designing Protein Complexes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these developments, the study and modeling of multi-chain proteins remain largely uncharted, though they are vital for understanding biological functions. Recognizing the importance of these interactions, we introduce APM (all-Atom Protein generative Model), a model specifically designed for modeling multi-chain proteins. |
Ruizhe Chen; Dongyu Xue; Xiangxin Zhou; Zaixiang Zheng; xiangxiang Zeng; Quanquan Gu; | code |
| 35 | Predictive Data Selection: The Data That Predicts Is The Data That Teaches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to directly estimate the contribution of data during pretraining and select pretraining data in an efficient manner. |
KaShun SHUM; Yuzhen Huang; Hongjian Zou; dingqi; YiXuan Liao; Xiaoxin Chen; Qian Liu; Junxian He; | code |
| 36 | Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address three underexplored research questions: (1) How can activation sparsity be measured more accurately? |
Yuqi Luo; Chenyang Song; Xu Han; Yingfa Chen; Chaojun Xiao; Xiaojun Meng; Liqun Deng; Jiansheng Wei; Zhiyuan Liu; Maosong Sun; | code |
| 37 | From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike sequential recommendation, which naturally fits a generative next-item prediction paradigm, it’s hard to formulate CTR models into this paradigm without explicit feature order. Therefore, we propose a novel Supervised Feature Generation framework for CTR models, shifting from the discriminative feature interaction paradigm to the generative feature generation paradigm. |
Mingjia Yin; Junwei Pan; Hao Wang; Ximei Wang; Shangyu Zhang; Jie Jiang; Defu Lian; Enhong Chen; | code |
| 38 | What If We Recaption Billions of Web Images with LLaMA-3? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community effort, leveraging the powerful and $\textit{open-sourced}$ LLaMA-3, a GPT-4 level LLM. |
Xianhang Li; Haoqin Tu; Mude Hui; Zeyu Wang; Bingchen Zhao; Junfei Xiao; Sucheng Ren; Jieru Mei; Qing Liu; Huangjie Zheng; Yuyin Zhou; Cihang Xie; | code |
| 39 | TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TimeBridge, a novel framework designed to bridge the gap between non-stationarity and dependency modeling in long-term time series forecasting. |
Peiyuan Liu; Beiliang Wu; Yifan Hu; Naiqi Li; Tao Dai; Jigang Bao; Shu-Tao Xia; | code |
| 40 | On Path to Multimodal Generalist: General-Level and General-Bench Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this project, we introduce an evaluation framework to delineate the capabilities and behaviors of current multimodal generalists.To evaluate the comprehensive abilities of various generalists, we present a massive multimodal benchmark, **General-Bench**, which encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325,800 instances. |
Hao Fei; Yuan Zhou; Juncheng Li; Xiangtai Li; Qingshan Xu; Bobo Li; Shengqiong Wu; Yaoting Wang; Junbao Zhou; Jiahao Meng; Qingyu Shi; Zhiyuan Zhou; Liangtao Shi; Minghe Gao; Daoan Zhang; Zhiqi Ge; Siliang Tang; Kaihang Pan; Yaobo Ye; Haobo Yuan; Tao Zhang; Weiming Wu; Tianjie Ju; Zixiang Meng; Shilin Xu; Liyu Jia; Wentao Hu; Meng Luo; Jiebo Luo; Tat-Seng Chua; Shuicheng YAN; Hanwang Zhang; | code |
| 41 | Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce \emph{preference embedding}, an approach that embeds responses into a latent space to capture intricate preference structures efficiently, achieving linear query complexity. |
Yifan Zhang; Ge Zhang; Yue Wu; Kangping Xu; Quanquan Gu; | code |
| 42 | On The Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To simulate faulty agents, we propose two approaches—AutoTransform and AutoInject—which introduce mistakes into the agents’ responses. |
Jen-tse Huang; Jiaxu Zhou; Tailin Jin; Xuhui Zhou; Zixi Chen; Wenxuan Wang; Youliang Yuan; Michael Lyu; Maarten Sap; | code |
| 43 | Contrastive Private Data Synthesis Via Weighted Multi-PLM Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods relying on pre-trained models for data synthesis often struggle in data-deficient scenarios, suffering from limited sample size, inevitable generation noise and existing pre-trained model bias. To address these challenges, we propose a novel contr**A**stive private data **S**ynthesis via **W**eighted multiple **P**re-trained generative models framework, named as **WASP**. |
Tianyuan Zou; Yang Liu; Peng Li; Yufei Xiong; Jianqing Zhang; Jingjing Liu; Xiaozhou Ye; Ye Ouyang; Ya-Qin Zhang; | code |
| 44 | OR-Bench: An Over-Refusal Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a novel method for automatically generating large-scale over-refusal datasets. |
Justin Cui; Wei-Lin Chiang; Ion Stoica; Cho-Jui Hsieh; | code |
| 45 | TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the obtained reward and Bradley-Terry model, this work establishes a framework of computable loss functions with token-level reward guidance for DPO, and proposes a practical reward guidance based on the induced DPO reward. |
Mingkang Zhu; Xi Chen; Zhongdao Wang; Bei Yu; Hengshuang Zhao; Jiaya Jia; | code |
| 46 | Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, state-of-the-art unlearning methods face a critical vulnerability: they are susceptible to “relearning” the removed information from a small number of forget data points, known as relearning attacks. In this paper, we systematically investigate how to make unlearned models robust against such attacks. |
Chongyu Fan; Jinghan Jia; Yihua Zhang; Anil Ramakrishna; Mingyi Hong; Sijia Liu; | code |
| 47 | MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MELON (Masked re-Execution and TooL comparisON), a novel IPI defense. |
Kaijie Zhu; Xianjun Yang; Jindong Wang; Wenbo Guo; William Yang Wang; | code |
| 48 | Elucidating The Design Space of Multimodal Protein Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. |
Cheng-Yen Hsieh; Xinyou Wang; Daiheng Zhang; Dongyu Xue; Fei Ye; Shujian Huang; Zaixiang Zheng; Quanquan Gu; | code |
| 49 | ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This lack of context-awareness can lead to suboptimal performance, as the same action may hold different meanings depending on its surrounding context. To address this issue, we propose ActionPiece to explicitly incorporate context when tokenizing action sequences. |
Yupeng Hou; Jianmo Ni; Zhankui He; Noveen Sachdeva; Wang-Cheng Kang; Ed H. Chi; Julian McAuley; Derek Zhiyuan Cheng; | code |
| 50 | H-Tuning: Toward Low-Cost and Efficient ECG-based Cardiovascular Disease Detection with Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a holistic method (H-Tuning) for low-cost and efficient fine-tuning of pre-trained models on downstream datasets. |
Rushuang Zhou; Yuanting Zhang; Yining Dong; | code |
| 51 | T1: Advancing Language Model Reasoning Through Reinforcement Learning and Inference Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present T1 to scale RL by encouraging exploration and understand inference scaling. |
Zhenyu Hou; Xin Lv; Rui Lu; Jiajie Zhang; Yujiang Li; Zijun Yao; Juanzi Li; Jie Tang; Yuxiao Dong; | code |
| 52 | Unifying 2D and 3D Vision-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel language-conditioned mask decoder shared across 2D and 3D modalities to ground objects effectively in both RGB and RGB-D images, outperforming box-based approaches. |
Ayush Jain; Alexander Swerdlow; Yuzhou Wang; Sergio Arnaud; Ada Martin; Alexander Sax; Franziska Meier; Katerina Fragkiadaki; | code |
| 53 | RStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. |
Xinyu Guan; Li Lyna Zhang; Yifei Liu; Ning Shang; Youran Sun; Yi Zhu; Fan Yang; Mao Yang; | code |
| 54 | Improving LLM Safety Alignment with Dual-Objective Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Direct preference optimization (DPO), a widely deployed alignment method, exhibits limitations in both experimental and theoretical contexts as its loss function proves suboptimal for refusal learning. Through gradient-based analysis, we identify these shortcomings and propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge. |
Xuandong Zhao; Will Cai; Tianneng Shi; David Huang; Licong Lin; Song Mei; Dawn Song; | code |
| 55 | MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose MMedPO, a novel multimodal medical preference optimization approach that considers the clinical relevance of preference samples to enhance Med-LVLM alignment. |
Kangyu Zhu; Peng Xia; Yun Li; Hongtu Zhu; Sheng Wang; Huaxiu Yao; | code |
| 56 | Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Alongside PhyGenBench, we propose a novel evaluation framework called PhyGenEval.We will release the data and codes at https://github.com/OpenGVLab/PhyGenBench |
Fanqing Meng; Jiaqi Liao; Xinyu Tan; Quanfeng Lu; Wenqi Shao; Kaipeng Zhang; Yu Cheng; Dianqi Li; Ping Luo; | code |
| 57 | Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, vision captures intricate temporal patterns but lacks semantic context, limiting the complementary potential of these modalities. To address this, we propose Time-VLM, a novel multimodal framework that leverages pre-trained Vision-Language Models (VLMs) to bridge temporal, visual, and textual modalities for enhanced forecasting. |
Siru Zhong; Weilin Ruan; Ming Jin; Huan Li; Qingsong Wen; Yuxuan Liang; | code |
| 58 | Watch Out Your Album! On The Inadvertent Privacy Memorization in Multi-Modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how randomly generated task-irrelevant private content can become spuriously correlated with downstream objectives due to partial mini-batch training dynamics, thus causing inadvertent memorization. |
Tianjie Ju; Yi Hua; Hao Fei; Zhenyu Shao; Yubin Zheng; Haodong Zhao; Mong-Li Lee; Wynne Hsu; Zhuosheng Zhang; Gongshen Liu; | code |
| 59 | Heads Up! Large Language Models Can Perform Tasks Without Your Instruction Via Selective Attention Head Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the modules inside LLMs and demonstrate that, by simply masking or retaining specific attention heads during inference, LLMs can exhibit specific task functionalities without requiring explicit instructions or modifications to the model parameters. |
Senyu Han; Hongchuan Zeng; Kai Yu; Lu Chen; | code |
| 60 | A General Framework for Inference-time Scaling and Steering of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose FK steering, a framework for inference-time steering diffusion models with reward functions. |
Raghav Singhal; Zachary Horvitz; Ryan Teehan; Mengye Ren; Zhou Yu; Kathleen McKeown; Rajesh Ranganath; | code |
| 61 | Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. |
Shaokun Zhang; Ming Yin; Jieyu Zhang; Jiale Liu; Zhiguang Han; Jingyang Zhang; Beibin Li; Chi Wang; Huazheng Wang; Yiran Chen; Qingyun Wu; | code |
| 62 | Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we start from a new perspective to excavate the reason behind the failure generalization in AIGI detection, named the asymmetry phenomenon, where a naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked, which is proved seriously limiting the expressivity and generalization. |
Zhiyuan Yan; Jiangming Wang; Peng Jin; Ke-Yue Zhang; Chengchun Liu; Shen Chen; Taiping Yao; Shouhong Ding; Baoyuan Wu; Li Yuan; | code |
| 63 | AssistanceZero: Scalably Solving Assistance Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first scalable approach to solving assistance games and apply it to a new, challenging Minecraft-based assistance game with over $10^{400}$ possible goals. |
Cassidy Laidlaw; Eli Bronstein; Timothy Guo; Dylan Feng; Lukas Berglund; Justin Svegliato; Stuart Russell; Anca Dragan; | code |
| 64 | NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, contemporary code language models (LMs) lack the ability to handle diverse types of code-edit requirements. In this work, we attempt to overcome this shortcoming through (1) a novel synthetic data generation pipeline and (2) a robust model adaptation algorithm. |
Tushar Aggarwal; Swayam Singh; Abhijeet Awasthi; Aditya Kanade; Nagarajan Natarajan; | code |
| 65 | OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose OTTER, a novel VLA architecture that leverages these existing alignments through explicit, text-aware visual feature extraction. |
Huang Huang; Fangchen Liu; Letian Fu; Tingfan Wu; Mustafa Mukadam; Jitendra Malik; Ken Goldberg; Pieter Abbeel; | code |
| 66 | TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce TimeBase, an ultra-lightweight network to harness the power of minimalism in LTSF. |
Qihe Huang; Zhengyang Zhou; Kuo Yang; Zhongchao Yi; Xu Wang; Yang Wang; | code |
| 67 | LieRE: Lie Rotational Positional Encodings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which generalizes RoPE to high-dimensional rotation matrices by leveraging their Lie group structure. |
Sophie Ostmeier; Brian Axelrod; Maya Varma; Michael Moseley; Akshay S Chaudhari; Curtis Langlotz; | code |
| 68 | Parrot: Multilingual Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we propose Parrot, a novel approach that leverages textual guidance for visual token alignment at the language level.Additionally, we introduce the Massive Multilingual Multimodal Benchmark (MMMB), a new benchmark comprising 6 languages, 15 categories, and 12,000 questions, to assess multilingual capabilities. |
Hai-Long Sun; Da-Wei Zhou; Yang Li; Shiyin Lu; Chao Yi; Qing-Guo Chen; Zhao Xu; Weihua Luo; Kaifu Zhang; De-Chuan Zhan; Han-Jia Ye; | code |
| 69 | The Jailbreak Tax: How Useful Are Your Jailbreak Outputs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we ask whether the model outputs produced by existing jailbreaks are actually *useful*.Overall, our work proposes jailbreak utility as a new important metric in AI safety, and introduces benchmarks to evaluate existing and future jailbreaks. |
Kristina Nikolić; Luze Sun; Jie Zhang; Florian Tramèr; | code |
| 70 | AdvAgent: Controllable Blackbox Red-teaming on Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their access to sensitive resources and autonomous decision-making also introduce significant security risks, where successful attacks could lead to severe consequences. To systematically uncover these vulnerabilities, we propose AdvAgent, a black-box red-teaming framework for attacking web agents. |
Chejian Xu; Mintong Kang; Jiawei Zhang; Zeyi Liao; Lingbo Mo; Mengqi Yuan; Huan Sun; Bo Li; | code |
| 71 | (How) Do Language Models Track State? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). |
Belinda Z. Li; Zifan Carl Guo; Jacob Andreas; | code |
| 72 | NoLiMa: Long-Context Evaluation Beyond Literal Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in these benchmarks, models can exploit existing literal matches between the needle and haystack to simplify the task. To address this, we introduce NoLiMa, a benchmark extending NIAH with a carefully designed needle set, where questions and needles have minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack. |
Ali Modarressi; Hanieh Deilamsalehy; Franck Dernoncourt; Trung Bui; Ryan A. Rossi; Seunghyun Yoon; Hinrich Schuetze; | code |
| 73 | Dendritic Localized Learning: Toward Biologically Plausible Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although various alternative learning approaches have been proposed to address these issues, most either fail to satisfy all three criteria simultaneously or yield suboptimal results. Inspired by the dynamics and plasticity of pyramidal neurons, we propose Dendritic Localized Learning (DLL), a novel learning algorithm designed to overcome these challenges. |
Changze Lv; Jingwen Xu; Yiyang Lu; Xiaohua Wang; Zhenghua Wang; Zhibo Xu; Di Yu; Xin Du; Xiaoqing Zheng; Xuanjing Huang; | code |
| 74 | All-atom Diffusion Transformers: Unified Generative Modelling of Molecules and Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. |
Chaitanya K. Joshi; Xiang Fu; Yi-Lun Liao; Vahe Gharakhanyan; Benjamin Kurt Miller; Anuroop Sriram; Zachary Ward Ulissi; | code |
| 75 | AnyEdit: Edit Any Knowledge Encoded in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These limitations arise from their reliance on editing a single token’s hidden state, a limitation we term as “efficacy barrier”. To solve this, we propose \textbf{AnyEdit}, a new autoregressive editing paradigm. |
Houcheng Jiang; Junfeng Fang; Ningyu Zhang; Mingyang Wan; Guojun Ma; Xiang Wang; Xiangnan He; Tat-Seng Chua; | code |
| 76 | VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that a visual masked autoencoder, pre-trained on the ImageNet dataset, can naturally be a numeric series forecaster. |
Mouxiang Chen; Lefei Shen; Zhuo Li; Xiaoyun Joy Wang; Jianling Sun; Chenghao Liu; | code |
| 77 | Sundial: A Family of Highly Capable Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Sundial, a family of native, flexible, and scalable time series foundation models. |
Yong Liu; Guo Qin; Zhiyuan Shi; Zhi Chen; Caiyin Yang; Xiangdong Huang; Jianmin Wang; Mingsheng Long; | code |
| 78 | Efficient Federated Incomplete Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Federated multi-view clustering (FMVC) has emerged as a potential solution, but existing approaches suffer from substantial limitations, including excessive communication overhead, insufficient privacy protection, and inadequate handling of missing views. To address these issues, we propose Efficient Federated Incomplete Multi-View Clustering (EFIMVC), a novel framework that introduces a localized optimization strategy to significantly reduce communication costs while ensuring theoretical convergence. |
Suyuan Liu; Hao Yu; Hao Tan; KE LIANG; Siwei Wang; Shengju Yu; En Zhu; Xinwang Liu; | code |
| 79 | ReferSplat: Referring Segmentation in 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm.To support research in this area, we construct the first R3DGS dataset, Ref-LERF. |
Shuting He; Guangquan Jie; Changshuo Wang; Yun Zhou; Shuming Hu; Guanbin Li; Henghui Ding; | code |
| 80 | Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Unisolver, a novel Transformer model trained on diverse data and conditioned on diverse PDEs, aiming towards a universal neural PDE solver capable of solving a wide scope of PDEs. |
Hang Zhou; Yuezhou Ma; Haixu Wu; Haowen Wang; Mingsheng Long; | code |
| 81 | CommVQ: Commutative Vector Quantization for KV Cache Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) are increasingly used in applications requiring long context lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as context grows. To address this, we propose Commutative Vector Quantization (CommVQ) to significantly reduce memory usage for long-context LLM inference. |
Junyan Li; Yang Zhang; Muhammad Yusuf Hassan; Talha Chafekar; Tianle Cai; Zhile Ren; Pengsheng Guo; Foroozan Karimzadeh; Colorado Reed; Chong Wang; Chuang Gan; | code |
| 82 | Improving Your Model Ranking on Chatbot Arena By Vote Rigging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this strategy is practically inefficient because there are over $190$ models on Chatbot Arena and on average only about 1% of new battles will involve $m\_{t}$. To overcome this, we propose an **omnipresent rigging** strategy, exploiting the Elo rating mechanism of Chatbot Arena that any new vote on a battle can influence the ranking of the target model $m\_{t}$, even if $m\_{t}$ is not directly involved in the battle. |
Rui Min; Tianyu Pang; Chao Du; Qian Liu; Minhao Cheng; Min Lin; | code |
| 83 | ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce ALMTokenizer, a novel low-bitrate and semantically rich audio codec tokenizer for audio language models. |
Dongchao Yang; Songxiang Liu; Haohan Guo; Jiankun Zhao; Yuanyuan Wang; Helin Wang; Zeqian Ju; Xubo Liu; Xueyuan Chen; Xu Tan; Xixin Wu; Helen M. Meng; | code |
| 84 | Geometry Informed Tokenization of Molecules for Language Model Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although tokenization of molecular graphs exists, that for 3D geometries is largely unexplored. Here, we attempt to bridge this gap by proposing a novel method which converts molecular geometries into SE(3)-invariant 1D discrete sequences. |
Xiner Li; Limei Wang; Youzhi Luo; Carl Edwards; Shurui Gui; Yuchao Lin; Heng Ji; Shuiwang Ji; | code |
| 85 | Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **Querent**, *i.e.*, the **quer**y-awar**e** long co**nt**extual dynamic modeling framework, which achieves a theoretically bounded approximation of full self-attention while delivering practical efficiency. |
Zhengrui Guo; Qichen Sun; Jiabo MA; Lishuang Feng; Jinzhuo Wang; Hao Chen; | code |
| 86 | CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Code and data used in the paper are available at https://anonymous.4open.science/r/CASEBench-D5DB. |
Guangzhi Sun; Xiao Zhan; Shutong Feng; Phil Woodland; Jose Such; | code |
| 87 | Do NOT Think That Much for 2+3=? On The Overthinking of Long Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using a self-training paradigm, we propose strategies to mitigate overthinking, simplifying reasoning processes without compromising accuracy. |
Xingyu Chen; Jiahao Xu; Tian Liang; Zhiwei He; Jianhui Pang; Dian Yu; Linfeng Song; Qiuzhi Liu; Mengfei Zhou; Zhuosheng Zhang; Rui Wang; Zhaopeng Tu; Haitao Mi; Dong Yu; | code |
| 88 | DPO Meets PPO: Reinforced Token Optimization for RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the great successes of PPO in the alignment of state-of-the-art closed-source large language models (LLMs), its open-source implementation is still largely sub-optimal, as widely reported by numerous research studies. To address these issues, we introduce a framework that models RLHF problems as a Markov decision process (MDP), enabling the capture of fine-grained token-wise information. |
Han Zhong; Zikang Shan; Guhao Feng; Wei Xiong; Xinle Cheng; Li Zhao; Di He; Jiang Bian; Liwei Wang; | code |
| 89 | MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present MENTOR, a method that improves both the *architecture* and *optimization* of RL agents. |
Suning Huang; Zheyu Aqa Zhang; Tianhai Liang; Yihan Xu; Zhehao Kou; Chenhao Lu; Guowei Xu; Zhengrong Xue; Huazhe Xu; | code |
| 90 | Visual Autoregressive Modeling for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon the tremendous success of autoregressive models in the language domain, we propose \textbf{VARSR}, a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.Furthermore, we collect large-scale data and design a training process to obtain robust generative priors. |
Yunpeng Qu; Kun Yuan; Jinhua Hao; Kai Zhao; Qizhi Xie; Ming Sun; Chao Zhou; | code |
| 91 | Probing Visual Language Priors in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Vision-Language Models (VLMs) may over-rely on visual language priors from their training data rather than true visual reasoning. To investigate this, we introduce ViLP, a benchmark featuring deliberately out-of-distribution images synthesized via image generation models and out-of-distribution Q\&A pairs. |
Tiange Luo; Ang Cao; Gunhee Lee; Justin Johnson; Honglak Lee; | code |
| 92 | Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset. |
Shenao Zhang; Zhihan Liu; Boyi Liu; Yufeng Zhang; Yingxiang Yang; Yongfei Liu; Liyu Chen; Tao Sun; Zhaoran Wang; | code |
| 93 | SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise with high accuracy. |
Wei Huang; Haotong Qin; Yangdong Liu; Yawei Li; Qinshuo Liu; Xianglong Liu; Luca Benini; Michele Magno; Shiming Zhang; XIAOJUAN QI; | code |
| 94 | Trajectory World Models for Heterogeneous Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore pre-training world models for heterogeneous environments by addressing key transfer barriers in both data diversity and model flexibility. |
Shaofeng Yin; Jialong Wu; Siqiao Huang; Xingjian Su; Xu He; Jianye HAO; Mingsheng Long; | code |
| 95 | Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, this work proposes **S**imultaneous **M**RMP **D**iffusion (SMD), a novel approach integrating constrained optimization into the diffusion sampling process to produce collision-free, kinematically feasible trajectories.Additionally, the paper introduces a comprehensive MRMP benchmark to evaluate trajectory planning algorithms across scenarios with varying robot densities, obstacle complexities, and motion constraints. |
Jinhao Liang; Jacob K Christopher; Sven Koenig; Ferdinando Fioretto; | code |
| 96 | Inductive Gradient Adjustment for Spectral Bias in Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the linear dynamics model of MLPs and theoretically identify the empirical Neural Tangent Kernel (eNTK) matrix as a reliable link between spectral bias and training dynamics. |
Kexuan Shi; Hai Chen; Leheng Zhang; Shuhang Gu; | code |
| 97 | Private Federated Learning Using Preference-Optimized Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that the private client feedback collected by prior DP synthetic data methods (Hou et al., 2024; Xie et al., 2024) can be viewed as a preference ranking. |
Charlie Hou; Mei-Yu Wang; Yige Zhu; Daniel Lazar; Giulia Fanti; | code |
| 98 | Memorization Sinks: Isolating Memorization During LLM Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we put forward a new paradigm of MemSinks that promotes isolation of memorization by design. |
Gaurav Rohit Ghosal; Pratyush Maini; Aditi Raghunathan; | code |
| 99 | MATH-Perturb: Benchmarking LLMs’ Math Reasoning Abilities Against Hard Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization.To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations — modifications that still preserve the underlying reasoning patterns of the solutions. |
Kaixuan Huang; Jiacheng Guo; Zihao Li; Xiang Ji; Jiawei Ge; Wenzhe Li; Yingqing Guo; Tianle Cai; Hui Yuan; Runzhe Wang; Yue Wu; Ming Yin; Shange Tang; Yangsibo Huang; Chi Jin; Xinyun Chen; Chiyuan Zhang; Mengdi Wang; | code |
| 100 | Putnam-AXIOM: A Functional & Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Putnam-AXIOM, a benchmark of 522 university-level competition problems drawn from the prestigious William Lowell Putnam Mathematical Competition, and Putnam-AXIOM Variation, an unseen companion set of 100 functional variants generated by programmatically perturbing variables, and constants. |
Aryan Gulati; Brando Miranda; Eric Chen; Emily Xia; Kai Fronsdal; Bruno de Moraes Dumont; Sanmi Koyejo; | code |
| 101 | MMInference: Accelerating Pre-filling for Long-Context Visual Language Models Via Modality-Aware Permutation Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the quadratic attention complexity during the pre-filling phase remains a significant obstacle to real-world deployment. To overcome this limitation, we introduce MMInference (Multimodality Million tokens Inference), a dynamic sparse attention method that accelerates the prefilling stage for long-context multi-modal inputs. |
Yucheng Li; Huiqiang Jiang; Chengruidong Zhang; Qianhui Wu; Xufang Luo; Surin Ahn; Amir H. Abdi; Dongsheng Li; Jianfeng Gao; Yuqing Yang; Lili Qiu; | code |
| 102 | Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the untapped potential of GNNs through an enhanced framework, GNN+, which integrates six widely used techniques: edge feature integration, normalization, dropout, residual connections, feed-forward networks, and positional encoding, to effectively tackle graph-level tasks. |
Yuankai Luo; Lei Shi; Xiao-Ming Wu; | code |
| 103 | Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, supervised fine-tuning improves reasoning quality but requires vast labeled data, while reward-maximizing reinforcement learning finds top-reward solutions while neglecting the solution diversity. To fill this gap, we propose Flow of Reasoning (FoR), an efficient diversity-seeking LLM finetuning method aimed at improving reasoning quality and diversity with minimal data. |
Fangxu Yu; Lai Jiang; Haoqiang Kang; Shibo Hao; Lianhui Qin; | code |
| 104 | FG-CLIP: Fine-Grained Visual and Textual Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we propose Fine-Grained CLIP (FG-CLIP), which enhances fine-grained understanding through three key innovations.We construct a comprehensive dataset, termed FineHARD, by integrating high-quality region-specific annotations with challenging fine-grained negative samples. |
Chunyu Xie; Bin Wang; Fanjing Kong; Jincheng Li; Dawei Liang; Gengshen Zhang; Dawei Leng; Yuhui Yin; | code |
| 105 | EARTH: Epidemiology-Aware Neural ODE with Continuous Disease Transmission Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing deep-learning methods often overlook the dynamic nature of epidemics and fail to account for the specific mechanisms of disease transmission. In response to these challenges, we introduce an innovative end-to-end framework called Epidemiology-Aware Neural ODE with Continuous Disease Transmission Graph (EARTH) in this paper. |
Guancheng Wan; Zewen Liu; Xiaojun Shan; Max SY Lau; B. Aditya Prakash; Wei Jin; | code |
| 106 | How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Measuring dataset contamination thus becomes essential to ensure that performance evaluations genuinely reflect a model’s ability to generalize to unseen data, rather than relying on memorized examples. To address this problem, we propose Kernel Divergence Score (KDS), a novel method that evaluates dataset contamination by computing the divergence between the kernel similarity matrix of sample embeddings, before and after fine-tuning on the benchmark dataset. |
Hyeong Kyu Choi; Maxim Khanov; Hongxin Wei; Yixuan Li; | code |
| 107 | Componential Prompt-Knowledge Alignment for Domain Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This arises from the random positioning of knowledge components within prompts, where irrelevant component fusion introduces interference. To address this, we propose Componential Prompt-Knowledge Alignment (KA-Prompt), a novel prompt-based DIL method that introduces component-aware prompt-knowledge alignment during training, significantly improving both the learning and inference capacity of the model. |
Kunlun Xu; Xu Zou; Gang Hua; Jiahuan Zhou; | code |
| 108 | GradPS: Resolving Futile Neurons in Parameter Sharing Network for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that the existence of futile neurons whose update is canceled out by gradient conflicts among agents leads to poor learning efficiency and diversity. To address this deficiency, we propose GradPS, a gradient-based PS method. |
Haoyuan Qin; Zhengzhu Liu; Chenxing Lin; Chennan Ma; Songzhu Mei; Siqi Shen; Cheng Wang; | code |
| 109 | Ultra-Resolution Adaptation with Ease Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, training models for high-resolution image generation remains challenging, particularly when training data and computational resources are limited. In this paper, we explore this practical problem from two key perspectives: data and parameter efficiency, and propose a set of key guidelines for ultra-resolution adaptation termed URAE. |
Ruonan Yu; Songhua Liu; Zhenxiong Tan; Xinchao Wang; | code |
| 110 | PARM: Multi-Objective Test-Time Alignment Via Preference-Aware Autoregressive Reward Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, GenARM (Xu et al., 2025) first independently trains Autoregressive Reward Models (ARMs) for each preference dimension without awareness of each other, then combines their outputs based on user-specific preference vectors during inference to achieve multi-objective test-time alignment, leading to two key limitations: the need for *multiple* ARMs increases the inference cost, and the *separate* training of ARMs causes the misalignment between the guided generation and the user preferences. To address these issues, we propose Preference-aware ARM (PARM), a *single* unified ARM trained across *all* preference dimensions. |
Baijiong Lin; Weisen Jiang; Yuancheng Xu; Hao Chen; Ying-Cong Chen; | code |
| 111 | FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose FOUNDER, a framework that integrates the generalizable knowledge embedded in FMs with the dynamic modeling capabilities of WMs to enable open-ended task solving in embodied environments in a reward-free manner. |
Yucen Wang; Rui Yu; Shenghua Wan; Le Gan; De-Chuan Zhan; | code |
| 112 | A Mixture-Based Framework for Guiding Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a novel mixture approximation of these intermediate distributions. Since direct gradient-based sampling of these mixtures is infeasible due to intractable terms, we propose a practical method based on Gibbs sampling. |
Yazid Janati; Badr MOUFAD; Mehdi Abou El Qassime; Alain Oliviero Durmus; Eric Moulines; Jimmy Olsson; | code |
| 113 | CPCF: A Cross-Prompt Contrastive Framework for Referring Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often suffer from suboptimal performance due to incorrect responses tailored to misleading areas adjacent to or similar to the target region. This work introduces CPCF, a novel framework to address this issue and achieve superior results. |
Lanyun Zhu; Deyi Ji; Tianrun Chen; Haiyang Wu; De Wen Soh; Jun Liu; | code |
| 114 | Towards Graph Foundation Models: Learning Generalities Across Graphs Via Task-Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, discovering such generalities in graph-structured data, especially across heterogeneous graph tasks, remains an open challenge. To address this, we propose a novel approach to cross-task generalization in graphs via task-trees, which serve as unified learning instances aligning node-, edge-, and graph-level tasks. |
Zehong Wang; Zheyuan Zhang; Tianyi Ma; Nitesh V Chawla; Chuxu Zhang; Yanfang Ye; | code |
| 115 | Beyond Message Passing: Neural Graph Pattern Machine Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Neural Graph Pattern Machine (GPM), a novel framework that bypasses message passing by learning directly from graph substructures. |
Zehong Wang; Zheyuan Zhang; Tianyi Ma; Nitesh V Chawla; Chuxu Zhang; Yanfang Ye; | code |
| 116 | MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we simplify the process of building an MAS by reframing it as a generative language task, where the input is a user query and the output is a corresponding MAS. |
Rui Ye; Shuo Tang; Rui Ge; Yaxin Du; Zhenfei Yin; Siheng Chen; Jing Shao; | code |
| 117 | Probabilistic Group Mask Guided Discrete Optimization for Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches often disregard parameter dependencies, resulting in an over-reliance on newly allocated parameters. To address this issue, we propose Probabilistic Group Mask selection (PGM), a group-wise approach that captures parameter dependencies by exploring candidate masks within each group. |
Fengqiang Wan; Yang Yang; | code |
| 118 | CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: CurvGAD introduces two parallel pipelines for enhanced anomaly interpretability: (1) Curvature-equivariant geometry reconstruction, which focuses exclusively on reconstructing the edge curvatures using a mixed-curvature, Riemannian encoder and Gaussian kernel-based decoder; and (2) Curvature-invariant structure and attribute reconstruction, which decouples structural and attribute anomalies from geometric irregularities by regularizing graph curvature under discrete Ollivier-Ricci flow, thereby isolating the non-geometric anomalies. |
Karish Grover; Geoffrey J. Gordon; Christos Faloutsos; | code |
| 119 | SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce **SyncMind**, a framework that systematically defines the *out-of-sync* problem faced by large language model (LLM) agents in collaborative software engineering (CSE). |
Xuehang Guo; Xingyao Wang; Yangyi Chen; Sha Li; Chi Han; Manling Li; Heng Ji; | code |
| 120 | Compositional Condition Question Answering in Tabular Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we introduce a new Compositional Condition Tabular Understanding method, called {\sc CoCoTab}. |
Jun-Peng Jiang; Tao Zhou; De-Chuan Zhan; Han-Jia Ye; | code |
| 121 | Mixture of Lookup Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Their large parameter size still limits deployment, and offloading, which load experts into VRAM only when needed, significantly increase inference latency. To address this, we propose Mixture of Lookup Experts (MoLE), a new MoE architecture that is efficient in both communication and VRAM usage. |
Shibo Jie; Yehui Tang; Kai Han; Yitong Li; Duyu Tang; Zhi-Hong Deng; Yunhe Wang; | code |
| 122 | Identifying and Understanding Cross-Class Features in Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel perspective on studying AT through the lens of class-wise feature attribution. |
Zeming Wei; Steven Y. Guo; Yisen Wang; | code |
| 123 | QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While post-training compression methods are very popular, the question of obtaining even more accurate compressed models by *directly training* over such representations, i.e., *Quantization-Aware Training (QAT)*, is still open: for example, a recent study put the optimal bit-width at which models can be trained using QAT, while staying accuracy-competitive with standard FP16/BF16 precision, at 8-bits weights and activations. We advance this state-of-the-art via a new method called QuEST, for which we demonstrate optimality at 4-bits and stable convergence as low as 1-bit weights and activations. |
Andrei Panferov; Jiale Chen; Soroush Tabesh; Mahdi Nikdan; Dan Alistarh; | code |
| 124 | Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that VDMs inherently produce visual representations that encompass both current static information and predicted future dynamics, thereby providing valuable guidance for robot action learning. Based on this hypothesis, we propose the Video Prediction Policy (VPP), which learns implicit inverse dynamics model conditioned on predicted future representations inside VDMs. |
Yucheng Hu; Yanjiang Guo; Pengchao Wang; Xiaoyu Chen; Yen-Jen Wang; Jianke Zhang; Koushil Sreenath; Chaochao Lu; Jianyu Chen; | code |
| 125 | PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PertEval-scFM, a standardized framework designed to evaluate models for perturbation effect prediction. |
Aaron Wenteler; Martina Occhetta; Nikhil Branson; Victor Curean; Magdalena Huebner; William Dee; William Connell; Siu Pui Chung; Alex Hawkins-Hooker; Yasha Ektefaie; César Miguel Valdez Córdova; Amaya Gallagher-Syed; | code |
| 126 | Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by relative representation similarity measures, we introduce Inference-Time Decomposition of Activation models (ITDAs). |
Patrick Leask; Neel Nanda; Noura Al Moubayed; | code |
| 127 | OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce OWLS, an open-access, reproducible suite of multilingual speech recognition and translation models spanning 0.25B to 18B parameters, with the 18B version being the largest speech model, to the best of our knowledge. |
William Chen; Jinchuan Tian; Yifan Peng; Brian Yan; Chao-Han Huck Yang; Shinji Watanabe; | code |
| 128 | DOLPHIN: A Programmable Framework for Scalable Neurosymbolic Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Neurosymbolic learning enables the integration of symbolic reasoning with deep learning but faces significant challenges in scaling to complex symbolic programs, large datasets, or both. We introduce DOLPHIN, a framework that tackles these challenges by supporting neurosymbolic programs in Python, executing complex symbolic reasoning on the CPU while vectorizing probabilistic computations and gradient propagation on the GPU. |
Aaditya Naik; Jason Liu; Claire Wang; Amish Sethi; Saikat Dutta; Mayur Naik; Eric Wong; | code |
| 129 | Adjoint Sampling: Highly Scalable Diffusion Samplers Via Adjoint Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. |
Aaron J Havens; Benjamin Kurt Miller; Bing Yan; Carles Domingo-Enrich; Anuroop Sriram; Daniel S. Levine; Brandon M Wood; Bin Hu; Brandon Amos; Brian Karrer; Xiang Fu; Guan-Horng Liu; Ricky T. Q. Chen; | code |
| 130 | ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Quantization of all weight, activation and key-value (KV) cache tensors to 4-bit without significantly degrading generalizability is challenging, due to the high quantization error caused by extreme outliers in activations. To tackle this problem, we propose ResQ, a PTQ method that pushes further the state-of-the-art. |
Utkarsh Saxena; Sayeh Sharify; Kaushik Roy; Xin Wang; | code |
| 131 | Decomposition of Graphic Design with Unified Multimodal Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This task presents two core challenges: (1) predicting the attribute information (metadata) of each layer, and (2) recovering the occluded regions within overlapping layers to enable high-fidelity image reconstruction. To address this, we present the Decompose Layer Model (DeaM), a large unified multimodal model that integrates a conjoined visual encoder, a language model, and a condition-aware RGB-A decoder. |
Hui Nie; Zhao Zhang; Yutao Cheng; Maoke Yang; Gonglei Shi; Qingsong Xie; Jie Shao; Xinglong Wu; | code |
| 132 | DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One formulation of the structure elucidation task is the conditional *de novo* generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. |
Montgomery Bohde; Mrunali Manjrekar; Runzhong Wang; Shuiwang Ji; Connor W. Coley; | code |
| 133 | Beyond The Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. |
Binchi Zhang; Zaiyi Zheng; Zhengzhang Chen; Jundong Li; | code |
| 134 | SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To maximize sparsity while retaining essential information, we introduce a rank-based strategy to adaptively determine the sparsification ratio for each layer, alongside a token recycling method that compresses pruned tokens into more compact representations. |
Yuan Zhang; Chun-Kai Fan; Junpeng Ma; Wenzhao Zheng; Tao Huang; Kuan Cheng; Denis A Gudovskiy; Tomoyuki Okuno; Yohei Nakata; Kurt Keutzer; Shanghang Zhang; | code |
| 135 | Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a novel approach to generate large-scale, high-resolution canopy height maps over time. |
Jan Pauls; Max Zimmer; Berkant Turan; Sassan Saatchi; Philippe CIAIS; Sebastian Pokutta; Fabian Gieseke; | code |
| 136 | RUN: Reversible Unfolding Network for Concealed Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods often employ reversible strategies to concentrate on uncertain regions but only focus on the mask level, overlooking the valuable of the RGB domain. To address this, we propose a Reversible Unfolding Network (RUN) in this paper. |
Chunming He; Rihan Zhang; Fengyang Xiao; Chengyu Fang; Longxiang Tang; Yulun Zhang; Linghe Kong; Deng-Ping Fan; Kai Li; Sina Farsiu; | code |
| 137 | Measuring Diversity in Synthetic Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce DCScore, a novel method for measuring synthetic dataset diversity from a classification perspective. |
Yuchang Zhu; Huizhe Zhang; Bingzhe Wu; Jintang Li; Zibin Zheng; Peilin Zhao; Liang Chen; Yatao Bian; | code |
| 138 | CodeSync: Synchronizing Large Language Models with Dynamic Code Evolution at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce CodeSync, a data engine to identify outdated code patterns and collect real-time code knowledge updates from Python third-party libraries. |
Chenlong Wang; Zhaoyang Chu; Zhengxiang Cheng; Xuyi Yang; Kaiyue Qiu; Yao Wan; Zhou Zhao; Xuanhua Shi; Hai Jin; Dongping Chen; | code |
| 139 | GuardAgent: Safeguard LLM Agents Via Knowledge-Enabled Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests.In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety policies for web agents. |
Zhen Xiang; Linzhi Zheng; Yanjie Li; Junyuan Hong; Qinbin Li; Han Xie; Jiawei Zhang; Zidi Xiong; Chulin Xie; Carl Yang; Dawn Song; Bo Li; | code |
| 140 | A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the mixture of LoRAs (MoE-LoRA) still exhibits its low robustness during tuning and inferring. Inspired by the Riemannian Preconditioners which train LoRA as a sub-space projector, we propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning by gate-rescaled multi-space projections. |
Mengyang Sun; Yihao Wang; Tao Feng; Dan Zhang; Yifan Zhu; Jie Tang; | code |
| 141 | Meta-Black-Box-Optimization Through Offline Q-function Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the online learning paradigms in existing works makes the efficiency of MetaBBO problematic. To address this, we propose an offline learning-based MetaBBO framework in this paper, termed Q-Mamba, to attain both effectiveness and efficiency in MetaBBO. |
Zeyuan Ma; Zhiguang Cao; Zhou Jiang; Hongshu Guo; Yue-Jiao Gong; | code |
| 142 | Causality-Aware Contrastive Learning for Robust Multivariate Time-Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Causality-Aware contrastive learning for RObust multivariate Time-Series (CAROTS), a novel MTSAD pipeline that incorporates the notion of causality into contrastive learning. |
HyunGi Kim; Jisoo Mok; Dongjun Lee; Jaihyun Lew; Sungjae Kim; Sungroh Yoon; | code |
| 143 | CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. |
Xintao Wang; Heng Wang; Yifei Zhang; Xinfeng Yuan; Rui Xu; Jen-tse Huang; Siyu Yuan; Haoran Guo; Jiangjie Chen; Shuchang Zhou; Wei Wang; Yanghua Xiao; | code |
| 144 | CodeSteer: Symbolic-Augmented Language Models Via Code/Text Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CodeSteer, an effective method for guiding LLM code/text generation. |
Yongchao Chen; Yilun Hao; Yueying Liu; Yang Zhang; Chuchu Fan; | code |
| 145 | Closed-Loop Long-Horizon Robotic Planning Via Equilibrium Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent advances in language model agents, they remain prone to planning errors and limited in their ability to plan ahead. To address these limitations in robotic planning, we advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached. |
Jinghan Li; Zhicheng Sun; Yadong MU; | code |
| 146 | Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **Ca2-VDM**, an efficient autoregressive VDM with **Ca**usal generation and **Ca**che sharing. |
Kaifeng Gao; Jiaxin Shi; Hanwang Zhang; Chunping Wang; Jun Xiao; Long Chen; | code |
| 147 | MedRAX: Medical Reasoning Agent for Chest X-ray Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. |
Adibvafa Fallahpour; Jun Ma; Alif Munim; Hongwei Lyu; BO WANG; | code |
| 148 | Federated Incomplete Multi-view Clustering with Globally Fused Graph Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, missing data problem in federated multi-view clustering task is less explored. To address these problems, we propose a novel Federated Incomplete Multi-view Clustering method with globally Fused Graph guidance (FIMCFG). |
Guoqing Chao; Zhenghao Zhang; Lei Meng; Jie Wen; Dianhui Chu; | code |
| 149 | Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, noise is pervasive in real-world scenarios, leading to a significant degradation in performance. To tackle this problem, we propose a novel multi-view clustering framework for the automatic identification and rectification of noisy data, termed AIRMVC. |
Xihong Yang; Siwei Wang; Fangdi Wang; Jiaqi Jin; Suyuan Liu; Yue Liu; En Zhu; Xinwang Liu; Yueming Jin; | code |
| 150 | MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the weight-wise trust ratio in LAMB is error-prone as it overlooks relationships of weight values within rows or columns. Building on these observations, we propose a novel optimizer, MERIT, which leverages the max-norm to calculate the trust ratio to constrain the max attention logit more effectively. |
Yang Luo; Zangwei Zheng; Ziheng Qin; Zirui Zhu; Yong Liu; Yang You; | code |
| 151 | From Passive to Active Reasoning: Can Large Language Models Ask The Right Questions Under Incomplete Information? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By contrast, active reasoning—where an LLM must interact with external systems to acquire missing evidence or data—has received little systematic attention. To address this shortfall, we present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM’s active reasoning skills. |
Zhanke Zhou; Xiao Feng; Zhaocheng Zhu; Jiangchao Yao; Sanmi Koyejo; Bo Han; | code |
| 152 | Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study this alignment problem in text-to-image (T2I) generation and propose a prototype for proactive T2I agents equipped with an interface to (1) actively ask clarification questions when uncertain, and (2) present their uncertainty about user intent as an understandable and editable belief graph. We build simple prototypes for such agents and propose a new scalable and automated evaluation approach using two agents, one with a ground truth intent (an image) while the other tries to ask as few questions as possible to align with the ground truth. |
Meera Hahn; Wenjun Zeng; Nithish Kannen; Rich Galt; Kartikeya Badola; Been Kim; Zi Wang; | code |
| 153 | Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in SEMG Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the problem from a short-term enhancement perspective to improve precision and robustness against various common noisy scenarios with learnable denoise using sEMG intrinsic pattern information and sliding-window attention. |
Weiyu Guo; Ziyue Qiao; Ying Sun; Yijie Xu; Hui Xiong; | code |
| 154 | OmniAudio: Generating Spatial Audio from 360-Degree Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate spatial audio from 360-degree video, we propose a novel framework \textbf{OmniAudio}, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data. |
Huadai Liu; Tianyi Luo; Kaicheng Luo; Qikai Jiang; Peiwen Sun; Jialei Wang; Rongjie Huang; Qian Chen; Wen Wang; Xiangtai Li; ShiLiang Zhang; Zhijie Yan; Zhou Zhao; Wei Xue; | code |
| 155 | Compositional Scene Understanding Through Inverse Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate scene understanding as an inverse generative modeling problem, where we seek to find conditional parameters of a visual generative model to best fit a given natural image. |
Yanbo Wang; Justin Dauwels; Yilun Du; | code |
| 156 | Scaling Video-Language Models to 10K Frames Via Hierarchical Differential Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce differential distillation, a principled approach that systematically preserves task-relevant information while suppressing redundancy. |
Chuanqi Cheng; Jian Guan; Wei Wu; Rui Yan; | code |
| 157 | $S^2$FGL: Spatial Spectral Federated Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenges, we propose a global knowledge repository to mitigate label signal disruption and a frequency alignment to address spectral client drifts. |
Zihan Tan; Suyuan Huang; Guancheng Wan; Wenke Huang; He Li; Mang Ye; | code |
| 158 | Normalizing Flows Are Capable Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that NFs are more powerful than previously believed. |
Shuangfei Zhai; Ruixiang ZHANG; Preetum Nakkiran; David Berthelot; Jiatao Gu; Huangjie Zheng; Tianrong Chen; Miguel Ángel Bautista; Navdeep Jaitly; Joshua M. Susskind; | code |
| 159 | The Devil Is in The Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find that MM-RMs trained on existing datasets often struggle to generalize to out-of-distribution data due to their reliance on unimodal spurious correlations, primarily text-only shortcuts within the training distribution, which prevents them from leveraging true multimodal reward functions. To address this, we introduce a Shortcut-aware MM-RM learning algorithm that mitigates this issue by dynamically reweighting training samples, shifting the distribution toward better multimodal understanding, and reducing dependence on unimodal spurious correlations. |
Zichao Li; Xueru Wen; Jie Lou; Yuqiu Ji; Yaojie Lu; Xianpei Han; Debing Zhang; Le Sun; | code |
| 160 | Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 2) Visual Feature Diversity: The diversity of visual features makes it challenging to leverage naive image features directly for image-text alignment in downstream tasks. In this work, we propose Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation (FedDDA) to overcome the above limitations. |
Yihao Yang; Wenke Huang; Guancheng Wan; Bin Yang; Mang Ye; | code |
| 161 | Flexible, Efficient, and Stable Adversarial Attacks on Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing adversarial MU attacks suffer from three key limitations: inflexibility due to pre-defined attack targets, inefficiency in handling multiple attack requests, and instability caused by non-convex loss functions. To address these challenges, we propose a Flexible, Efficient, and Stable Attack (DDPA). |
Zihan Zhou; Yang Zhou; Zijie Zhang; Lingjuan Lyu; Da Yan; Ruoming Jin; Dejing Dou; | code |
| 162 | Efficient Robotic Policy Learning Via Latent Space Backward Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a **B**ackward **P**lanning scheme in **L**atent space (**LBP**), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. |
Dongxiu Liu; Haoyi Niu; Zhihao Wang; Jinliang Zheng; Yinan Zheng; Zhonghong Ou; Jianming HU; Jianxiong Li; Xianyuan Zhan; | code |
| 163 | DMOSpeech: Direct Metric Optimization Via Distilled Diffusion Model in Zero-Shot Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing TTS approaches are limited by non-differentiable components or iterative sampling that prevent true end-to-end optimization with perceptual metrics. We introduce DMOSpeech, a distilled diffusion-based TTS model that uniquely achieves both faster inference and superior performance compared to its teacher model. |
Yinghao Aaron Li; Rithesh Kumar; Zeyu Jin; | code |
| 164 | LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LangDAug, a novel **Lang**evin **D**ata **Aug**mentation for multi-source domain generalization in 2D medical image segmentation. |
Piyush Tiwary; Kinjawl Bhattacharyya; Prathosh AP; | code |
| 165 | Info-Coevolution: An Efficient Framework for Data Model Coevolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. |
Ziheng Qin; Hailun Xu; Wei Chee Yew; Qi Jia; Yang Luo; Kanchan Sarkar; Danhui Guan; Kai Wang; Yang You; | code |
| 166 | Concept-Centric Token Interpretation for Vector-Quantized Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Concept-Oriented Token Explanation (CORTEX), a novel approach for interpreting VQGMs by identifying concept-specific token combinations. |
Tianze Yang; Yucheng Shi; Mengnan Du; Xuansheng Wu; Qiaoyu Tan; Jin Sun; Ninghao Liu; | code |
| 167 | AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current community suffers from a lack of large-scale datasets with intensive, descriptive emotion annotations, as well as a multimodal-centric framework to maximize the potential of MLLMs for emotion understanding. To address this, we establish a new benchmark for MLLM-based emotion understanding with a novel dataset (MER-Caption) and a new model (AffectGPT). |
Zheng Lian; Haoyu Chen; Lan Chen; Haiyang Sun; Licai Sun; Yong Ren; Zebang Cheng; Bin Liu; Rui Liu; Xiaojiang Peng; Jiangyan Yi; Jianhua Tao; | code |
| 168 | OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paradigm shift aims to enable models to predict emotions beyond a fixed label space, accommodating a flexible set of categories to better reflect the nuanced spectrum of human emotions. To achieve this, we propose a novel paradigm: *Open-Vocabulary MER (OV-MER)*, which enables emotion prediction without being confined to predefined spaces. |
Zheng Lian; Haiyang Sun; Licai Sun; Haoyu Chen; Lan Chen; Hao Gu; Zhuofan Wen; Shun Chen; Zhang Siyuan; Hailiang Yao; Bin Liu; Rui Liu; Shan Liang; Ya Li; Jiangyan Yi; Jianhua Tao; | code |
| 169 | Towards Efficient Online Tuning of VLM Agents Via Counterfactual Soft Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel online fine-tuning method, Counterfactual Soft Reinforcement Learning (CoSo), better suited to the textual output space of VLM agents. |
Lang Feng; Weihao Tan; Zhiyi Lyu; Longtao Zheng; Haiyang Xu; Ming Yan; Fei Huang; Bo An; | code |
| 170 | Long-Short Alignment for Effective Long-Context Modeling in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a fresh perspective on length generalization, shifting the focus from the conventional emphasis on input features such as positional encodings or data structures to the output distribution of the model. |
Tianqi Du; Haotian Huang; Yifei Wang; Yisen Wang; | code |
| 171 | Simplifying DINO Via Coding Rate Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we posit that we can remove most such-motivated idiosyncrasies in the pre-training pipelines, and only need to add an explicit coding rate term in the loss function to avoid collapse of the representations. |
Ziyang Wu; Jingyuan Zhang; Druv Pai; XuDong Wang; Chandan Singh; Jianwei Yang; Jianfeng Gao; Yi Ma; | code |
| 172 | Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. |
Mohit Pandey; Gopeshh Subbaraj; Artem Cherkasov; Martin Ester; Emmanuel Bengio; | code |
| 173 | Super Deep Contrastive Information Bottleneck for Multi-modal Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although there is a wealth of research on MMC, due to the complexity of datasets, a major challenge remains in how to deeply explore the complex latent information and interdependencies between modalities. To address this issue, this paper proposes a method called super deep contrastive information bottleneck (SDCIB) for MMC, which aims to explore and utilize all types of latent information to the fullest extent. |
Zhengzheng Lou; Ke Zhang; Yucong Wu; Shizhe Hu; | code |
| 174 | Large Continual Instruction Assistant Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general continual instruction tuning framework to address the challenge. |
Jingyang Qiao; zhizhong zhang; Xin Tan; Yanyun Qu; Shouhong Ding; Yuan Xie; | code |
| 175 | Discriminative Policy Optimization for Token-Level Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the conflict between generative language modeling and reward modeling may introduce instability and lead to inaccurate credit assignments. To address this challenge, we revisit token-level reward assignment by decoupling reward modeling from language generation and derive a token-level reward model through the optimization of a discriminative policy, termed the Q-function Reward Model (Q-RM). |
Hongzhan Chen; Tao Yang; Shiping Gao; Ruijun Chen; Xiaojun Quan; Hongtao Tian; Ting Yao; | code |
| 176 | BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce BAME, a method that maintains consistent sparsity throughout the N:M sparse training process. |
Chenyi yang; Wenjie Nie; Yuxin Zhang; Yuhang Wu; Xiawu Zheng; GUANNAN JIANG; Rongrong Ji; | code |
| 177 | Griffin: Towards A Graph-Centric Relational Database Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Griffin, the first foundation model attemptation designed specifically for Relational Databases (RDBs). |
Yanbo Wang; Xiyuan Wang; Quan Gan; Minjie Wang; Qibin Yang; David Wipf; Muhan Zhang; | code |
| 178 | LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs – No Silver Bullet for LC or RAG Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LaRA, a novel benchmark with 2326 test cases across four QA tasks and three long context types, for rigorous evaluation. |
Kuan Li; Liwen Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Shuai Wang; Minhao Cheng; | code |
| 179 | Revolve: Optimizing AI Systems By Tracking Response Evolution in Textual Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce $\textbf{REVOLVE}$, an optimization method that tracks how $\textbf{R}$esponses $\textbf{EVOLVE}$ across iterations in LLM systems. |
Peiyan Zhang; Haibo Jin; Leyang Hu; Xinnuo Li; Liying Kang; Man Luo; Yangqiu Song; Haohan Wang; | code |
| 180 | Textural or Textual: How Vision-Language Models Read Text in Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To disentangle orthographic form from meaning, we introduce the ToT dataset, which includes controlled word pairs that either share semantics with distinct appearances (synonyms) or share appearance with differing semantics (paronyms). |
Hanzhang Wang; Qingyuan Ma; | code |
| 181 | Do We Really Need Message Passing in Brain Network Modeling? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, this paper observes the significant performance and efficiency enhancements of the Hadamard product compared to the matrix product, which is the matrix form of message passing, in processing the brain network. |
Liang Yang; Yuwei Liu; Jiaming Zhuo; Di Jin; Chuan Wang; Zhen Wang; Xiaochun Cao; | code |
| 182 | Wasserstein Flow Matching: Generative Modeling Over Families of Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Wasserstein flow matching (WFM), which lifts flow matching onto families of distributions using the Wasserstein geometry. |
Doron Haviv; Aram-Alexandre Pooladian; Dana Pe’er; Brandon Amos; | code |
| 183 | Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose Retrieval-Augmented Perception (RAP), a training-free framework that retrieves and fuses relevant image crops while preserving spatial context using the proposed Spatial-Awareness Layout. |
Wenbin Wang; Yongcheng Jing; Liang Ding; Yingjie Wang; Li Shen; Yong Luo; Bo Du; Dacheng Tao; | code |
| 184 | BSLoRA: Enhancing The Parameter Efficiency of LoRA with Intra-Layer and Inter-Layer Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods reduce stored parameters via parameter sharing, they fail to capture both local and global information simultaneously. To address this issue, we propose the Bi-Share LoRA (BSLoRA), which extends local LoRA with intra-LoRA and inter-LoRA parameter sharing to better capture local and global information. |
Yuhua Zhou; Ruifeng Li; Changhai Zhou; Fei Yang; Aimin PAN; | code |
| 185 | Preserving AUC Fairness in Learning with Noisy Protected Groups Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these studies often overlook the impact of noisy protected groups, leading to fairness violations in practice. To address this, we propose the first robust AUC fairness approach under noisy protected groups with fairness theoretical guarantees using distributionally robust optimization. |
Mingyang Wu; Li Lin; Wenbin Zhang; Xin Wang; Zhenhuan Yang; Shu Hu; | code |
| 186 | Prices, Bids, Values: One ML-Powered Combinatorial Auction to Rule Them All Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel ML algorithm that provably makes use of the full information from both value and demand queries, and we show via experiments that combining both query types results in significantly better learning performance in practice. |
Ermis Soumalias; Jakob Heiss; Jakob Weissteiner; Sven Seuken; | code |
| 187 | MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More accurate methods for activity prediction exist, such as molecular dynamics based binding free energy calculations, but they are too computationally expensive to use in a generative model. To address this challenge, we propose Multi-Fidelity Latent space Active Learning (MF-LAL), a generative modeling framework that integrates a set of oracles with varying cost-accuracy tradeoffs. |
Peter Eckmann; Dongxia Wu; Germano Heinzelmann; Michael K Gilson; Rose Yu; | code |
| 188 | Graph World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While multiple graph foundation models have been proposed, they focus on graph learning tasks and cannot extend to diverse multi-modal data and interdisciplinary tasks. To address these challenges, we propose the Graph World Model (GWM), a world model that supports both unstructured and graph-structured states with multi-modal information and represents diverse tasks as actions. |
Tao Feng; Yexin Wu; Guanyu Lin; Jiaxuan You; | code |
| 189 | Piloting Structure-Based Drug Design Via Modality-Specific Optimal Schedule Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A major bottleneck lies in the twisted probability path of multi-modalities—continuous 3D positions and discrete 2D topologies—which jointly determine molecular geometries. By establishing the fact that noise schedules decide the Variational Lower Bound (VLB) for the twisted probability path, we propose VLB-Optimal Scheduling (VOS) strategy in this under-explored area, which optimizes VLB as a path integral for SBDD. |
Keyue Qiu; Yuxuan Song; Zhehuan Fan; Peidong Liu; Zhe Zhang; Mingyue Zheng; Hao Zhou; Wei-Ying Ma; | code |
| 190 | Empower Structure-Based Molecule Optimization with Gradient Guided Bayesian Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel backward correction strategy that optimizes within a sliding window of the past histories, allowing for a seamless trade-off between explore-and-exploit during optimization. |
Keyue Qiu; Yuxuan Song; Jie Yu; Hongbo Ma; Ziyao Cao; Zhilong Zhang; Yushuai Wu; Mingyue Zheng; Hao Zhou; Wei-Ying Ma; | code |
| 191 | Faster and Stronger: When ANN-SNN Conversion Meets Parallel Spiking Calculation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel parallel conversion learning framework, which establishes a mathematical mapping relationship between each time-step of the parallel spiking neurons and the cumulative spike firing rate. |
Zecheng Hao; Qichao Ma; Kang Chen; Yi Zhang; Zhaofei Yu; Tiejun Huang; | code |
| 192 | Differentiable Quadratic Optimization For The Maximum Independent Set Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the non-convexity of the objective, we propose optimizing several initializations in parallel using momentum-based gradient descent, complemented by an efficient MIS checking criterion derived from our theory. |
Ismail Alkhouri; Cedric Le Denmat; Yingjie Li; CUNXI YU; Jia Liu; Rongrong Wang; Alvaro Velasquez; | code |
| 193 | FrameBridge: Improving Image-to-Video Generation with Bridge Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion models have achieved remarkable progress on image-to-video (I2V) generation, while their noise-to-data generation process is inherently mismatched with this task, which may lead to suboptimal synthesis quality. |
Yuji Wang; Zehua Chen; Chen Xiaoyu; Yixiang Wei; Jun Zhu; Jianfei Chen; | code |
| 194 | Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models. |
Zhining Liu; Ze Yang; Xiao Lin; Ruizhong Qiu; Tianxin Wei; Yada Zhu; Hendrik Hamann; Jingrui He; Hanghang Tong; | code |
| 195 | CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CUPS, a novel method for learning sequence-to-sequence 3D human shapes and poses from RGB videos with uncertainty quantification. |
Harry Zhang; Luca Carlone; | code |
| 196 | Learning Distribution-wise Control in Representation Space for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend this approach to the distribution level, enabling the model to learn not only pointwise transformations but also the surrounding regions of the concept subspace. |
Chunyuan Deng; Ruidi Chang; Hanjie Chen; | code |
| 197 | Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, approaches that aim to tackle task diversity, such as using task embedding as policy context and task clustering, typically lack performance guarantees and require a large number of training tasks. To address these challenges, we propose a novel approach for learning a policy committee that includes at least one near-optimal policy with high probability for tasks encountered during execution. |
Luise Ge; Michael Lanier; Anindya Sarkar; Bengisu Guresti; Chongjie Zhang; Yevgeniy Vorobeychik; | code |
| 198 | Discovering Latent Causal Graphs from Spatiotemporal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SPACY (SPAtiotemporal Causal discoverY), a novel framework based on variational inference, designed to model latent time series and their causal relationships from spatiotemporal data. |
Kun Wang; Sumanth Varambally; Duncan Watson-Parris; Yian Ma; Rose Yu; | code |
| 199 | Rethinking Point Cloud Data Augmentation: Topologically Consistent Deformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SinPoint, a novel method designed to preserve the topological structure of the original point cloud through a homeomorphism. |
Jian Bi; Qianliang Wu; Xiang Li; Shuo Chen; Jianjun Qian; lei luo; Jian Yang; | code |
| 200 | Scaling Trends in Language Model Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As both attackers and defenders gain access to more compute, and as models become larger, what will be the effect on robustness? We argue that to answer this question requires a *scaling lens*, which we adopt in an extensive study of language model robustness across several classification tasks, model families, and adversarial attacks. |
Nikolaus H. R. Howe; Ian R. McKenzie; Oskar John Hollinsworth; Michał Zając; Tom Tseng; Aaron David Tucker; Pierre-Luc Bacon; Adam Gleave; | code |
| 201 | MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MuseControlLite, a lightweight mechanism designed to fine-tune text-to-music generation models for precise conditioning using various time-varying musical attributes and reference audio signals. |
Fang-Duo Tsai; Shih-Lun Wu; Weijaw Lee; Sheng-Ping Yang; Bo-Rui Chen; Hao-Chung Cheng; Yi-Hsuan Yang; | code |
| 202 | TAROT: Targeted Data Selection Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TAROT, a targeted data selection framework grounded in Optimal Transport theory. |
Lan Feng; Fan Nie; Yuejiang Liu; Alexandre Alahi; | code |
| 203 | Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify a fundamental limitation in existing evaluation metrics, such as the uplift and Qini curves, which fail to rank individuals with binary negative outcomes accurately. |
Minqin Zhu; Zexu Sun; Ruoxuan Xiong; Anpeng Wu; Baohong Li; Caizhi Tang; JUN ZHOU; Fei Wu; Kun Kuang; | code |
| 204 | AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces AKRMap, a new DR technique designed to visualize cross-modal embeddings metric with enhanced accuracy by learning kernel regression of the metric landscape in the projection space. |
Yilin Ye; Junchao Huang; Xingchen ZENG; Jiazhi Xia; Wei Zeng; | code |
| 205 | Patch-wise Structural Loss for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing forecasting models rely heavily on point-wise loss functions like Mean Squared Error, which treat each time step independently and neglect the structural dependencies inherent in time series data, making it challenging to capture complex temporal patterns accurately. To address these challenges, we propose a novel **P**atch-wise **S**tructural (**PS**) loss, designed to enhance structural alignment by comparing time series at the patch level. |
Dilfira Kudrat; Zongxia Xie; Yanru Sun; Tianyu Jia; Qinghua Hu; | code |
| 206 | PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation Via Few-Shot Private Data and Generative APIs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, the few-shot private data challenge is particularly prevalent in specialized domains like healthcare and industry. To address this challenge, we propose a novel API-assisted algorithm, Private Contrastive Evolution (PCEvolve), which iteratively mines inherent inter-class contrastive relationships in few-shot private data beyond individual data points and seamlessly integrates them into an adapted Exponential Mechanism (EM) to optimize DP’s utility in an evolution loop. |
Jianqing Zhang; Yang Liu; JIE FU; Yang Hua; Tianyuan Zou; Jian Cao; Qiang Yang; | code |
| 207 | RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a graduate-level, multi-disciplinary, EnglishChinese benchmark, dubbed as Reasoning Bench (RBench), for assessing the reasoning capability of both language and multimodal models. |
Meng-Hao Guo; Jiajun Xu; Yi Zhang; Jiaxi Song; Haoyang Peng; Yi-Xuan Deng; Xinzhi Dong; Kiyohiro Nakayama; Zhengyang Geng; Chen Wang; Bolin Ni; Guo-Wei Yang; Yongming Rao; Houwen Peng; Han Hu; Gordon Wetzstein; Shi-min Hu; | code |
| 208 | RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Code auditing is the process of reviewing code with the aim of identifying bugs. |
Jinyao Guo; Chengpeng Wang; Xiangzhe Xu; Zian Su; Xiangyu Zhang; | code |
| 209 | Determining Layer-wise Sparsity for Large Language Models Through A Theoretical Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the challenge of determining the layer-wise sparsity rates of large language models (LLMs) through a theoretical perspective. |
Weizhong Huang; Yuxin Zhang; Xiawu Zheng; Fei Chao; Rongrong Ji; | code |
| 210 | What Makes In-context Learning Effective for Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we aim to theoretically analyze the impact of in-context demonstrations on LLMs’ reasoning performance. |
Jiayu Liu; Zhenya Huang; Chaokun Wang; Xunpeng Huang; ChengXiang Zhai; Enhong Chen; | code |
| 211 | ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we introduce the Time-Series Question Answering (Time-Series QA) task and release EngineMT-QA, the first large-scale, multi-task, temporal-textual QA dataset designed to capture complex interactions between time-series signals and natural language. Building on this resource, we propose the Instruct Time Transformer (ITFormer), a novel framework that bridges time-series encoders with frozen large language models (LLMs). |
Yilin wang; Peixuan Lei; Jie Song; Yuzhe Hao; Tao Chen; Yuxuan Zhang; LEI JIA; Yuanxiang Li; zhongyu wei; | code |
| 212 | Reinforcement Learning with Adaptive Reward Modeling for Expensive-to-Evaluate Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training reinforcement learning (RL) agents requires extensive trials and errors, which becomes prohibitively time-consuming in systems with costly reward evaluations. To address this challenge, we propose adaptive reward modeling (AdaReMo) which accelerates RL training by decomposing the complicated reward function into multiple localized fast reward models approximating direct reward evaluation with neural networks. |
Hongyuan Su; Yu Zheng; Yuan Yuan; Yuming Lin; Depeng Jin; Yong Li; | code |
| 213 | One Leaf Reveals The Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a scalable and straightforward pre-training paradigm for efficient visual conceptual representation called occluded image contrastive learning (OCL). |
Xiaoyu Yang; Lijian Xu; Hongsheng Li; Shaoting Zhang; | code |
| 214 | Unbiased Recommender Learning from Implicit Feedback Via Weakly Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This assumption risks misclassifying potential positive samples within the unlabeled data, thereby undermining model performance. To address this issue, we introduce PURL, a model-agnostic framework that reframes implicit feedback recommendation as a weakly supervised learning task, eliminating the need for negative samples. |
Hao Wang; Zhichao Chen; Haotian Wang; Yanchao Tan; Licheng Pan; Tianqiao Liu; Xu Chen; Haoxuan Li; Zhouchen Lin; | code |
| 215 | SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **SongGen**, a fully open-source, single-stage auto-regressive transformer designed for controllable song generation.To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline. |
Zihan Liu; Shuangrui Ding; Zhixiong Zhang; Xiaoyi Dong; Pan Zhang; Yuhang Zang; Yuhang Cao; Dahua Lin; Jiaqi Wang; | code |
| 216 | In-Context Learning and Occam’s Razor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. |
Eric Elmoznino; Tom Marty; Tejas Kasetty; Leo Gagnon; Sarthak Mittal; Mahan Fathi; Dhanya Sridhar; Guillaume Lajoie; | code |
| 217 | Towards A Formal Theory of Representational Compositionality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while we have strong intuitions about what compositionality is, we lack satisfying formal definitions for it. Here, we propose such a definition called representational compositionality that is conceptually simple, quantitative, and grounded in algorithmic information theory. |
Eric Elmoznino; Thomas Jiralerspong; Yoshua Bengio; Guillaume Lajoie; | code |
| 218 | Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the mechanisms of external slow-thinking from a theoretical standpoint. |
Zeyu Gan; Yun Liao; Yong Liu; | code |
| 219 | Synthetic Face Datasets Generation Via Latent Space Exploration from Brownian Identity Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new method, inspired by the physical motion of soft particles subjected to stochastic Brownian forces, allowing us to sample identities distributions in a latent space under various constraints.With this in hands, we generate several face datasets and benchmark them by training face recognition models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-art diffusion-based synthetic datasets. |
David Geissbühler; Hatef Otroshi Shahreza; Sébastien Marcel; | code |
| 220 | STAIR: Improving Safety Alignment with Introspective Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **STAIR**, a novel framework that integrates **S**afe**T**y **A**lignment with **I**trospective **R**easoning. |
Yichi Zhang; Siyuan Zhang; Yao Huang; Zeyu Xia; Zhengwei Fang; Xiao Yang; Ranjie Duan; Dong Yan; Yinpeng Dong; Jun Zhu; | code |
| 221 | Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how human experts assess problem complexity before selecting solutions, we propose a novel two-component fine-tuning method, *Adapting while Learning* (AWL). |
Bohan Lyu; Yadi Cao; Duncan Watson-Parris; Leon Bergen; Taylor Berg-Kirkpatrick; Rose Yu; | code |
| 222 | CSTrack: Enhancing RGB-X Tracking Via Compact Spatiotemporal Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More critically, intra-modality spatial modeling within each dispersed space incurs substantial computational overhead, limiting resources for inter-modality spatial modeling and temporal modeling. To address this, we propose a novel tracker, CSTrack, which focuses on modeling Compact Spatiotemporal features to achieve simple yet effective tracking. |
Xiaokun Feng; Dailing Zhang; Shiyu Hu; Xuchen Li; Meiqi Wu; Jing Zhang; Xiaotang Chen; Kaiqi Huang; | code |
| 223 | Towards Universal Offline Black-Box Optimization Via Learning Language Model Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss multiple potential approaches, including an end-to-end learning framework in the form of next-token prediction, as well as prioritizing the learning of latent spaces with strong representational capabilities.To validate the effectiveness of these methods, we collect offline BBO tasks and data from open-source academic works for training. |
Rong-Xi Tan; Ming Chen; Ke Xue; Yao Wang; Yaoyuan Wang; Fu Sheng; Chao Qian; | code |
| 224 | Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing diffusion-based minority generators often rely on computationally expensive guidance dedicated for minority generation. To address this, here we present a simple yet powerful guidance-free approach called *Boost-and-Skip* for generating minority samples using diffusion models. |
Soobin Um; Beomsu Kim; Jong Chul Ye; | code |
| 225 | SPMC: Self-Purifying Federated Backdoor Defense Via Margin Contribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These attacks exploit FL decentralized nature, while existing defenses, based on isolated behaviors and fixed rules, can be bypassed by adaptive attackers. To address these limitations, we propose **SPMC**, a marginal collaboration defense mechanism that leverages intrinsic consistency across clients to estimate inter-client marginal contributions. This allows the system to dynamically reduce the influence of clients whose behavior deviates from the collaborative norm, thus maintaining robustness even as the number of attackers changes. |
Wenwen He; Wenke Huang; Bin Yang; ShuKan Liu; Mang Ye; | code |
| 226 | Splitting with Importance-aware Updating for Heterogeneous Federated Learning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is decomposing client updates into consensus and divergence components, enabling the model to maintain core capabilities while adapting to domain-specific knowledge. We propose a novel federated learning framework called **FedICU** (Splitting with **I**mportan**C**e-aware **U**pdating for Heterogeneous **Fed**erated Learning with Large Language Models), which introduces an aggregation mechanism that dynamically balances these components based on their contribution to global model performance, while implementing an importance-aware parameter updating strategy to prevent catastrophic forgetting and domain overfitting. |
Yangxu Liao; Wenke Huang; Guancheng Wan; Jian Liang; Bin Yang; Mang Ye; | code |
| 227 | GHOST: Generalizable One-Shot Federated Graph Learning with Proxy-Based Topology Knowledge Retention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we introduce **GHOST**, an innovative one-shot FGL framework. In GHOST, we establish a proxy model for each client to leverage diverse local knowledge and integrate it to train the global model. |
Jiaru Qian; Guancheng Wan; Wenke Huang; Guibin Zhang; Yuxin Wu; Bo Du; Mang Ye; | code |
| 228 | Privacy Attacks on Image AutoRegressive Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the privacy risks associated with IARs remain unexplored, raising concerns regarding their responsible deployment. To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to the ones of DMs as reference points. |
Antoni Kowalczuk; Jan Dubiński; Franziska Boenisch; Adam Dziedzic; | code |
| 229 | Update Your Transformer to The Latest Release: Re-Basin of Task Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. |
Filippo Rinaldi; Giacomo Capitani; Lorenzo Bonicelli; Donato Crisostomi; Federico Bolelli; ELISA FICARRA; Emanuele Rodolà; Simone Calderara; Angelo Porrello; | code |
| 230 | Scaling Large Motion Models with Million-Level Human Motions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better integrate the motion modality, we propose Motionbook, an innovative motion encoding approach including (1) a compact yet lossless feature to represent motions; (2) a novel 2D lookup-free motion tokenizer that preserves fine-grained motion details while expanding codebook capacity, significantly enhancing the representational power of motion tokens. |
Ye Wang; Sipeng Zheng; Bin Cao; Qianshan Wei; Weishuai Zeng; Qin Jin; Zongqing Lu; | code |
| 231 | Regularized Langevin Dynamics for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a simple yet effective sampling framework for combinatorial optimization (CO). |
Shengyu Feng; Yiming Yang; | code |
| 232 | EasyInv: Toward Fast and Better DDIM Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces EasyInv, an easy yet novel approach that significantly advances the field of DDIM Inversion by addressing the inherent inefficiencies and performance limitations of traditional iterative optimization methods. |
Ziyue Zhang; Mingbao Lin; Shuicheng YAN; Rongrong Ji; | code |
| 233 | Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: {However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture.} To mitigate these issues, we propose \underline{G}reat L\underline{o}R\underline{A} Mixture-of-Exper\underline{t} (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. |
Chenghao Fan; Zhenyi Lu; Sichen Liu; Chengfeng Gu; Xiaoye Qu; Wei Wei; Yu Cheng; | code |
| 234 | Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: *Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model’s capacity*. |
Chengqian Gao; Haonan Li; Liu Liu; Zeke Xie; Peilin Zhao; zhiqiang xu; | code |
| 235 | CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By introducing and analyzing the matching mechanism between Core Neurons and Core Tokens, we found that key neurons and tokens for inference mutually influence and reinforce each other. Building on this insight, we propose CoreMatching, a co-adaptive sparse inference framework, which leverages the synergy between token and neuron sparsity to enhance inference efficiency. |
Qinsi Wang; Hancheng Ye; Ming-Yu Chung; Yudong Liu; Yueqian Lin; Martin Kuo; Mingyuan Ma; Jianyi Zhang; Yiran Chen; | code |
| 236 | From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their performance on more basic factual memory tasks drops considerably below standard RAG. We address this unintended deterioration and propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks. |
Bernal Jiménez Gutiérrez; Yiheng Shu; Weijian Qi; Sizhe Zhou; Yu Su; | code |
| 237 | Emotional Face-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a new task, termed *emotional face-to-speech*, aiming to synthesize emotional speech directly from expressive facial cues. |
Jiaxin Ye; Boyuan Cao; Hongming Shan; | code |
| 238 | Perception in Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a perception in reflection paradigm designed to transcend the limitations of current large vision-language models (LVLMs), which are expected yet often fail to achieve perfect perception initially. |
Yana Wei; Liang Zhao; Kangheng Lin; En Yu; Yuang Peng; Runpei Dong; Jianjian Sun; Haoran Wei; Zheng Ge; Xiangyu Zhang; Vishal M. Patel; | code |
| 239 | DCTdiff: Intriguing Properties of Image Generative Modeling in The DCT Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. |
Mang Ning; Mingxiao Li; Jianlin Su; Jia Haozhe; Lanmiao Liu; Martin Benes; Wenshuo Chen; Albert Ali Salah; Itir Onal Ertugrul; | code |
| 240 | Synthetic Text Generation for Training Large Language Models Via Gradient Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first theoretically rigorous approach for generating synthetic human-readable text that provides convergence, performance, and privacy guarantees for fine-tuning LLMs on a target task. |
Dang Nguyen; Zeman Li; Mohammadhossein Bateni; Vahab Mirrokni; Meisam Razaviyayn; Baharan Mirzasoleiman; | code |
| 241 | FlashTP: Fused, Sparsity-Aware Tensor Product for Machine Learning Interatomic Potentials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While equivariant MLIPs achieve state-of-the-art accuracy, they face significant computational bottlenecks centered around their Tensor-Product layer, which account for up to 75\% of training time and cause substantial memory overhead. We present FlashTP, a highly optimized tensor-product library that addresses these inefficiencies through kernel fusion, sparse computation, and path-aggregated execution. |
Seung Yul Lee; Hojoon Kim; Yutack Park; Dawoon Jeong; Seungwu Han; Yeonhong Park; Jae W. Lee; | code |
| 242 | Cross-Modal Alignment Via Variational Copula Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Copula is a powerful statistical structure in modelling the interactions between variables, as it bridges the joint distribution and marginal distributions of multiple variables. In this paper, we propose a novel copula modelling-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities to capture the complex interaction among them. |
Feng Wu; Tsai Hor Chan; Fuying Wang; Guosheng Yin; Lequan Yu; | code |
| 243 | Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate strategies for Vim and propose Stochastic Layer-Wise Shuffle (SLWS), a novel regularization method that can effectively improve the Vim training. |
Zizheng Huang; Haoxing Chen; Jiaqi Li; jun lan; Huijia Zhu; Weiqiang Wang; Limin Wang; | code |
| 244 | AdaptiveStep: Automatically Dividing Reasoning Step Through Model Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These approaches overlook the fact that certain words don’t usually indicate true decision points. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model’s confidence in predicting the next word, offering more information on decision-making at each step, improving downstream tasks like reward model training. |
Yuliang Liu; Junjie Lu; Chaofeng Qu; Zhaoling Chen; Zefan Cai; Jason Klein Liu; Chonghan Liu; Yunhui Xia; Li Zhao; Jiang Bian; Chuheng Zhang; Wei Shen; Zhouhan Lin; | code |
| 245 | Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that *sparse coding* offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. |
Tiansheng Wen; Yifei Wang; Zequn Zeng; Zhong Peng; Yudi Su; Xinyang Liu; Bo Chen; Hongwei Liu; Stefanie Jegelka; Chenyu You; | code |
| 246 | LEAPS: A Discrete Neural Sampler Via Locally Equivariant Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose *LEAPS*, an algorithm to sample from discrete distributions known up to normalization by learning a rate matrix of a continuous-time Markov chain (CTMC).To derive these importance weights, we introduce a set of Radon-Nikodym derivatives of CTMCs over their path measures. |
Peter Holderrieth; Michael Samuel Albergo; Tommi Jaakkola; | code |
| 247 | Task-Gated Multi-Expert Collaboration Network for Degraded Multi-Modal Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real-world imaging often suffers from degradation issues, such as noise, blur, and haze in visible imaging, as well as stripe noise in infrared imaging, which significantly degrades model performance. To address these challenges, we propose a task-gated multi-expert collaboration network (TG-ECNet) for degraded multi-modal image fusion. |
Yiming Sun; Xin Li; Pengfei Zhu; Qinghua Hu; Dongwei Ren; Huiying Xu; Xinzhong Zhu; | code |
| 248 | SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SMART-PC, a skeleton-based framework that enhances resilience to corruptions by leveraging the geometric structure of 3D point clouds. |
Ali Bahri; Moslem Yazdanpanah; Sahar Dastani; Mehrdad Noori; Gustavo Adolfo Vargas Hakim; David OSOWIECHI; Farzad Beizaee; Ismail Ben Ayed; Christian Desrosiers; | code |
| 249 | SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SpikeVideoFormer, an efficient spike-driven video Transformer, featuring linear temporal complexity $\mathcal{O}(T)$. |
Shihao Zou; Qingfeng Li; Wei Ji; Jingjing Li; Yongkui Yang; Guoqi Li; Chao Dong; | code |
| 250 | Compressed Image Generation with Denoising Diffusion Codebook Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel generative approach based on Denoising Diffusion Models (DDMs), which produces high-quality image samples *along* with their losslessly compressed bit-stream representations. |
Guy Ohayon; Hila Manor; Tomer Michaeli; Michael Elad; | code |
| 251 | Unlocking Post-hoc Dataset Inference with Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such in-distribution, held-out data is rarely available in practice, severely limiting the applicability of DI. In this work, we address this challenge by synthetically generating the required held-out set. |
Bihe Zhao; Pratyush Maini; Franziska Boenisch; Adam Dziedzic; | code |
| 252 | ADHMR: Aligning Diffusion-based Human Mesh Recovery Via Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Probabilistic methods have tried to solve this by generating numerous plausible 3D human mesh predictions, but they often exhibit misalignment with 2D image observations and weak robustness to in-the-wild images. To address these issues, we propose ADHMR, a framework that **A**ligns a **D**iffusion-based **HMR** model in a preference optimization manner. |
Wenhao Shen; Wanqi Yin; Xiaofeng Yang; Cheng Chen; Chaoyue Song; Zhongang Cai; Lei Yang; Hao Wang; Guosheng Lin; | code |
| 253 | Hybrid Batch Normalisation: Resolving The Dilemma of Batch Normalisation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we resolve the dilemma of the BN layer in federated learning by developing a customised normalisation approach, Hybrid Batch Normalisation (HBN). |
Hongyao Chen; Tianyang Xu; Xiaojun Wu; Josef Kittler; | code |
| 254 | Curvature Enhanced Data Augmentation for Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, a novel manifold learning approach for generating synthetic data was proposed, utilizing a first-order approximation of the data manifold. Building on this foundation, we present a theoretical framework and practical tools for approximating and sampling general data manifolds. |
Ilya Kaufman; Omri Azencot; | code |
| 255 | Neural Graph Matching Improves Retrieval Augmented Generation in Molecular Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply this approach to mass spectrum simulation and introduce MARASON, a novel model that incorporates neural graph matching to enhance a fragmentation-based neural network. |
Runzhong Wang; Rui-Xi Wang; Mrunali Manjrekar; Connor W. Coley; | code |
| 256 | Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present $Morse$, a simple dual-sampling framework for accelerating diffusion models losslessly. |
Chao Li; Jiawei Fan; Anbang Yao; | code |
| 257 | Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we analyze and identify samples within benign datasets that contribute most to safety degradation, then fine-tune LLMs exclusively on these samples. We approach this problem from an outlier detection perspective and propose Self-Inf-N, to detect and extract outliers for fine-tuning. |
Zihan Guan; Mengxuan Hu; Ronghang Zhu; Sheng Li; Anil Vullikanti; | code |
| 258 | ExpProof : Operationalizing Explanations for Confidential Models with ZKPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step towards operationalizing explanations in adversarial scenarios with Zero-Knowledge Proofs (ZKPs), a cryptographic primitive. |
Chhavi Yadav; Evan Laufer; Dan Boneh; Kamalika Chaudhuri; | code |
| 259 | Learning Fused State Representations for Control from Multi-View Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **M**ulti-view **F**usion **S**tate for **C**ontrol (**MFSC**), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. |
Zeyu Wang; Yao-Hui Li; Xin Li; Hongyu Zang; Romain Laroche; Riashat Islam; | code |
| 260 | Taming Knowledge Conflicts in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works attribute this conflict to the interplay between memory heads and context heads, attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the *superposition of contextual information and parametric memory*, where highly influential attention heads simultaneously contribute to both memory and context. |
Gaotang Li; Yuzhong Chen; Hanghang Tong; | code |
| 261 | TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional models typically process all variables or time points uniformly, which limits their ability to capture complex variable relationships and obtain non-trivial time representations. To address this issue, we propose TimePro, an innovative Mamba-based model that constructs variate- and time-aware hyper-states. |
Xiaowen Ma; Zhen-Liang Ni; Shuai Xiao; Xinghao Chen; | code |
| 262 | The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models Via Visual Information Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the internal dynamics of hallucination by examining the tokens logits rankings throughout the generation process, revealing three key patterns in how LVLMs process information: (1) *gradual visual information loss* — visually grounded tokens gradually become less favored throughout generation, and (2) *early excitation* — semantically meaningful tokens achieve peak activation in the layers earlier than the final layer. |
Zhuowei Li; Haizhou Shi; Yunhe Gao; Di Liu; Zhenting Wang; Yuxiao Chen; Ting Liu; Long Zhao; Hao Wang; Dimitris N. Metaxas; | code |
| 263 | On Explaining Equivariant Graph Networks Via Improved Relevance Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current XAI techniques either struggle to adapt to equivariant GNNs or fail to effectively handle positional data and evaluate the significance of geometric features adequately. To address these challenges, we introduce a novel method, known as EquiGX, which uses the Deep Taylor decomposition framework to extend the layer-wise relevance propagation rules tailored for spherical equivariant GNNs. |
Hongyi Ling; Haiyang Yu; Zhimeng Jiang; Na Zou; Shuiwang Ji; | code |
| 264 | One Diffusion Step to Real-World Super-Resolution Via Flow Trajectory Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing one-step diffusion methods are constrained by the performance of the teacher model, where poor teacher performance results in image artifacts. To address this limitation, we propose FluxSR, a novel one-step diffusion Real-ISR technique based on flow matching models. |
Jianze Li; Jiezhang Cao; Yong Guo; Wenbo Li; Yulun Zhang; | code |
| 265 | Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance—the former indicates that those insufficiently optimized data should be emphasized, while the latter stresses some critical data that are most influential for loss minimization. |
Puning Yang; Qizhou Wang; Zhuo Huang; Tongliang Liu; Chengqi Zhang; Bo Han; | code |
| 266 | NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue Via Next-Token-Pair Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we systematically explore the use of dual-channel speech data in the context of modern large language models, and introduce a novel generative modeling paradigm—Next-Token-Pair Prediction (NTPP)—to enable speaker-independent dual-channel spoken dialogue learning using decoder-only architectures for the first time. |
Qichao Wang; Ziqiao Meng; Wenqian Cui; Yifei Zhang; Pengcheng Wu; Bingzhe Wu; Irwin King; Liang Chen; Peilin Zhao; | code |
| 267 | Stabilizing Sample Similarity in Representation Via Mitigating Random Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify random consistency—an inherent bias in Euclidean distance metrics—as a key obstacle to reliable evaluation, affecting both fairness and discrimination. To address this, we derive the expected Euclidean distance under uniformly distributed label permutations and introduce its closed-form solution, the Pure Square Euclidean Distance (PSED), which provably eliminates random consistency. |
Jieting Wang; ZhangZelong; Feijiang Li; Yuhua Qian; Xinyan Liang; | code |
| 268 | Clients Collaborate: Flexible Differentially Private Federated Learning with Guaranteed Improvement of Utility-Privacy Trade-off Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel federated learning framework with rigorous privacy guarantees, named **FedCEO**, designed to strike a trade-off between model utility and user privacy by letting clients ***C**ollaborate with **E**ach **O**ther*. |
Yuecheng Li; Lele Fu; Tong Wang; Jian Lou; Bin Chen; Lei Yang; Jian Shen; Zibin Zheng; Chuan Chen; | code |
| 269 | Earley-Driven Dynamic Pruning for Efficient Structured Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, creating this mask requires checking the validity of all tokens in the LLM vocabulary at every decoding step, which often incurs significant overheads in existing constrained decoding engines. To address this challenge, we propose $\textbf{ZapFormat}$, a novel $\textbf{dynamic pruning}$ strategy based on the Earley algorithm that identifies and eliminates invalid or redundant Earley states in real-time, significantly reducing memory occupation of the Earley algorithm’s states. |
Xintong Sun; Chi Wei; Minghao Tian; Shiwen Ni; | code |
| 270 | From Uncertain to Safe: Conformal Adaptation of Diffusion Models for Safe PDE Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods rarely consider safety requirements crucial in real-world applications. To address this limitation, we propose Safe Diffusion Models for PDE Control (SafeDiffCon), which introduce the uncertainty quantile as model uncertainty quantification to achieve optimal control under safety constraints through both post-training and inference phases. |
Peiyan Hu; Xiaowei Qian; Wenhao Deng; Rui Wang; Haodong Feng; Ruiqi Feng; Tao Zhang; Long Wei; Yue Wang; Zhi-Ming Ma; Tailin Wu; | code |
| 271 | ParallelComp: Parallel Long-Context Compressor for Length Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose ParallelComp, a parallel long-context compression method that effectively overcomes the memory bottleneck, enabling 8B-parameter LLMs to extrapolate from 8K to 128K tokens on a single A100 80GB GPU in a training-free setting. |
Jing Xiong; Jianghan Shen; Chuanyang Zheng; Zhongwei Wan; Chenyang Zhao; Chiwun Yang; Fanghua Ye; Hongxia Yang; Lingpeng Kong; Ngai Wong; | code |
| 272 | Incorporating Arbitrary Matrix Group Equivariance Into KANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Equivariant Kolmogorov-Arnold Networks (EKAN), a method for incorporating arbitrary matrix group equivariance into KANs, aiming to broaden their applicability to more fields. |
Lexiang Hu; Yisen Wang; Zhouchen Lin; | code |
| 273 | Explicit Discovery of Nonlinear Symmetries from Dynamic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose LieNLSD, which is, to our knowledge, the first method capable of determining the number of infinitesimal generators with nonlinear terms and their explicit expressions. |
Lexiang Hu; Yikang Li; Zhouchen Lin; | code |
| 274 | Flat-LoRA: Low-Rank Adaptation Over A Flat Loss Landscape Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space. |
Tao Li; Zhengbao He; Yujun Li; Yasheng Wang; Lifeng Shang; Xiaolin Huang; | code |
| 275 | TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, coarse-grained clustering struggles to capture complex, time-varying interactions effectively. To address these challenges, we propose TimeFilter, a GNN-based framework for adaptive and fine-grained dependency modeling. |
Yifan Hu; Guibin Zhang; Peiyuan Liu; Disen Lan; Naiqi Li; Dawei Cheng; Tao Dai; Shu-Tao Xia; Shirui Pan; | code |
| 276 | Non-stationary Diffusion For Probabilistic Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we innovatively utilize the Location-Scale Noise Model (LSNM) to relax the fixed uncertainty assumption of ANM. |
Weiwei Ye; Zhuopeng Xu; Ning Gui; | code |
| 277 | Hgformer: Hyperbolic Graph Transformer for Collaborative Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this remarkable progress, local structure modeling and embedding distortion still remain two notable limitations in the majority of GNN-based CF methods. Therefore, in this paper, we propose a novel Hyperbolic Graph Transformer architecture, to tackle the long-tail problems in CF tasks. |
Xin Yang; Xingrun Li; Heng Chang; Yang jinze; Xihong Yang; Shengyu Tao; Maiko Shigeno; Ningkang Chang; Junfeng Wang; Dawei Yin; Erxue Min; | code |
| 278 | TeLoGraF: Temporal Logic Planning Via Graph-encoded Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TeLoGraF, Temporal Logic Graph-encoded Flow, which utilizes Graph Neural Networks (GNN) encoder and flow-matching to learn solutions for general STL specifications. |
Yue Meng; Chuchu Fan; | code |
| 279 | EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While language-centric embodied agents have garnered substantial attention, MLLM-based embodied agents remain underexplored due to the lack of comprehensive evaluation frameworks. To bridge this gap, we introduce EmbodiedBench, an extensive benchmark designed to evaluate vision-driven embodied agents. |
Rui Yang; Hanyang Chen; Junyu Zhang; Mark Zhao; Cheng Qian; Kangrui Wang; Qineng Wang; Teja Venkat Koripella; Marziyeh Movahedi; Manling Li; Heng Ji; Huan Zhang; Tong Zhang; | code |
| 280 | Efficient Multi-modal Long Context Learning for Training-free Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Efficient Multi-Modal Long Context Learning (EMLoC), a novel training-free alternative that embeds demonstration examples directly into the model input. |
Zehong Ma; Shiliang Zhang; Longhui Wei; Qi Tian; | code |
| 281 | Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we theoretically demonstrate that initial tokens in the draft sequence are more important than later ones. Building on this insight, we propose Gumiho, a hybrid model combining serial and parallel heads. |
Jinze Li; Yixing Xu; Haiduo Huang; Xuanwu Yin; Dong Li; Edith C. H. Ngai; Emad Barsoum; | code |
| 282 | UniDB: A Unified Diffusion Bridge Framework Via Stochastic Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches frequently produce blurred or excessively smoothed image details and lack a comprehensive theoretical foundation to explain these shortcomings. To address these limitations, we propose UniDB, a unified framework for diffusion bridges based on Stochastic Optimal Control (SOC). |
Kaizhen Zhu; Mokai Pan; Yuexin Ma; Yanwei Fu; Jingyi Yu; Jingya Wang; Ye Shi; | code |
| 283 | Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a neuro-symbolic approach that enhances LLMs-based planners with Knowledge Graph-based RAG for hierarchical plan generation. |
Flavio Petruzzellis; Cristina Cornelio; Pietro Lio; | code |
| 284 | Overcoming Non-monotonicity in Transducer-based Streaming Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, its input-synchronous decoding mechanism presents challenges in tasks requiring non-monotonic alignments, such as simultaneous translation. In this research, we address this issue by integrating Transducer’s decoding with the history of input stream via a learnable monotonic attention. |
Zhengrui Ma; Yang Feng; Min zhang; | code |
| 285 | OneForecast: A Universal Framework for Global and Regional Weather Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In recent years, deep learning models have made significant progress in weather forecasting, but challenges remain, such as balancing global and regional high-resolution forecasts, excessive smoothing in extreme event predictions, and insufficient dynamic system modeling. To address these issues, this paper proposes a global-regional nested weather forecasting framework (OneForecast) based on graph neural networks. |
Yuan Gao; Hao Wu; Ruiqi Shu; huanshuo dong; Fan Xu; Rui Ray Chen; Yibo Yan; Qingsong Wen; Xuming Hu; Kun Wang; Jiahao Wu; Li Qing; Hui Xiong; Xiaomeng Huang; | code |
| 286 | Efficient Motion Prompt Learning for Robust Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a lightweight and plug-and-play motion prompt tracking method. |
Jie Zhao; Xin Chen; Yongsheng Yuan; Michael Felsberg; Dong Wang; Huchuan Lu; | code |
| 287 | Imitation Learning from A Single Temporally Misaligned Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that matching should instead be defined at the level of sequences. |
William Huey; Huaxiaoyue Wang; Anne Wu; Yoav Artzi; Sanjiban Choudhury; | code |
| 288 | AtlasD: Automatic Local Symmetry Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formalize the notion of local symmetry as atlas equivariance. |
Manu Bhat; Jonghyun Park; Jianke Yang; Nima Dehmamy; Robin Walters; Rose Yu; | code |
| 289 | MCU: An Evaluation Framework for Open-Ended Game Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, evaluating such open-ended agents remains difficult, with current benchmarks facing scalability limitations. To address this, we introduce \textit{Minecraft Universe} (MCU), a comprehensive evaluation framework set within the open-world video game Minecraft. |
Xinyue Zheng; Haowei Lin; Kaichen He; Zihao Wang; QIANG FU; Haobo Fu; Zilong Zheng; Yitao Liang; | code |
| 290 | POQD: Performance-Oriented Query Decomposer for Multi-vector Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even worse, jointly solving this problem and training the downstream retrieval-based systems, say RAG systems could be highly inefficient. To overcome these challenges, we propose Performance-Oriented Query Decomposer (POQD), a novel query decomposition framework for MVR. |
Yaoyang Liu; Junlin Li; Yinjun Wu; zhen chen; | code |
| 291 | FSTLLM: Spatio-Temporal LLM for Few Shot Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models typically require large volumes of training data and often struggle in data-scarce scenarios. To address this limitation, we propose a framework named Few-shot Spatio-Temporal Large Language Models (FSTLLM), aimed at enhancing model robustness and predictive performance in few-shot settings. |
YUE JIANG; Yile Chen; Xiucheng Li; Qin Chao; SHUAI LIU; Gao Cong; | code |
| 292 | Interpreting CLIP with Hierarchical Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce Matryoshka SAE (MSAE), a new architecture that learns hierarchical representations at multiple granularities simultaneously, enabling a direct optimization of both metrics without compromise. |
Vladimir Zaigrajew; Hubert Baniecki; Przemyslaw Biecek; | code |
| 293 | Whoever Started The Interference Should End It: Guiding Data-Free Model Merging Via Task Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although prior work has explored many merging strategies, resolving interference without additional data for retraining or test-time computation remains challenging. In this paper, we theoretically demonstrate that the task vectors of the linear layer constitute an approximate linear subspace for its corresponding input. |
Runxi Cheng; Feng Xiong; Yongxian Wei; Wanyun Zhu; Chun Yuan; | code |
| 294 | Directly Forecasting Belief for Reinforcement Learning with Delays Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. |
Qingyuan Wu; Yuhui Wang; Simon Sinong Zhan; Yixuan Wang; Chung-Wei Lin; Chen Lv; Qi Zhu; Jürgen Schmidhuber; Chao Huang; | code |
| 295 | Action Dubber: Timing Audible Actions Via Inflectional Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the task of Audible Action Temporal Localization, which aims to identify the spatio-temporal coordinates of audible movements.To support this task, we introduce a new benchmark dataset, $Audible623$, derived from Kinetics and UCF101 by removing non-essential vocalization subsets. |
Wenlong Wan; Weiying Zheng; Tianyi Xiang; Guiqing Li; Shengfeng He; | code |
| 296 | VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods cannot scale up to extremely large scenes, due to the inefficient tracking and mapping strategies that need to optimize all 3D Gaussians in the limited GPU memories throughout the training to maintain the geometry and color consistency to previous RGBD observations. To resolve this issue, we propose novel tracking and mapping strategies to work with a novel 3D representation, dubbed view-tied 3D Gaussians, for RGBD SLAM systems. |
Pengchong Hu; Zhizhong Han; | code |
| 297 | What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks face significant limitations, including uncontrollable task complexity, extensive manual annotation, and a lack of multidimensional evaluation. In response to these challenges, we introduce OmniBench, a self-generating, graph-based benchmark with an automated pipeline for synthesizing tasks of controllable complexity through subtask composition. |
Wendong Bu; Yang Wu; Qifan Yu; Minghe Gao; Bingchen Miao; Zhenkui Zhang; Kaihang Pan; liyunfei; Mengze Li; Wei Ji; Juncheng Li; Siliang Tang; Yueting Zhuang; | code |
| 298 | Feature Shift Localization Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the Feature Shift Localization Network (FSL-Net), a neural network that can localize feature shifts in large and high-dimensional datasets in a fast and accurate manner. |
Míriam Barrabés; Daniel Mas Montserrat; Kapal Dev; Alexander G. Ioannidis; | code |
| 299 | LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Llavaguard, a suite of VLM-based vision safeguards that address the critical need for reliable tools in the era of large-scale data and models.For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. |
Lukas Helff; Felix Friedrich; Manuel Brack; Kristian Kersting; Patrick Schramowski; | code |
| 300 | Enhancing Treatment Effect Estimation Via Active Learning: A Counterfactual Covering Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the bound, we propose a greedy radius reduction algorithm, which excels under an idealized, balanced data distribution. |
Hechuan Wen; Tong Chen; Mingming Gong; Li Kheng Chai; Shazia Sadiq; Hongzhi Yin; | code |
| 301 | Latent Imputation Before Prediction: A New Computational Paradigm for De Novo Peptide Sequencing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the issue of missing fragmentation, attributable to factors such as suboptimal fragmentation efficiency and instrumental constraints, presents a formidable challenge in practical applications. To tackle this obstacle, we propose a novel computational paradigm called $\underline{\textbf{L}}$atent $\underline{\textbf{I}}$mputation before $\underline{\textbf{P}}$rediction (LIPNovo). |
Ye Du; Chen Yang; Nanxi Yu; Wanyu Lin; Qian Zhao; Shujun Wang; | code |
| 302 | TLLC: Transfer Learning-based Label Completion for Crowdsourcing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in real-world scenarios, workers typically annotate only a few instances, leading to insufficient worker modeling and thus limiting the improvement of label completion. To address this issue, we propose a novel transfer learning-based label completion (TLLC) method. |
Wenjun Zhang; Liangxiao Jiang; Chaoqun Li; | code |
| 303 | FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though Rectified Flows (ReFlows) with distillation offer a promising way for fast sampling, its fast inversion transforms images back to structured noise for recovery and following editing remains unsolved. This paper introduces FireFlow, an embarrassingly simple yet effective zero-shot approach that inherits the startling capacity of ReFlow-based models (such as FLUX) in generation while extending its capabilities to accurate inversion and editing in **8** steps. |
Yingying Deng; Xiangyu He; Changwang Mei; Peisong Wang; Fan Tang; | code |
| 304 | ROME Is Forged in Adversity: Robust Distilled Datasets Via Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While adversarial robustness has been extensively studied in related fields, research on improving DD robustness is still limited. To address this, we propose ROME, a novel method that enhances the adversarial RObustness of DD by leveraging the InforMation BottlenEck (IB) principle. |
Zheng Zhou; Wenquan Feng; Qiaosheng Zhang; Shuchang Lyu; Qi Zhao; Guangliang Cheng; | code |
| 305 | Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these advancements, most loss functions are still primarily pixel-wise, while regional and boundary-focused loss functions often incur high computational costs or are restricted to small-scale regions. To address this limitation, we propose the complex wavelet mutual information (CWMI) loss, a novel loss function that leverages mutual information from subband images decomposed by a complex steerable pyramid. |
Renhao Lu; | code |
| 306 | One Wave To Explain Them All: A Unifying Perspective On Feature Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Feature attribution methods aim to improve the transparency of deep neural networks by identifying the input features that influence a model’s decision. |
Gabriel Kasmi; Amandine Brunetto; Thomas Fel; Jayneel Parekh; | code |
| 307 | Modeling All-Atom Glycan Structures Via Hierarchical Message Passing and Multi-Scale Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous methods mainly focused on modeling the backbone structure of glycans as graphs of monosaccharides (i.e., sugar units), while they neglected the atomic structures underlying each monosaccharide, which are actually important indicators of glycan properties. We fill this blank by introducing the GlycanAA model for All-Atom-wise Glycan modeling. |
Minghao Xu; Jiaze Song; Keming Wu; Xiangxin Zhou; Bin CUI; Wentao Zhang; | code |
| 308 | Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle it, we introduce a new metric Balancing Preservation Modification (BPM), that tailored for instruction-based image editing by explicitly disentangling the image into editing-relevant and irrelevant regions for specific consideration. |
Zhuoying Li; Zhu Xu; Yuxin Peng; Yang Liu; | code |
| 309 | RollingQ: Reviving The Cooperation Dynamics in Multimodal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To revive adaptability, we propose a simple yet effective method Rolling Query (RollingQ), which balances attention allocation by rotating the query to break the self-reinforcing cycle and mitigate the key distribution gap. |
HaoTian Ni; Yake Wei; Hang Liu; Gong Chen; Chong Peng; Hao Lin; Di Hu; | code |
| 310 | Token Cleaning: Fine-Grained Data Selection for LLM Supervised Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate token quality from a noisy-label perspective and propose a generic token cleaning pipeline for SFT tasks. |
Jinlong Pang; Na Di; Zhaowei Zhu; Jiaheng Wei; Hao Cheng; Chen Qian; Yang Liu; | code |
| 311 | KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Experimental results show that we can achieve nearly lossless 3.25-bit mixed precision KV cache quantization for LLMs like Llama-3.1-8B-Instruct and 4.0-bit for sensitive models like Qwen2.5-7B-Instruct on mathematical reasoning tasks. |
Xing Li; Zeyu XING; Yiming Li; Linping Qu; Hui-Ling Zhen; Yiwu Yao; Wulong Liu; Sinno Jialin Pan; Mingxuan Yuan; | code |
| 312 | CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks and models are primarily limited to a small set of modalities and tasks, which hinders the development of large-scale multimodal methods that can make holistic assessments of patient health and well-being. To bridge this gap, we introduce Clinical Large-scale Integrative Multimodal Benchmark (CLIMB), a comprehensive clinical benchmark unifying diverse clinical data across imaging, language, temporal, and graph modalities. |
Wei Dai; Peilin Chen; Malinda Lu; Daniel A Li; Haowen Wei; Hejie Cui; Paul Pu Liang; | code |
| 313 | Posterior Inference with Diffusion Models for High-dimensional Black-box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, those methods often underperform compared to BO methods due to limited expressivity and difficulty of uncertainty estimation in high-dimensional spaces. To overcome these issues, we introduce \textbf{DiBO}, a novel framework for solving high-dimensional black-box optimization problems. |
Taeyoung Yun; Kiyoung Om; Jaewoo Lee; Sujin Yun; Jinkyoo Park; | code |
| 314 | Automatically Interpreting Millions of Features in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build an open-source automated pipeline to generate and evaluate natural language interpretations for SAE latents using LLMs. |
Gonçalo Santos Paulo; Alex Troy Mallen; Caden Juang; Nora Belrose; | code |
| 315 | Gradient Aligned Regression Via Pairwise Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose GAR (Gradient Aligned Regression) as a competitive alternative method in label space, which is constituted by a conventional regression loss and two pairwise label difference losses for gradient alignment including magnitude and direction. |
Dixian Zhu; Tianbao Yang; Livnat Jerby; | code |
| 316 | Diffusion on Language Model Encodings for Protein Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present *DiMA*, a latent diffusion framework that operates on protein language model representations. |
Viacheslav Meshchaninov; Pavel Strashnov; Andrey Shevtsov; Fedor Nikolaev; Nikita Ivanisenko; Olga Kardymon; Dmitry Vetrov; | code |
| 317 | GTR: A General, Multi-View, and Dynamic Framework for Trajectory Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose GTR, a general, multi-view, and dynamic Trajectory Representation framework built on a pre-train and fine-tune architecture. |
Xiangheng Wang; Ziquan Fang; Chenglong Huang; Danlei Hu; Lu Chen; Yunjun Gao; | code |
| 318 | UDora: A Unified Red Teaming Framework Against LLM Agents By Dynamically Hijacking Their Own Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present UDora, a unified red teaming framework designed for LLM agents that dynamically hijacks the agent’s reasoning processes to compel malicious behavior. |
Jiawei Zhang; Shuang Yang; Bo Li; | code |
| 319 | SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, effectively embedding precise safety knowledge into MLLMs for autonomous driving remains a significant challenge. To address this, we propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge. |
Jiawei Zhang; Xuan Yang; Taiqi Wang; Yu Yao; Aleksandr Petiushko; Bo Li; | code |
| 320 | Improving LLM Video Understanding with 16 Frames Per Second Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce F-16, the first multimodal LLM designed for high-frame-rate video understanding.We will release the source code, model checkpoints, and data at [https://github.com/bytedance/F-16](https://github.com/bytedance/F-16). |
Yixuan Li; Changli Tang; Jimin Zhuang; Yudong Yang; Guangzhi Sun; Wei Li; Zejun MA; Chao Zhang; | code |
| 321 | FlatQuant: Flatness Matters for LLM Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FlatQuant (Fast and Learnable Affine Transformation), a new post-training quantization approach that enhances the flatness of weights and activations. |
Yuxuan Sun; Ruikang Liu; Haoli Bai; Han Bao; Kang Zhao; Yuening Li; JiaxinHu; Xianzhi Yu; Lu Hou; Chun Yuan; Xin Jiang; Wulong Liu; Jun Yao; | code |
| 322 | CAT: Contrastive Adversarial Training for Evaluating The Robustness of Protective Perturbations in Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Extensive experiments demonstrate that our CAT method significantly reduces the effectiveness of protective perturbations in customization, urging the community to reconsider and improve the robustness of existing protective perturbations. |
Sen Peng; Mingyue Wang; Jianfei He; Jijia Yang; Xiaohua Jia; | code |
| 323 | Variational Control for Guidance in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new method within this framework that achieves state-of-the-art results on several linear, non-linear, and blind inverse problems without requiring additional model training or specificity to pixel or latent space diffusion models. |
Kushagra Pandey; Farrin Marouf Sofian; Felix Draxler; Theofanis Karaletsos; Stephan Mandt; | code |
| 324 | Can We Predict Performance of Large Models Across Vision-Language Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a new framework for predicting unknown performance scores based on observed ones from other LVLMs or tasks. |
Qinyu Zhao; Ming Xu; Kartik Gupta; Akshay Asthana; Liang Zheng; Stephen Gould; | code |
| 325 | RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively realize low-bit quantization of weights, activations and KV caches in LLMs, we propose an algorithm named Rotated Straight-Through-Estimator (RoSTE), which combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy that identifies an effective rotation configuration to reduce activation outliers. |
Quan Wei; Chung-Yiu Yau; Hoi To Wai; Yang Zhao; Dongyeop Kang; Youngsuk Park; Mingyi Hong; | code |
| 326 | Latent Thought Models with Variational Bayes Inference-Time Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. |
Deqian Kong; Minglu Zhao; Dehong Xu; Bo Pang; Shu Wang; Edouardo Honig; Zhangzhang Si; Chuan Li; Jianwen Xie; Sirui Xie; Ying Nian Wu; | code |
| 327 | From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, these methods struggle to learn generalized representations due to their inability to model the hierarchical structure of ECG data. To address this gap, we introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs. |
Fuying Wang; Jiacheng Xu; Lequan Yu; | code |
| 328 | NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to select training data for instruction tuning to improve the LLM performance on specific tasks. |
Jingtan Wang; Xiaoqiang Lin; Rui Qiao; Pang Wei Koh; Chuan-Sheng Foo; Bryan Kian Hsiang Low; | code |
| 329 | EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we propose a novel ensemble method, namely *EnsLoss*, which extends the ensemble learning concept to combine loss functions within the ERM framework. |
Ben Dai; | code |
| 330 | The Lock-in Hypothesis: Stagnation By Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The training and deployment of large language models (LLMs) create a feedback loop with human users: models learn human beliefs from data, reinforce these beliefs with generated content, reabsorb the reinforced beliefs, and feed them back to users again and again. |
Tianyi Qiu; Zhonghao He; Tejasveer Chugh; Max Kleiman-Weiner; | code |
| 331 | Empowering World Models with Reflection for Embodied Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing models often lack robust understanding, limiting their ability to perform multi-step predictions or handle Out-of-Distribution (OOD) scenarios. To address this challenge, we propose the Reflection of Generation (RoG), a set of intermediate reasoning strategies designed to enhance video prediction. |
Xiaowei Chi; Chun-Kai Fan; Hengyuan Zhang; Xingqun Qi; Rongyu Zhang; Anthony Chen; Chi-Min Chan; Wei Xue; Qifeng Liu; Shanghang Zhang; Yike Guo; | code |
| 332 | Label Distribution Propagation-based Label Completion for Crowdsourcing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, WSLC considers solely the correlation of the labels annotated by different workers on per individual instance while totally ignoring the correlation of the labels annotated by different workers among similar instances. To fill this gap, we propose a novel label distribution propagation-based label completion (LDPLC) algorithm. |
Tong Wu; Liangxiao Jiang; Wenjun Zhang; Chaoqun Li; | code |
| 333 | Origin Identification for Text-Guided Image-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to *visual discrepancy* across generations produced by different diffusion models, this similarity-based approach fails when training on images from one model and testing on those from another, limiting its effectiveness in real-world applications. To solve this challenge of the proposed ID$^2$ task, we contribute the first dataset and a theoretically guaranteed method, both emphasizing generalizability. |
Wenhao Wang; Yifan Sun; Zongxin Yang; Zhentao Tan; Zhengdong Hu; Yi Yang; | code |
| 334 | Zero-Shot Offline Imitation Learning Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this framework can suffer from myopic behavior: the agent’s immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. |
Thomas Rupf; Marco Bagatella; Nico Gürtler; Jonas Frey; Georg Martius; | code |
| 335 | Can Transformers Learn Full Bayesian Inference in Context? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows and enables us to infer complex posterior distributions for models such as generalized linear models and latent factor models. |
Arik Reuter; Tim G. J. Rudner; Vincent Fortuin; David Rügamer; | code |
| 336 | HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation Via Heterogeneous Knowledge Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present **HealthGPT**, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. |
Tianwei Lin; Wenqiao Zhang; SIJING LI; Yuqian Yuan; Binhe Yu; Haoyuan Li; Wanggui He; Hao Jiang; Mengze Li; Song xiaohui; Siliang Tang; Jun Xiao; Hui Lin; Yueting Zhuang; Beng Chin Ooi; | code |
| 337 | Stable Fair Graph Representation Learning with Lipschitz Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a stable fair Graph Neural Network (SFG) to maintain training stability while preserving accuracy and fairness performance. |
Qiang Chen; Zhongze Wu; Xiu Su; Xi Lin; Zhe Qu; Shan You; Shuo Yang; Chang Xu; | code |
| 338 | One Stone, Two Birds: Enhancing Adversarial Defense Through The Lens of Distributional Discrepancy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the strength of SADD-based methods by theoretically showing that minimizing distributional discrepancy can help reduce the expected loss on AEs. |
Jiacheng Zhang; Benjamin I. P. Rubinstein; Jingfeng Zhang; Feng Liu; | code |
| 339 | Bridging Layout and RTL: Knowledge Distillation Based Timing Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, existing RTL-level approaches sacrifice accuracy due to the limited physical information available. We propose RTLDistil, a novel cross-stage knowledge distillation framework that bridges this gap by transferring precise physical characteristics from a layout-aware teacher model (Teacher GNN) to an efficient RTL-level student model (Student GNN), both implemented as graph neural networks (GNNs). |
Mingjun Wang; Yihan Wen; Bin Sun; Jianan Mu; Juan Li; Xiaoyi Wang; Jing Justin Ye; Bei Yu; Huawei Li; | code |
| 340 | L3A: Label-Augmented Analytic Adaptation for Multi-Label Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-label CIL (MLCIL) extends CIL to a real-world scenario where each sample may belong to multiple classes, introducing several challenges: label absence, which leads to incomplete historical information due to missing labels, and class imbalance, which results in the model bias toward majority classes. To address these challenges, we propose Label-Augmented Analytic Adaptation (L3A), an exemplar-free approach without storing past samples. |
Xiang Zhang; Run He; Chen Jiao; Di Fang; Ming Li; Ziqian Zeng; Cen Chen; Huiping Zhuang; | code |
| 341 | Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel kernel-based method to align CLIP’s visual representation with that of DINOv2, ensuring that the resulting embeddings maintain compatibility with text embeddings while enhancing perceptual capabilities. |
Shizhan Gong; Yankai Jiang; Qi Dou; Farzan Farnia; | code |
| 342 | TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Armed with the DZP ideas, we propose TMetaNet, a new meta-learning parameter update model based on dynamic topological features. |
Hao Li; Hao Wan; Yuzhou Chen; Dongsheng Ye; Yulia Gel; Hao Jiang; | code |
| 343 | Efficient Quantification of Multimodal Interaction at Sample Level Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. |
Zequn Yang; Hongfa Wang; Di Hu; | code |
| 344 | Adapting Precomputed Features for Efficient Graph Condensation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the efficiency issue, we completely bypass trajectory matching and propose a novel two-stage framework. |
Yuan Li; Jun Hu; Zemin Liu; Bryan Hooi; Jia Chen; Bingsheng He; | code |
| 345 | Sample-specific Noise Injection for Diffusion-based Adversarial Purification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discover that an optimal $t*$ for each sample indeed could be different. |
Yuhao Sun; Jiacheng Zhang; Zesheng Ye; Chaowei Xiao; Feng Liu; | code |
| 346 | WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such interaction analysis remains underexplored due to the lack of dedicated language datasets that address it. Therefore, we propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a comprehensive large-scale Q&As dataset built on WOMD focusing on describing and reasoning traffic rule-induced interactions in driving scenarios. |
Yiheng Li; Cunxin Fan; Chongjian GE; Seth Z. Zhao; Chenran Li; Chenfeng Xu; Huaxiu Yao; Masayoshi Tomizuka; Bolei Zhou; Chen Tang; Mingyu Ding; Wei Zhan; | code |
| 347 | Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Correspondingly, we introduce OPSA-AT (Adversarial Training), a defense strategy that integrates OPSA within a novel conformal training paradigm. |
Jie Bao; Chuangyin Dang; Rui Luo; Hanwei Zhang; Zhixin Zhou; | code |
| 348 | Three-Dimensional Trajectory Prediction with 3DMoTraj Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Mathematically, trajectory prediction becomes significantly more complex when transitioning from 2D to 3D. To tackle this challenge, we analyze the prediction complexity of 3D trajectories and propose a new method consisting of two key components: decoupled trajectory prediction and correlated trajectory refinement. |
Hao Zhou; Xu Yang; Mingyu Fan; Lu Qi; Xiangtai Li; Ming-Hsuan Yang; Fei Luo; | code |
| 349 | GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Graph Inspired Veracity Extrapolation (GIVE), a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input. |
Jiashu He; Mingyu Derek Ma; Jinxuan Fan; Dan Roth; Wei Wang; Alejandro Ribeiro; | code |
| 350 | Regress, Don’t Guess: A Regression-like Loss on Number Tokens for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we here present a regression-like loss that operates purely on token level. |
Jonas Zausinger; Lars Pennig; Anamarija Kozina; Sean Sdahl; Julian Sikora; Adrian Dendorfer; Timofey Kuznetsov; Mohamad Hagog; Nina Wiedemann; Kacper Chlodny; Vincent Limbach; Anna Ketteler; Thorben Prein; Vishwa Mohan Singh; Michael Danziger; Jannis Born; | code |
| 351 | EFDTR: Learnable Elliptical Fourier Descriptor Transformer for Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel vertex regression loss grounded in Fourier elliptic descriptors, which removes the need for rasterization or heuristic approximations and resolves ambiguities in boundary point assignment through frequency-domain matching. |
Jiawei Cao; Chaochen Gu; Hao Cheng; Xiaofeng Zhang; Kaijie Wu; Changsheng Lu; | code |
| 352 | Test-Time Canonicalization By Foundation Models for Robust Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose FoCal, a test-time, data-driven framework that achieves robust perception by leveraging internet-scale visual priors from foundation models. |
Utkarsh Singhal; Ryan Feng; Stella X. Yu; Atul Prakash; | code |
| 353 | Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability. |
He Li; Haoang Chi; Mingyu Liu; Wanrong Huang; Liyang Xu; Wenjing Yang; | code |
| 354 | Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These scenarios face significant challenges due to high variance and poor performance with low-quality propensity scores and heavy-tailed reward distributions. We address these issues by introducing a novel estimator based on the log-sum-exponential (LSE) operator, which outperforms traditional inverse propensity score estimators. |
Armin Behnamnia; Gholamali Aminian; Alireza Aghaei; Chengchun Shi; Vincent Y. F. Tan; Hamid R. Rabiee; | code |
| 355 | Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose *federated full-parameter tuning at scale for LLMs* (Ferret), **the first first-order method with shared randomness** to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. |
Yao Shu; Wenyang Hu; See-Kiong Ng; Bryan Kian Hsiang Low; Fei Yu; | code |
| 356 | De-AntiFake: Rethinking The Protective Perturbations Against Voice Cloning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. |
Wei Fan; Kejiang Chen; Chang Liu; Weiming Zhang; Nenghai Yu; | code |
| 357 | Learning from Sample Stability for Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unstable representations across epochs often lead to mispredictions, indicating difficulty in memorization and atypicality. Leveraging these findings, we introduce supervision signals for the first time based on sample stability at the representation level. |
Zhixin Li; Yuheng Jia; Hui LIU; Junhui Hou; | code |
| 358 | LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new KV cache optimization paradigm called LaCache, a training-free method for efficient and accurate generative inference of LLMs. |
Dachuan Shi; Yonggan Fu; Xiangchi Yuan; Zhongzhi Yu; Haoran You; Sixu Li; Xin Dong; Jan Kautz; Pavlo Molchanov; Yingyan Celine Lin; | code |
| 359 | Controlling Large Language Model with Latent Action Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we apply **CoLA** to the Llama-3.1-8B model. |
Chengxing Jia; Ziniu Li; Pengyuan Wang; Yi-Chen Li; Zhenyu Hou; Yuxiao Dong; Yang Yu; | code |
| 360 | Demystifying Singular Defects in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide both theoretical insights and empirical validation across a range of recent models, leading to the following observations: i) The layer-wise singular direction predicts the abrupt explosion of token norms in LLMs. |
Haoqi Wang; Tong Zhang; Mathieu Salzmann; | code |
| 361 | VCT: Training Consistency Models with Variational Noise Coupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Variational Consistency Training (VCT), a flexible and effective framework compatible with various forward kernels, including those in flow matching. |
Gianluigi Silvestri; Luca Ambrogioni; Chieh-Hsin Lai; Yuhta Takida; Yuki Mitsufuji; | code |
| 362 | CaDA: Cross-Problem Routing Solver with Constraint-Aware Dual-Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, they rely solely on global connectivity, which fails to focus on key nodes and leads to inefficient representation learning. This paper introduces a \underline{C}onstraint-\underline{A}ware \underline{D}ual-\underline{A}ttention Model (CaDA), designed to address these limitations. |
Han Li; Fei Liu; Zhi Zheng; Yu Zhang; Zhenkun Wang; | code |
| 363 | Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, our study reveals that the gap in feature distribution between novel and existing tasks is primarily driven by differences in mean and covariance moments. Building on this insight, we propose a novel semantic drift calibration method that incorporates mean shift compensation and covariance calibration. |
Fangwen Wu; Lechao Cheng; Shengeng Tang; Xiaofeng Zhu; Chaowei Fang; Dingwen Zhang; Meng Wang; | code |
| 364 | PTTA: Purifying Malicious Samples for Test-Time Model Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although malicious samples that would undermine the model’s optimization should be filtered out, it also leads to a waste of test data. To alleviate this issue, we focus on how to make full use of the malicious test samples for TTA by transforming them into benign ones, and propose a plug-and-play method, PTTA. |
Jing Ma; Hanlin Li; Xiang Xiang; | code |
| 365 | Differential Coding for Training-Free ANN-to-SNN Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many conversion methods are based on rate coding, which requires numerous spikes and longer time-steps compared to directly trained SNNs, leading to increased energy consumption and latency. This article introduces differential coding for ANN-to-SNN conversion, a novel coding scheme that reduces spike counts and energy consumption by transmitting changes in rate information rather than rates directly, and explores its application across various layers. |
Zihan Huang; Wei Fang; Tong Bu; Peng Xue; Zecheng Hao; Wenxuan Liu; Yuanhong Tang; Zhaofei Yu; Tiejun Huang; | code |
| 366 | Sable: A Performant, Efficient and Scalable Sequence Model for MARL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modelling approach to MARL. |
Omayma Mahjoub; Sasha Abramowitz; Ruan John de Kock; Wiem Khlifi; Simon Verster Du Toit; Jemma Daniel; Louay Ben Nessir; Louise Beyers; Juan Claude Formanek; Liam Clark; Arnu Pretorius; | code |
| 367 | Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill this gap, we delve into imbalanced pseudolabels and identify two primary contributing factors: concept mismatch and concept confusion. To mitigate these two issues, we propose a novel framework incorporating concept alignment and confusion-aware calibrated margin mechanisms. |
Yuchen Wang; Xuefeng Bai; Xiucheng Li; Weili Guan; Liqiang Nie; Xinyang Chen; | code |
| 368 | BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, it remains impossible to deploy DM to resource-limited edge devices. To address this problem, we propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. |
Kai Liu; Kaicheng Yang; Zheng Chen; Zhiteng Li; Yong Guo; Wenbo Li; Linghe Kong; Yulun Zhang; | code |
| 369 | Speculate, Then Collaborate: Fusing Knowledge of Language Models During Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to improve their performance across domains. To realize this potential, we introduce a novel Collaborative Speculative Decoding (CoSD) algorithm that enables efficient LLM knowledge fusion at test time without requiring additional model training. |
Ziyao Wang; Muneeza Azmat; Ang Li; Raya Horesh; Mikhail Yurochkin; | code |
| 370 | Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Neural Interpretable PDEs (NIPS), a novel neural operator architecture that builds upon and enhances Nonlocal Attention Operators (NAO) in both predictive accuracy and computational efficiency. |
Ning Liu; Yue Yu; | code |
| 371 | SkipGPT: Each Token Is One of A Kind Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **SkipGPT**, a dynamic layer pruning framework designed to optimize computational resource allocation through two core innovations: (1) global token-aware routing to prioritize critical tokens and (2) decoupled pruning policies for MLP and self-attention components. |
Anhao Zhao; Fanghua Ye; Yingqi Fan; Junlong Tong; Jing Xiong; Zhiwei Fei; Hui Su; Xiaoyu Shen; | code |
| 372 | Human Body Restoration with One-Step Diffusion Model and A New Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a high-quality dataset automated cropping and filtering (HQ-ACF) pipeline. |
Jue Gong; Jingkai Wang; Zheng Chen; Xing Liu; Hong Gu; Yulun Zhang; Xiaokang Yang; | code |
| 373 | NeuroTree: Hierarchical Functional Brain Pathway Decoding for Mental Health Disorders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although existing fMRI-based graph neural networks (GNNs) have demonstrated significant potential in brain network feature extraction, they often fail to characterize complex relationships between brain regions and demographic information in mental disorders. To overcome these limitations, we propose a learnable NeuroTree framework that integrates a $k$-hop AGE-GCN with neural ordinary differential equations (ODEs) and contrastive masked functional connectivity (CMFC) to enhance similarities and dissimilarities of brain region distance. |
Jun-En Ding; Dongsheng Luo; Chenwei Wu; Feng Liu; | code |
| 374 | Understanding and Mitigating Memorization in Diffusion Models for Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we provide a theoretical explanation for why memorization occurs in tabular diffusion models. To address this issue, we propose TabCutMix, a simple yet effective data augmentation technique that exchanges randomly selected feature segments between random same-class training sample pairs. |
Zhengyu Fang; Zhimeng Jiang; Huiyuan Chen; Xiao Li; Jing Li; | code |
| 375 | Semantic Shift Estimation Via Dual-Projection and Classifier Reconstruction for Exemplar-Free Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, the embeddings of old tasks shift in the embedding space after learning new tasks, and the classifier becomes biased towards new tasks due to training solely with new data, hindering the balance between old and new knowledge. To address these issues, we propose the Dual-Projection Shift Estimation and Classifier Reconstruction (DPCR) approach for EFCIL. |
Run He; Di Fang; Yicheng Xu; Yawen Cui; Ming Li; Cen Chen; Ziqian Zeng; Huiping Zhuang; | code |
| 376 | MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach is often hindered by the high cost of obtaining large amounts of labeled data. To address this challenge, we propose **M**any-Shot **A**daptive **P**seudo-**L**ab**E**ling, namely **MAPLE**, a novel influence-based many-shot ICL framework that utilizes pseudo-labeled samples to compensate for the lack of label information. |
Zihan Chen; Song Wang; Zhen Tan; Jundong Li; Cong Shen; | code |
| 377 | Does One-shot Give The Best Shot? Mitigating Model Inconsistency in One-shot Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a novel OFL framework FAFI that enhances the one-shot training on the client side to essentially overcome inferior local uploading. |
Hui Zeng; Wenke Huang; Tongqing Zhou; Xinyi Wu; Guancheng Wan; Yingwen Chen; Zhiping Cai; | code |
| 378 | Generalized Category Discovery Via Reciprocal Learning and Class-Wise Distribution Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent parametric-based methods suffer from inferior base discrimination due to unreliable self-supervision. To address this issue, we propose a Reciprocal Learning Framework (RLF) that introduces an auxiliary branch devoted to base classification. |
Duo Liu; Zhiquan Tan; Linglan Zhao; Zhongqiang Zhang; Xiangzhong Fang; Weiran Huang; | code |
| 379 | Beyond One-Hot Labels: Semantic Mixing for Model Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce calibration-aware data augmentation to create synthetic datasets of diverse samples and their ground-truth uncertainty. |
Haoyang Luo; Linwei Tao; Minjing Dong; Chang Xu; | code |
| 380 | Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce a novel PFL framework, called as PHN-HVVS, which decomposes the design space into Voronoi grids and deploys a genetic algorithm (GA) for Voronoi grid partitioning within high-dimensional space. |
Mengmeng Chen; Xiaohu Wu; QIQI LIU; Tiantian He; Yew-Soon Ong; Yaochu Jin; Qicheng Lao; Han Yu; | code |
| 381 | EgoPrivacy: What Your First-Person Camera Says About You? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further emphasize the privacy threats inherent to egocentric vision, we propose Retrieval-Augmented Attack, a novel attack strategy that leverages ego-to-exo retrieval from an external pool of exocentric videos to boost the effectiveness of demographic privacy attacks. |
Yijiang Li; Genpei Zhang; Jiacheng Cheng; Yi Li; Xiaojun Shan; Dashan Gao; Jiancheng Lyu; Yuan Li; Ning Bi; Nuno Vasconcelos; | code |
| 382 | Weakly-Supervised Contrastive Learning for Imprecise Class Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of directly relying on imprecise class labels, we measure the semantic similarity between example pairs, which quantifies how closely they belong to the same category by iteratively refining weak supervisory signals. Based on this concept, we propose a graph-theoretic framework for weakly-supervised contrastive learning, where semantic similarity serves as the graph weights. |
Zi-Hao Zhou; Jun-Jie Wang; Tong Wei; Min-Ling Zhang; | code |
| 383 | TtBA: Two-third Bridge Approach for Decision-Based Adversarial Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel normal-vector-based method called Two-third Bridge Attack (TtBA). |
Feiyang Wang; Xingquan Zuo; Hai Huang; Gang Chen; | code |
| 384 | CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we evaluate the vulnerability of GNNs to MEAs and explore their potential for cost-effective model acquisition in non-adversarial research settings. |
Zebin Wang; Menghan Lin; Bolin Shen; Ken Anderson; Molei Liu; Tianxi Cai; Yushun Dong; | code |
| 385 | Beyond Zero Initialization: Investigating The Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the impact of non-zero initialization on LoRA’s fine-tuning dynamics from an infinite-width perspective. |
Shiwei Li; Xiandi Luo; Xing Tang; Haozhao Wang; Hao Chen; weihongluo; Yuhua Li; xiuqiang He; Ruixuan Li; | code |
| 386 | The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the training efficiency of federated learning (FL), previous research has employed low-rank decomposition techniques to reduce communication overhead. In this paper, we seek to enhance the performance of these low-rank decomposition methods. |
Shiwei Li; Xiandi Luo; Haozhao Wang; Xing Tang; Shijie Xu; weihongluo; Yuhua Li; xiuqiang He; Ruixuan Li; | code |
| 387 | MATS: An Audio Language Model Under Text-only Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **MATS**, an audio-language multimodal LLM designed to handle **M**ultiple **A**udio task using solely **T**ext-only **S**upervision. |
Wen Wang; RuiBing Hou; Hong Chang; Shiguang Shan; Xilin Chen; | code |
| 388 | Splitting & Integrating: Out-of-Distribution Detection Via Adversarial Gradient Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel OOD detection method called \textbf{S \& I} based on layer \textbf{S}plitting and gradient \textbf{I}ntegration via Adversarial Gradient Attribution. |
Jiayu Zhang; Xinyi Wang; Zhibo Jin; Zhiyu Zhu; Jianlong Zhou; Fang Chen; Huaming Chen; | code |
| 389 | Towards An Explainable Comparison and Alignment of Feature Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Spectral Pairwise Embedding Comparison (SPEC) framework to compare embeddings and identify their differences in clustering a reference dataset.Furthermore, we introduce an optimization problem using this framework to align two embeddings, ensuring that clusters identified in one embedding are also captured in the other model. |
Mohammad Jalali; Bahar Dibaei Nia; Farzan Farnia; | code |
| 390 | Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations. |
Haoyuan Cai; Zhenghao Peng; Bolei Zhou; | code |
| 391 | PatchPilot: A Cost-Efficient Software Engineering Agent with Early Attempts on Formal Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PatchPilot, an agentic patcher that strikes a balance between patching efficacy, stability, and cost-efficiency. |
Hongwei Li; Yuheng Tang; Shiqi Wang; Wenbo Guo; | code |
| 392 | ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel, fully parallelizable doubly-stochastic attention mechanism based on sliced optimal transport, leveraging Expected Sliced Transport Plans (ESP). |
Ashkan Shahbazi; Elaheh Akbari; Darian Salehi; Xinran Liu; Navid NaderiAlizadeh; Soheil Kolouri; | code |
| 393 | On The Guidance of Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first framework of general guidance for flow matching. |
Ruiqi Feng; Chenglei Yu; Wenhao Deng; Peiyan Hu; Tailin Wu; | code |
| 394 | TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose \textbf{TimeDART}, a novel self-supervised time series pre-training framework that unifies two powerful generative paradigms to learn more transferable representations. |
Daoyu Wang; Mingyue Cheng; Zhiding Liu; Qi Liu; | code |
| 395 | Protein Structure Tokenization: Benchmarking and New Recipe Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to the leading model ESM3, our method achieves an average of 6.31\% performance improvement across 24 supervised tasks, with sensitivity and utilization rates increased by 12.83\% and 124.03\%, respectively. |
Xinyu Yuan; Zichen Wang; Marcus D. Collins; Huzefa Rangwala; | code |
| 396 | HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Remarkably, our *image-free* approach reduces training time by $25\\%$ compared with the previous method. |
Yushi Huang; Zining Wang; Ruihao Gong; Jing Liu; Xinjie Zhang; Jinyang Guo; Xianglong Liu; Jun Zhang; | code |
| 397 | LETS Forecast: Learning Embedology for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While deep learning has achieved major success in time series forecasting, many existing approaches do not explicitly model the dynamics. To bridge this gap, we introduce DeepEDM, a framework that integrates nonlinear dynamical systems modeling with deep neural networks. |
Abrar Majeedi; Viswanatha Reddy Gajjala; Satya Sai Srinath Namburi GNVV; Nada Magdi Elkordi; Yin Li; | code |
| 398 | When Do LLMs Help With Node Classification? A Comprehensive Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although many studies demonstrate the impressive performance of LLM-based methods, the lack of clear design guidelines may hinder their practical application. In this work, we aim to establish such guidelines through a fair and systematic comparison of these algorithms. |
Xixi Wu; Yifei Shen; Fangzhou Ge; Caihua Shan; Yizhu Jiao; Xiangguo Sun; Hong Cheng; | code |
| 399 | AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many video-to-audio (VTA) methods have been proposed for dubbing silent AI-generated videos. |
Yuqin Cao; Xiongkuo Min; Yixuan Gao; Wei Sun; Guangtao Zhai; | code |
| 400 | Stacey: Promoting Stochastic Steepest Descent Via Accelerated $\ell_p$-Smooth Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While popular optimization methods such as SGD, AdamW, and Lion depend on steepest descent updates in either $\ell_2$ or $\ell_\infty$ norms, there remains a critical gap in handling the non-Euclidean structure observed in modern deep networks training. In this work, we address this need by introducing a new accelerated $\ell_p$ steepest descent algorithm, called Stacey, which uses interpolated primal-dual iterate sequences to effectively navigate non-Euclidean smooth optimization tasks. |
Xinyu Luo; Site Bai; Bolian Li; Petros Drineas; Ruqi Zhang; Brian Bullins; | code |
| 401 | ConText: Driving In-context Learning for Text Removal and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the first study on adapting the visual in-context learning (V-ICL) paradigm to optical character recognition tasks, specifically focusing on text removal and segmentation. |
Fei Zhang; Pei Zhang; Baosong Yang; Fei Huang; Yanfeng Wang; Ya Zhang; | code |
| 402 | LIFT The Veil for The Truth: Principal Weights Emerge After Rank Reduction for Reasoning-Focused Supervised Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we state that weights with the largest magnitude after low-rank approximation are critical weights for fine-tuning, which we call *Principal Weights*. |
Zihang Liu; Tianyu Pang; Oleg Balabanov; Chaoqun Yang; Tianjin Huang; Lu Yin; Yaoqing Yang; Shiwei Liu; | code |
| 403 | Curriculum Learning for Biological Sequence Prediction: The Case of De Novo Peptide Sequencing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing NAT approaches often rely on Connectionist Temporal Classification (CTC) loss, which presents significant optimization challenges due to CTC’s complexity and increases the risk of training failures. To address these issues, we propose an improved non-autoregressive peptide sequencing model that incorporates a structured protein sequence curriculum learning strategy. |
Xiang Zhang; Jiaqi Wei; Zijie Qiu; Sheng Xu; Nanqing Dong; ZhiQiang Gao; Siqi Sun; | code |
| 404 | Distillation of Discrete Diffusion Through Dimensional Correlations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The code used in the paper is available at https://github.com/sony/di4c.In this paper, (i) we propose mixture models for discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and (ii) we provide a set of loss functions for distilling the iterations of existing models. |
Satoshi Hayakawa; Yuhta Takida; Masaaki Imaizumi; Hiromi Wakaki; Yuki Mitsufuji; | code |
| 405 | Open-Det: An Efficient Learning Framework for Open-Ended Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing OED models, such as GenerateU, require large-scale datasets for training, suffer from slow convergence, and exhibit limited performance. To address these issues, we present a novel and efficient Open-Det framework, consisting of four collaborative parts. |
Guiping Cao; Tao Wang; Wenjian Huang; Xiangyuan Lan; Jianguo Zhang; Dongmei Jiang; | code |
| 406 | FuseUNet: A Multi-Scale Feature Fusion Method for U-like Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent improvements to UNet have focused on enhancing encoder and decoder capabilities, these limitations remain overlooked. To overcome these challenges, we propose a novel multi-scale feature fusion method that reimagines the UNet decoding process as solving an initial value problem (IVP), treating skip connections as discrete nodes. |
Quansong He; Xiangde Min; Kaishen Wang; Tao He; | code |
| 407 | EmoGrowth: Incremental Multi-label Emotion Decoding with Augmented Emotional Relation Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an Augmented Emotional Semantics Learning (AESL) framework to address two critical challenges: past- and future-missing partial label problems. |
Kaicheng Fu; Changde Du; Jie Peng; Kunpeng Wang; Shuangchen Zhao; Xiaoyu Chen; Huiguang He; | code |
| 408 | Concentration Distribution Learning from Label Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, it’s impossible to obtain the total description degree of hidden labels that not in the label space, which leads to the loss of information and confusion in instances. To solve the above problem, we come up with a new concept named background concentration to serve as the absolute description degree term of the label distribution and introduce it into the LDL process, forming the improved paradigm of concentration distribution learning. |
Jiawei Tang; Yuheng Jia; | code |
| 409 | Distributed Conformal Prediction Via Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conformal Prediction (CP) offers a robust post-hoc calibration framework, providing distribution-free statistical coverage guarantees for prediction sets by leveraging held-out datasets. In this work, we address a decentralized setting where each device has limited calibration data and can communicate only with its neighbors over an arbitrary graph topology. |
Haifeng Wen; Hong Xing; Osvaldo Simeone; | code |
| 410 | Selective Prompt Anchoring for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that this attention dilution issue is an important reason for code generation errors. To mitigate this issue, we propose ***S**elective **P**rompt **A**nchoring* (SPA) to guide code LLMs to pay more attention to user intent when generating code. |
Yuan Tian; Tianyi Zhang; | code |
| 411 | Learning The RoPEs: Better 2D and 3D Position Encodings with STRING Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $\textbf{STRING}$: Separable Translationally Invariant Position Encodings. |
Connor Schenck; Isaac Reid; Mithun George Jacob; Alex Bewley; Joshua Ainslie; David Rendleman; Deepali Jain; Mohit Sharma; Kumar Avinava Dubey; Ayzaan Wahid; Sumeet Singh; René Wagner; Tianli Ding; Chuyuan Fu; Arunkumar Byravan; Jake Varley; Alexey A. Gritsenko; Matthias Minderer; Dmitry Kalashnikov; Jonathan Tompson; Vikas Sindhwani; Krzysztof Marcin Choromanski; | code |
| 412 | Active Learning for Efficient Discovery of Optimal Combinatorial Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce NAIAD, an active learning framework that efficiently discovers optimal gene pairs by leveraging single-gene perturbation effects and adaptive gene embeddings that scale with the training data size, mitigating overfitting in small-sample learning while capturing complex gene interactions as more data is collected. |
Jason Qin; Hans-Hermann Wessels; Carlos Fernandez-Granda; Yuhan Hao; | code |
| 413 | Aligning LLMs By Predicting Preferences from User Writing Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces PROSE, a method designed to enhance the precision of preference descriptions inferred from user writing samples. |
Stéphane Aroca-Ouellette; Natalie Mackraz; Barry-John Theobald; Katherine Metcalf; | code |
| 414 | Categorical Schrödinger Bridge Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a theoretical and algorithmic foundation for solving SB in discrete spaces using the recently introduced Iterative Markovian Fitting (IMF) procedure. |
Grigoriy Ksenofontov; Alexander Korotin; | code |
| 415 | From Thousands to Billions: 3D Visual Language Grounding Via Render-Supervised Distillation from 2D VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 3D vision-language grounding faces a fundamental data bottleneck: while 2D models train on billions of images, 3D models have access to only thousands of labeled scenes–a six-order-of-magnitude gap that severely limits performance. We introduce \textbf{\emph{LIFT-GS}}, a practical distillation technique that overcomes this limitation by using differentiable rendering to bridge 3D and 2D supervision. |
Ang Cao; Sergio Arnaud; Oleksandr Maksymets; Jianing Yang; Ayush Jain; Ada Martin; Vincent-Pierre Berges; Paul McVay; Ruslan Partsey; Aravind Rajeswaran; Franziska Meier; Justin Johnson; Jeong Joon Park; Alexander Sax; | code |
| 416 | SADA: Stability-guided Adaptive Diffusion Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **Stability-guided Adaptive Diffusion Acceleration (SADA)**, a novel paradigm that unifies step-wise and token-wise sparsity decisions via a single stability criterion to accelerate sampling of ODE-based generative models (Diffusion and Flow-matching). |
Ting Jiang; Yixiao Wang; Hancheng Ye; Zishan Shao; Jingwei Sun; Jingyang Zhang; Zekai Chen; Jianyi Zhang; Yiran Chen; Hai Li; | code |
| 417 | Haste Makes Waste: A Simple Approach for Scaling Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a comprehensive analysis of their staleness and inferior performance on large-scale problems. |
Rui Xue; Tong Zhao; Neil Shah; Xiaorui Liu; | code |
| 418 | MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reframe masked reconstruction as a sparse signal reconstruction problem and theoretically prove that the dual form of complementary masks possesses superior capabilities in extracting domain-agnostic image features. |
Jiawen Wang; Yinda Chen; Xiaoyu Liu; Che Liu; Dong Liu; Jianqing Gao; Zhiwei Xiong; | code |
| 419 | Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a surrogate function for the mean squared error (MSE) of the estimator, which facilitates the use of classical graph cut algorithms to learn the optimal design. |
Jin Zhu; Jingyi Li; Hongyi Zhou; Yinan Lin; Zhenhua Lin; Chengchun Shi; | code |
| 420 | GuidedQuant: Large Language Model Quantization Via Exploiting End Loss Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods either (1) fail to account for the varying importance of hidden features to the end loss or, when incorporating end loss, (2) neglect the critical interactions between model weights. To address these limitations, we propose GuidedQuant, a novel quantization approach that integrates gradient information from the end loss into the quantization objective while preserving cross-weight dependencies within output channels. |
Jinuk Kim; Marwa El Halabi; Wonpyo Park; Clemens JS Schaefer; Deokjae Lee; Yeonhong Park; Jae W. Lee; Hyun Oh Song; | code |
| 421 | UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing benchmarks often fall short in evaluating LLMs’ abilities on the breadth and depth of undergraduate-level physics, underscoring the need for a comprehensive evaluation. To fill this gap, we introduce UGPhysics, a large-scale and diverse benchmark specifically designed to evaluate **U**nder**G**raduate-level **Physics** (**UGPhysics**) reasoning with LLMs. |
Xin Xu; Qiyun Xu; Tong Xiao; Tianhao Chen; Yuchen Yan; Jiaxin ZHANG; Shizhe Diao; Can Yang; Yang Wang; | code |
| 422 | MoE-SVD: Structured Mixture-of-Experts LLMs Compression Via Singular Value Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present MoE-SVD, a new decomposition-based compression framework tailored for MoE LLMs without any extra training. |
Wei Li; Lujun Li; Hao Gu; You-Liang Huang; Mark G. Lee; Shengjie Sun; Wei Xue; Yike Guo; | code |
| 423 | From Debate to Equilibrium: Belief‑Driven Multi‑Agent LLM Reasoning Via Bayesian Nash Equilibrium Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Efficient Coordination via Nash Equilibrium (ECON), a hierarchical reinforcement-learning paradigm that marries distributed reasoning with centralized final output. |
Xie Yi; Zhanke Zhou; Chentao Cao; Qiyu Niu; Tongliang Liu; Bo Han; | code |
| 424 | KIND: Knowledge Integration and Diversion for Training Decomposable Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional pre-trained models often face deployment challenges due to their fixed sizes, and are prone to negative transfer when discrepancies arise between training tasks and target tasks. To address this, we propose **KIND**, a novel pre-training method designed to construct decomposable models. |
Yucheng Xie; Fu Feng; Ruixiao Shi; Jing Wang; Yong Rui; Xin Geng; | code |
| 425 | CostFilter-AD: Enhancing Anomaly Detection Through Matching Cost Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Often, such a matching process is inaccurate yet overlooked, leading to sub-optimal detection. To address this issue, we introduce the concept of cost filtering, borrowed from classical matching tasks, such as depth and flow estimation, into the UAD problem. |
Zhe Zhang; Mingxiu Cai; Hanxiao Wang; Gaochang Wu; Tianyou Chai; Xiatian Zhu; | code |
| 426 | AnalogGenie-Lite: Enhancing Scalability and Precision in Circuit Topology Discovery Through Lightweight Graph Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes AnalogGenie-Lite, a decoder-only transformer that discovers novel analog IC topologies with significantly enhanced scalability and precision via lightweight graph modeling. |
Jian Gao; Weidong Cao; Xuan Zhang; | code |
| 427 | Balancing Model Efficiency and Performance: Adaptive Pruner for Long-tailed Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel adaptive pruning strategy, LTAP (Long-Tailed Adaptive Pruner), aimed at balancing model efficiency and performance to better address the challenges posed by long-tailed data distributions. |
Zhe Zhao; HaiBin Wen; Pengkun Wang; ShuangWang; Zhenkun Wang; Qingfu Zhang; Yang Wang; | code |
| 428 | TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current evaluation metrics fail to assess this capability, as they inadvertently cancel out opposing feature contributions. To address this limitation, we propose novel evaluation metrics—Cumulative Prediction Difference (CPD) and Cumulative Prediction Preservation (CPP)—to systematically assess whether attribution methods accurately identify significant positive and negative points in time series XAI. |
Hyeongwon Jang; Changhun Kim; Eunho Yang; | code |
| 429 | Continuous Visual Autoregressive Generation Via Score Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When applied to continuous modalities such as visual data, Visual AutoRegressive modeling (VAR) typically resorts to quantization-based approaches to cast the data into a discrete space, which can introduce significant information loss. To tackle this issue, we introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization. |
Chenze Shao; Fandong Meng; Jie Zhou; | code |
| 430 | How Effective Can Dropout Be in Multiple Instance Learning ? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we empirically explore how effective the dropout can be in MIL. |
Wenhui Zhu; Peijie Qiu; Xiwen Chen; Zhangsihao Yang; Aristeidis Sotiras; Abolfazl Razi; Yalin Wang; | code |
| 431 | HyperNear: Unnoticeable Node Injection Attacks on Hypergraph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through empirical analysis, we develop a relatively unnoticeable attack approach by monitoring changes in homophily and leveraging this self-regulating property to enhance stealth. Building on these insights, we introduce HyperNear, i.e., $\underline{N}$ode inj$\underline{E}$ction $\underline{A}$ttacks on hype$\underline{R}$graph neural networks, the first node injection attack framework specifically tailored for HNNs. |
Tingyi Cai; Yunliang Jiang; Ming Li; Lu Bai; Changqin Huang; Yi Wang; | code |
| 432 | Subspace Optimization for Large Language Models with Convergence Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we reveal that GaLore does not always converge to the optimal solution and provide an explicit counterexample to support this finding. |
Yutong He; Pengrui Li; Yipeng Hu; Chuyan Chen; Kun Yuan; | code |
| 433 | Physics-informed Temporal Alignment for Auto-regressive PDE Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The challenge becomes particularly evident for out-of-distribution data, as the pretraining performance may approach random model initialization for downstream tasks with long-term dynamics. To deal with this problem, we propose physics-informed temporal alignment (PITA), a self-supervised learning framework inspired by inverse problem solving. |
Congcong Zhu; Xiaoyan Xu; Jiayue Han; Jingrun Chen; | code |
| 434 | Fast Large Language Model Collaborative Decoding Via Speculation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce **Collaborative decoding via Speculation (CoS)**, a novel framework that accelerates collaborative decoding without compromising performance. |
Jiale Fu; Yuchu Jiang; Junkai Chen; Jiaming Fan; Xin Geng; Xu Yang; | code |
| 435 | Training Diffusion-based Generative Models with Limited Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel theoretical insight for diffusion models that two factors, i.e., the denoiser function hypothesis space and the number of training samples, can affect the denoising score matching error of all training samples. |
Zhaoyu Zhang; Yang Hua; Guanxiong Sun; Hui Wang; Seán McLoone; | code |
| 436 | The Four Color Theorem for Cell Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel cell instance segmentation method inspired by the four-color theorem. |
Ye Zhang; Yu Zhou; Yifeng Wang; Jun Xiao; Ziyue Wang; Yongbing Zhang; Jianxu Chen; | code |
| 437 | OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This data scarcity also contributes to the generalization difficulties experienced by learning-based methods. To address these challenges, we propose a scalable framework for synthesizing a high-quality dataset, named OptMATH. |
Hongliang Lu; Zhonglin Xie; Yaoyu Wu; Can Ren; Yuxuan Chen; Zaiwen Wen; | code |
| 438 | You Always Recognize Me (YARM): Robust Texture Synthesis Against Multi-View Corruption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the use of warning colors and camouflage in the real world, we propose designing a robust appearance that can enhance model recognition of low-quality image data. |
Weihang Ran; Wei Yuan; Yinqiang Zheng; | code |
| 439 | SpikF: Spiking Fourier Network for Efficient Long-term Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their application in long-term prediction tasks remains underexplored, which is primarily due to two critical challenges: (1) current SNN encoding methods are unable to effectively encode long temporal information, leading to increased computational complexity and energy consumption; (2) though Transformer-based models have achieved state-of-the-art accuracy in temporal prediction tasks, the absence of proper positional encoding for spiking self-attention restricts Spiking Transformer from effectively utilizing positional information, resulting in performance degradation. To address these challenges, we introduce an attention-free framework, **Spik**ing **F**ourier Network (**SpikF**), that encodes input sequences in patches and employs an innovative frequency domain selection mechanism to effectively utilize the sequential properties of time-series data. |
Wenjie Wu; Dexuan Huo; Hong Chen; | code |
| 440 | Stray Intrusive Outliers-Based Feature Selection on Intra-Class Asymmetric Instance Distribution or Multiple High-Density Clusters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a supervised FS method, Stray Intrusive Outliers-based FS (SIOFS), for data classification with intra-class ADMHC. |
Lixin Yuan; Yirui Wu; Wenxiao Zhang; Minglei Yuan; Jun Liu; | code |
| 441 | Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. |
Michael Sun; Weize Yuan; Gang Liu; Wojciech Matusik; Jie Chen; | code |
| 442 | LensLLM: Unveiling Fine-Tuning Dynamics for LLM Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: * In this work, we propose a novel theoretical framework that provides a proper lens to assess the generalization capabilities of LLMs, thereby enabling accurate and efficient LLM selection for downstream applications. |
Xinyue Zeng; Haohui Wang; Junhong Lin; Jun Wu; Tyler Cody; Dawei Zhou; | code |
| 443 | AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, several practical challenges persist, including managing intricate dependencies among features and quantifying uncertainty in predictions. This study aims to tackle these critical limitations by introducing **adapters**—feature-space transformations that facilitate the effective use of pre-trained univariate time series FMs for multivariate tasks. |
Abdelhakim Benechehab; Vasilii Feofanov; Giuseppe Paolo; Albert Thomas; Maurizio Filippone; Balázs Kégl; | code |
| 444 | Positional Encoding Meets Persistent Homology on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our insights inform the design of a novel learnable method, PiPE (Persistence-informed Positional Encoding), which is provably more expressive than both PH and PE. |
Yogesh Verma; Amauri H Souza; Vikas K Garg; | code |
| 445 | Gamma Distribution PCA-Enhanced Feature Learning for Angle-Robust SAR Target Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We validate $\Gamma$PCA model based on two commonly used backbones, ResNet and ViT, and conduct multiple robustness experiments on the MSTAR benchmark dataset. |
Chong Zhang; Peng Zhang; Mengke Li; | code |
| 446 | Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. |
Kejia Chen; Jiawen Zhang; Jiacong Hu; Yu Wang; Jian Lou; Zunlei Feng; Mingli Song; | code |
| 447 | Permutation-based Rank Test in The Presence of Discretization and Application in Causal Discovery with Mixed Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, in psychometric studies, the continuous level of certain personality dimensions of a person can only be measured after being discretized into order-preserving options such as disagree, neutral, and agree. Motivated by this, we propose Mixed data Permutation-based Rank Test (MPRT), which properly controls the statistical errors even when some or all variables are discretized. |
Xinshuai Dong; Ignavier Ng; Boyang Sun; Haoyue Dai; Guang-Yuan Hao; Shunxing Fan; Peter Spirtes; Yumou Qiu; Kun Zhang; | code |
| 448 | Instance Correlation Graph-based Naive Bayes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, none of them takes into account the correlations among instances. To fill this gap, we propose a novel algorithm called instance correlation graph-based naive Bayes (ICGNB). |
Chengyuan Li; Liangxiao Jiang; Wenjun Zhang; Liangjun Yu; Huan Zhang; | code |
| 449 | Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose an innovative yet straightforward iterative framework, termed \emph{Uncertainty-Based Extensible-Codebook Federated Learning (UEFL)}. |
Tianyi Zhang; Yu Cao; Dianbo Liu; | code |
| 450 | CoastalBench: A Decade-Long High-Resolution Dataset to Emulate Complex Coastal Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing studies often focus on relatively small datasets and simple processes. To fill this gap, we introduce a decade-long, high-resolution (<100m) coastal circulation modeling dataset on a real-world 3D mesh in southwest Florida with around 6 million cells. |
Zelin Xu; Yupu Zhang; Tingsong Xiao; Maitane Olabarrieta Lizaso; Jose M. Gonzalez-Ondina; Zibo Liu; Shigang Chen; Zhe Jiang; | code |
| 451 | Maximum Entropy Reinforcement Learning with Diffusion Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we employ the diffusion model, a powerful generative model capable of capturing complex multimodal distributions, as the policy representation to fulfill the MaxEnt RL objective, developing a method named MaxEnt RL with Diffusion Policy (MaxEntDP). |
Xiaoyi Dong; Jian Cheng; Xi Sheryl Zhang; | code |
| 452 | DLP: Dynamic Layerwise Pruning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches often rely on pre-defined values, which can result in suboptimal performance. To overcome these limitations, we propose a novel method called Dynamic Layerwise Pruning (DLP). |
Yuli Chen; Bo Cheng; Jiale Han; Yingying Zhang; Yingting Li; Shuhao Zhang; | code |
| 453 | GraphGPT: Generative Pre-trained Graph Eulerian Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce *GraphGPT*, a novel self-supervised *generative pre-trained* model for graph learning based on the *Graph Eulerian Transformer* (**GET**). |
Qifang Zhao; Weidong Ren; Tianyu Li; Hong Liu; Xingsheng He; Xiaoxiao Xu; | code |
| 454 | Robust Spatio-Temporal Centralized Interaction for OOD Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most models relying on node-to-node messaging interaction exhibit sensitivity to spatiotemporal shifts, encountering out-of-distribution (OOD) challenges. To address these issues, we introduce \textbf{\underline{S}}patio-\textbf{\underline{T}}emporal \textbf{\underline{O}}OD \textbf{\underline{P}}rocessor (STOP), which employs a centralized messaging mechanism along with a message perturbation mechanism to facilitate robust spatiotemporal interactions. |
Jiaming Ma; Binwu Wang; Pengkun Wang; Zhengyang Zhou; Xu Wang; Yang Wang; | code |
| 455 | Physics-Informed Weakly Supervised Learning For Interatomic Potentials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, machine-learned interatomic potentials (MLIPs) often struggle with generalization and robustness, leading to unphysical energy and force predictions in atomistic simulations. To address this, we propose a physics-informed, weakly supervised training framework for MLIPs. |
Makoto Takamoto; Viktor Zaverkin; Mathias Niepert; | code |
| 456 | FeatSharp: Your Vision Model Features, Sharper Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method to coherently and cheaply upsample the feature maps of low-resolution vision encoders while picking up on fine-grained details that would otherwise be lost due to resolution. |
Mike Ranzinger; Greg Heinrich; Pavlo Molchanov; Bryan Catanzaro; Andrew Tao; | code |
| 457 | Socialized Coevolution: Advancing A Better World Through Cross-Task Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by Social Learning (SL), this paper introduces a practical paradigm of Socialized Coevolution (SC). |
Xinjie Yao; Yu Wang; Pengfei Zhu; Wanyu Lin; Ruipu Zhao; Zhoupeng Guo; Weihao Li; Qinghua Hu; | code |
| 458 | SynEVO: A Neuro-inspired Spatiotemporal Evolutional Framework for Cross-domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by neuroscience theories, we theoretically derive the increased information boundary via learning cross-domain collective intelligence and propose a Synaptic EVOlutional spatiotemporal network, SynEVO, where SynEVO breaks the model independence and enables cross-domain knowledge to be shared and aggregated. |
Jiayue Liu; Zhongchao Yi; Zhengyang Zhou; Qihe Huang; Kuo Yang; Xu Wang; Yang Wang; | code |
| 459 | Reflection-Bench: Evaluating Epistemic Agency in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Correspondingly, we propose Reflection-Bench, a cognitive-psychology-inspired benchmark consisting of seven tasks with long-term relevance and minimization of data leakage. |
Lingyu Li; Yixu Wang; Haiquan Zhao; Shuqi Kong; Yan Teng; Chunbo Li; Yingchun Wang; | code |
| 460 | QT-DoG: Quantization-Aware Training for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Quantization-aware Training for Domain Generalization (QT-DoG) and demonstrate that weight quantization effectively leads to flatter minima in the loss landscape, thereby enhancing domain generalization. |
Saqib Javed; Hieu Le; Mathieu Salzmann; | code |
| 461 | Identifying Neural Dynamics Using Interventional State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose interventional state-space models (iSSM), a class of causal models that can predict neural responses to novel perturbations. |
Amin Nejatbakhsh; Yixin Wang; | code |
| 462 | HyperIMTS: Hypergraph Neural Network for Irregular Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To represent and learn both dependencies from original observations in a unified form, we propose HyperIMTS, a **Hyper**graph neural network for **I**rregular **M**ultivariate **T**ime **S**eries forecasting. |
Boyuan Li; Yicheng Luo; Zhen Liu; Junhao Zheng; Jianming Lv; Qianli Ma; | code |
| 463 | L-Diffusion: Laplace Diffusion for Efficient Pathology Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the Laplace Diffusion Model, referred to as L-Diffusion, an innovative framework tailored for efficient pathology image segmentation. |
Weihan Li; Linyun Zhou; YangJian; Shengxuming Zhang; Xiangtong Du; Xiuming Zhang; Jing Zhang; ChaoqingXu; Mingli Song; Zunlei Feng; | code |
| 464 | Taming Diffusion for Dataset Distillation with High Representativeness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically investigate issues present in current diffusion-based dataset distillation methods, including inaccurate distribution matching, distribution deviation with random noise, and separate sampling. Building on this, we propose D$^3$HR, a novel diffusion-based framework to generate distilled datasets with high representativeness. |
Lin Zhao; Yushu Wu; Xinru Jiang; Jianyang Gu; Yanzhi Wang; Xiaolin Xu; Pu Zhao; Xue Lin; | code |
| 465 | Are LLMs Prescient? A Continuous Evaluation Using Daily News As The Oracle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These benchmarks also fall short in assessing how LLM performance changes over time, as they consist of a static set of questions without a temporal dimension. To address these limitations, we propose using future event prediction as a continuous evaluation method to assess LLMs’ temporal generalization and forecasting abilities. |
Hui Dai; Ryan Teehan; Mengye Ren; | code |
| 466 | DiLQR: Differentiable Iterative Linear Quadratic Regulator Via Implicit Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces DiLQR, a framework that facilitates differentiation through iLQR, allowing it to serve as a trainable and differentiable module, either as or within a neural network. |
Shuyuan Wang; Philip D Loewen; Michael Forbes; Bhushan Gopaluni; Wei Pan; | code |
| 467 | Rethinking Chain-of-Thought from The Perspective of Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. |
Zongqian Wu; Baoduo Xu; Ruochen Cui; Mengmeng Zhan; Xiaofeng Zhu; Lei Feng; | code |
| 468 | Efficient Logit-based Knowledge Distillation of Deep Spiking Neural Networks for Full-Range Timestep Deployment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this, SNNs often suffer from accuracy degradation compared to ANNs and face deployment challenges due to fixed inference timesteps, which require retraining for adjustments, limiting operational flexibility. To address these issues, our work considers the spatio-temporal property inherent in SNNs, and proposes a novel distillation framework for deep SNNs that optimizes performance across full-range timesteps without specific retraining, enhancing both efficacy and deployment adaptability. |
Chengting Yu; Xiaochen Zhao; Lei Liu; Shu Yang; Gaoang Wang; Erping Li; Aili Wang; | code |
| 469 | One Arrow, Two Hawks: Sharpness-aware Minimization for Federated Learning Via Global Model Trajectory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most SAM-based methods do not directly consider the global objective and require two backward pass per iteration, resulting in diminished effectiveness. To overcome these two bottlenecks, we leverage the global model trajectory to directly measure sharpness for the global objective, requiring only a single backward pass. |
Yuhang Li; Tong Liu; Yangguang Cui; Ming Hu; Xiaoqiang Li; | code |
| 470 | UltraTWD: Optimizing Ultrametric Trees for Tree-Wasserstein Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address it, we introduce UltraTWD, a novel unsupervised framework that simultaneously optimizes both ultrametric tree structures and edge weights to more faithfully approximate the cost matrix. |
Fangchen Yu; Yanzhen Chen; Jiaxing Wei; Jianfeng Mao; Wenye Li; Qiang Sun; | code |
| 471 | LLM-Assisted Semantically Diverse Teammate Generation for Efficient Multi-agent Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such teammates lack semantic information, resulting in inefficient teammate generation and poor adaptability of the agents. To tackle these challenges, we propose Semantically Diverse Teammate Generation (SemDiv), a novel framework leveraging the capabilities of large language models (LLMs) to discover and learn diverse coordination behaviors at the semantic level. |
Lihe Li; Lei Yuan; Pengsen Liu; Tao Jiang; Yang Yu; | code |
| 472 | IMTS Is Worth Time $\times$ Channel Patches: Visual Masked Autoencoders for Irregular Multivariate Time Series Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While pre-trained foundation models show potential for addressing these challenges, they are typically designed for Regularly Sampled Time Series (RTS). Motivated by the visual Mask AutoEncoder’s (MAE) powerful capability for modeling sparse multi-channel information and its success in RTS forecasting, we propose **VIMTS**, a framework adapting **V**isual MAE for **IMTS** forecasting. |
Zhangyi Hu; Jiemin Wu; Hua XU; Mingqian Liao; Ninghui Feng; Bo Gao; Songning Lai; Yutao Yue; | code |
| 473 | High Dynamic Range Novel View Synthesis with Single Exposure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While effective, this multiple-exposure HDR-NVS approach has significant limitations, including susceptibility to motion artifacts (e.g., ghosting and blurring), high capture and storage costs. To overcome these challenges, we introduce, for the first time, the single-exposure HDR-NVS problem, where only single exposure LDR images are available during training. |
Kaixuan Zhang; HuWang; Minxian Li; Mingwu Ren; Mao Ye; Xiatian Zhu; | code |
| 474 | Sample Efficient Demonstration Selection for In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate the exemplar selection task as a top-m best arms identification problem.We release our code and data (https://github.com/kiranpurohit/CASE). |
Kiran Purohit; Venktesh V; Sourangshu Bhattacharya; Avishek Anand; | code |
| 475 | Do Not Mimic My Voice : Speaker Identity Unlearning for Zero-Shot Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the new challenge of speaker identity unlearning for ZS-TTS systems. |
Taesoo Kim; Jinju Kim; Dong Chan Kim; Jong Hwan Ko; Gyeong-Moon Park; | code |
| 476 | BSO: Binary Spiking Online Optimization Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly reduces training memory. |
Yu Liang; Yu Yang; Wenjie Wei; Ammar Belatreche; Shuai Wang; Malu Zhang; Yang Yang; | code |
| 477 | BECAME: Bayesian Continual Learning with Adaptive Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To validate our approach, we introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging. |
Mei Li; Yuxiang Lu; Qinyan Dai; Suizhi Huang; Yue Ding; Hongtao Lu; | code |
| 478 | PolyConf: Unlocking Polymer Conformation Generation Through Hierarchical Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PolyConf, a pioneering tailored polymer conformation generation method that leverages hierarchical generative models to unlock new possibilities.Moreover, we develop the first benchmark with a high-quality polymer conformation dataset derived from molecular dynamics simulations to boost related research in this area. |
Fanmeng Wang; Wentao Guo; Qi Ou; Hongshuai Wang; Haitao Lin; Hongteng Xu; Zhifeng Gao; | code |
| 479 | LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. |
Yuanhe Zhang; Fanghui Liu; Yudong Chen; | code |
| 480 | ELoRA: Low-Rank Adaptation for Equivariant GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ELoRA (Equivariant Low-Rank Adaptation), a novel fine-tuning method designed specifically for SO(3) equivariant Graph Neural Networks (GNNs), the backbones in multiple pre-trained interatomic potentials. |
Chen Wang; Siyu Hu; Guangming Tan; Weile Jia; | code |
| 481 | HybridGS: High-Efficiency Gaussian Splatting Data Compression Using Dual-Channel Sparse Representation and Point Cloud Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new 3DGS compression framework called HybridGS, which takes advantage of both compact generation and standardized point cloud data encoding. |
Qi Yang; Le Yang; Geert Van der Auwera; Zhu Li; | code |
| 482 | DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches often suffer from limited accuracy due to the low representation ability of the feature in motion supervision, as well as inefficiencies caused by the large search space required for point tracking. To address these limitations, we present DragLoRA, a novel framework that integrates LoRA (Low-Rank Adaptation) adapters into the drag-based editing pipeline. |
Siwei Xia; Li Sun; Tiantian Sun; Qingli Li; | code |
| 483 | MUDDFormer: Breaking Residual Bottlenecks in Transformers Via Multiway Dynamic Dense Connections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MUltiway Dynamic Dense (MUDD) connections, a simple yet effective method to address the limitations of residual connections and enhance cross-layer information flow in Transformers. |
Da Xiao; Qingye Meng; Shengping Li; Xingyuan Yuan; | code |
| 484 | Knowledge Swapping Via Learning and Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Knowledge Swapping, a novel task designed to selectively regulate knowledge of a pretrained model by enabling the forgetting of user-specified information, retaining essential knowledge, and acquiring new knowledge simultaneously. |
Mingyu Xing; Lechao Cheng; Shengeng Tang; Yaxiong Wang; Zhun Zhong; Meng Wang; | code |
| 485 | MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce three key strategies for PEFT in agent tasks: 1) Inspired by the increasingly dominant \textit{Reason+Action} paradigm, we first decompose the capabilities necessary for the agent tasks into three distinct roles: reasoner, executor, and summarizer. |
Jing Han; Binwei Yan; Tianyu Guo; Zheyuan Bai; Mengyu Zheng; Hanting Chen; Ying Nie; | code |
| 486 | Instruct2See: Learning to Remove Any Obstructions Across Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods address occlusions from specific elements like fences or raindrops, but are constrained by the wide range of real-world obstructions, making comprehensive data collection impractical. To overcome these challenges, we propose Instruct2See, a novel zero-shot framework capable of handling both seen and unseen obstacles. |
Junhang Li; Yu Guo; Chuhua XIAN; Shengfeng He; | code |
| 487 | Geometric Feature Embedding for Effective 3D Few-Shot Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing 3D FSCIL approaches primarily utilize multimodal pre-trained models to extract the semantic features, heavily dependent on meticulously designed high-quality prompts and fine-tuning strategies. To reduce this dependence, this paper proposes a novel method for **3D** **F**SCI**L** with **E**mbedded **G**eometric features (**3D-FLEG**). |
Xiangqi Li; Libo Huang; Zhulin An; Weilun Feng; Chuanguang Yang; Boyu Diao; Fei Wang; Yongjun Xu; | code |
| 488 | Channel Normalization for Time Series Channel Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we highlight the importance of CID and propose Channel Normalization (CN), a simple yet effective normalization strategy that enhances CID by assigning distinct affine transformation parameters to each channel. |
Seunghan Lee; Taeyoung Park; Kibok Lee; | code |
| 489 | Improving Consistency Models with Generator-Augmented Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The related estimation error induces a discrepancy between consistency distillation and training that, we show, still holds in the continuous-time limit. To alleviate this issue, we propose a novel flow that transports noisy data towards their corresponding outputs derived from a consistency model. |
Thibaut Issenhuth; Sangchul Lee; Ludovic Dos Santos; Jean-Yves Franceschi; Chansoo Kim; Alain Rakotomamonjy; | code |
| 490 | Improving Out-of-Distribution Detection Via Dynamic Covariance Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that the influence of ill-distributed samples can be corrected by dynamically adjusting the prior geometry in response to new data. |
Kaiyu Guo; Zijian Wang; Tan Pan; Brian C. Lovell; Mahsa Baktashmotlagh; | code |
| 491 | UnMORE: Unsupervised Multi-Object Segmentation Via Center-Boundary Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce unMORE, a novel two-stage pipeline designed to identify many complex objects in real-world images. |
Yafei YANG; Zihui Zhang; Bo Yang; | code |
| 492 | Policy Design for Two-sided Platforms with Participation Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper thus studies the dynamics and recommender policy design on two-sided platforms under the population effects for the first time. |
Haruka Kiyohara; Fan Yao; Sarah Dean; | code |
| 493 | Generalization Performance of Ensemble Clustering: From Theory to Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines the generalization performance of ensemble clustering, focusing on generalization error, excess risk and consistency. |
Xu Zhang; Haoye Qiu; Weixuan Liang; Hui LIU; Junhui Hou; Yuheng Jia; | code |
| 494 | Learning Input Encodings for Kernel-Optimal Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first formulate the optimal kernel that minimizes pointwise expected squared error, then demonstrate that the Neural Tangent Kernel of the composed function (INR with input encoding) can approximate any positive semidefinite dot-product kernels through input feature mapping adjustments. Building upon these insights, we propose a Kernel Alignment Regularizer (KAR) that naturally integrates with existing INR systems to enhance kernel alignment. |
Zhemin Li; Liyuan Ma; Hongxia Wang; Yaoyun Zeng; Xiaolong Han; | code |
| 495 | Learning Adaptive Lighting Via Channel-Aware Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we identify shared fundamental properties across these tasks: i) different color channels have different light properties, and ii) the channel differences reflected in the spatial and frequency domains are different. Leveraging these insights, we introduce the channel-aware Learning Adaptive Lighting Network (LALNet), a multi-task framework designed to handle multiple light-related tasks efficiently. |
Qirui Yang; Peng-Tao Jiang; Hao Zhang; Jinwei Chen; Bo Li; Huanjing Yue; Jingyu Yang; | code |
| 496 | PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PDE-Transformer, an improved transformer-based architecture for surrogate modeling of physics simulations on regular grids. |
Benjamin Holzschuh; Qiang Liu; Georg Kohl; Nils Thuerey; | code |
| 497 | Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Video-Enhanced Offline RL (VeoRL), a model-based method that constructs an interactive world model from diverse, unlabeled video data readily available online. |
Minting Pan; Yitao Zheng; Jiajian Li; Yunbo Wang; Xiaokang Yang; | code |
| 498 | I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ThinkDiff, a novel alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities by integrating the strengths of vision-language models (VLMs). |
Zhenxing Mi; Kuan-Chieh Wang; Guocheng Qian; Hanrong Ye; Runtao Liu; Sergey Tulyakov; Kfir Aberman; Dan Xu; | code |
| 499 | Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This raises a question: “Does attention fail for graphs in natural language settings?” Motivated by these observations, we embarked on an empirical study from the perspective of attention mechanisms to explore how LLMs process graph-structured data. |
Zhong Guan; Likang Wu; Hongke Zhao; Ming He; Jianping Fan; | code |
| 500 | $\texttt{I$^2$MoE}$: Interpretable Multimodal Interaction-aware Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose $\texttt{I$^2$MoE}$ ($\underline{I}$nterpretable Multimodal $\underline{I}$nteraction-aware $\underline{M}$ixture-$\underline{o}$f-$\underline{E}$xperts), an end-to-end MoE framework designed to enhance modality fusion by explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level. |
Jiayi Xin; Sukwon Yun; Jie Peng; Inyoung Choi; Jenna L. Ballard; Tianlong Chen; Qi Long; | code |
| 501 | FlexControl: Computation-Aware Conditional Control with Differentiable Router for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, most implementations (e.g., ControlNet) rely on ad-hoc heuristics to choose which network blocks to control — an approach that varies unpredictably with different tasks. To address this gap, we propose FlexControl, a novel framework that equips all diffusion blocks with control signals during training and employs a trainable gating mechanism to dynamically select which control signal to activate at each denoising step. |
Zheng Fang; Lichuan Xiang; Xu Cai; Kaicheng Zhou; Hongkai Wen; | code |
| 502 | Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. |
Kaiwen Tang; Zhanglu Yan; Weng-Fai Wong; | code |
| 503 | WMarkGPT: Watermarked Image Understanding Via Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a meticulously designed three-stage learning pipeline to progressively equip WMarkGPT with the necessary abilities. |
Songbai Tan; Xuerui Qiu; Yao Shu; Gang Xu; Linrui Xu; Xiangyu Xu; Huiping Zhuang; Ming Li; Fei Yu; | code |
| 504 | Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Reaction Graph (RG), a unified graph representation that encapsulates the 3D molecular structures within chemical reactions. |
Yingzhao Jian; Yue Zhang; Ying Wei; Hehe Fan; Yi Yang; | code |
| 505 | Few-Shot Learner Generalizes Across AI-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, collecting adequate training data from online generative models is often expensive or infeasible. To overcome these issues, we propose Few-Shot Detector (FSD), a novel AI-generated image detector which learns a specialized metric space for effectively distinguishing unseen fake images using very few samples. |
Shiyu Wu; Jing Liu; Jing Li; Yequan Wang; | code |
| 506 | How Do Images Align and Complement LiDAR? Towards A Harmonized Multi-modal 3D Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these approaches have shown promising results, they still face challenges, such as misalignment during data augmentation and the reliance on post-processing steps. To address these issues, we propose **I**mage-**A**ssists-**L**iDAR (**IAL**), a novel multi-modal 3D panoptic segmentation framework. |
Yining Pan; Qiongjie Cui; Xulei Yang; Na Zhao; | code |
| 507 | MixBridge: Heterogeneous Image-to-Image Backdoor Attack Through Mixture of Schrödinger Bridges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing backdoor formulations mainly address single-attack scenarios and are limited to Gaussian noise input models. To fill this gap, we propose MixBridge, a novel diffusion Schrödinger bridge (DSB) framework to cater to arbitrary input distributions (taking I2I tasks as special cases). |
Shixi Qin; Zhiyong Yang; Shilong Bao; Shi Wang; Qianqian Xu; Qingming Huang; | code |
| 508 | Right Time to Learn: Promoting Generalization Via Bio-inspired Spacing Effect in Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose an easy-to-use and compatible strategy named Spaced KD to improve the effectiveness of both online KD and self KD, in which the student model distills knowledge from a teacher model trained with a space interval ahead. |
Guanglong Sun; Hongwei Yan; Liyuan Wang; Qian Li; Bo Lei; Yi Zhong; | code |
| 509 | Cut Out and Replay: A Simple Yet Versatile Strategy for Multi-Label Online Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It not only enables models to simultaneously address catastrophic forgetting, missing labels, and class imbalance challenges, but also serves as an orthogonal solution that seamlessly integrates with existing approaches. |
Xinrui Wang; Shao-Yuan Li; Jiaqiang Zhang; Songcan Chen; | code |
| 510 | UniMate: A Unified Model for Mechanical Metamaterial Generation, Property Prediction, and Condition Confirmation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose a unified model named UniMate, which consists of a modality alignment module and a synergetic diffusion generation module. |
Wangzhi Zhan; Jianpeng Chen; Dongqi Fu; Dawei Zhou; | code |
| 511 | CFPT: Empowering Time Series Forecasting Through Cross-Frequency Interaction and Periodic-Aware Timestamp Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Long-term time series forecasting has been widely studied, yet two aspects remain insufficiently explored: the interaction learning between different frequency components and the exploitation of periodic characteristics inherent in timestamps. To address the above issues, we propose **CFPT**, a novel method that empowering time series forecasting through **C**ross-**F**requency Interaction (CFI) and **P**eriodic-Aware **T**imestamp Modeling (PTM). |
Feifei Kou; Jiahao Wang; Lei Shi; Yuhan Yao; Yawen Li; Suguo Zhu; Zhongbao Zhang; Junping Du; | code |
| 512 | Information Bottleneck-guided MLPs for Robust Spatial-temporal Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the problem: *can simple neural networks such as Multi-Layer Perceptrons (MLPs) achieve robust spatial-temporal forecasting while remaining efficient? |
Min Chen; Guansong Pang; Wenjun Wang; Cheng Yan; | code |
| 513 | Better to Teach Than to Give: Domain Generalized Semantic Segmentation Via Agent Queries with Diffusion Model Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel agent \textbf{Query}-driven learning framework based on \textbf{Diff}usion model guidance for DGSS, named QueryDiff. |
Fan Li; Xuan Wang; Min Qi; Zhaoxiang Zhang; yuelei xu; | code |
| 514 | Retraining-free Merging of Sparse MoE Via Hierarchical Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the deployment of SMoE models faces constraints from extensive memory requirements of expert components in resource-limited environments. To address these limitations, this paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework for parameter reduction without retraining. |
I-Chun Chen; Hsu-Shen Liu; Wei-Fang Sun; Chen-Hao Chao; Yen-Chang Hsu; Chun-Yi Lee; | code |
| 515 | Weight Matrices Compression Based on PDB Model in Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, a novel **Population Double Bulk (PDB) model** is proposed to characterize the eigenvalue behavior of the weight matrix, which is more general than the existing Population Unit Bulk (PUB) model. |
Xiaoling Wu; Junpeng Zhu; Zeng Li; | code |
| 516 | SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SAeUron, a novel method leveraging features learned by sparse autoencoders (SAEs) to remove unwanted concepts in text-to-image diffusion models. |
Bartosz Cywiński; Kamil Deja; | code |
| 517 | Structure-informed Risk Minimization for Robust Ensemble Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by Distributionally Robust Optimization (DRO), we propose Structure-informed Risk Minimization (SRM), a principled framework that learns robust ensemble weights without access to test data. |
Fengchun Qiao; Yanlin Chen; Xi Peng; | code |
| 518 | GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing generative ZSL methods merely generate (imagine) the visual features from scratch guided by the strong class semantic vectors annotated by experts, resulting in suboptimal generative performance and limited scene generalization. To address these and advance ZSL, we propose an inductive variational autoencoder for generative zero-shot learning, dubbed GenZSL. |
Shiming Chen; Dingjie Fu; Salman Khan; Fahad Shahbaz Khan; | code |
| 519 | WGFormer: An SE(3)-Transformer Driven By Wasserstein Gradient Flows for Molecular Ground-State Conformation Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel and effective method to bridge the energy-based simulation and the learning-based strategy, which designs and learns a Wasserstein gradient flow-driven SE(3)-Transformer, called WGFormer, for ground-state conformation prediction. |
Fanmeng Wang; Minjie Cheng; Hongteng Xu; | code |
| 520 | The Emperor’s New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design a systematic and controlled pipeline along with two novel metrics—*fidelity* and *contamination resistance*—to provide a fine-grained and comprehensive assessment of existing BDC mitigation strategies. |
Yifan Sun; Han Wang; Dongbai Li; Gang Wang; Huan Zhang; | code |
| 521 | ArrayDPS: Unsupervised Blind Speech Separation with A Diffusion Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ArrayDPS to solve the BSS problem in an unsupervised, array-agnostic, and generative manner. |
Zhongweiyang Xu; Xulin Fan; Zhong-Qiu Wang; Xilin Jiang; Romit Roy Choudhury; | code |
| 522 | BoA: Attention-aware Post-training Quantization Without Backpropagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel backpropagation-free PTQ algorithm that optimizes quantized weights by considering inter-layer dependencies. |
Junhan Kim; Ho-young Kim; Eulrang Cho; Chungman Lee; Joonyoung Kim; Yongkweon Jeon; | code |
| 523 | ML$^2$-GCL: Manifold Learning Inspired Lightweight Graph Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing works follow the basic principle of pulling positive pairs closer and pushing negative pairs far away, they still suffer from several critical problems, such as the underlying semantic disturbance brought by augmentation strategies, the failure of GCN in capturing long-range dependence, rigidness and inefficiency of node sampling techniques. To address these issues, we propose Manifold Learning Inspired Lightweight Graph Contrastive Learning (ML$^2$-GCL), which inherits the merits of both manifold learning and GCN. |
Jianqing Liang; Zhiqiang Li; Xinkai Wei; Yuan Liu; Zhiqiang Wang; | code |
| 524 | Spherical-Nested Diffusion Model for Panoramic Image Outpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the fact that the majority of generative outpainting solutions operates on planar images, existing methods for panoramic images address the sphere nature by soft regularisation during the end-to-end learning, which still fails to fully exploit the spherical content. In this paper, we set out the first attempt to impose the sphere nature in the design of diffusion model, such that the panoramic format is intrinsically ensured during the learning procedure, named as spherical-nested diffusion (SpND) model. |
Xiancheng Sun; Senmao Ma; Shengxi Li; Mai Xu; Jingyuan Xia; Lai Jiang; Xin Deng; Jiali Wang; | code |
| 525 | TINED: GNNs-to-MLPs By Teacher Injection and Dirichlet Energy Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present TINED, a novel approach that distills GNNs to MLPs on a layer-by-layer basis using Teacher Injection and Dirichlet Energy Distillation techniques. |
Ziang Zhou; Zhihao Ding; Jieming Shi; Li Qing; Shiqi Shen; | code |
| 526 | Model Steering: Learning with A Reference Model Improves Generalization Bounds and Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a theory-driven framework for model steering called **DRRho risk minimization**, which is rooted in Distributionally Robust Optimization (DRO). |
Xiyuan Wei; Ming Lin; Fanjiang Ye; Fengguang Song; Liangliang Cao; My T. Thai; Tianbao Yang; | code |
| 527 | Discovering Global False Negatives On The Fly for Self-supervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach can result in the creation of negative pairs with similar semantics, referred to as false negatives, leading to their embeddings being falsely pushed apart. To address this issue, we introduce *GloFND*, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to *identify* its false negatives during training. |
Vicente Balmaseda; Bokun Wang; Ching-Long Lin; Tianbao Yang; | code |
| 528 | R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents’ roles, observed trajectories, and expected future behaviors. |
Harsh Goel; Mohammad Omama; Behdad Chalaki; Vaishnav Tadiparthi; Ehsan Moradi Pari; Sandeep P. Chinchali; | code |
| 529 | Hessian Geometry of Latent Space in Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel method for analyzing the latent space geometry of generative models, including statistical physics models and diffusion models, by reconstructing the Fisher information metric. |
Alexander Lobashev; Dmitry Guskov; Maria Larchenko; Mikhail Tamm; | code |
| 530 | InfoSAM: Fine-Tuning The Segment Anything Model from An Information-Theoretic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing PEFT methods for SAM neglect the domain-invariant relations encoded in the pre-trained model. To bridge this gap, we propose InfoSAM, an information-theoretic approach that enhances SAM fine-tuning by distilling and preserving its pre-trained segmentation knowledge. |
yuanhong zhang; Muyao Yuan; Weizhan Zhang; Tieliang Gong; Wen Wen; Jiangyong Ying; Weijie Shi; | code |
| 531 | GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GPTAQ, a novel finetuning-free quantization method for compressing large-scale transformer architectures. |
Yuhang Li; Ruokai Yin; Donghyun Lee; Shiting Xiao; Priyadarshini Panda; | code |
| 532 | Discriminative Finetuning of Generative Large Language Models Without Reward Models and Human Preference Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the limitations of SFT by exploring one of the most successful techniques in conventional supervised learning: discriminative learning. |
Siqi Guo; Ilgee Hong; Vicente Balmaseda; Changlong Yu; Liang Qiu; Xin Liu; Haoming Jiang; Tuo Zhao; Tianbao Yang; | code |
| 533 | Devil Is in The Details: Density Guidance for Detail-Aware Generation with Flow Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze an existing technique, Prior Guidance, which scales the latent code to influence image detail. |
Rafal Karczewski; Markus Heinonen; Vikas K Garg; | code |
| 534 | Circumventing Backdoor Space Via Weight Symmetry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Notably, recent studies have shown successful backdoor attacks across various learning paradigms, highlighting a critical security concern. To address this gap, we propose Two-stage Symmetry Connectivity (TSC), a novel backdoor purification defense that operates independently of data format and requires only a small fraction of clean samples. |
Jie Peng; Hongwei Yang; Jing Zhao; Hengji Dong; Hui He; Weizhe Zhang; Haoyu He; | code |
| 535 | Model Immunization from A Condition Number Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models. |
Amber Yijia Zheng; Site Bai; Brian Bullins; Raymond A. Yeh; | code |
| 536 | Are High-Quality AI-Generated Images More Difficult for Models to Detect? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, our systematic study on cutting-edge text-to-image generators reveals a counterintuitive finding: AIGIs with higher quality scores, as assessed by human preference models, tend to be more easily detected by existing models. To investigate this, we examine how the text prompts for generation and image characteristics influence both quality scores and detector accuracy. |
Yao Xiao; Binbin Yang; Weiyan Chen; Jiahao Chen; Zijie Cao; ZiYi Dong; Xiangyang Ji; Liang Lin; Wei Ke; Pengxu Wei; | code |
| 537 | Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose **Similar**, a **s**tep-w**i**se **m**ult**i**-dimensiona**l** gener**a**list **r**eward model, which offers fine-grained signals for agent training and can choose better actions for inference-time scaling.Furthermore, we introduce the first benchmark in the virtual agent domain for step-wise, multi-dimensional reward model training and evaluation, named ***SRM***. |
Bingchen Miao; Yang Wu; Minghe Gao; Qifan Yu; Wendong Bu; Wenqiao Zhang; liyunfei; Siliang Tang; Tat-Seng Chua; Juncheng Li; | code |
| 538 | Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose CoTo, a progressive training strategy that gradually increases adapters’ activation probability over the course of fine-tuning. |
Zhan Zhuang; Xiequn Wang; Wei Li; Yulong Zhang; Qiushi Huang; Shuhao Chen; Xuehao Wang; Yanbin Wei; Yuhe Nie; Kede Ma; Yu Zhang; Ying Wei; | code |
| 539 | CellFlux: Simulating Cellular Morphology Changes Via Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CellFlux, an image-generative model that simulates cellular morphology changes induced by chemical and genetic perturbations using flow matching. |
Yuhui Zhang; Yuchang Su; Chenyu Wang; Tianhong Li; Zoe Wefers; Jeffrey J Nirschl; James Burgess; Daisy Ding; Alejandro Lozano; Emma Lundberg; Serena Yeung-Levy; | code |
| 540 | Optimizing Adaptive Attacks Against Watermarks for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate watermark robustness as an objective function and use preference-based optimization to tune *adaptive* attacks against the specific watermarking method. |
Abdulrahman Diaa; Toluwani Aremu; Nils Lukas; | code |
| 541 | Unnatural Languages Are Not Bugs But Features for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts, often viewed as a bug for aligned LLMs. In this work, we present a systematic investigation challenging this perception, demonstrating that unnatural languages – strings that appear incomprehensible to humans but maintain semantic meanings for LLMs – contain latent features usable by models. |
Keyu Duan; Yiran Zhao; Zhili Feng; Jinjie Ni; Tianyu Pang; Qian Liu; Tianle Cai; Longxu Dou; Kenji Kawaguchi; Anirudh Goyal; J Zico Kolter; Michael Qizhe Shieh; | code |
| 542 | A Square Peg in A Square Hole: Meta-Expert for Long-Tailed Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that different experts are good at predicting different intervals of samples, e.g., long-tailed expert is skilled in samples located in the head interval and uniform expert excels in samples located in the medium interval. Therefore, we propose a dynamic expert assignment module that can estimate the class membership (i.e., head, medium, or tail class) of samples, and dynamically assigns suitable expert to each sample based on the estimated membership to produce high-quality pseudo-label in the training phase and produce prediction in the testing phase. |
Yaxin Hou; Yuheng Jia; | code |
| 543 | Scalable Non-Equivariant 3D Molecule Generation Via Rotational Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, specialized equivariant architectures limit the scalability and efficiency of diffusion models. In this paper, we propose an approach that relaxes such equivariance constraints. |
Yuhui Ding; Thomas Hofmann; | code |
| 544 | Generalized Interpolating Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging a novel diffusion ELBO, we achieve compute-matched state-of-the-art performance in diffusion language modeling. Exploiting GIDD’s flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality and unlocking the ability for the model to correct its own mistakes, an area where autoregressive models notoriously have struggled. |
Dimitri von Rütte; Janis Fluri; Yuhui Ding; Antonio Orvieto; Bernhard Schölkopf; Thomas Hofmann; | code |
| 545 | Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ensuring fairness in medical image segmentation is critical due to biases in imbalanced clinical data acquisition caused by demographic attributes (e.g., age, sex, race) and clinical factors (e.g., disease severity). To address these challenges, we introduce Distribution-aware Mixture of Experts (dMoE), inspired by optimal control theory. |
Yujin Oh; Pengfei Jin; Sangjoon Park; Sekeun Kim; Siyeop yoon; Jin Sung Kim; Kyungsang Kim; Xiang Li; Quanzheng Li; | code |
| 546 | TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce `TypyBench`, a benchmark designed to evaluate LLMs’ type inference across entire Python repositories. |
Honghua Dong; Jiacheng Yang; Xun Deng; Yuhe Jiang; Gennady Pekhimenko; Fan Long; Xujie Si; | code |
| 547 | EvoMesh: Adaptive Physical Simulation with Hierarchical Graph Evolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose EvoMesh, a fully differentiable framework that jointly learns graph hierarchies and physical dynamics, adaptively guided by physical inputs. |
Huayu Deng; Xiangming Zhu; Yunbo Wang; Xiaokang Yang; | code |
| 548 | Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods struggle to adaptively optimize the distribution of Gaussian primitives based on scene characteristics, making it challenging to balance reconstruction quality and efficiency. Inspired by human perception, we propose scene-adaptive perceptual densification for Gaussian Splatting (Perceptual-GS), a novel framework that integrates perceptual sensitivity into the 3DGS training process to address this challenge. |
Hongbi Zhou; Zhangkai Ni; | code |
| 549 | LADA: Scalable Label-Specific CLIP Adapter for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This requires selecting the expected parameters for input images during inference, which is prone to error that degrades performance. To address this problem, we introduce LADA (**L**abel-specific **ADA**pter). |
Mao-Lin Luo; Zi-Hao Zhou; Tong Wei; Min-Ling Zhang; | code |
| 550 | Efficiently Serving Large Multimodal Models Using EPD Disaggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Encode-Prefill-Decode (EPD) Disaggregation, a novel framework that separates the encoding, prefill, and decode stages onto dedicated resources. |
Gursimran Singh; Xinglu Wang; Yifan Hu; Timothy Tin Long Yu; Linzi Xing; Wei Jiang; Zhefeng Wang; Bai Xiaolong; Yi Li; Ying Xiong; Yong Zhang; Zhenan Fan; | code |
| 551 | SKOLR: Structured Koopman Operator Linear RNN for Time-Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we establish a connection between Koopman operator approximation and linear Recurrent Neural Networks (RNNs), which have recently demonstrated remarkable success in sequence modeling. |
Yitian Zhang; Liheng Ma; Antonios Valkanas; Boris N. Oreshkin; Mark Coates; | code |
| 552 | Learning Efficient Robotic Garment Manipulation with Standardization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents APS-Net, a novel approach to garment manipulation that combines unfolding and standardization in a unified framework. |
zhou changshi; Feng Luan; hujiarui; Shaoqiang Meng; Zhipeng Wang; Yanchao Dong; Yanmin Zhou; Bin He; | code |
| 553 | Diffusion Sampling Correction Via Approximately 10 Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **P**CA-based **A**daptive **S**earch (PAS), which optimizes existing solvers for DPMs with minimal additional costs. |
Guangyi Wang; Wei Peng; lijiang Li; Wenyu Chen; Yuren Cai; Song-Zhi Su; | code |
| 554 | Long-Form Speech Generation with Spoken Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the generative modeling of speech over multiple minutes, a requirement for long-form multimedia generation and audio-native voice assistants. |
Se Jin Park; Julian Salazar; Aren Jansen; Keisuke Kinoshita; Yong Man Ro; RJ Skerry-Ryan; | code |
| 555 | Automated Hypothesis Validation with Agentic Sequential Falsifications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose POPPER, an agentic framework for rigorous automated validation of free-form hypotheses. |
Kexin Huang; Ying Jin; Ryan Li; Michael Y. Li; Emmanuel Candes; Jure Leskovec; | code |
| 556 | Can Large Language Models Understand Intermediate Representations in Compilers? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs—GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama—in understanding IRs. |
Hailong Jiang; Jianfeng Zhu; Yao Wan; Bo Fang; Hongyu Zhang; Ruoming Jin; Qiang Guan; | code |
| 557 | IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce IMPACT, a text-to-audio generation framework that achieves high performance in audio quality and fidelity while ensuring fast inference. |
Kuan-Po Huang; Shu-wen Yang; HUY PHAN; Bo-Ru Lu; Byeonggeun Kim; Sashank Macha; Qingming Tang; Shalini Ghosh; Hung-yi Lee; Chieh-Chi Kao; Chao Wang; | code |
| 558 | QuRe: Query-Relevant Retrieval Through Hard Negative Sampling in Composed Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This may result in retrieving irrelevant images, reducing user satisfaction even when the target image is retrieved. To address this issue, we propose Query-Relevant Retrieval through Hard Negative Sampling (QuRe), which optimizes a reward model objective to reduce false negatives. |
Jaehyun Kwak; Ramahdani Muhammad Izaaz Inhar; Se-Young Yun; Sung-Ju Lee; | code |
| 559 | KoopSTD: Reliable Similarity Analysis Between Dynamical Systems Via Approximating Koopman Spectrum with Timescale Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **KoopSTD**, a dynamical similarity measurement framework that precisely characterizes the underlying dynamics by approximating the Koopman spectrum with explicit timescale decoupling and spectral residual control. |
Shimin Zhang; Ziyuan Ye; Yinsong Yan; Zeyang Song; Yujie Wu; Jibin Wu; | code |
| 560 | Revisiting Continuity of Image Tokens for Cross-domain Few-shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This questions the role of image tokens’ continuity in ViT’s generalization under large domain gaps. In this paper, we delve into this phenomenon for an interpretation. |
Shuai Yi; Yixiong Zou; Yuhua Li; Ruixuan Li; | code |
| 561 | Random Registers for Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although Vision Transformer (ViT) has shown superior capability in many vision tasks, its transferability against huge domain gaps in CDFSL is still under-explored. In this paper, we find an intriguing phenomenon: during the source-domain training, prompt tuning, as a common way to train ViT, could be harmful for the generalization of ViT in target domains, but setting them to random noises (i.e., random registers) could consistently improve target-domain performance. |
Shuai Yi; Yixiong Zou; Yuhua Li; Ruixuan Li; | code |
| 562 | Diffusion Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce *Lavender*, a simple supervised fine-tuning (SFT) method that boosts the performance of advanced vision-language models (VLMs) by leveraging state-of-the-art image generation models such as Stable Diffusion. |
Chen Jin; Ryutaro Tanno; Amrutha Saseendran; Tom Diethe; Philip Alexander Teare; | code |
| 563 | Towards Lifelong Model Editing Via Simulating Ideal Editor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, this paper proposes a general framework, ***Sim**ulating **I**deal **E**ditor* (SimIE), which restores the strong performance of parameter-modifying methods from standard model editing in a lifelong context. |
Yaming Guo; Siyang Guo; Hengshu Zhu; Ying Sun; | code |
| 564 | Modified K-means Algorithm with Local Optimality Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first present conditions under which the K-means algorithm converges to a locally optimal solution. Based on this, we propose simple modifications to the K-means algorithm which ensure local optimality in both the continuous and discrete sense, with the same computational complexity as the original K-means algorithm. |
Mingyi Li; Michael R. Metel; Akiko Takeda; | code |
| 565 | A Variational Framework for Improving Naturalness in Generative Spoken Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, pitch alone cannot fully represent the range of paralinguistic attributes, and selecting the right features requires careful hand-engineering. To overcome this, we propose an end-to-end variational approach that automatically learns to encode these continuous speech attributes to enhance the semantic tokens. |
Li-Wei Chen; Takuya Higuchi; Zakaria Aldeneh; Ahmed Hussen Abdelaziz; Alexander Rudnicky; | code |
| 566 | Efficient and Separate Authentication Image Steganography Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is elegant to design an authentication mechanism for isolated reception. We explore such mechanism through sufficient experiments, and uncover that additional authentication information will affect the distribution of hidden information and occupy more hiding space of the cover image. |
Junchao Zhou; Yao Lu; Jie Wen; Guangming Lu; | code |
| 567 | Learning Monotonic Probabilities with A Generative Cost Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This perspective enables us to reformulate the monotonicity challenge into modeling the latent cost variable. To tackle this, we introduce a generative network for the latent cost variable, termed the Generative Cost Model (**GCM**), which inherently addresses the strict monotonic problem, and propose the Implicit Generative Cost Model (**IGCM**) to address the implicit monotonic problem. |
Yongxiang Tang; Yanhua cheng; Xiaocheng Liu; Jiaochenchen; Yanxiang Zeng; Ning Luo; Pengjia Yuan; Xialong Liu; Peng Jiang; | code |
| 568 | Active Reward Modeling: Adaptive Preference Labeling for Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we highlight the insight that an ideal comparison dataset for reward modeling should balance *exploration of the representation space* and make *informative comparisons* between pairs with moderate reward differences. |
Yunyi Shen; Hao Sun; Jean-Francois Ton; | code |
| 569 | RePaViT: Scalable Vision Transformer Acceleration Via Structural Reparameterization on Feedforward Network Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel channel idle mechanism that facilitates post-training structural reparameterization for efficient FFN layers during testing. |
Xuwei Xu; Yang Li; Yudong Chen; Jiajun Liu; Sen Wang; | code |
| 570 | Optimal Information Retention for Time-Series Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a practical framework, we introduce an explanation framework ORTE, learning a binary mask to eliminate redundant information while mining temporal patterns of explanations. |
Jinghang Yue; Jing Wang; Lu Zhang; Shuo Zhang; Da Li; Zhaoyang Ma; Youfang Lin; | code |
| 571 | BalancEdit: Dynamically Balancing The Generality-Locality Trade-off in Multi-modal Model Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, direct knowledge editing within the models presents a more viable solution.We develop a new model editing dataset named OKEDIT, specifically designed to effectively evaluate this trade-off. |
Dongliang Guo; Mengxuan Hu; Zihan Guan; Thomas Hartvigsen; Sheng Li; | code |
| 572 | Redundancy Undermines The Trustworthiness of Self-Interpretable GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a systematic investigation into the trustworthiness of explanations generated by self-interpretable graph neural networks (GNNs), revealing why models trained with different random seeds yield inconsistent explanations. |
Wenxin Tai; Ting Zhong; Goce Trajcevski; Fan Zhou; | code |
| 573 | MTL-UE: Learning to Learn Nothing for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents MTL-UE, the first unified framework for generating unlearnable examples for multi-task data and MTL models. |
Yi Yu; Song Xia; SIYUAN YANG; Chenqi Kong; Wenhan Yang; Shijian Lu; Yap-Peng Tan; Alex Kot; | code |
| 574 | CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, frozen encoders often produce misaligned features, leading to confusion between classes and limiting specialization. To overcome this issue, we propose a confusion-aware loss (CoA-loss) that improves specialization by refining the decision boundaries between confusing classes. |
Dasol Hong; Wooju Lee; Hyun Myung; | code |
| 575 | ITBench: Evaluating AI Agents Across Diverse Real-World IT Automation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. |
Saurabh Jha; Rohan R. Arora; Yuji Watanabe; Takumi Yanagawa; Yinfang Chen; Jackson Clark; Bhavya Bhavya; Mudit Verma; Harshit Kumar; Hirokuni Kitahara; Noah Zheutlin; Saki Takano; Divya Pathak; Felix George; Xinbo Wu; Bekir O Turkkan; Gerard Vanloo; Michael Nidd; Ting Dai; Oishik Chatterjee; Pranjal Gupta; Suranjana Samanta; Pooja Aggarwal; Rong Lee; Jae-wook Ahn; Debanjana Kar; Amit Paradkar; Yu Deng; Pratibha Moogi; Prateeti Mohapatra; Naoki Abe; Chandrasekhar Narayanaswami; Tianyin Xu; Lav R. Varshney; Ruchi Mahindru; Anca Sailer; Laura Shwartz; Daby Sow; Nicholas C. M. Fuller; Ruchir Puri; | code |
| 576 | GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Some recent GT models help alleviate this issue, but their flexibility and expressiveness are still limited since the filters they learn are fixed on predefined graph spectrum or spectral order. To tackle this challenge, we propose a Graph Fourier Kolmogorov-Arnold Transformer (GrokFormer), a novel GT model that learns highly expressive spectral filters with adaptive graph spectrum and spectral order through a Fourier series modeling over learnable activation functions. |
GuoguoAi; Guansong Pang; Hezhe Qiao; YuanGao; Hui Yan; | code |
| 577 | Online Conformal Prediction Via Online Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a family of algorithms for online conformal prediction with coverage guarantees for both adversarial and stochastic data. |
Felipe Areces; Christopher Mohri; Tatsunori Hashimoto; John Duchi; | code |
| 578 | Propagate and Inject: Revisiting Propagation-Based Feature Imputation for Graphs with Partially Observed Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address learning tasks on graphs with missing features, enhancing the applicability of graph neural networks to real-world graph-structured data. |
Daeho Um; Sunoh Kim; Jiwoong Park; Jongin Lim; Seong Jin Ahn; Seulki Park; | code |
| 579 | PEAKS: Selecting Key Training Examples Incrementally Via Prediction Error Anchored By Kernel Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that in IDS, the impact of a new sample on the model state depends fundamentally on both its geometric relationship in the feature space and its prediction error. Leveraging this insight, we propose PEAKS (Prediction Error Anchored by Kernel Similarity), an efficient data selection method tailored for IDS. |
Mustafa Burak Gurbuz; Xingyu Zheng; Constantine Dovrolis; | code |
| 580 | Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a flexible and efficient data mixing framework, Chameleon, that employs leverage scores to quantify domain importance within a learned embedding space. |
Wanyun Xie; Francesco Tonin; Volkan Cevher; | code |
| 581 | Mixed-curvature Decision Trees and Random Forests Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our novel angular reformulation respects manifold geometry while preserving the algorithmic properties that make decision trees effective. In the special cases of single-component manifolds, our method simplifies to its Euclidean or hyperbolic counterparts, or introduces hyperspherical DT algorithms, depending on the curvature. |
Philippe Chlenski; Quentin Chu; Raiyan R. Khan; Kaizhu Du; Antonio Khalil Moretti; Itsik Pe’er; | code |
| 582 | Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **Pi**voting **Fa**ctorization (**PIFA**), a novel **lossless** meta low-rank representation that unsupervisedly learns a **compact** form of any low-rank representation, effectively eliminating redundant information. |
Jialin Zhao; Yingtao Zhang; Carlo Vittorio Cannistraci; | code |
| 583 | Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces MUFFIN, a fully convolutional NAC framework that leverages psychoacoustically guided multi-band frequency reconstruction. |
Dianwen Ng; Kun Zhou; Yi-Wen Chao; Zhiwei Xiong; Bin Ma; EngSiong Chng; | code |
| 584 | Playmate: Flexible Control of Portrait Animation Via 3D-Implicit Space Guided Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the first stage, we introduce a decoupled implicit 3D representation along with a meticulously designed motion-decoupled module to facilitate more accurate attribute disentanglement and generate expressive talking videos directly from audio cues. |
Xingpei Ma; Jiaran Cai; Yuansheng Guan; Shenneng Huang; Qiang Zhang; Shunsi Zhang; | code |
| 585 | On Exact Bit-level Reversible Transformers Without Changing Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we present the BDIA-transformer, which is an exact bit-level reversible transformer that uses an unchanged standard architecture for inference. |
Guoqiang Zhang; JP Lewis; W. Bastiaan Kleijn; | code |
| 586 | Open Materials Generation with Stochastic Interpolants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Open Materials Generation (OMatG), a unifying framework for the generative design and discovery of inorganic crystalline materials. |
Philipp Höllmer; Thomas Egg; Maya Martirossyan; Eric Fuemmeler; Zeren Shui; Amit Gupta; Pawan Prakash; Adrian Roitberg; Mingjie Liu; George Karypis; Mark Transtrum; Richard Hennig; Ellad B. Tadmor; Stefano Martiniani; | code |
| 587 | Tackling Dimensional Collapse Toward Comprehensive Universal Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify that the failure of PDM for extreme UniDA stems from dimensional collapse (DC) in target representations. |
Hung-Chieh Fang; Po-Yi Lu; Hsuan-Tien Lin; | code |
| 588 | Continual Generalized Category Discovery: Learning and Forgetting from A Bayesian Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on this insight, we propose Variational Bayes C-GCD (VB-CGCD), a novel framework that integrates variational inference with covariance-aware nearest-class-mean classification.We also introduce a new challenging benchmark with only 10% labeled data and extended online phases—VB-CGCD achieves a 67.86% final accuracy, significantly higher than state-of-the-art (38.55%), demonstrating its robust applicability across diverse scenarios. |
Hao Dai; Jagmohan Chauhan; | code |
| 589 | Navigating Conflicting Views: Harnessing Trust for Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While prior work focuses on learning consistent and informative representations across views, it often assumes perfect alignment and equal importance of all views, an assumption rarely met in real-world scenarios, as some views may express distinct information. To address this, we develop a computational trust-based discounting method that enhances the Evidential Multi-view framework by accounting for the instance-wise reliability of each view through a probability-sensitive trust mechanism. |
Jueqing Lu; Wray Buntine; YUANYUAN QI; Joanna Dipnall; Belinda Gabbe; Lan Du; | code |
| 590 | Self-supervised Adversarial Purification for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Defending Graph Neural Networks (GNNs) against adversarial attacks requires balancing accuracy and robustness, a trade-off often mishandled by traditional methods like adversarial training that intertwine these conflicting objectives within a single classifier. To overcome this limitation, we propose a self-supervised adversarial purification framework. |
Woohyun Lee; Hogun Park; | code |
| 591 | HyperIV: Real-time Implied Volatility Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose HyperIV, a novel approach for real-time implied volatility smoothing that eliminates the need for traditional calibration procedures. |
Yongxin Yang; Wenqi Chen; Chao Shu; Timothy Hospedales; | code |
| 592 | Pareto-Optimal Fronts for Benchmarking Symbolic Regression Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore absolute Pareto-optimal (APO) solutions instead, which have the optimal tradeoff between the multiple SR objectives, for 34 datasets in the widely-used SR benchmark, SRBench, by performing exhaustive search. |
Kei Sen Fong; Mehul Motani; | code |
| 593 | Fast Inference with Kronecker-Sparse Matrices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing GPU kernels for KS matrix multiplication suffer from high data movement costs, with up to 50% of time spent on memory-bound tensor permutations. We propose a fused, output-stationary GPU kernel that eliminates these overheads, reducing global memory traffic threefold. |
Antoine Gonon; Léon Zheng; Pascal Carrivain; TUNG QUOC LE; | code |
| 594 | Reconstructing Cell Lineage Trees from Phenotypic Features with Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce *CellTreeQM*, a novel deep learning method based on transformer architectures that learns an embedding space with geometric properties optimized for tree-graph inference.By formulating the lineage reconstruction problem as tree-metric learning, we systematically explore weakly supervised training settings at different levels of information and present the *Cell Lineage Reconstruction Benchmark* to facilitate comprehensive evaluation. |
Da Kuang; GuanWen Qiu; Junhyong Kim; | code |
| 595 | NegMerge: Sign-Consensual Weight Merging for Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method that utilizes all fine-tuned models trained with varying hyperparameters instead of a single selection. |
Hyo Seo Kim; Dongyoon Han; Junsuk Choe; | code |
| 596 | AMPO: Active Multi Preference Optimization for Self-play Preference Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Active Multi-Preference Optimization (AMPO), which combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection.We release our datasets [here](https://huggingface.co/Multi-preference-Optimization). |
Taneesh Gupta; Rahul Madhavan; Xuchao Zhang; Chetan Bansal; Saravan Rajmohan; | code |
| 597 | HYGMA: Hypergraph Coordination Networks with Dynamic Grouping for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel framework that integrates dynamic spectral clustering with hypergraph neural networks to enable adaptive group formation and efficient information processing in multi-agent systems. |
Chiqiang Liu; Dazi Li; | code |
| 598 | Pixel-level Certified Explanations Via Randomized Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Post-hoc attribution methods aim to explain deep learning predictions by highlighting influential input pixels. |
Alaa Anani; Tobias Lorenz; Mario Fritz; Bernt Schiele; | code |
| 599 | ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. |
Lufei Liu; Tor M. Aamodt; | code |
| 600 | Linear Mode Connectivity Between Multiple Models Modulo Permutation Symmetries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a more detailed empirical analysis. |
Akira Ito; Masanori Yamada; Atsutoshi Kumagai; | code |
| 601 | Test-Time Selective Adaptation for Uni-Modal Distribution Shift in Multi-Modal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, we investigate the the under-explored practical scenario *uni-modal distribution shift*, where the distribution shift influences only one modality, leaving the others unchanged. |
MingCai Chen; Baoming Zhang; Zongbo Han; Wenyu Jiang; Yanmeng Wang; Shuai Feng; Yuntao Du.; Bingkun BAO; | code |
| 602 | Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Modern neural-network-based Image Quality Assessment (IQA) metrics are vulnerable to adversarial attacks, which can be exploited to manipulate search engine rankings, benchmark results, and content quality assessments, raising concerns about the reliability of IQA metrics in critical applications. This paper presents the first comprehensive study of IQA defense mechanisms in response to adversarial attacks on these metrics to pave the way for safer use of IQA metrics. |
Aleksandr Gushchin; Khaled Abud; Georgii Bychkov; Ekaterina Shumitskaya; Anna Chistyakova; Sergey Lavrushkin; Bader Rasheed; Kirill Malyshev; Dmitriy S. Vatolin; Anastasia Antsiferova; | code |
| 603 | How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motion synthesis for diverse object categories holds great potential for 3D content creation but remains underexplored due to two key challenges: (1) the lack of comprehensive motion datasets that include a wide range of high-quality motions and annotations, and (2) the absence of methods capable of handling heterogeneous skeletal templates from diverse objects. To address these challenges, we contribute the following: First, we augment the Truebones Zoo dataset—a high-quality animal motion dataset covering over 70 species—by annotating it with detailed text descriptions, making it suitable for text-based motion synthesis. |
Wonkwang Lee; Jongwon Jeong; Taehong Moon; Hyeon-Jong Kim; Jaehyeon Kim; Gunhee Kim; Byeong-Uk Lee; | code |
| 604 | Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to substantial variations in data quality, the fixed regularization strength often leads to a dilemma: Weak regularization strength fails to address extrapolation errors and value overestimation, while strong regularization strength shifts policy learning toward behavior cloning, impeding potential performance enabled by Bellman updates. To address this issue, we propose the selective state-adaptive regularization method for offline RL. |
Qin-Wen Luo; Ming-Kun Xie; Ye-Wen Wang; Sheng-Jun Huang; | code |
| 605 | Improving Memory Efficiency for Training KANs Via Meta Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The proposed method provides an alternative technique for training KANs, that allows for greater scalability and extensibility, and narrows the training cost gap with MLPs stated in the original paper of KANs. |
Zhangchi Zhao; Jun Shu; Deyu Meng; Zongben Xu; | code |
| 606 | Beyond Entropy: Region Confidence Proxy for Wild Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel region-integrated method **ReCAP** that bypasses the lengthy process. |
Zixuan Hu; Yichun Hu; Xiaotong Li; SHIXIANG TANG; LINGYU DUAN; | code |
| 607 | MOGIC: Metadata-infused Oracle Guidance for Improved Extreme Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance accuracy while maintaining low latency, we propose MOGIC, a novel approach to metadata-infused oracle guidance for XC. |
Suchith Chidananda Prabhu; Bhavyajeet Singh; Anshul Mittal; Siddarth Asokan; Shikhar Mohan; Deepak Saini; Yashoteja Prabhu; Lakshya Kumar; Jian Jiao; Amit S; Niket Tandon; Manish Gupta; Sumeet Agarwal; Manik Varma; | code |
| 608 | A Recipe for Causal Graph Regression: Confounding Effects Revisited Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we reflect on the predictive power of confounders in graph-level regression, and generalize classification-specific causal intervention techniques to regression through a lens of contrastive learning. |
Yujia Yin; Tianyi Qu; Zihao Wang; Yifan Chen; | code |
| 609 | TabFSBench: Tabular Benchmark for Feature Shifts in Open Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper conducts the first comprehensive study on feature shifts in tabular data and introduces the first **tab**ular **f**eature-**s**hift **bench**mark (TabFSBench). |
Zi-Jian Cheng; Ziyi Jia; Zhi Zhou; Yu-Feng Li; Lan-Zhe Guo; | code |
| 610 | Text-to-LoRA: Instant Transformer Adaption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fine-tuning techniques enable practitioners to adapt foundation models for many new applications but require expensive and lengthy training while being notably sensitive to hyperparameter choices. To overcome these limitations, we introduce Text-to-LoRA (T2L), a model capable of adapting large language models (LLMs) on the fly solely based on a natural language description of the target task. |
Rujikorn Charakorn; Edoardo Cetin; Yujin Tang; Robert Tjarko Lange; | code |
| 611 | SE(3)-Equivariant Diffusion Policy in Spherical Fourier Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion Policies are effective at learning closed-loop manipulation policies from human demonstrations but generalize poorly to novel arrangements of objects in 3D space, hurting real-world performance. To address this issue, we propose Spherical Diffusion Policy (SDP), an SE(3) equivariant diffusion policy that adapts trajectories according to 3D transformations of the scene. |
Xupeng Zhu; Fan Wang; Robin Walters; Jane Shi; | code |
| 612 | Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel explicit modeling paradigm that incorporates factors influencing solid deformation through structured modules. |
Shilong Tao; Zhe Feng; Haonan Sun; Zhanxing Zhu; Yunhuai Liu; | code |
| 613 | TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current mode-seeking methods identify modes by breaking some dependency connections, but relying heavily on local data characteristics, requiring case-by-case threshold settings or human intervention to be effective for different datasets. To address this issue, we introduce a novel concept called typicality, by exploring the locally defined dependency from a global perspective, to quantify how confident a point would be a mode. |
Haowen Ma; Zhiguo Long; Hua Meng; | code |
| 614 | Targeted Unlearning with Single Layer Unlearning Gradient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Single Layer Unlearning Gradient (SLUG) as an efficient method to unlearn targeted information by updating a single critical layer using a one-time gradient computation. |
Zikui Cai; Yaoteng Tan; M. Salman Asif; | code |
| 615 | Complete-Tree Space Favors Data-Efficient Link Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenge of limited link samples, we propose leveraging hierarchical modularity as a prior structure. We introduce complete-tree (CT) space, a discrete metric space with latent complete-tree structures, to formalize hierarchical modularity with an emphasis on its hierarchical permutation symmetry. |
Chi Gao; Lukai Li; Yancheng Zhou; Shangqi Guo; | code |
| 616 | All-atom Inverse Protein Folding Through Discrete Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many inverse folding methods struggle to predict sequences for complexes that contain non-protein components, and perform poorly with complexes that adopt multiple structural states. To address these challenges, we present ADFLIP (All-atom Discrete FLow matching Inverse Protein folding), a generative model based on discrete flow-matching for designing protein sequences conditioned on all-atom structural contexts. |
Kai Yi; Kiarash Jamali; Sjors HW Scheres; | code |
| 617 | Analytical Lyapunov Function Discovery: An RL-based Generative Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we propose an end-to-end framework using transformers to construct analytical Lyapunov functions (local), which simplifies formal verification, enhances interpretability, and provides valuable insights for control engineers. |
Haohan Zou; Jie Feng; Hao Zhao; Yuanyuan Shi; | code |
| 618 | Unveiling AI’s Blind Spots: An Oracle for In-Domain, Out-of-Domain, and Adversarial Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, transformer-based mentor models excel at predicting errors across various mentee architectures. Subsequently, we draw insights from these observations and develop an oracle mentor model, dubbed SuperMentor, that can outperform baseline mentors in predicting errors across different error types from the ImageNet-1K dataset. |
Shuangpeng Han; Mengmi Zhang; | code |
| 619 | Expressive Score-Based Priors for Distribution Matching with Geometry-Preserving Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While likelihood-based methods are a promising alternative, they often impose unnecessary biases through fixed priors or require explicit density models (e.g., flows) that can be challenging to train. We address this limitation by introducing a novel approach to training likelihood-based DM using expressive score-based prior distributions. |
Ziyu Gong; Jim Lim; David I. Inouye; | code |
| 620 | An End-to-End Model for Logits-Based Large Language Models Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel end-to-end logits perturbation method for watermarking LLM-generated text. |
KA HIM WONG; Jicheng Zhou; Jiantao Zhou; Yain-Whar Si; | code |
| 621 | LipsNet++: Unifying Filter and Controller Into A Policy Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LipsNet++, a novel policy network with Fourier filter layer and Lipschitz controller layer to separately address both causes. |
Xujie Song; Liangfa Chen; Tong Liu; Wenxuan Wang; Yinuo Wang; Shentao Qin; Yinsong Ma; Jingliang Duan; Shengbo Eben Li; | code |
| 622 | Enhancing Certified Robustness Via Block Reflector Orthogonal Layers and Logit Annealing Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel, efficient Block Reflector Orthogonal (BRO) layer that enhances the capability of orthogonal layers on constructing more expressive Lipschitz neural architectures. |
Bo-Han Lai; Pin-Han Huang; Bo-Han Kung; Shang-Tse Chen; | code |
| 623 | Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. |
Seungho Baek; taegeon park; Jongchan Park; Seungjun Oh; Yusung Kim; | code |
| 624 | DCBM: Data-Efficient Visual Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Data-efficient CBMs (DCBMs), which reduce the need for large sample sizes during concept generation while preserving interpretability. |
Katharina Prasse; Patrick Knab; Sascha Marton; Christian Bartelt; Margret Keuper; | code |
| 625 | The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a careful, systematic analysis of a number of tensor product operations. |
YuQing Xie; Ameya Daigavane; Mit Kotak; Tess Smidt; | code |
| 626 | Return Capping: Sample Efficient CVaR Policy Gradient Optimisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When optimising for conditional value at risk (CVaR) using policy gradients (PG), current methods rely on discarding a large proportion of trajectories, resulting in poor sample efficiency. We propose a reformulation of the CVaR optimisation problem by capping the total return of trajectories used in training, rather than simply discarding them, and show that this is equivalent to the original problem if the cap is set appropriately. |
Harry Mead; Clarissa Costen; Bruno Lacerda; Nick Hawes; | code |
| 627 | Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize this machine learning problem and introduce alpha-covariance, a metric for evaluating robustness to such transformations. To tackle this task, we propose a dual-part token embedding strategy: a shared component ensures semantic consistency, while a randomized component maintains token distinguishability. |
İlker Işık; Ramazan Gokberk Cinbis; Ebru Aydin Gol; | code |
| 628 | CoDy: Counterfactual Explainers for Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Counterfactual explanation methods provide a promising solution by illustrating how modifications to input graphs can influence model predictions. To address this challenge, we present CoDy—Counterfactual Explainer for Dynamic Graphs—a model-agnostic, instance-level explanation approach that identifies counterfactual subgraphs to interpret TGNN predictions. |
Zhan Qu; Daniel Gomm; Michael Färber; | code |
| 629 | A Physics-Augmented Deep Learning Framework for Classifying Single Molecule Force Spectroscopy Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we both apply state-of-the-art machine learning models and present a novel deep learning model tailored to SMFS data. |
Cailong Hua; Sivaraman Rajaganapathy; Rebecca A Slick; Joseph Vavra; Joseph M. Muretta; James M. Ervasti; Murti Salapaka; | code |
| 630 | Ad-Hoc Human-AI Coordination Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. |
Tin Dizdarević; Ravi Hammond; Tobias Gessler; Anisoara Calinescu; Jonathan Cook; Matteo Gallici; Andrei Lupu; Jakob Nicolaus Foerster; | code |
| 631 | A Closer Look at Multimodal Representation Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We further prove that cross-modal knowledge distillation implicitly disentangles such representations by freeing up rank bottlenecks in the student encoder, denoising the fusion-head outputs without negatively impacting the predictive features from either modality. Based on the above findings, we propose an algorithm that prevents modality collapse through explicit basis reallocation, with applications in dealing with missing modalities. |
Abhra Chaudhuri; Anjan Dutta; Tu Bui; Serban Georgescu; | code |
| 632 | CombiMOTS: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose CombiMOTS, a Pareto Monte Carlo Tree Search (PMCTS) framework that generates dual-target molecules. |
Thibaud Southiratn; Bonil Koo; Yijingxiu Lu; Sun Kim; | code |
| 633 | BounDr.E: Predicting Drug-likeness Via Biomedical Knowledge Alignment and EM-like One-Class Boundary Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce BounDr.E}: a novel modeling of drug-likeness as a compact space surrounding approved drugs through a dynamic one-class boundary approach. |
Dongmin Bang; Inyoung Sung; Yinhua Piao; Sangseon Lee; Sun Kim; | code |
| 634 | Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose using an ensemble of diverse classifiers to adaptively capture risk associated with subpopulations. |
Nguyen Nhat Minh To; Paul F R Wilson; Viet Nguyen; Mohamed Harmanani; Michael Cooper; Fahimeh Fooladgar; Purang Abolmaesumi; Parvin Mousavi; Rahul Krishnan; | code |
| 635 | What Limits Bidirectional Model’s Generative Capabilities? A Uni-Bi-Directional Mixture-of-Expert Method For Bidirectional Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through systematic Transformer module evaluations, we discover the FFN layer is least affected by such dependence. Leveraging this discovery, we propose UBMoE-LLM, a novel Uni-Bi-directional Mixture-of-Experts LLM, which integrates the original unidirectional FFN with a bidirectionally fine-tuned FFN via unsupervised contrastive learning. |
Zuchao Li; Yonghua Hei; Qiwei Li; Lefei Zhang; Ping Wang; hai zhao; Baoyuan Qi; Liu Guoming; | code |
| 636 | Test-time Correlation Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we provide a theoretical analysis to investigate the feasibility of **T**est-time **C**orrelation **A**lignment (**TCA**), demonstrating that correlation alignment between high-certainty instances and test instances can enhance test performances with a theoretical guarantee. Based on this, we propose two simple yet effective algorithms: LinearTCA and LinearTCA+. |
Linjing You; Jiabao Lu; Xiayuan Huang; | code |
| 637 | Tensor Product Neural Networks for Functional ANOVA Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel neural network which guarantees a unique functional ANOVA decomposition and thus is able to estimate each component stably. |
Seokhun Park; Insung Kong; yongchan Choi; Chanmoo Park; Yongdai Kim; | code |
| 638 | Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, a critical gap persists, `conceptualization’—the ability to recognize and reason about the same concept despite variations in visual form, a basic ability of human reasoning. To address this challenge, we introduce the Visual Graph Arena (VGA), a dataset featuring six graph-based tasks designed to evaluate and improve AI systems’ capacity for visual abstraction. |
Zahra Babaiee; Peyman Kiasari; Daniela Rus; Radu Grosu; | code |
| 639 | Attributes Shape The Embedding Space of Face Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we observe a multiscale geometric structure emerging in the embedding space, influenced by interpretable facial (e.g., hair color) and image attributes (e.g., contrast). We propose a geometric approach to describe the dependence or invariance of FR models to these attributes and introduce a physics-inspired alignment metric. |
Pierrick Leroy; Antonio Mastropietro; Marco Nurisso; Francesco Vaccarino; | code |
| 640 | ADIOS: Antibody Development Via Opponent Shaping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To demonstrate the value of ADIOS, we build a viral evolution simulator using the Absolut! |
Sebastian Rene Towers; Aleksandra Kalisz; Philippe A. Robert; Alicia Higueruelo; Francesca Vianello; Chloe Ming-Han Tsai; Harrison Steel; Jakob Nicolaus Foerster; | code |
| 641 | Flow-of-Options: Diversified and Improved LLM Reasoning By Thinking Through Options Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel reasoning approach called Flow-of-Options (FoO), designed to address intrinsic biases in Large Language Models (LLMs). |
Lakshmi Nair; Ian Trase; J. Mark Kim; | code |
| 642 | SCENIR: Visual Semantic Clarity Through Unsupervised Scene Graph Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recognizing the lack of semantic understanding as a key limitation, we propose a novel scene graph-based retrieval framework that emphasizes semantic content over superficial image characteristics. |
Nikolaos Chaidos; Angeliki Dimitriou; Maria Lymperaiou; Giorgos Stamou; | code |
| 643 | IN2V: Bringing Transductive Node Embeddings to Inductive Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Embedding methods like N2V are limited in their application on new nodes, which restricts them to the transductive setting where the entire graph, including the test nodes, is available during training. We propose inductive node2vec (iN2V), which combines a post-hoc procedure to compute embeddings for nodes unseen during training and modifications to the original N2V training procedure to prepare the embeddings for this post-hoc procedure. |
Nicolas Lell; Ansgar Scherp; | code |
| 644 | Aggregation Buffer: Revisiting DropEdge with A New Parameter Block Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on this analysis, we propose **Aggregation Buffer**, a parameter block specifically designed to improve the robustness of GNNs by addressing the limitation of DropEdge. |
Dooho Lee; Myeong Kong; Sagad Hamid; Cheonwoo Lee; Jaemin Yoo; | code |
| 645 | Enhancing Visual Localization with Cross-Domain Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel cross-domain data generation method to enhance visual localization methods. |
Yuanze Wang; Yichao Yan; Shiming Song; Songchang Jin; Yilan Huang; Xingdong Sheng; Dianxi Shi; | code |
| 646 | Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Imitation learning (IL) has shown promise in robot locomotion but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. |
Zhenglin Wan; Xingrui Yu; David Mark Bossens; Yueming Lyu; Qing Guo; Flint Xiaofeng Fan; Yew-Soon Ong; Ivor Tsang; | code |
| 647 | NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings Via Neural Tangent Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach leveraging the NTK to train client models in the decentralized setting, while introducing a synergy between NTK-based evolution and model averaging. |
Gabriel Thompson; Kai Yue; Chau-Wai Wong; Huaiyu Dai; | code |
| 648 | Reinforcement Learning for Quantum Control Under Physical Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We devise a physics-constrained Reinforcement Learning (RL) algorithm that restricts the space of possible solutions. |
Jan Ole Ernst; Aniket Chatterjee; Tim Franzmeyer; Axel Kuhn; | code |
| 649 | Improved Expressivity of Hypergraph Neural Networks Through High-Dimensional Generalized Weisfeiler-Leman Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current algorithms for hypergraphs, like the 1-dimensional generalized Weisfeiler-Lehman test (1-GWL), lag behind advancements in graph isomorphism tests, limiting most hypergraph neural networks to 1-GWL’s expressive power. To address this, we propose the high-dimensional GWL (k-GWL), generalizing k-WL from graphs to hypergraphs. |
Detian Zhang; Chengqiang Zhang; Yanghui Rao; Li Qing; Chunjiang Zhu; | code |
| 650 | Reasoning Limitations of Multimodal Large Language Models. A Case Study of Bongard Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite some successes on real-world datasets, MLLMs struggle with synthetic BPs. To explore this gap, we introduce Bongard-RWR, a dataset representing synthetic BP concepts using real-world images. |
Mikołaj Małkiński; Szymon Pawlonka; Jacek Mańdziuk; | code |
| 651 | Global Curvature for Second-order Optimization of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a theory that predicts the \emph{exact} structure of the global curvature by leveraging the intrinsic symmetries of neural networks, such as invariance under parameter permutations. |
Alberto Bernacchia; | code |
| 652 | MTSTRec: Multimodal Time-Aligned Shared Token Recommender Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing item ID-based methods and multimodal models often overlook the temporal alignment of modalities like textual descriptions, visual content, and prices in user browsing sequences. To address this limitation, this paper proposes the Multimodal Time-aligned Shared Token Recommender (MTSTRec), a transformer-based framework with a single time-aligned shared token per product for efficient cross-modality fusion. |
Ming-Yi Hong; Yen-Jung Hsu; Miao-Chen Chiang; Che Lin; | code |
| 653 | Curvature-aware Graph Attention for PDEs on Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Curvature-aware Graph Attention for PDEs on manifolds by exploring the important intrinsic geometric quantities such as curvature and discrete gradient operator. |
Yunfeng Liao; Jiawen Guan; Xiucheng Li; | code |
| 654 | Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. |
Sunwoo Lee; Jaebak Hwang; Yonghyeon Jo; Seungyul Han; | code |
| 655 | STD-FD: Spatio-Temporal Distribution Fitting Deviation for AIGC Forgery Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Spatio-Temporal Distribution Fitting Deviation (STD-FD) for AIGC forgery detection, which explores the generative process in detail. |
Hengrui Lou; Zunlei Feng; Jinsong Geng; Erteng Liu; Jie Lei; Lechao Cheng; Jie Song; Mingli Song; Yijun Bei; | code |
| 656 | Fleet of Agents: Coordinated Problem Solving with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Fleet of Agents (FoA), a novel and intuitive yet principled framework utilizing LLMs as agents to navigate through dynamic tree searches, employing a genetic-type particle filtering approach. |
Lars Henning Klein; Nearchos Potamitis; Roland Aydin; Robert West; Caglar Gulcehre; Akhil Arora; | code |
| 657 | Learning from True-False Labels Via Multi-modal Prompt Retrieving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel weakly supervised labeling setting, namely **T**rue-**F**alse **L**abels (TFLs) which can achieve high accuracy when generated by VLMs. |
Zhongnian Li; Jinghao Xu; Peng Ying; Meng Wei; Xinzheng Xu; | code |
| 658 | Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training Via Symmetric Policy Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel off-policy method that eliminates the need for additional environmental interactions by reformulating adversarial learning as a soft-constrained optimization problem. |
Kosuke Nakanishi; Akihiro Kubo; Yuji Yasui; Shin Ishii; | code |
| 659 | Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. |
Jeongmo Kim; Yisak Park; Minung Kim; Seungyul Han; | code |
| 660 | Hypo3D: Exploring Hypothetical Reasoning in 3D Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce *Hypothetical 3D Reasoning*, namely Hypo3D, a benchmark designed to evaluate models’ ability to reason without access to real-time scene data. |
Ye Mao; Weixun Luo; Junpeng Jing; Anlan Qiu; Krystian Mikolajczyk; | code |
| 661 | FlexiClip: Locality-Preserving Free-Form Character Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Similarly, text-to-video (T2V) and image-to-video (I2V) models struggle to handle clipart due to the mismatch in statistical properties between natural video and clipart styles. This paper introduces FlexiClip, a novel approach designed to overcome these limitations by addressing the intertwined challenges of temporal consistency and geometric integrity. |
Anant Khandelwal; | code |