ICLR 2025 Papers with Code & Data
To facilitate rapid community engagement with the presented research, we have compiled an extensive index of accepted papers that have associated public code or data repositories. We list all of them in the following table. This index was generated using an automated extraction process. While we strive for completeness, some papers with public resources may have been missed. Please inform us if you discover any additional papers that should be included. Readers should be aware that some code repositories may not be made fully public until the conference officially begins.
In addition to this index, we encourage readers to explore our related resources: ICLR-2025 papers & highlights: For curated summaries and key takeaways from this year’s conference. “Best Paper” Digest (ICLR): A historical overview of the most influential ICLR papers published since 2018.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive daily paper digests on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: ICLR 2025 Papers with Code & Data
| Paper | Author(s) | Code | |
|---|---|---|---|
| 1 | Generative Representational Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. |
Niklas Muennighoff; Hongjin SU; Liang Wang; Nan Yang; Furu Wei; Tao Yu; Amanpreet Singh; Douwe Kiela; | code |
| 2 | AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AndroidWorld, a fully functional Android environment that provides reward signals for 116 programmatic tasks across 20 real-world Android apps. |
Christopher Rawles; Sarah Clinckemaillie; Yifan Chang; Jonathan Waltz; Gabrielle Lau; Marybeth Fair; Alice Li; William E Bishop; Wei Li; Folawiyo Campbell-Ajala; Daniel Kenji Toyama; Robert James Berry; Divya Tyamagundlu; Timothy P Lillicrap; Oriana Riva; | code |
| 3 | Scaling Up Masked Diffusion Models on Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fully leveraging the probabilistic formulation of MDMs, we propose a simple yet effective *unsupervised classifier-free guidance* that effectively exploits large-scale unpaired data, boosting performance for conditional inference. |
Shen Nie; Fengqi Zhu; Chao Du; Tianyu Pang; Qian Liu; Guangtao Zeng; Min Lin; Chongxuan Li; | code |
| 4 | Your Absorbing Discrete Diffusion Secretly Models The Conditional Distributions of Clean Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time-dependent scalar in an analytic form. |
Jingyang Ou; Shen Nie; Kaiwen Xue; Fengqi Zhu; Jiacheng Sun; Zhenguo Li; Chongxuan Li; | code |
| 5 | LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. |
Haian Jin; Hanwen Jiang; Hao Tan; Kai Zhang; Sai Bi; Tianyuan Zhang; Fujun Luan; Noah Snavely; Zexiang Xu; | code |
| 6 | SpinQuant: LLM Quantization with Learned Rotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures while enhancing quantization accuracy. |
Zechun Liu; Changsheng Zhao; Igor Fedorov; Bilge Soran; Dhruv Choudhary; Raghuraman Krishnamoorthi; Vikas Chandra; Yuandong Tian; Tijmen Blankevoort; | code |
| 7 | On Scaling Up 3D Gaussian Splatting Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. |
Hexu Zhao; Haoyang Weng; Daohan Lu; Ang Li; Jinyang Li; Aurojit Panda; Saining Xie; | code |
| 8 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify that only a fraction of attention heads, a.k.a, Retrieval Heads, are critical for processing long contexts and require full attention across all tokens. |
Guangxuan Xiao; Jiaming Tang; Jingwei Zuo; junxian guo; Shang Yang; Haotian Tang; Yao Fu; Song Han; | code |
| 9 | AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. |
Maksym Andriushchenko; Alexandra Souly; Mateusz Dziemian; Derek Duenas; Maxwell Lin; Justin Wang; Dan Hendrycks; Andy Zou; J Zico Kolter; Matt Fredrikson; Yarin Gal; Xander Davies; | code |
| 10 | Does Refusal Training in LLMs Generalize to The Past Tense? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., *How to make a Molotov cocktail?* to *How did people make a Molotov cocktail?*) is often sufficient to jailbreak many state-of-the-art LLMs. |
Maksym Andriushchenko; Nicolas Flammarion; | code |
| 11 | Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this way, we achieve 100\% attack success rate—according to GPT-4 as a judge—on Vicuna-13B, Mistral-7B, Phi-3-Mini, Nemotron-4-340B, Llama-2-Chat-7B/13B/70B, Llama-3-Instruct-8B, Gemma-7B, GPT-3.5, GPT-4o, and R2D2 from HarmBench that was adversarially trained against the GCG attack. |
Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; | code |
| 12 | JudgeBench: A Benchmark for Evaluating LLM-Based Judges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing benchmarks primarily focus on a judge’s alignment with human preferences, but often fail to account for more challenging tasks where crowdsourced human preference is a poor indicator of factual and logical correctness. To address this, we propose a novel evaluation framework to objectively evaluate LLM-based judges. |
Sijun Tan; Siyuan Zhuang; Kyle Montgomery; William Yuan Tang; Alejandro Cuadron; Chenguang Wang; Raluca Popa; Ion Stoica; | code |
| 13 | Depth Pro: Sharp Monocular Metric Depth in Less Than A Second Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a foundation model for zero-shot metric monocular depth estimation. |
Alexey Bochkovskiy; Amaël Delaunoy; Hugo Germain; Marcel Santos; Yichao Zhou; Stephan Richter; Vladlen Koltun; | code |
| 14 | Improving Pretraining Data Using Perplexity Correlations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, progress in understanding pretraining data has been slow due to the costly pretraining runs required for data selection experiments. We present a framework that avoids these costs and selects high-quality pretraining data without any LLM training of our own. |
Tristan Thrush; Christopher Potts; Tatsunori Hashimoto; | code |
| 15 | KBLaM: Knowledge Base Augmented Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Knowledge Base augmented Language Model (KBLAM), a new method for augmenting Large Language Models (LLMs) with external knowledge. |
Xi Wang; Taketomo Isazawa; Liana Mikaelyan; James Hensman; | code |
| 16 | NV-Embed: Improved Techniques for Training LLMs As Generalist Embedding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the NV-Embed model, incorporating architectural designs, training procedures, and curated datasets to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility. |
Chankyu Lee; Rajarshi Roy; Mengyao Xu; Jonathan Raiman; Mohammad Shoeybi; Bryan Catanzaro; Wei Ping; | code |
| 17 | CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos that align seamlessly with text prompts, with a frame rate of 16 fps and resolution of 768 x 1360 pixels. |
Zhuoyi Yang; Jiayan Teng; Wendi Zheng; Ming Ding; Shiyu Huang; Jiazheng Xu; Yuanming Yang; Wenyi Hong; Xiaohan Zhang; Guanyu Feng; Da Yin; Yuxuan.Zhang; Weihan Wang; Yean Cheng; Bin Xu; Xiaotao Gu; Yuxiao Dong; Jie Tang; | code |
| 18 | Bidirectional Decoding: Improving Action Chunking Via Guided Test-Time Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity to unexpected states. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation. |
Yuejiang Liu; Jubayer Ibn Hamid; Annie Xie; Yoonho Lee; Max Du; Chelsea Finn; | code |
| 19 | Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. |
Marianne Arriola; Aaron Gokaslan; Justin T Chiu; Zhihan Yang; Zhixuan Qi; Jiaqi Han; Subham Sekhar Sahoo; Volodymyr Kuleshov; | code |
| 20 | Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an approach, called Digi-Q, to train VLM-based action-value Q-functions which are then used to extract the agent policy. |
Hao Bai; Yifei Zhou; Li Erran Li; Sergey Levine; Aviral Kumar; | code |
| 21 | How to Evaluate Reward Models for RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). |
Evan Frick; Tianle Li; Connor Chen; Wei-Lin Chiang; Anastasios Nikolas Angelopoulos; Jiantao Jiao; Banghua Zhu; Joseph E. Gonzalez; Ion Stoica; | code |
| 22 | MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. |
Jun Shern Chan; Neil Chowdhury; Oliver Jaffe; James Aung; Dane Sherburn; Evan Mays; Giulio Starace; Kevin Liu; Leon Maksin; Tejal Patwardhan; Aleksander Madry; Lilian Weng; | code |
| 23 | EqNIO: Subequivariant Neural Inertial Odometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These priors learn to produce denoised displacement measurements but need to ignore data variations due to specific IMU mount orientation and motion directions, hindering generalization. This work introduces EqNIO, which addresses this challenge with _canonical displacement priors_, i.e., priors that are invariant to the orientation of the gravity-aligned frame in which the IMU data is expressed. |
Royina Karegoudra Jayanth; Yinshuang Xu; Ziyun Wang; Evangelos Chatzipantazis; Kostas Daniilidis; Daniel Gehrig; | code |
| 24 | Improving Instruction-Following in Language Models Through Activation Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. |
Alessandro Stolfo; Vidhisha Balachandran; Safoora Yousefi; Eric Horvitz; Besmira Nushi; | code |
| 25 | SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate design choices for creating a fast, accurate automated safety evaluator. |
Tinghao Xie; Xiangyu Qi; Yi Zeng; Yangsibo Huang; Udari Madhushani Sehwag; Kaixuan Huang; Luxi He; Boyi Wei; Dacheng Li; Ying Sheng; Ruoxi Jia; Bo Li; Kai Li; Danqi Chen; Peter Henderson; Prateek Mittal; | code |
| 26 | Agent S: An Open Agentic Framework That Uses Computers Like A Human Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Agent S, an open agentic framework that enables autonomous interaction with computers through Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks. |
Saaket Agashe; Jiuzhou Han; Shuyu Gan; Jiachen Yang; Ang Li; Xin Eric Wang; | code |
| 27 | RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the Robotics Diffusion Transformer (RDT), a pioneering diffusion foundation model for bimanual manipulation. |
Songming Liu; Lingxuan Wu; Bangguo Li; Hengkai Tan; Huayu Chen; Zhengyi Wang; Ke Xu; Hang Su; Jun Zhu; | code |
| 28 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce NoPoSplat, a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from unposed sparse multi-view images. |
Botao Ye; Sifei Liu; Haofei Xu; Xueting Li; Marc Pollefeys; Ming-Hsuan Yang; Songyou Peng; | code |
| 29 | MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs), enabling a broader search scenario, termed universal multimodal retrieval, where multiple modalities and diverse retrieval tasks are accommodated. |
Sheng-Chieh Lin; Chankyu Lee; Mohammad Shoeybi; Jimmy Lin; Bryan Catanzaro; Wei Ping; | code |
| 30 | SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our analysis reveals that LLMs exhibit great potential for self-acceleration through layer sparsity and the task-specific nature of this sparsity. Building on these insights, we introduce SWIFT, an on-the-fly self-speculative decoding algorithm that adaptively selects intermediate layers of LLMs to skip during inference. |
Heming Xia; Yongqi Li; Jun Zhang; Cunxiao Du; Wenjie Li; | code |
| 31 | OGBench: Benchmarking Offline Goal-Conditioned RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose OGBench, a new, high-quality benchmark for algorithms research in offline goal-conditioned RL. |
Seohong Park; Kevin Frans; Benjamin Eysenbach; Sergey Levine; | code |
| 32 | RB-Modulation: Training-Free Stylization Using Reference-Based Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. |
Litu Rout; Yujia Chen; Nataniel Ruiz; Abhishek Kumar; Constantine Caramanis; Sanjay Shakkottai; Wen-Sheng Chu; | code |
| 33 | Semantic Image Inversion and Editing Using Rectified Stochastic Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator, and prove that the resulting vector field is equivalent to a rectified stochastic differential equation. |
Litu Rout; Yujia Chen; Nataniel Ruiz; Constantine Caramanis; Sanjay Shakkottai; Wen-Sheng Chu; | code |
| 34 | ColPali: Efficient Document Retrieval with Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To benchmark current systems on visually rich document retrieval, we introduce the Visual Document Retrieval Benchmark $\textit{ViDoRe}$, composed of various page-level retrieval tasks spanning multiple domains, languages, and practical settings.We release models, data, code and benchmarks under open licenses at https://hf.co/vidore. |
Manuel Faysse; Hugues Sibille; Tony Wu; Bilel Omrani; Gautier Viaud; CELINE HUDELOT; Pierre Colombo; | code |
| 35 | AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AHA, an open-source VLM designed to detect and reason about failures in robotic manipulation using natural language. |
Jiafei Duan; Wilbert Pumacay; Nishanth Kumar; Yi Ru Wang; Shulin Tian; Wentao Yuan; Ranjay Krishna; Dieter Fox; Ajay Mandlekar; Yijie Guo; | code |
| 36 | Sparse Autoencoders Do Not Find Canonical Units of Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To train meta-SAEs we introduce BatchTopK SAEs, an improved variant of the popular TopK SAE method, that only enforces a fixed average sparsity. |
Patrick Leask; Bart Bussmann; Michael T Pearce; Joseph Isaac Bloom; Curt Tigges; Noura Al Moubayed; Lee Sharkey; Neel Nanda; | code |
| 37 | Watermark Anything With Localized Messages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a deep-learning model for localized image watermarking, dubbed the Watermark Anything Model (WAM). |
Tom Sander; Pierre Fernandez; Alain Oliviero Durmus; Teddy Furon; Matthijs Douze; | code |
| 38 | Dualformer: Controllable Fast and Slow Thinking By Learning with Randomized Reasoning Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present \dualformer, a single Transformer model that seamlessly integrates both the fast and slow reasoning modes by training on randomized reasoning traces, where different parts of the traces are strategically dropped during training. |
DiJia Su; Sainbayar Sukhbaatar; Michael Rabbat; Yuandong Tian; Qinqing Zheng; | code |
| 39 | TorchTitan: One-stop PyTorch Native Solution for Production Ready LLM Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces **TORCHTITAN**$^1$, a PyTorch-native distributed training system that unifies and advances state-of-the-art techniques, streamlining integration and reducing engineering overhead. |
Wanchao Liang; Tianyu Liu; Less Wright; Will Constable; Andrew Gu; Chien-Chin Huang; Iris Zhang; Wei Feng; Howard Huang; Junjie Wang; Sanket Purandare; Gokul Nadathur; Stratos Idreos; | code |
| 40 | SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current video guardrails, however, are either overly simplistic, relying on pure classification models trained on simple policies with limited unsafe categories, which lack detailed explanations, or prompting multimodal large language models (MLLMs) with long safety guidelines, which are inefficient and impractical for guardrailing real-world content. To bridge this gap, we propose SafeWatch, an efficient MLLM-based video guardrail model designed to follow customized safety policies and provide multi-label video guardrail outputs with content-specific explanations in a zero-shot manner. |
Zhaorun Chen; Francesco Pinto; Minzhou Pan; Bo Li; | code |
| 41 | When Attention Sink Emerges in Language Models: An Empirical View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We highlight that attention sink emerges after effective optimization on sufficient training data. |
Xiangming Gu; Tianyu Pang; Chao Du; Qian Liu; Fengzhuo Zhang; Cunxiao Du; Ye Wang; Min Lin; | code |
| 42 | Dissecting Adversarial Robustness of Multimodal LM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To systematically examine the robustness of agents, we propose the Agent Robustness Evaluation (ARE) framework. |
Chen Henry Wu; Rishi Rajesh Shah; Jing Yu Koh; Russ Salakhutdinov; Daniel Fried; Aditi Raghunathan; | code |
| 43 | Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench, have become popular for evaluating language models due to their cost-effectiveness and scalability compared to human evaluation. |
Xiaosen Zheng; Tianyu Pang; Chao Du; Qian Liu; Jing Jiang; Min Lin; | code |
| 44 | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, continuing with the current architectures will present a computational roadblock. To address this gap, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for Imitation Learning. |
Moritz Reuss; Jyothish Pari; Pulkit Agrawal; Rudolf Lioutikov; | code |
| 45 | TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a progressive multi-granularity framework. |
Leqi Shen; Tianxiang Hao; Tao He; Sicheng Zhao; Yifeng Zhang; pengzhang liu; Yongjun Bao; Guiguang Ding; | code |
| 46 | ThinK: Thinner Key Cache By Query-Driven Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose ThinK, a novel query-dependent KV cache pruning method designed to minimize attention weight loss while selectively pruning the least significant channels. |
Yuhui Xu; Zhanming Jie; Hanze Dong; Lei Wang; Xudong Lu; Aojun Zhou; Amrita Saha; Caiming Xiong; Doyen Sahoo; | code |
| 47 | What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors. |
Guangkai Xu; Yongtao Ge; Mingyu Liu; Chengxiang Fan; Kangyang Xie; Zhiyue Zhao; Hao Chen; Chunhua Shen; | code |
| 48 | MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluation suite designed to assess LVLMs across a wide range of multi-image tasks. |
Fanqing Meng; Jin Wang; Chuanhao Li; Quanfeng Lu; Hao Tian; Tianshuo Yang; Jiaqi Liao; Xizhou Zhu; Jifeng Dai; Yu Qiao; Ping Luo; Kaipeng Zhang; Wenqi Shao; | code |
| 49 | Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. |
Zhenting Qi; Mingyuan MA; Jiahang Xu; Li Lyna Zhang; Fan Yang; Mao Yang; | code |
| 50 | Improved Diffusion-based Generative Model with Better Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fortunately, this issue can be mitigated by AT as well. Based on these insights, we propose to conduct efficient AT on both DPM and CM. |
Zekun Wang; Mingyang Yi; Shuchen Xue; Zhenguo Li; Ming Liu; Bing Qin; Zhi-Ming Ma; | code |
| 51 | Energy-Based Diffusion Language Models for Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Energy-based Diffusion Language Model (EDLM), an energy-based model operating at the full sequence level for each diffusion step, introduced to improve the underlying approximation used by diffusion models. |
Minkai Xu; Tomas Geffner; Karsten Kreis; Weili Nie; Yilun Xu; Jure Leskovec; Stefano Ermon; Arash Vahdat; | code |
| 52 | Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Vision-RWKV (VRWKV), a model that builds upon the RWKV architecture from the NLP field with key modifications tailored specifically for vision tasks. |
Yuchen Duan; Weiyun Wang; Zhe Chen; Xizhou Zhu; Lewei Lu; Tong Lu; Yu Qiao; Hongsheng Li; Jifeng Dai; Wenhai Wang; | code |
| 53 | SOAP: Improving and Stabilizing Shampoo Using Adam for Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor — a memory-efficient approximation of Adam — showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo’s preconditioner. |
Nikhil Vyas; Depen Morwani; Rosie Zhao; Itai Shapira; David Brandfonbrener; Lucas Janson; Sham M. Kakade; | code |
| 54 | A Transfer Attack to Image Watermarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new transfer evasion attack to image watermark in the no-box setting. |
Yuepeng Hu; Zhengyuan Jiang; Moyang Guo; Neil Zhenqiang Gong; | code |
| 55 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guided—offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. |
Koichi Namekata; Sherwin Bahmani; Ziyi Wu; Yash Kant; Igor Gilitschenski; David B. Lindell; | code |
| 56 | MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities with multigranular annotations for more than 65 diseases. |
Yunfei Xie; Ce Zhou; Lang Gao; Juncheng Wu; Xianhang Li; Hong-Yu Zhou; Sheng Liu; Lei Xing; James Zou; Cihang Xie; Yuyin Zhou; | code |
| 57 | ThinkBot: Embodied Instruction Following with Thought Chain Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the contrary, we propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions, so that the agent can successfully complete human goals by following the coherent instruction. |
Guanxing Lu; Ziwei Wang; Changliu Liu; Jiwen Lu; Yansong Tang; | code |
| 58 | Consistency Models Made Easy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, as of 2024, training a state-of-the-art CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an effective scheme for training CMs that largely improves the efficiency of building such models. |
Zhengyang Geng; Ashwini Pokle; Weijian Luo; Justin Lin; J Zico Kolter; | code |
| 59 | Scaling FP8 Training to Trillion-token LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Interestingly, we show, both analytically and empirically, that this amplification happens only over prolonged training periods, and link it to a SwiGLU weight alignment process. To address this newly identified issue, we introduce Smooth-SwiGLU, a novel modification that ensures stable FP8 training without altering function behavior. |
Maxim Fishman; Brian Chmiel; Ron Banner; Daniel Soudry; | code |
| 60 | VideoPhy: Evaluating Physical Commonsense for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities (e.g. marbles will roll down when placed on a slanted surface). |
Hritik Bansal; Zongyu Lin; Tianyi Xie; Zeshun Zong; Michal Yarom; Yonatan Bitton; Chenfanfu Jiang; Yizhou Sun; Kai-Wei Chang; Aditya Grover; | code |
| 61 | TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a new vision and language assistant called TEOChat that can engage in conversations about temporal sequences of earth observation data.We publicly release our data, models, and code at https://github.com/ermongroup/TEOChat . |
Jeremy Andrew Irvin; Emily Ruoyu Liu; Joyce C. Chen; Ines Dormoy; Jinyoung Kim; Samar Khanna; Zhuo Zheng; Stefano Ermon; | code |
| 62 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Stable Video 4D (SV4D) — a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. |
Yiming Xie; Chun-Han Yao; Vikram Voleti; Huaizu Jiang; Varun Jampani; | code |
| 63 | Benchmarking Agentic Workflow Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. |
Shuofei Qiao; Runnan Fang; Zhisong Qiu; Xiaobin Wang; Ningyu Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen; | code |
| 64 | AI Sandbagging: Language Models Can Strategically Underperform on Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we assess sandbagging capabilities in contemporary language models (LMs). |
Teun van der Weij; Felix Hofstätter; Oliver Jaffe; Samuel F. Brown; Francis Rhys Ward; | code |
| 65 | DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech Without Domain-Specific Factors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce DiTTo-TTS, a Diffusion Transformer (DiT)-based TTS model, to investigate whether LDM-based TTS can achieve state-of-the-art performance without domain-specific factors. |
Keon Lee; Dong Won Kim; Jaehyeon Kim; Seungjun Chung; Jaewoong Cho; | code |
| 66 | A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work tackles the information loss bottleneck of vector-quantization (VQ) autoregressive image generation by introducing a novel model architecture called the 2-Dimensional Autoregression (DnD) Transformer. |
Liang Chen; Sinan Tan; Zefan Cai; Weichu Xie; Haozhe Zhao; Yichi Zhang; Junyang Lin; Jinze Bai; Tianyu Liu; Baobao Chang; | code |
| 67 | StringLLM: Understanding The String Processing Capability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our evaluations indicate that LLMs struggle with accurately processing strings compared to humans. To uncover the underlying reasons for this limitation, we conduct an in-depth analysis and subsequently propose an effective approach that significantly enhances LLMs’ string processing capability via fine-tuning. |
Xilong Wang; Hao Fu; Jindong Wang; Neil Zhenqiang Gong; | code |
| 68 | What’s The Move? Hybrid Imitation Learning Via Salient Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce **SPHINX**: **S**alient **P**oint-based **H**ybrid **I**mitatio**N** and e**X**ecution, a flexible IL policy that leverages multimodal observations (point clouds and wrist images), along with a hybrid action space of low-frequency, sparse waypoints and high-frequency, dense end effector movements. |
Priya Sundaresan; Hengyuan Hu; Quan Vuong; Jeannette Bohg; Dorsa Sadigh; | code |
| 69 | Implicit Search Via Discrete Diffusion: A Study on Chess Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DiffuSearch , a model that does \textit{implicit search} by looking into the future world via discrete diffusion modeling. |
Jiacheng Ye; Zhenyu Wu; Jiahui Gao; Zhiyong Wu; Xin Jiang; Zhenguo Li; Lingpeng Kong; | code |
| 70 | Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. |
Jiacheng Ye; Jiahui Gao; Shansan Gong; Lin Zheng; Xin Jiang; Zhenguo Li; Lingpeng Kong; | code |
| 71 | Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its key components include: 1) using the linear interpolating diffusion form of flow-matching, 2) employing $\boldsymbol v$-prediction, and 3) performing rectification (a.k.a. reflow). In this paper, we argue that the success of rectification primarily lies in using a pretrained diffusion model to obtain matched pairs of noise and samples, followed by retraining with these matched noise-sample pairs. |
Fu-Yun Wang; Ling Yang; Zhaoyang Huang; Mengdi Wang; Hongsheng Li; | code |
| 72 | Autoregressive Video Generation Without Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel approach that enables autoregressive video generation with high efficiency. |
Haoge Deng; Ting Pan; Haiwen Diao; Zhengxiong Luo; Yufeng Cui; Huchuan Lu; Shiguang Shan; Yonggang Qi; Xinlong Wang; | code |
| 73 | Booster: Tackling Harmful Fine-tuning for Large Language Models Via Attenuating Harmful Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to attenuate the negative impact of harmful perturbation, we propose an alignment-stage solution, dubbed Booster. |
Tiansheng Huang; Sihao Hu; Fatih Ilhan; Selim Furkan Tekin; Ling Liu; | code |
| 74 | Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. |
Guanting Dong; Keming Lu; Chengpeng Li; Tingyu Xia; Bowen Yu; Chang Zhou; Jingren Zhou; | code |
| 75 | LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified learning-based framework called LLMOPT to boost optimization generalization. |
Caigao JIANG; Xiang Shu; Hong Qian; Xingyu Lu; JUN ZHOU; Aimin Zhou; Yang Yu; | code |
| 76 | Interpreting Emergent Planning in Model-Free Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first mechanistic evidence that model-free reinforcement learning agents can learn to plan. |
Thomas Bush; Stephen Chung; Usman Anwar; Adrià Garriga-Alonso; David Krueger; | code |
| 77 | VL-ICL Bench: The Devil in The Details of Multimodal In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a comprehensive benchmark VL-ICL Bench for multimodal in-context learning, encompassing a broad spectrum of tasks that involve both images and text as inputs and outputs, and different types of challenges, from {perception to reasoning and long context length}. |
Yongshuo Zong; Ondrej Bohdal; Timothy Hospedales; | code |
| 78 | ChatQA 2: Bridging The Gap to Proprietary LLMs in Long Context and RAG Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ChatQA 2, an Llama 3.0-based model with a 128K context window, designed to bridge the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo-2024-04-09) in long context un- derstanding and retrieval-augmented generation (RAG) capabilities. |
Peng Xu; Wei Ping; Xianchao Wu; Chejian Xu; Zihan Liu; Mohammad Shoeybi; Bryan Catanzaro; | code |
| 79 | VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current RAG systems are solely based on text, rendering it impossible to utilize vision information like layout and images that play crucial roles in real-world multi-modality documents. In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline. |
Shi Yu; Chaoyue Tang; Bokai Xu; Junbo Cui; Junhao Ran; Yukun Yan; Zhenghao Liu; Shuo Wang; Xu Han; Zhiyuan Liu; Maosong Sun; | code |
| 80 | APE: Faster and Longer Context-Augmented Generation Via Adaptive Parallel Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable effective and efficient CAG, we propose Adaptive Parallel Encoding (**APE**), which brings shared prefix, attention temperature, and scaling factor to align the distribution of parallel encoding with sequential encoding. |
Xinyu Yang; Tianqi Chen; Beidi Chen; | code |
| 81 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we present WorkflowLLM, a data-centric framework elaborately designed to enhance the capability of LLMs in workflow orchestration.Specifically, the construction process can be divided into three phases: (1) Data Collection: we collect real-world workflow data from Apple Shortcuts and RoutineHub, transcribing them into Python-style code. |
Shengda Fan; Xin Cong; Yuepeng Fu; Zhong Zhang; Shuyan Zhang; Yuanwei Liu; Yesai Wu; Yankai Lin; Zhiyuan Liu; Maosong Sun; | code |
| 82 | Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Group Position Embedding (GPE), a novel and efficient technique to enhance the layout understanding capabilities of LLMs without architectural changes or additional pre-training.We also introduce a challenging benchmark called BLADE, specifically designed to assess layout comprehension. |
Yuke Zhu; Yue Zhang; Dongdong Liu; Chi Xie; Zihua Xiong; Bo Zheng; Sheng Guo; | code |
| 83 | GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We believe our work provides valuable insights for future research in dynamic GUI content understanding.To this end, this paper introduces a new dataset, termed GUI-World, which features meticulously crafted Human-MLLM annotations, extensively covering six GUI scenarios and eight types of GUI-oriented questions in three formats. |
Dongping Chen; Yue Huang; Siyuan Wu; Jingyu Tang; Huichi Zhou; Qihui Zhang; Zhigang He; Yilin Bai; Chujie Gao; Liuyi Chen; Yiqiang Li; Chenlong Wang; Yue Yu; Tianshuo Zhou; Zhen Li; Yi Gui; Yao Wan; Pan Zhou; Jianfeng Gao; Lichao Sun; | code |
| 84 | Fugatto 1: Foundational Generative Audio Transformer Opus 1 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is because audio data does not inherently contain the instructions that were used to generate it. To overcome this challenge, we introduce a specialized dataset generation approach optimized for producing a wide range of audio generation and transformation tasks, ensuring the data reveals meaningful relationships between audio and language. |
Rafael Valle; Rohan Badlani; Zhifeng Kong; Sang-gil Lee; Arushi Goel; Sungwon Kim; Joao Felipe Santos; Shuqi Dai; Siddharth Gururani; Aya Aljafari; Alexander H. Liu; Kevin J. Shih; Ryan Prenger; Wei Ping; Chao-Han Huck Yang; Bryan Catanzaro; | code |
| 85 | The Superposition of Diffusion Models Using The Itô Density Estimator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. |
Marta Skreta; Lazar Atanackovic; Joey Bose; Alexander Tong; Kirill Neklyudov; | code |
| 86 | Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models Via Energy Hessians Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a method for transferring general-purpose representations from MLFF foundation models to smaller, faster MLFFs specialized to specific regions of chemical space. |
Ishan Amin; Sanjeev Raja; Aditi S. Krishnapriyan; | code |
| 87 | McEval: Massively Multilingual Code Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further facilitate the research of code LLMs, we propose a massively multilingual code benchmark covering 40 programming languages (McEval) with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. |
Linzheng Chai; Shukai Liu; Jian Yang; Yuwei Yin; JinKe; Jiaheng Liu; Tao Sun; Ge Zhang; Changyu Ren; Hongcheng Guo; Noah Wang; Boyang Wang; Xianjie Wu; Bing Wang; Tongliang Li; Liqun Yang; Sufeng Duan; Zhaoxiang Zhang; Zhoujun Li; | code |
| 88 | Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. |
Yao Teng; Han Shi; Xian Liu; Xuefei Ning; Guohao Dai; Yu Wang; Zhenguo Li; Xihui Liu; | code |
| 89 | Dynamic-LLaVA: Efficient Multimodal Large Language Models Via Dynamic Vision-language Context Sparsification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the efficiency benefits of the vision context reduction in the prefill stage gradually diminish during the decoding stage. To address this problem, we proposed a dynamic vision-language context sparsification framework Dynamic-LLaVA, which dynamically reduces the redundancy of vision context in the prefill stage and decreases the memory and computation overhead of the generated language context during decoding. |
Wenxuan Huang; Zijie Zhai; Yunhang Shen; Shaosheng Cao; Fei Zhao; Xiangfeng Xu; Zheyu Ye; Shaohui Lin; | code |
| 90 | Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals such as movement regions to stabilize movements, which compromise the naturalness and freedom of motion. To address this issue, we propose an end-to-end audio-only conditioned video diffusion model named Loopy. |
Jianwen Jiang; Chao Liang; Jiaqi Yang; Gaojie Lin; Tianyun Zhong; Yanbo Zheng; | code |
| 91 | Scaling Large Language Model-based Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the neural scaling law—increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. |
Chen Qian; Zihao Xie; YiFei Wang; Wei Liu; Kunlun Zhu; Hanchen Xia; Yufan Dang; Zhuoyun Du; Weize Chen; Cheng Yang; Zhiyuan Liu; Maosong Sun; | code |
| 92 | SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike existing methods focused on multi-view generation of single objects for 4D reconstruction, our interest lies in generating open-world videos from arbitrary viewpoints, incorporating six degrees of freedom (6 DoF) camera poses. To achieve this, we propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation, ensuring consistent content across different viewpoints. |
Jianhong Bai; Menghan Xia; Xintao Wang; Ziyang Yuan; Zuozhu Liu; Haoji Hu; Pengfei Wan; Di ZHANG; | code |
| 93 | Perturbation-Restrained Sequential Model Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, a framework termed Perturbation Restraint on Upper bouNd for Editing (PRUNE) is proposed, which applies the condition number restraints in sequential editing. |
Jun-Yu Ma; Hong Wang; Hao-Xiang Xu; Zhen-Hua Ling; Jia-Chen Gu; | code |
| 94 | LongPO: Long Context Self-Evolution of Large Language Models Through Short-to-Long Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This alignment process remains challenging due to the impracticality of human annotation for extended contexts and the difficulty in balancing short- and long-context performance. To address these challenges, we introduce LongPO, that enables short-context LLMs to self-evolve to excel on long-context tasks by internally transferring short-context capabilities. |
Guanzheng Chen; Xin Li; Michael Shieh; Lidong Bing; | code |
| 95 | API Pack: A Massive Multi-Programming Language Dataset for API Call Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce API Pack, a massive multi-programming language dataset containing over one million instruction-API calls for improving the API call generation capabilities of large language models. |
Zhen Guo; Adriana Meza Soria; Wei Sun; Yikang Shen; Rameswar Panda; | code |
| 96 | Autoregressive Pretraining with Mamba in Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper shows that Mamba’s visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. |
Sucheng Ren; Xianhang Li; Haoqin Tu; Feng Wang; Fangxun Shu; Lei Zhang; Jieru Mei; Linjie Yang; Peng Wang; Heng Wang; Alan Yuille; Cihang Xie; | code |
| 97 | Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Fiddler, a resource-efficient inference system for MoE models with limited GPU resources. |
Keisuke Kamahori; Tian Tang; Yile Gu; Kan Zhu; Baris Kasikci; | code |
| 98 | AdaIR: Adaptive All-in-One Image Restoration Via Frequency Mining and Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most methods purely operate in the spatial domain and do not delve into the distinct frequency variations inherent to different degradation types. To address this gap, we propose an adaptive all-in-one image restoration network based on frequency mining and modulation. |
Yuning Cui; Syed Waqas Zamir; Salman Khan; Alois Knoll; Mubarak Shah; Fahad Shahbaz Khan; | code |
| 99 | WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domain: 1) extreme compression. |
Shengpeng Ji; Ziyue Jiang; Wen Wang; Yifu Chen; Minghui Fang; Jialong Zuo; Qian Yang; Xize Cheng; Zehan Wang; Ruiqi Li; Ziang Zhang; Xiaoda Yang; Rongjie Huang; Yidi Jiang; Qian Chen; Siqi Zheng; Zhou Zhao; | code |
| 100 | Permute-and-Flip: An Optimally Stable and Watermarkable Decoder for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. |
Xuandong Zhao; Lei Li; Yu-Xiang Wang; | code |
| 101 | HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, HiSplat, which introduces a hierarchical manner in generalizable 3D Gaussian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy. |
Shengji Tang; Weicai Ye; Peng Ye; Weihao Lin; Yang Zhou; Tao Chen; Wanli Ouyang; | code |
| 102 | Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without mitigating such gaps, diffusion for perception still struggles on tasks represented by multi-modal understanding (e.g., referring image segmentation). Motivated by these challenges, we analyze and improve the alignment between the generative diffusion process and perception objectives centering around the key observation: how perception quality evolves with the denoising process. |
Ziqi Pang; Xin Xu; Yu-Xiong Wang; | code |
| 103 | FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a comprehensive benchmark for fairness of LLMs in multi-turn scenarios, **FairMT-Bench**.Based on these findings, we develop a more challenging dataset, FairMT-1K, and test 15 current state-of-the-art (SOTA) LLMs on this dataset. |
Zhiting Fan; Ruizhe Chen; Tianxiang Hu; Zuozhu Liu; | code |
| 104 | Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing benchmarks often rely on extensive human annotation or handcrafted templates, making it difficult to achieve the necessary complexity, scalability, and diversity for robust evaluation. To address these limitations, we propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models (LLMs) with the rigor and precision of symbolic provers, enabling the creation of a scalable, diverse, and high-quality FOL reasoning dataset, ProverQA. |
Chengwen Qi; Ren Ma; Bowen Li; He Du; Binyuan Hui; Jinwang Wu; Yuanjun Laili; Conghui He; | code |
| 105 | CREAM: Consistency Regularized Self-Rewarding Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then introduce the regularization to this generalized framework to mitigate the overconfident preference labeling in the self-rewarding process. Based on this theoretical insight, we propose a Consistency Regularized sElf-rewarding lAnguage Model (CREAM) that leverages the consistency of rewards across different iterations to regularize the self-rewarding training, helping the model to learn from more reliable preference data. |
Zhaoyang Wang; Weilei He; Zhiyuan Liang; Xuchao Zhang; Chetan Bansal; Ying Wei; Weitong Zhang; Huaxiu Yao; | code |
| 106 | FaithEval: Can Your Language Model Stay Faithful to Context, Even If The Moon Is Made of Marshmallows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce FaithEval, a novel and comprehensive benchmark tailored to evaluate the faithfulness of LLMs in contextual scenarios across three diverse tasks: unanswerable, inconsistent, and counterfactual contexts. |
Yifei Ming; Senthil Purushwalkam; Shrey Pandit; Zixuan Ke; Xuan-Phi Nguyen; Caiming Xiong; Shafiq Joty; | code |
| 107 | Dense Video Object Captioning from Disjoint Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new task and model for dense video object captioning — detecting, tracking and captioning trajectories of objects in a video. |
Xingyi Zhou; Anurag Arnab; Chen Sun; Cordelia Schmid; | code |
| 108 | Intelligent Go-Explore: Standing on The Shoulders of Giant Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration (i.e., determine which states to save and explore from, and what actions to consider next), which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these handcrafted heuristics with the intelligence and internalized human notions of interestingness captured by giant pretrained foundation models (FMs). |
Cong Lu; Shengran Hu; Jeff Clune; | code |
| 109 | CycleResearcher: Improving Automated Research Via Automated Review Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our iterative preference training framework consists of CycleResearcher, which conducts research tasks, and CycleReviewer, which simulates the peer review process, providing iterative feedback via reinforcement learning. To train these models, we develop two new datasets, Review-5k and Research-14k, reflecting real-world machine learning research and peer review dynamics. |
Yixuan Weng; Minjun Zhu; Guangsheng Bao; Hongbo Zhang; Jindong Wang; Yue Zhang; Linyi Yang; | code |
| 110 | Controlling Space and Time with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 4DiM, a cascaded diffusion model for 4D novel view synthesis (NVS), supporting generation with arbitrary camera trajectories and timestamps, in natural scenes, conditioned on one or more images. |
Daniel Watson; Saurabh Saxena; Lala Li; Andrea Tagliasacchi; David J. Fleet; | code |
| 111 | RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often overlook the need for repository-level code understanding, which is crucial for accurately grasping the broader context and developing effective solutions. On this basis, we present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. |
Siru Ouyang; Wenhao Yu; Kaixin Ma; Zilin Xiao; Zhihan Zhang; Mengzhao Jia; Jiawei Han; Hongming Zhang; Dong Yu; | code |
| 112 | ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop ProtComposer to generate protein structures conditioned on spatial protein layouts that are specified via a set of 3D ellipsoids capturing substructure shapes and semantics. |
Hannes Stark; Bowen Jing; Tomas Geffner; Jason Yim; Tommi Jaakkola; Arash Vahdat; Karsten Kreis; | code |
| 113 | IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose IRIS, a neuro-symbolic approach that systematically combines LLMs with static analysis to perform whole-repository reasoning for security vulnerability detection. |
Ziyang Li; Saikat Dutta; Mayur Naik; | code |
| 114 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we evaluate and enhance the 3D awareness of ViT-based models. |
Yang You; Yixin Li; Congyue Deng; Yue Wang; Leonidas Guibas; | code |
| 115 | Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Principled solutions to reward hacking have been impeded by the lack of a good definition for the problem. To address this gap, we introduce a definition of reward hacking based on the correlation between proxy and true rewards for states and actions seen by a “reference policy” that breaks down under optimization. |
Cassidy Laidlaw; Shivam Singhal; Anca Dragan; | code |
| 116 | InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while existing methods can add watermarks or steganographic information to individual 3D assets, they often require time-consuming per-scene training and optimization, leading to watermarking overheads that can far exceed the time required for asset generation itself, making deployment impractical for generating large collections of 3D objects. To address this, we propose InstantSplamp a framework that seamlessly integrates the 3D steganography pipeline into large 3D generative models without introducing explicit additional time costs. |
Chenxin Li; Hengyu Liu; Zhiwen Fan; Wuyang Li; Yifan Liu; Panwang Pan; Yixuan Yuan; | code |
| 117 | SyllableLM: Learning Coarse Semantic Units for Speech Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a controllable self-supervised technique to merge speech representations into coarser syllable-like units while still preserving semantic information. |
Alan Baade; Puyuan Peng; David Harwath; | code |
| 118 | STBLLM: Breaking The 1-Bit Barrier with Structured Binary LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first structural binarization method for LLM compression to less than 1-bit precision. |
Peijie Dong; Lujun Li; Yuedong Zhong; DaYou Du; Ruibo FAN; Yuhan Chen; Zhenheng Tang; Qiang Wang; Wei Xue; Yike Guo; Xiaowen Chu; | code |
| 119 | Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Lumina-T2X family — a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a simple and scalable generative framework that can be adapted to various modalities, e.g., transforming noise into images, videos, multi-view 3D objects, or audio clips conditioned on text instructions. |
Peng Gao; Le Zhuo; Dongyang Liu; Ruoyi Du; Xu Luo; Longtian Qiu; Yuhang Zhang; Rongjie Huang; Shijie Geng; Renrui Zhang; Junlin Xie; Wenqi Shao; Zhengkai Jiang; Tianshuo Yang; Weicai Ye; Tong He; Jingwen He; Junjun He; Yu Qiao; Hongsheng Li; | code |
| 120 | Rethinking Invariance in In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify two crucial elements in the design of an invariant ICL algorithm: information non-leakage and context interdependence, which are not simultaneously achieved by any of the existing methods. |
Lizhe Fang; Yifei Wang; Khashayar Gatmiry; Lei Fang; Yisen Wang; | code |
| 121 | What Is Wrong with Perplexity for Long-context Language Modeling? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. |
Lizhe Fang; Yifei Wang; Zhaoyang Liu; Chenheng Zhang; Stefanie Jegelka; Jinyang Gao; Bolin Ding; Yisen Wang; | code |
| 122 | Long-Sequence Recommendation Models Need Decoupled Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Initial attempts to address this issue with some common methods (e.g., linear projections—a technique borrowed from language processing) proved ineffective, shedding light on the unique challenges of recommendation models. To overcome this, we propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are initialized and learned separately to fully decouple attention and representation. |
Ningya Feng; Junwei Pan; Jialong Wu; Baixu Chen; Ximei Wang; QianLi; Xian Hu; Jie Jiang; Mingsheng Long; | code |
| 123 | FlashRNN: I/O-Aware Optimization of Traditional RNNs on Modern Hardware Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable flexibility on different GPU variants, we introduce a new optimization framework for hardware-internal cache sizes, memory and compute handling. |
Korbinian Pöppel; Maximilian Beck; Sepp Hochreiter; | code |
| 124 | Stealthy Shield Defense: A Conditional Mutual Information-Based Post-Processing Against Black-Box Model Inversion Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The latest black-box attacks have outperformed the state-of-the-art white-box attacks, and existing defenses cannot resist them effectively. To fill this gap, we propose Stealthy Shield Defense (SSD), a post-processing algorithm against black-box MIAs. |
Tianqu Zhuang; Hongyao Yu; Yixiang Qiu; Hao Fang; Bin Chen; Shu-Tao Xia; | code |
| 125 | SVDQuant: Absorbing Outliers By Low-Rank Component for 4-Bit Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to accelerate diffusion models by quantizing their weights and activations to 4 bits. |
Muyang Li; Yujun Lin; Zhekai Zhang; Tianle Cai; Xiuyu Li; Junxian Guo; Enze Xie; Chenlin Meng; Jun-Yan Zhu; Song Han; | code |
| 126 | Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on two research questions: (1) Can LLMs generate reliable preferences among wrong options? And if so, (2) Would alignment with such wrong-over-wrong preferences be helpful? We employ methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences, and fine-tune language models with preference optimization approaches using these synthesized preferences. |
Jihan Yao; Wenxuan Ding; Shangbin Feng; Lucy Lu Wang; Yulia Tsvetkov; | code |
| 127 | Hierarchical World Models As Visual Whole-Body Humanoid Controllers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. |
Nicklas Hansen; Jyothir S V; Vlad Sobal; Yann LeCun; Xiaolong Wang; Hao Su; | code |
| 128 | Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches treat predictive learning as a conditional generation problem, but often fail to fully exploit the temporal dynamics inherent in the data, leading to challenges in generating temporally coherent sequences. To address this, we introduce Dynamical Diffusion (DyDiff), a theoretically sound framework that incorporates temporally aware forward and reverse processes. |
Xingzhuo Guo; Yu Zhang; Baixu Chen; Haoran Xu; Jianmin Wang; Mingsheng Long; | code |
| 129 | Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 27 different types of attack/defense methods, and 7 evaluation metrics. |
Hanrong Zhang; Jingyuan Huang; Kai Mei; Yifei Yao; Zhenting Wang; Chenlu Zhan; Hongwei Wang; Yongfeng Zhang; | code |
| 130 | Unveiling The Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources to effectively explore the experiment space. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. |
Aldo Pareja; Nikhil Shivakumar Nayak; Hao Wang; Krishnateja Killamsetty; Shivchander Sudalairaj; Wenlong Zhao; Seungwook Han; Abhishek Bhandwaldar; Guangxuan Xu; Kai Xu; Ligong Han; Luke Inglis; Akash Srivastava; | code |
| 131 | 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to manipulate multi-entity 3D motions in video generation.To address the lack of suitable training data, we construct a 360-Motion Dataset, which first correlates collected 3D human and animal assets with GPT-generated trajectory and then captures their motion with 12 evenly-surround cameras on diverse 3D UE platforms. |
Xiao FU; Xian Liu; Xintao Wang; Sida Peng; Menghan Xia; Xiaoyu Shi; Ziyang Yuan; Pengfei Wan; Di ZHANG; Dahua Lin; | code |
| 132 | Is Large-scale Pretraining The Secret to Good Domain Generalization? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior studies have shown that perceptual similarity to pre-training data correlates with zero-shot performance, but we find the effect limited in the DG setting. |
Piotr Teterwak; Kuniaki Saito; Theodoros Tsiligkaridis; Bryan A. Plummer; Kate Saenko; | code |
| 133 | HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Hybrid Autoregressive Transformer (HART), the first autoregressive (AR) visual generation model capable of directly generating 1024×1024 images, rivaling diffusion models in image generation quality. |
Haotian Tang; Yecheng Wu; Shang Yang; Enze Xie; Junsong Chen; Junyu Chen; Zhuoyang Zhang; Han Cai; Yao Lu; Song Han; | code |
| 134 | Sufficient Context: A New Lens on Retrieval Augmented Generation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. |
Hailey Joren; Jianyi Zhang; Chun-Sung Ferng; Da-Cheng Juan; Ankur Taly; Cyrus Rashtchian; | code |
| 135 | GameGen-X: Interactive Open-world Game Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GameGen-$\mathbb{X}$, the first diffusion transformer model specifically designed for both generating and interactively controlling open-world game videos.To realize this vision, we first collected and built an Open-World Video Game Dataset (OGameData) from scratch. |
Haoxuan Che; Xuanhua He; Quande Liu; Cheng Jin; Hao Chen; | code |
| 136 | PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the shortfall, we introduce PhysAgent, a novel framework that combines the generalization strengths of VLMs with the specialized expertise of vision models, significantly enhancing VLMs’ physical understanding across a variety of tasks, including an 18.4\% improvement on GPT-4o. |
Wei Chow; Jiageng Mao; Boyi Li; Daniel Seita; Vitor Campagnolo Guizilini; Yue Wang; | code |
| 137 | HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between small and large models, we propose **HarmAug**, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. |
Seanie Lee; Haebin Seong; Dong Bok Lee; Minki Kang; Xiaoyin Chen; Dominik Wagner; Yoshua Bengio; Juho Lee; Sung Ju Hwang; | code |
| 138 | Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This often obscures poor performance on complex or uncommon linguistic patterns by the focus on frequent word combinations. To address these deficiencies, we propose a novel metric called SemVarEffect and a benchmark named SemVarBench, designed to evaluate the causality between semantic variations in inputs and outputs in T2I synthesis. |
Xiangru Zhu; Penglei Sun; Yaoxian Song; Yanghua Xiao; Zhixu Li; Chengyu Wang; Jun Huang; Bei Yang; Xiaoxiao Xu; | code |
| 139 | Spatial-Mamba: Effective Visual State Space Models Via Structure-Aware State Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods are limited in effectively capturing the complex image spatial structures and the increased computational cost caused by the lengthened scanning paths. To address these limitations, we propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space. |
Chaodong Xiao; Minghan Li; Zhengqiang ZHANG; Deyu Meng; Lei Zhang; | code |
| 140 | Iterative Label Refinement Matters More Than Preference Optimization Under Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that in the presence of unreliable supervision, SFT still retains some effectiveness, but DPO (a common RLHF algorithm) fails to improve the model beyond SFT. To address this, we propose *iterative label refinement* (ILR) as an alternative to RLHF. |
Yaowen Ye; Cassidy Laidlaw; Jacob Steinhardt; | code |
| 141 | PWM: Policy Learning with Multi-Task World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Policy learning with multi-task World Models (PWM), a novel model-based RL algorithm for continuous control. |
Ignat Georgiev; Varun Giridhar; Nicklas Hansen; Animesh Garg; | code |
| 142 | D-FINE: Redefine Regression Task of DETRs As Fine-grained Distribution Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. |
Yansong Peng; Hebei Li; Peixi Wu; Yueyi Zhang; Xiaoyan Sun; Feng Wu; | code |
| 143 | COMBO: Compositional World Models for Embodied Multi-Agent Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only egocentric views of the world. |
Hongxin Zhang; Zeyuan Wang; Qiushi Lyu; Zheyuan Zhang; Sunli Chen; Tianmin Shu; Behzad Dariush; Kwonjoon Lee; Yilun Du; Chuang Gan; | code |
| 144 | From Attention to Activation: Unraveling The Enigmas of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that popular large language models, such as Llama attend maximally to the first token in 98% of attention heads, a behaviour we attribute to the softmax function. To mitigate this issue, we propose a reformulation of softmax to softmax-1. |
Prannay Kaul; Chengcheng Ma; Ismail Elezi; Jiankang Deng; | code |
| 145 | Timer-XL: Long-Context Transformers for Unified Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Timer-XL, a causal Transformer for unified time series forecasting. |
Yong Liu; Guo Qin; Xiangdong Huang; Jianmin Wang; Mingsheng Long; | code |
| 146 | An Exploration with Entropy Constrained 3D Gaussians for 2D Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on TSW, we introduce an end-to-end trainable video compression method, GSVC, which employs deformable Gaussian representation and optical flow guidance to capture dynamic content in videos. |
Xiang Liu; Bin Chen; Zimo Liu; Yaowei Wang; Shu-Tao Xia; | code |
| 147 | Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To traverse the divide, we propose **Glimpse**, a probability distribution estimation approach, predicting the full distributions from partial observations.We release our code and data at https://github.com/baoguangsheng/glimpse. |
Guangsheng Bao; Yanbin Zhao; Juncai He; Yue Zhang; | code |
| 148 | MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce $\textbf{Mask}$ed $\textbf{G}$enerative $\textbf{C}$odec $\textbf{T}$ransformer (MaskGCT), a fully non-autoregressive TTS model that eliminates the need for explicit alignment information between text and speech supervision, as well as phone-level duration prediction. |
Yuancheng Wang; Haoyue Zhan; Liwei Liu; Ruihong Zeng; Haotian Guo; Jiachen Zheng; Qiang Zhang; Xueyao Zhang; Shunsi Zhang; Zhizheng Wu; | code |
| 149 | Halton Scheduler for Masked Generative Image Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, MaskGIT’s token unmasking scheduler, an essential component of the framework, has not received the attention it deserves. We analyze the sampling objective in MaskGIT, based on the mutual information between tokens, and elucidate its shortcomings. |
Victor Besnier; Mickael Chen; David Hurych; Eduardo Valle; Matthieu Cord; | code |
| 150 | 4K4DGen: Panoramic 4D Generation at 4K Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the challenging task of elevating a single panorama to an immersive 4D experience. |
Renjie Li; Panwang Pan; Bangbang Yang; Dejia Xu; Shijie Zhou; Xuanyang Zhang; Zeming Li; Achuta Kadambi; Zhangyang Wang; Zhengzhong Tu; Zhiwen Fan; | code |
| 151 | IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. |
Zhibing Li; Tong Wu; Jing Tan; Mengchen Zhang; Jiaqi Wang; Dahua Lin; | code |
| 152 | MMDT: Decoding The Trustworthiness and Safety of Multimodal Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first unified platform, MMDT (Multimodal DecodingTrust), designed to provide a comprehensive safety and trustworthiness evaluation for MMFMs. |
Chejian Xu; Jiawei Zhang; Zhaorun Chen; Chulin Xie; Mintong Kang; Yujin Potter; Zhun Wang; Zhuowen Yuan; Alexander Xiong; Zidi Xiong; Chenhui Zhang; Lingzhi Yuan; Yi Zeng; Peiyang Xu; Chengquan Guo; Andy Zhou; Jeffrey Ziwei Tan; Xuandong Zhao; Francesco Pinto; Zhen Xiang; Yu Gai; Zinan Lin; Dan Hendrycks; Bo Li; Dawn Song; | code |
| 153 | Mini-Monkey: Alleviating The Semantic Sawtooth Effect for Lightweight MLLMs Via Complementary Image Pyramid Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This effect is particularly evident in lightweight MLLMs. To address this issue, we introduce a Complementary Image Pyramid (CIP), a simple, effective, and plug-and-play solution designed to mitigate semantic discontinuity during high-resolution image processing. |
Mingxin Huang; Yuliang Liu; Dingkang Liang; Lianwen Jin; Xiang Bai; | code |
| 154 | Regressing The Relative Future: Efficient Policy Optimization for Multi-turn RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce REgressing the RELative FUture (REFUEL), an efficient policy optimization approach designed to address multi-turn RLHF in LLMs. |
Zhaolin Gao; Wenhao Zhan; Jonathan Daniel Chang; Gokul Swamy; Kianté Brantley; Jason D. Lee; Wen Sun; | code |
| 155 | MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Human instance matting aims to estimate an alpha matte for each human instance in an image, which is challenging as it easily fails in complex cases requiring disentangling mingled pixels belonging to multiple instances along hairy and thin boundary structures. In this work, we address this by introducing MP-Mat, a novel 3D-and-instance-aware matting framework with multiplane representation, where the multiplane concept is designed from two different perspectives: scene geometry level and instance level. |
Siyi Jiao; Wenzheng Zeng; Yerong Li; Huayu Zhang; Changxin Gao; Nong Sang; Mike Zheng Shou; | code |
| 156 | ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing code generation benchmarks fail to capture the diverse feedback encountered in multi-turn interactions, limiting our ability to evaluate LLMs in these contexts. To address this gap, we present a set of novel benchmarks that explicitly model the quality of feedback provided to code generation LLMs. |
Hojae Han; seung-won hwang; Rajhans Samdani; Yuxiong He; | code |
| 157 | Measuring and Enhancing Trustworthiness of LLMs in RAG Through Grounded Attributions and Learning to Refuse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose Trust-Align, a method to align LLMs for improved Trust-Score performance. |
Maojia Song; Shang Hong Sim; Rishabh Bhardwaj; Hai Leong Chieu; Navonil Majumder; Soujanya Poria; | code |
| 158 | Personality Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aligning large language models (LLMs) typically aim to reflect general human values and behaviors, but they often fail to capture the unique characteristics and preferences of individual users. To address this gap, we introduce the concept of Personality Alignment. |
Minjun Zhu; Yixuan Weng; Linyi Yang; Yue Zhang; | code |
| 159 | BLEND: Behavior-guided Neural Population Dynamics Modeling Via Privileged Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose **BLEND**, the **B**ehavior-guided neura**L** population dynamics mod**E**lling framework via privileged k**N**owledge **D**istillation. |
Zhengrui Guo; Fangxu Zhou; Wei Wu; Qichen Sun; Lishuang Feng; Jinzhuo Wang; Hao Chen; | code |
| 160 | Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: State-of-the-art text-to-image (T2I) diffusion models often struggle to generate rare compositions of concepts, e.g., objects with unusual attributes. In this paper, we show that the compositional generation power of diffusion models on such rare concepts can be significantly enhanced by the Large Language Model (LLM) guidance. |
Dongmin Park; Sebin Kim; Taehong Moon; Minkyu Kim; Kangwook Lee; Jaewoong Cho; | code |
| 161 | Palu: KV-Cache Compression with Low-Rank Projection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a hidden dimension compression approach called Palu, a KV-Cache compression framework that utilizes low-rank projection to reduce inference-time LLM memory usage. |
Chi-Chih Chang; Wei-Cheng Lin; Chien-Yu Lin; Chong-Yan Chen; Yu-Fang Hu; Pei-Shuo Wang; Ning-Chi Huang; Luis Ceze; Mohamed S. Abdelfattah; Kai-Chiang Wu; | code |
| 162 | IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current libraries for structured LLM generation rely on left-to-right decoding without support for backtracking, limiting the ability to correct or refine outputs mid-generation. To address this, we introduce IterGen, a user-friendly library for iterative, grammar-guided LLM generation that enables users to move both forward and backward within the generated output based on grammar symbols. |
Shubham Ugare; Rohan Gumaste; Tarun Suresh; Gagandeep Singh; Sasa Misailovic; | code |
| 163 | NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. |
Jaden Fried Fiotto-Kaufman; Alexander Russell Loftus; Eric Todd; Jannik Brinkmann; Koyena Pal; Dmitrii Troitskii; Michael Ripa; Adam Belfki; Can Rager; Caden Juang; Aaron Mueller; Samuel Marks; Arnab Sen Sharma; Francesca Lucchetti; Nikhil Prakash; Carla E. Brodley; Arjun Guha; Jonathan Bell; Byron C Wallace; David Bau; | code |
| 164 | HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we posit that *hierarchical* vision-language-action (VLA) models can be more effective in utilizing off-domain data than standard monolithic VLA models that directly finetune vision-language models (VLMs) to predict actions. |
Yi Li; Yuquan Deng; Jesse Zhang; Joel Jang; Marius Memmel; Caelan Reed Garrett; Fabio Ramos; Dieter Fox; Anqi Li; Abhishek Gupta; Ankit Goyal; | code |
| 165 | Beyond Correlation: The Impact of Human Uncertainty in Measuring The Effectiveness of Automatic Evaluation and LLM-as-a-judge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show how *relying on a single aggregate correlation score* can obscure fundamental differences between human labels and those from automatic evaluation, including LLM-as-a-Judge.Based on these findings, we first propose *stratifying data by human label uncertainty* to provide a more robust analysis of automatic evaluation performance. |
Aparna Elangovan; Lei Xu; Jongwoo Ko; Mahsa Elyasi; Ling Liu; Sravan Babu Bodapati; Dan Roth; | code |
| 166 | Probabilistic Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Probabilistic Language-Image Pre-training (ProLIP), the first probabilistic VLM pre-trained on a billion-scale image-text dataset using only probabilistic objectives, achieving a strong zero-shot capability (e.g., 74.6% ImageNet zero-shot accuracy with ViT-B/16). |
Sanghyuk Chun; Wonjae Kim; Song Park; Sangdoo Yun; | code |
| 167 | Multiple Heads Are Better Than One: Mixture of Modality Knowledge Experts for Entity Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies, yet they over- look the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel framework with Mixture of Modality Knowledge experts (MOMOK for short) to learn adaptive multi-modal entity representations for better MMKGC. |
Yichi Zhang; Zhuo Chen; Lingbing Guo; yajing Xu; Binbin Hu; Ziqi Liu; Wen Zhang; Huajun Chen; | code |
| 168 | LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LongMemEval, a comprehensive benchmark designed to evaluate five core long-term memory abilities of chat assistants: information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and abstention. |
Di Wu; Hongwei Wang; Wenhao Yu; Yuwei Zhang; Kai-Wei Chang; Dong Yu; | code |
| 169 | Data Selection Via Optimal Control for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate data selection as a generalized Optimal Control problem, which can be solved theoretically by Pontryagin’s Maximum Principle (PMP), yielding a set of necessary conditions that characterize the relationship between optimal data selection and LM training dynamics. Based on these theoretical results, we introduce **P**MP-based **D**ata **S**election (**PDS**), a framework that approximates optimal data selection by solving the PMP conditions. |
Yuxian Gu; Li Dong; Hongning Wang; Yaru Hao; Qingxiu Dong; Furu Wei; Minlie Huang; | code |
| 170 | MiniPLM: Knowledge Distillation for Pre-training Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **MiniPLM**, a KD framework for pre-training LMs by refining the training data distribution with the teacher LM’s knowledge. |
Yuxian Gu; Hao Zhou; Fandong Meng; Jie Zhou; Minlie Huang; | code |
| 171 | CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CURIE, a scientific long-Context Understanding, Reasoning, and Information Extraction benchmark to measure the potential of Large Language Models (LLMs) in scientific problem-solving and assisting scientists in realistic workflows. |
Hao Cui; Zahra Shamsi; Gowoon Cheon; Xuejian Ma; Shutong Li; Maria Tikhanovskaya; Peter Christian Norgaard; Nayantara Mudur; Martyna Beata Plomecka; Paul Raccuglia; Yasaman Bahri; Victor V. Albert; Pranesh Srinivasan; Haining Pan; Philippe Faist; Brian A Rohr; Michael J. Statt; Dan Morris; Drew Purves; Elise Kleeman; Ruth Alcantara; Matthew Abraham; Muqthar Mohammad; Ean Phing VanLee; Chenfei Jiang; Elizabeth Dorfman; Eun-Ah Kim; Michael Brenner; Sameera S Ponda; Subhashini Venugopalan; | code |
| 172 | Fine-Grained Verifiers: Preference Modeling As Next-token Prediction in Vision-Language Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FiSAO (Fine-Grained Self-Alignment Optimization), a novel self-alignment method that utilizes the model’s own visual encoder as a fine-grained verifier to improve vision-language alignment without the need for additional data. |
Chenhang Cui; An Zhang; Yiyang Zhou; Zhaorun Chen; Gelei Deng; Huaxiu Yao; Tat-Seng Chua; | code |
| 173 | Mixture Compressor for Mixture-of-Experts LLMs Gains More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by these issues, we investigate the MoE-LLMs and make two key observations: a) different experts exhibit varying behaviors on activation reconstruction error, routing scores, and activated frequencies, highlighting their differing importance, and b) not all tokens are equally important– only a small subset is critical. Building on these insights, we propose MC, a training-free Mixture-Compressor for MoE-LLMs, which leverages the significance of both experts and tokens to achieve an extreme compression. |
Wei Huang; Yue Liao; Jianhui Liu; Ruifei He; Haoru Tan; Shiming Zhang; Hongsheng Li; Si Liu; XIAOJUAN QI; | code |
| 174 | An Undetectable Watermark for Generative Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first undetectable watermarking scheme for generative image models. |
Sam Gunn; Xuandong Zhao; Dawn Song; | code |
| 175 | SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent reflection-based methods aim to address these issues by enabling self-reflection and self-correction, but they still face challenges in independently detecting errors in their reasoning steps. To overcome these limitations, we propose SuperCorrect, a novel two-stage framework that uses a large teacher model to supervise and correct both the reasoning and reflection processes of a smaller student model. |
Ling Yang; Zhaochen Yu; Tianjun Zhang; Minkai Xu; Joseph E. Gonzalez; Bin CUI; Shuicheng YAN; | code |
| 176 | Improved Techniques for Optimization-Based Jailbreaking on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present several improved (empirical) techniques for optimization-based jailbreaks like GCG. |
Xiaojun Jia; Tianyu Pang; Chao Du; Yihao Huang; Jindong Gu; Yang Liu; Xiaochun Cao; Min Lin; | code |
| 177 | Fine-Tuning Discrete Diffusion Models Via Reward Optimization with Applications to DNA and Protein Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then formulate the reward maximization problem within discrete diffusion models, analogous to reinforcement learning (RL), while minimizing the KL divergence against pre-trained diffusion models to preserve naturalness. To solve this RL problem, we propose a novel algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models, by making the originally non-differentiable trajectories differentiable using the Gumbel-Softmax trick. |
Chenyu Wang; Masatoshi Uehara; Yichun He; Amy Wang; Avantika Lal; Tommi Jaakkola; Sergey Levine; Aviv Regev; Hanchen; Tommaso Biancalani; | code |
| 178 | Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we for the first time reveal the vulnerability of safety alignment in FedIT by proposing a simple, stealthy, yet effective safety attack method. |
Rui Ye; Jingyi Chai; Xiangrui Liu; Yaodong Yang; Yanfeng Wang; Siheng Chen; | code |
| 179 | Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. |
Andy K Zhang; Neil Perry; Riya Dulepet; Joey Ji; Celeste Menders; Justin W Lin; Eliot Jones; Gashon Hussein; Samantha Liu; Donovan Julian Jasper; Pura Peetathawatchai; Ari Glenn; Vikram Sivashankar; Daniel Zamoshchin; Leo Glikbarg; Derek Askaryar; Haoxiang Yang; Aolin Zhang; Rishi Alluri; Nathan Tran; Rinnara Sangpisit; Kenny O Oseleononmen; Dan Boneh; Daniel E. Ho; Percy Liang; | code |
| 180 | Competing Large Language Models in Multi-Agent Gaming Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GAMA($\gamma$)-Bench, a new framework for evaluating LLMs’ Gaming Ability in Multi-Agent environments. |
Jen-tse Huang; Eric John Li; Man Ho LAM; Tian Liang; Wenxuan Wang; Youliang Yuan; Wenxiang Jiao; Xing Wang; Zhaopeng Tu; Michael Lyu; | code |
| 181 | Poison-splat: Computation Cost Attack on 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in this work, we reveal a significant security vulnerability that has been largely overlooked in 3DGS: the computation cost of training 3DGS could be maliciously tampered by poisoning the input data. By developing an attack named Poison-splat, we reveal a novel attack surface where the adversary can poison the input images to drastically increase the computation memory and time needed for 3DGS training, pushing the algorithm towards its worst computation complexity. |
Jiahao Lu; Yifan Zhang; Qiuhong Shen; Xinchao Wang; Shuicheng YAN; | code |
| 182 | 3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Depth-Driven Decoupled Image Synthesis (3DIS), a novel framework that decouples the MIG process into two stages: (i) generating a coarse scene depth map for accurate instance positioning and scene composition, and (ii) rendering fine-grained attributes using pre-trained ControlNet on any foundational model, without additional training. |
dewei Zhou; Ji Xie; Zongxin Yang; Yi Yang; | code |
| 183 | AutoG: Towards Automatic Graph Construction from Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our research aims to address this gap by formalizing the graph construction problem and proposing an effective solution.First, we introduce a set of datasets to formalize and evaluate graph construction methods. |
Zhikai Chen; Han Xie; Jian Zhang; Xiang song; Jiliang Tang; Huzefa Rangwala; George Karypis; | code |
| 184 | Generative Flows on Synthetic Pathway for Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose RxnFlow, which sequentially assembles molecules using predefined molecular building blocks and chemical reaction templates to constrain the synthetic chemical pathway. |
Seonghwan Seo; Minsu Kim; Tony Shen; Martin Ester; Jinkyoo Park; Sungsoo Ahn; Woo Youn Kim; | code |
| 185 | Small Models Are LLM Knowledge Triggers for Medical Tabular Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SERSAL, a general self-prompting method by synergy learning with small models to enhance LLM tabular prediction in an unsupervised manner. |
Jiahuan Yan; Jintai Chen; Chaowen Hu; Bo Zheng; Yaojun Hu; Jimeng Sun; Jian Wu; | code |
| 186 | Diffusion Feedback Helps CLIP See Better Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main reason could be that the image-text pairs used to train CLIP are inherently biased, due to the lack of the distinctiveness of the text and the diversity of images. In this work, we present a simple post-training approach for CLIP models, which largely overcomes its visual shortcomings via a self-supervised diffusion process. |
Wenxuan Wang; Quan Sun; Fan Zhang; Yepeng Tang; Jing Liu; Xinlong Wang; | code |
| 187 | CR-CTC: Consistency Regularization on CTC for Improved Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. |
Zengwei Yao; Wei Kang; Xiaoyu Yang; Fangjun Kuang; Liyong Guo; Han Zhu; Zengrui Jin; Zhaoqing Li; Long Lin; Daniel Povey; | code |
| 188 | Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel end-to-end framework that generates highly compact (typically 6-15 dimensions), discrete (int4 type), and interpretable node representations—termed node identifiers (node IDs)—to tackle inference challenges on large-scale graphs. |
Yuankai Luo; Hongkang Li; Qijiong Liu; Lei Shi; Xiao-Ming Wu; | code |
| 189 | SpikeLLM: Scaling Up Spiking Neural Network to Large Language Models Via Saliency-based Spiking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Coupled with the proposed model, two essential approaches are proposed to improve spike training efficiency: Generalized Integrate-and-Fire (GIF) neurons to compress spike length from $T$ to $\frac{T}{L} \log_2 L$ bits, and an Optimal Brain Spiking framework to divide outlier channels and allocate different $T$ for GIF neurons, which further compresses spike length to approximate $log_2T$ bits. |
Xingrun Xing; Boyan Gao; Zheng Liu; David A. Clifton; Shitao Xiao; Wanpeng Zhang; Li Du; Zheng Zhang; Guoqi Li; Jiajun Zhang; | code |
| 190 | Syntactic and Semantic Control of Large Language Models Via Sequential Monte Carlo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC). |
João Loula; Benjamin LeBrun; Li Du; Ben Lipkin; Clemente Pasti; Gabriel Grand; Tianyu Liu; Yahya Emara; Marjorie Freedman; Jason Eisner; Ryan Cotterell; Vikash Mansinghka; Alexander K. Lew; Tim Vieira; Timothy J. O’Donnell; | code |
| 191 | Diversity-Rewarded CFG Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, CFG doubles inference cost while limiting originality and diversity across generated contents. In this paper, we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations. |
Geoffrey Cideron; Andrea Agostinelli; Johan Ferret; Sertan Girgin; Romuald Elie; Olivier Bachem; Sarah Perrin; Alexandre Rame; | code |
| 192 | ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, large language models (LLMs) encounter difficulties handling domain-specific formulas, executing reasoning steps accurately, and integrating code ef- effectively when tackling chemical reasoning tasks. To address these challenges, we present ChemAgent, a novel framework designed to improve the performance of LLMs through a dynamic, self-updating library. |
Xiangru Tang; Tianyu Hu; Muyang Ye; Yanjun Shao; Xunjian Yin; Siru Ouyang; Wangchunshu Zhou; Pan Lu; Zhuosheng Zhang; Yilun Zhao; Arman Cohan; Mark Gerstein; | code |
| 193 | Regulatory DNA Sequence Design with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current CRE design methods are limited by two major drawbacks: (1) they typically rely on iterative optimization strategies that modify existing sequences and are prone to local optima, and (2) they lack the guidance of biological prior knowledge in sequence optimization. In this paper, we address these limitations by proposing a generative approach that leverages reinforcement learning (RL) to fine-tune a pre-trained autoregressive (AR) model. |
Zhao Yang; Bing Su; Chuan Cao; Ji-Rong Wen; | code |
| 194 | MLLM Can See? Dynamic Correction Decoding for Hallucination Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena, but the underlying reasons remain poorly understood. In this paper, we present an empirical analysis and find that, although MLLMs incorrectly generate the objects in the final output, they are actually able to recognize visual objects in the preceding layers. |
Chenxi Wang; Xiang Chen; Ningyu Zhang; Bozhong Tian; Haoming Xu; Shumin Deng; Huajun Chen; | code |
| 195 | Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The widespread enthusiasm for deep learning has recently expanded into the domain of tabular data. Recognizing that the advancement in deep tabular methods is often inspired by classical methods, e.g., integration of nearest neighbors into neural networks, we investigate whether these classical methods can be revitalized with modern techniques. |
Han-Jia Ye; Huai-Hong Yin; De-Chuan Zhan; Wei-Lun Chao; | code |
| 196 | Physics-Informed Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a framework that unifies generative modeling and partial differential equation fulfillment by introducing a first-principle-based loss term that enforces generated samples to fulfill the underlying physical constraints. |
Jan-Hendrik Bastek; WaiChing Sun; Dennis Kochmann; | code |
| 197 | FakeShield: Explainable Image Forgery Detection and Localization Via Multi-modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although current image forgery detection and localization (IFDL) methods are generally effective, they tend to face two challenges: \textbf{1)} black-box nature with unknown detection principle, \textbf{2)} limited generalization across diverse tampering methods (e.g., Photoshop, DeepFake, AIGC-Editing). To address these issues, we propose the explainable IFDL task and design FakeShield, a multi-modal framework capable of evaluating image authenticity, generating tampered region masks, and providing a judgment basis based on pixel-level and image-level tampering clues. |
Zhipei Xu; Xuanyu Zhang; Runyi Li; Zecheng Tang; Qing Huang; Jian Zhang; | code |
| 198 | Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reveal the susceptibility and vulnerability of Vision-Language (VL) models to significant biases arising from gradual drift and sudden drift, particularly in the pre-training.Furthermore, we create a set of multi-modal datasets called OpenMMlo, specifically tailored for the long-tailed open-world setting, to validate our findings. |
Xiaoyu Yang; Jie Lu; En Yu; | code |
| 199 | Efficient Inference for Large Language Model-based Generative Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an alignment framework named AtSpeed, which presents the AtSpeed-S optimization objective for top-K alignment under the strict top-K verification. |
Xinyu Lin; Chaoqun Yang; Wenjie Wang; Yongqi Li; Cunxiao Du; Fuli Feng; See-Kiong Ng; Tat-Seng Chua; | code |
| 200 | MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reveal that **up to 33\% or 76\% of \mquake{}’s questions and ground truth labels are, in fact, corrupted in various fashions due to some unintentional clerical or procedural oversights**. |
Shaochen Zhong; Yifan Lu; Lize Shao; Bhargav Bhushanam; Xiaocong Du; Yixin Wan; Yucheng Shi; Daochen Zha; Yiwei Wang; Ninghao Liu; Kaixiong Zhou; Shuai Xu; Kai-Wei Chang; Louis Feng; Vipin Chaudhary; Xia Hu; | code |
| 201 | DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from A Single Demo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present DenseMatcher, a method capable of computing 3D correspondences between in-the-wild objects that share similar structures. |
Junzhe Zhu; Yuanchen Ju; Junyi Zhang; Muhan Wang; Zhecheng Yuan; Kaizhe Hu; Huazhe Xu; | code |
| 202 | I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose I2VControl-Camera, a novel camera control method that significantly enhances controllability while providing adjustability over the strength of subject motion. |
Wanquan Feng; Jiawei Liu; Pengqi Tu; Tianhao Qi; Mingzhen Sun; Tianxiang Ma; Songtao Zhao; SiYu Zhou; Qian HE; | code |
| 203 | MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. |
Sheng Wang; Liheng Chen; Pengan CHEN; Jingwei Dong; Boyang XUE; Jiyue Jiang; Lingpeng Kong; Chuan Wu; | code |
| 204 | VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The major difficulties in multi-grained editing include semantic misalignment of text-to-region control and feature coupling within the diffusion model. To address these difficulties, we present VideoGrain, a zero-shot approach that modulates space-time (cross- and self-) attention mechanisms to achieve fine-grained control over video content. |
Xiangpeng Yang; Linchao Zhu; Hehe Fan; Yi Yang; | code |
| 205 | ReAttention: Training-Free Infinite Context with Finite Attention Scope Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose \textbf{ReAttention}, a training-free approach enabling LLM based on the self-attention mechanism to support an infinite context with a finite attention scope under sufficient memory resources. |
Xiaoran Liu; Ruixiao Li; Zhigeng Liu; Qipeng Guo; Yuerong Song; Kai Lv; Hang Yan; Linlin Li; Qun Liu; Xipeng Qiu; | code |
| 206 | Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation Through Diffusion Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation hampers their practical application in real-world settings. To address this, we propose ***Stem-OB*** that leverages the inversion process of pretrained image diffusion models to suppress low-level visual differences while maintaining high-level scene structures. |
Kaizhe Hu; Zihang Rui; Yao He; Yuyao Liu; Pu Hua; Huazhe Xu; | code |
| 207 | CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, inspired by the recent advancements of diffusion model-based inverse problem solvers (DIS), we reformulate text-guidance as an inverse problem with a text-conditioned score matching loss and develop CFG++, a novel approach that tackles the off-manifold challenges inherent in traditional CFG. |
Hyungjin Chung; Jeongsol Kim; Geon Yeong Park; Hyelin Nam; Jong Chul Ye; | code |
| 208 | GOAL: A Generalist Combinatorial Optimization Agent Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose GOAL (for Generalist combinatorial Optimization Agent Learner), a generalist model capable of efficiently solving multiple COPs and which can be fine-tuned to solve new COPs. |
Darko Drakulic; Sofia Michel; Jean-Marc Andreoli; | code |
| 209 | Collapsed Language Models Promote Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, by rigorous evaluations of Neural Collapse — a learning phenomenon happen in last-layer representations and classifiers in deep networks — on fairness-related words, we find that debiased language models exhibit collapsed alignment between token representations and word embeddings. |
Jingxuan Xu; Wuyang Chen; Linyi Li; Yao Zhao; Yunchao Wei; | code |
| 210 | Exploring The Design Space of Visual Context Representation in Video MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the design space for visual context representation, and aim to improve the performance of video MLLMs by finding more effective representation schemes. |
Yifan Du; Yuqi Huo; Kun Zhou; Zijia Zhao; Haoyu Lu; Han Huang; Xin Zhao; Bingning Wang; weipeng chen; Ji-Rong Wen; | code |
| 211 | GOFA: A Generative One-For-All Model for Joint Graph Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first identify three key desirable properties of a GFM: self-supervised pretraining, fluidity in tasks, and graph awareness. To account for these properties, we extend the conventional language modeling to the graph domain and propose a novel generative graph language model GOFA. |
Lecheng Kong; Jiarui Feng; Hao Liu; Chengsong Huang; Jiaxin Huang; Yixin Chen; Muhan Zhang; | code |
| 212 | Learning System Dynamics Without Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose the Mode-switching Graph ODE (MS-GODE) model, which integrates the strengths LG-ODE and sub-network learning with a mode-switching module, enabling efficient learning over varying dynamics.Moreover, we construct a novel benchmark of biological dynamic systems for CDL, Bio-CDL, featuring diverse systems with disparate dynamics and significantly enriching the research field of machine learning for dynamic systems. |
Xikun ZHANG; Dongjin Song; Yushan Jiang; Yixin Chen; Dacheng Tao; | code |
| 213 | Model Merging with SVD to Tie The Knots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. |
George Stoica; Pratik Ramesh; Boglarka Ecsedi; Leshem Choshen; Judy Hoffman; | code |
| 214 | An Empirical Analysis of Uncertainty in Large Language Model Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct extensive experiments involving 9 widely used LLM evaluators across 2 different evaluation settings to investigate the uncertainty in model-based LLM evaluations. |
Qiujie Xie; Qingqiu Li; Zhuohao Yu; Yuejie Zhang; Yue Zhang; Linyi Yang; | code |
| 215 | 3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations, we propose \textbf{3D-MolT5}, a unified framework designed to model molecule in both sequence and 3D structure spaces. |
Qizhi Pei; Rui Yan; Kaiyuan Gao; Jinhua Zhu; Lijun Wu; | code |
| 216 | CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Secondly, existing EEG foundation models have limited generalizability on a wide range of downstream BCI tasks due to varying formats of EEG data, making it challenging to adapt to. To address these challenges, we propose a novel foundation model called CBraMod. |
Jiquan Wang; Sha Zhao; Zhiling Luo; Yangxuan Zhou; Haiteng Jiang; Shijian Li; Tao Li; Gang Pan; | code |
| 217 | Diffusion State-Guided Projected Gradient for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the performance and robustness of diffusion models in solving inverse problems, we propose Diffusion State-Guided Projected Gradient (DiffStateGrad), which projects the measurement gradient onto a subspace that is a low-rank approximation of an intermediate state of the diffusion process. |
Rayhan Zirvi; Bahareh Tolooshams; Anima Anandkumar; | code |
| 218 | EgoSim: Egocentric Exploration in Virtual Worlds with Multi-modal Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in video diffusion models have established a strong foundation for developing world models with practical applications. The next challenge lies in exploring how an agent can leverage these foundation models to understand, interact with, and plan within observed environments. This requires adding more controllability to the model, transforming it into a versatile game engine capable of dynamic manipulation and control. To address this, we investigated three key conditioning factors: camera, context frame, and text, identifying limitations in current model designs. |
Wei Yu; Songheng Yin; Steve Easterbrook; Animesh Garg; | code |
| 219 | Bootstrapping Language Models with DPO Implicit Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: DPO, after training, provides an implicit reward model. In this work, we make a novel observation that this implicit reward model can by itself be used in a bootstrapping fashion to further align the LLM. |
Changyu Chen; Zichen Liu; Chao Du; Tianyu Pang; Qian Liu; Arunesh Sinha; Pradeep Varakantham; Min Lin; | code |
| 220 | Intermediate Layer Classifiers for OOD Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we question the use of last-layer representations for out-of-distribution (OOD) generalisation and explore the utility of intermediate layers. |
Arnas Uselis; Seong Joon Oh; | code |
| 221 | BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce $\textbf{BitStack}$, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance. |
Xinghao Wang; Pengyu Wang; Bo Wang; Dong Zhang; Yunhua Zhou; Xipeng Qiu; | code |
| 222 | Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explain the process of information retrieval with a causal graph and discover that PLM-based retrievers learn perplexity features for relevance estimation, causing source bias by ranking the documents with low perplexity higher. |
Haoyu Wang; Sunhao Dai; Haiyuan Zhao; Liang Pang; Xiao Zhang; Gang Wang; Zhenhua Dong; Jun Xu; Ji-Rong Wen; | code |
| 223 | TIPS: Text-Image Pretraining with Spatial Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For this reason, self-supervised image-only pretraining is still the go-to method for many dense vision applications (e.g. depth estimation, semantic segmentation), despite the lack of explicit supervisory signals. In this paper, we close this gap between image-text and self-supervised learning, by proposing a novel general-purpose image-text model, which can be effectively used off the shelf for dense and global vision tasks. |
Kevis-kokitsi Maninis; Kaifeng Chen; Soham Ghosh; Arjun Karpur; Koert Chen; Ye Xia; Bingyi Cao; Daniel Salz; Guangxing Han; Jan Dlabal; Dan Gnanapragasam; Mojtaba Seyedhosseini; Howard Zhou; Andre Araujo; | code |
| 224 | Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Think-on-Graph 2.0 (ToG-2), a hybrid RAG framework that iteratively retrieves information from both unstructured and structured knowledge sources in a tight-coupling manner. |
Shengjie Ma; Chengjin Xu; Xuhui Jiang; Muzhi Li; Huaren Qu; Cehao Yang; Jiaxin Mao; Jian Guo; | code |
| 225 | Learning LLM-as-a-Judge for Preference Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This proposal of learning the LLMas-a-Judge using self-generated Contrastive judgments (Con-J) ensures natural interpretability through the generated rationales supporting the judgments, and demonstrates higher robustness against bias compared to scalar models. |
Ziyi Ye; Xiangsheng Li; Qiuchi Li; Qingyao Ai; Yujia Zhou; Wei Shen; Dong Yan; Yiqun LIU; | code |
| 226 | STAMP: Scalable Task- And Model-agnostic Collaborative Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, the heterogeneity among agents—in terms of sensors, models, and tasks—significantly hinders effective and efficient cross-agent collaboration. To address these challenges, we propose STAMP, a scalable task- and model-agnostic collaborative perception framework tailored for heterogeneous agents. |
Xiangbo Gao; Runsheng Xu; Jiachen Li; Ziran Wang; Zhiwen Fan; Zhengzhong Tu; | code |
| 227 | Radar: Fast Long-Context Decoding for Any Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Radar, a training-free approach that accelerates inference by dynamically searching for the most important context tokens. |
Yongchang Hao; Mengyao Zhai; Hossein Hajimirsadeghi; Sepidehsadat Hosseini; Frederick Tung; | code |
| 228 | Unveiling The Magic of Code Reasoning Through Hypothesis Decomposition and Amendment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, tasks that embody both reasoning and recall characteristics are often overlooked. In this paper, we introduce such a novel task, code reasoning, to provide a new perspective for the reasoning abilities of LLMs. |
Yuze Zhao; Tianyun Ji; Wenjun Feng; Zhenya Huang; Qi Liu; Zhiding Liu; Yixiao Ma; Kai Zhang; Enhong Chen; | code |
| 229 | NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While prior efforts focus on 3D diffusion models for their benefits in modeling continuous 3D conformers, they overlook the advantages of 1D SELFIES-based Language Models (LMs), which can generate 100\% valid molecules and leverage the billion-scale 1D molecule datasets. To combine these advantages for 3D molecule generation, we propose a foundation model — NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation. |
Zhiyuan Liu; Yanchen Luo; Han Huang; Enzhi Zhang; Sihang Li; Junfeng Fang; Yaorui Shi; Xiang Wang; Kenji Kawaguchi; Tat-Seng Chua; | code |
| 230 | ToolACE: Winning The Points of LLM Function Calling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data, specifically tailored to the capabilities of LLMs. |
Weiwen Liu; Xu Huang; Xingshan Zeng; xinlong hao; Shuai Yu; Dexun Li; Shuai Wang; Weinan Gan; Zhengying Liu; Yuanqing Yu; Zezhong WANG; Yuxian Wang; Wu Ning; Yutai Hou; Bin Wang; Chuhan Wu; Wang Xinzhi; Yong Liu; Yasheng Wang; Duyu Tang; Dandan Tu; Lifeng Shang; Xin Jiang; Ruiming Tang; Defu Lian; Qun Liu; Enhong Chen; | code |
| 231 | Diffusion On Syntax Trees For Program Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. |
Shreyas Kapur; Erik Jenner; Stuart Russell; | code |
| 232 | CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the comprehensive methodology that includes: (1) a flexible design of manipulations based on extensive analysis, (2) an efficient automated data generation pipeline, (3) a compatible VLM architecture capable of multi-turn, multi-image, and (4) a model training process for versatile capabilities. |
Ji Qi; Ming Ding; Weihan Wang; Yushi Bai; Qingsong Lv; Wenyi Hong; Bin Xu; Lei Hou; Juanzi Li; Yuxiao Dong; Jie Tang; | code |
| 233 | PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing. |
Qijun Gan; Song Wang; Shengtao Wu; Jianke Zhu; | code |
| 234 | DECO: Unleashing The Potential of ConvNets for Query-based Detection and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper we explore whether we could build query-based detection and segmentation framework with ConvNets instead of sophisticated transformer architecture. |
Xinghao Chen; Siwei Li; Yijing Yang; Yunhe Wang; | code |
| 235 | RegMix: Data Mixture As Regression for Language Model Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. |
Qian Liu; Xiaosen Zheng; Niklas Muennighoff; Guangtao Zeng; Longxu Dou; Tianyu Pang; Jing Jiang; Min Lin; | code |
| 236 | SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a key challenge remains in downstream task applications: how to effectively and efficiently adapt pre-trained diffusion models to new tasks. Inspired by model pruning which lightens large pre-trained models by removing unimportant parameters, we propose a novel model fine-tuning method to make full use of these ineffective parameters and enable the pre-trained model with new task-specified capabilities. |
Teng Hu; Jiangning Zhang; Ran Yi; Hongrui Huang; Yabiao Wang; Lizhuang Ma; | code |
| 237 | Symbolic Regression Via MDLformer-guided Search: from Minimizing Prediction Error to Minimizing Description Length Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since formulas with similar function shapes may have completely different symbolic forms, the prediction error does not decrease monotonously as the search approaches the target formula, causing the low recovery rate of existing methods. To solve this problem, we propose a novel search objective based on the minimum description length, which reflects the distance from the target and decreases monotonically as the search approaches the correct form of the target formula. |
Zihan Yu; Jingtao Ding; Yong Li; Depeng Jin; | code |
| 238 | Better Than Your Teacher: LLM Agents That Learn from Privileged AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LEAP, an iterative fine-tuning framework that continually improves LLM agents using feedback from AI expert teachers. |
Sanjiban Choudhury; Paloma Sodhi; | code |
| 239 | FreeVS: Generative View Synthesis on Free Driving Trajectory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose FreeVS, a novel fully generative approach that can synthesize camera views on free new trajectories in real driving scenes.Moreover, we propose two new challenging benchmarks tailored to driving scenes, which are novel camera synthesis and novel trajectory synthesis, emphasizing the freedom of viewpoints. |
Qitai Wang; Lue Fan; Yuqi Wang; Yuntao Chen; Zhaoxiang Zhang; | code |
| 240 | ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing approaches that fuse temporally forward and backward paths in parallel often suffer from off-manifold issues, leading to artifacts or requiring multiple iterative re-noising steps. In this work, we introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning. |
Serin Yang; Taesung Kwon; Jong Chul Ye; | code |
| 241 | IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce IterComp, a novel framework that aggregates composition-aware model preferences from multiple models and employs an iterative feedback learning approach to enhance compositional generation.Based on these metrics, we develop a composition-aware model preference dataset comprising numerous image-rank pairs to train composition-aware reward models. |
Xinchen Zhang; Ling Yang; Guohao Li; YaQi Cai; xie jiake; Yong Tang; Yujiu Yang; Mengdi Wang; Bin CUI; | code |
| 242 | Combatting Dimensional Collapse in LLM Pre-Training Data Via Submodular File Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Empirically, we establish a benchmark and conduct extensive experiments on the TinyLlama architecture with models from 120M to 1.1B parameters. |
Ziqing Fan; Siyuan Du; Shengchao Hu; Pingjie Wang; Li Shen; Ya Zhang; Dacheng Tao; Yanfeng Wang; | code |
| 243 | Preference Diffusion for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These approaches are either suboptimal for personalized ranking tasks or fail to exploit the full generative potential of DMs. To address these limitations, we propose \textbf{PreferDiff}, an optimization objective tailored for DM-based recommenders. |
Shuo Liu; An Zhang; Guoqing Hu; Hong Qian; Tat-Seng Chua; | code |
| 244 | InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current studies primarily focus on natural image restoration, leaving the performance of these algorithms in scientific inverse problems largely unexplored. To address this gap, we introduce \textsc{InverseBench}, a framework that evaluates diffusion models across five distinct scientific inverse problems. |
Hongkai Zheng; Wenda Chu; Bingliang Zhang; Zihui Wu; Austin Wang; Berthy Feng; Caifeng Zou; Yu Sun; Nikola Borislavov Kovachki; Zachary E Ross; Katherine Bouman; Yisong Yue; | code |
| 245 | VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Relying solely on Automatic Speech Recognition (ASR) can lead to the loss of valuable auditory cues, thereby weakening the system’s ability to generate contextually appropriate responses. To address this limitation, we propose \textbf{VoxDialogue}, a comprehensive benchmark for evaluating the ability of spoken dialogue systems to understand multi-modal information beyond text. |
Xize Cheng; Ruofan Hu; Xiaoda Yang; Jingyu Lu; Dongjie Fu; Zehan Wang; Shengpeng Ji; Rongjie Huang; Boyang Zhang; Tao Jin; Zhou Zhao; | code |
| 246 | OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing QSS methods rely on a single modality for separation, lacking the ability to fully leverage homologous but heterogeneous information across multiple modalities for the same sound signal. To address this limitation, we introduce Omni-modal Sound Separation (**OmniSep**), a novel framework capable of isolating clean soundtracks based on omni-modal queries, encompassing both single-modal and multi-modal composed queries. |
Xize Cheng; Siqi Zheng; Zehan Wang; Minghui Fang; Ziang Zhang; Rongjie Huang; Shengpeng Ji; Jialong Zuo; Tao Jin; Zhou Zhao; | code |
| 247 | Test-time Adaptation for Cross-modal Retrieval with Query Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the observations, we propose a novel method dubbed Test-time adaptation for Cross-modal Retrieval (TCR). |
Haobin Li; Peng Hu; Qianjun Zhang; Xi Peng; XitingLiu; Mouxing Yang; | code |
| 248 | GenSE: Generative Speech Enhancement Via Language Models Using Hierarchical Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enrich the SE model with semantic information, we employ language models as an efficient semantic learner and propose a comprehensive framework tailored for language model-based speech enhancement, called GenSE. |
Jixun Yao; Hexin Liu; Chen Chen; Yuchen Hu; EngSiong Chng; Lei Xie; | code |
| 249 | MonST3R: A Simple Approach for Estimating Geometry in The Presence of Motion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. |
Junyi Zhang; Charles Herrmann; Junhwa Hur; Varun Jampani; Trevor Darrell; Forrester Cole; Deqing Sun; Ming-Hsuan Yang; | code |
| 250 | VisualAgentBench: Towards Large Multimodal Models As Visual Foundation Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs as visual foundation agents in complex, real-world environments. To address this gap, we introduce VisualAgentBench (VAB), a comprehensive and unified benchmark specifically designed to train and evaluate LMMs as visual foundation agents across diverse scenarios in one standard setting, including Embodied, Graphical User Interface, and Visual Design, with tasks formulated to probe the depth of LMMs’ understanding and interaction capabilities. |
Xiao Liu; Tianjie Zhang; Yu Gu; Iat Long Iong; Song XiXuan; Yifan Xu; Shudan Zhang; Hanyu Lai; Jiadai Sun; Xinyue Yang; Yu Yang; Zehan Qi; Shuntian Yao; Xueqiao Sun; Siyi Cheng; Qinkai Zheng; Hao Yu; Hanchen Zhang; Wenyi Hong; Ming Ding; Lihang Pan; Xiaotao Gu; Aohan Zeng; Zhengxiao Du; Chan Hee Song; Yu Su; Yuxiao Dong; Jie Tang; | code |
| 251 | Agent-Oriented Planning in Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy, to ensure that each sub-task can be effectively resolved, resulting in satisfactory responses to user queries. |
Ao Li; Yuexiang Xie; Songze Li; Fugee Tsung; Bolin Ding; Yaliang Li; | code |
| 252 | Large Language Models Often Say One Thing and Do Another Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores a critical issue in assessing the reliability of LLMs: the consistency between their words and deeds.To quantitatively explore this consistency, we developed a novel evaluation benchmark called the Words and Deeds Consistency Test (WDCT). |
Ruoxi Xu; Hongyu Lin; Xianpei Han; Jia Zheng; Weixiang Zhou; Le Sun; Yingfei Sun; | code |
| 253 | On Speeding Up Language Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This exhaustive evaluation can be time-consuming and costly. In this paper, we propose an \textit{adaptive} approach to explore this space. |
Jin Peng Zhou; Christian K Belardi; Ruihan Wu; Travis Zhang; Carla P Gomes; Wen Sun; Kilian Q Weinberger; | code |
| 254 | Towards Foundation Models for Mixed Integer Linear Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As existing datasets for MILP lack diversity and volume, we introduce MILP-Evolve, a novel LLM-based evolutionary framework that is capable of generating a large set of diverse MILP classes with an unlimited amount of instances. |
Sirui Li; Janardhan Kulkarni; Ishai Menache; Cathy Wu; Beibin Li; | code |
| 255 | GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing test-time approaches rely on trajectory-level RMs which are designed to evaluate complete responses, making them unsuitable for autoregressive text generation that requires computing next-token rewards from partial responses. To address this, we introduce GenARM, a test-time alignment approach that leverages the Autoregressive Reward Model—a novel reward parametrization designed to predict next-token rewards for efficient and effective autoregressive generation. |
Yuancheng Xu; Udari Madhushani Sehwag; Alec Koppel; Sicheng Zhu; Bang An; Furong Huang; Sumitra Ganesh; | code |
| 256 | Cut The Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though impressive in performance, existing multi-agent pipelines inherently introduce substantial token overhead, as well as increased economic costs, which pose challenges for their large-scale deployments. In response to this challenge, we propose an economical, simple, and robust multi-agent communication framework, termed $\texttt{AgentPrune}$, which can seamlessly integrate into mainstream multi-agent systems and prunes redundant or even malicious communication messages. |
Guibin Zhang; Yanwei Yue; Zhixun Li; Sukwon Yun; Guancheng Wan; Kun Wang; Dawei Cheng; Jeffrey Xu Yu; Tianlong Chen; | code |
| 257 | Think While You Generate: Discrete Diffusion with Planned Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce *Discrete Diffusion with Planned Denoising* (DDPD), a novel framework that separates the generation process into two models: a planner and a denoiser. |
Sulin Liu; Juno Nam; Andrew Campbell; Hannes Stark; Yilun Xu; Tommi Jaakkola; Rafael Gomez-Bombarelli; | code |
| 258 | Cross-Embodiment Dexterous Grasping with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the learning of cross-embodiment dexterous grasping policies using reinforcement learning (RL). |
Haoqi Yuan; Bohan Zhou; Yuhui Fu; Zongqing Lu; | code |
| 259 | Atomas: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Atomas, a hierarchical molecular representation learning framework that jointly learns representations from SMILES strings and text. |
Yikun Zhang; Geyan Ye; Chaohao Yuan; Bo Han; Long-Kai Huang; Jianhua Yao; Wei Liu; Yu Rong; | code |
| 260 | RelitLRM: Generative Relightable Radiance for Large Reconstruction Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. |
Tianyuan Zhang; Zhengfei Kuang; Haian Jin; Zexiang Xu; Sai Bi; Hao Tan; He Zhang; Yiwei Hu; Milos Hasan; William T. Freeman; Kai Zhang; Fujun Luan; | code |
| 261 | AFlow: Automating Agentic Workflow Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFLOW, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. |
Jiayi Zhang; Jinyu Xiang; Zhaoyang Yu; Fengwei Teng; Xiong-Hui Chen; Jiaqi Chen; Mingchen Zhuge; Xin Cheng; Sirui Hong; Jinlin Wang; Bingnan Zheng; Bang Liu; Yuyu Luo; Chenglin Wu; | code |
| 262 | Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the gap, this paper proposes Neural Exploratory Landscape Analysis (NeurELA), a novel framework that dynamically profiles landscape features through a two-stage, attention-based neural network, executed in an entirely end-to-end fashion. |
Zeyuan Ma; Jiacheng Chen; Hongshu Guo; Yue-Jiao Gong; | code |
| 263 | LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While speculative decoding has proven effective for accelerating LLMs by generating multiple tokens in a single forward, its application in visual AR models remains largely unexplored. In this work, we identify a challenge in this setting, which we term \textit{token selection ambiguity}, wherein visual AR models frequently assign uniformly low probabilities to tokens, hampering the performance of speculative decoding. |
Doohyuk Jang; Sihwan Park; June Yong Yang; Yeonsung Jung; Jihun Yun; Souvik Kundu; Sung-Yub Kim; Eunho Yang; | code |
| 264 | BANGS: Game-theoretic Node Selection for Graph Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While selecting highly confident nodes has proven effective for self-training, this pseudo-labeling strategy ignores the combinatorial dependencies between nodes and suffers from a local view of the distribution. To overcome these issues, we propose BANGS, a novel framework that unifies the labeling strategy with conditional mutual information as the objective of node selection. |
Fangxin Wang; Kay Liu; Sourav Medya; Philip S. Yu; | code |
| 265 | ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. |
Chenrui Tie; Yue Chen; Ruihai Wu; Boxuan Dong; Zeyi Li; Chongkai Gao; Hao Dong; | code |
| 266 | Pyramidal Flow Matching for Efficient Video Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a unified pyramidal flow matching algorithm. |
Yang Jin; Zhicheng Sun; Ningyuan Li; Kun Xu; Kun Xu; Hao Jiang; Nan Zhuang; Quzhe Huang; Yang Song; Yadong MU; Zhouchen Lin; | code |
| 267 | Boltzmann-Aligned Inverse Folding Model As A Predictor of Mutational Effects on Protein-Protein Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Boltzmann Alignment technique to transfer knowledge from pre-trained inverse folding models to prediction of $\Delta\Delta G$. |
Xiaoran Jiao; Weian Mao; Wengong Jin; Peiyuan Yang; Hao Chen; Chunhua Shen; | code |
| 268 | GraphBridge: Towards Arbitrary Transfer Learning in GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces **GraphBridge**, a novel framework to enable knowledge transfer across disparate tasks and domains in GNNs, circumventing the need for modifications to task configurations or graph structures. |
Li Ju; Xingyi Yang; Qi Li; Xinchao Wang; | code |
| 269 | CS-Bench: A Comprehensive Benchmark for Large Language Models Towards Computer Science Mastery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current community of LLMs overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer science field. To bridge this gap, we introduce CS-Bench, the first multilingual (English, Chinese, French, German) benchmark dedicated to evaluating the performance of LLMs in computer science. |
Xiaoshuai Song; Muxi Diao; Guanting Dong; Zhengyang Wang; Yujia Fu; Runqi Qiao; Zhexu Wang; Dayuan Fu; Huangxuan Wu; Bin Liang; Weihao Zeng; Yejie Wang; Zhuoma GongQue; Jianing Yu; Qiuna Tan; Weiran Xu; | code |
| 270 | Selective Aggregation for Low-Rank Adaptation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In doing so, we uncover that $A$ matrices are responsible for learning general knowledge, while $B$ matrices focus on capturing client-specific knowledge. Based on this finding, we introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update, but only $A$ matrices are shared with the server for aggregation. |
Pengxin Guo; Shuang Zeng; Yanran Wang; Huijie Fan; Feifei Wang; Liangqiong Qu; | code |
| 271 | ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion models break down the challenging task of generating data from high-dimensional distributions into a series of easier denoising steps. Inspired by this paradigm, we propose a novel approach that extends the diffusion framework into modality space, decomposing the complex task of RGB image generation into simpler, interpretable stages. |
Eslam Mohamed BAKR; Liangbing Zhao; Vincent Tao Hu; Matthieu Cord; Patrick Perez; Mohamed Elhoseiny; | code |
| 272 | From Commands to Prompts: LLM-based Semantic File System for AIOS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paradigm poses a bottleneck to the usability of these systems as users are required to navigate complex folder hierarchies and remember cryptic file names. To address this limitation, we propose an LLM-based Semantic File System (LSFS) for prompt-driven file management in LLM Agent Operating System (AIOS). |
Zeru Shi; Kai Mei; Mingyu Jin; Yongye Su; Chaoji Zuo; Wenyue Hua; Wujiang Xu; Yujie Ren; Zirui Liu; Mengnan Du; Dong Deng; Yongfeng Zhang; | code |
| 273 | LoRA-Pro: Are Low-Rank Adapters Properly Optimized? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And this low-rank gradient can be expressed in terms of the gradients of the two low-rank matrices in LoRA. Leveraging this insight, we introduce LoRA-Pro, a method that enhances LoRA’s performance by strategically adjusting the gradients of these low-rank matrices. |
Zhengbo Wang; Jian Liang; Ran He; Zilei Wang; Tieniu Tan; | code |
| 274 | TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, rather than modifying the routing mechanism as done in previous studies, we propose the Ternary Choice MoE (TC-MoE), a novel approach that expands the expert space by applying the ternary set {-1, 0, 1} to each expert. |
Shen Yan; Xingyan Bin; Sijun Zhang; Yisen Wang; Zhouchen Lin; | code |
| 275 | SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such practice can introduce content variations irrelevant to whether the instruction is precisely followed (e.g., different expressions about the same semantic), interfering with the goal of teaching models to recognize the key differences that lead to improved instruction following. In light of this, we introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs free from distractions. |
Jiale Cheng; Xiao Liu; Cunxiang Wang; Xiaotao Gu; Yida Lu; Dan Zhang; Yuxiao Dong; Jie Tang; Hongning Wang; Minlie Huang; | code |
| 276 | Framer: Interactive Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. |
Wen Wang; Qiuyu Wang; Kecheng Zheng; Hao OUYANG; Zhekai Chen; Biao Gong; Hao Chen; Yujun Shen; Chunhua Shen; | code |
| 277 | Universal Image Restoration Pre-training Via Degradation Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the Degradation Classification Pre-Training (DCPT), which enables models to learn how to classify the degradation type of input images for universal image restoration pre-training. |
JiaKui Hu; Lujia Jin; Zhengjian Yao; Yanye Lu; | code |
| 278 | Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, inspired by the observation that the text-to-image generation process is the inverse of image-conditioned response generation in LVLMs, we explore the potential of leveraging text-to-image generative models to assist in mitigating hallucinations in LVLMs. |
Ce Zhang; Zifu Wan; Zhehan Kan; Martin Q. Ma; Simon Stepputtis; Deva Ramanan; Russ Salakhutdinov; Louis-Philippe Morency; Katia P. Sycara; Yaqi Xie; | code |
| 279 | Semi-Supervised CLIP Adaptation By Enforcing Semantic and Trapezoidal Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when the downstream tasks are constrained by limited image-text paired data, CLIP struggles to effectively address the domain gap between the pre-training and the target tasks. To address this limitation, we propose a novel semi-supervised CLIP training method coined SemiCLIP that leverages a small amount of image-text pairs alongside a large volume of images without text descriptions to enhance CLIP’s cross-modal alignment. |
Kai Gan; Bo Ye; Min-Ling Zhang; Tong Wei; | code |
| 280 | A Closer Look at Machine Unlearning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches. |
Xiaojian Yuan; Tianyu Pang; Chao Du; Kejiang Chen; Weiming Zhang; Min Lin; | code |
| 281 | Hidden in The Noise: Two-Stage Robust Watermarking for Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, detecting the watermark requires comparing the initial noise reconstructed for an image to all previously used initial noises. To mitigate these issues, we propose a two-stage watermarking framework for efficient detection. |
Kasra Arabi; Benjamin Feuer; R. Teal Witter; Chinmay Hegde; Niv Cohen; | code |
| 282 | Adding Conditional Control to Diffusion Models with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these diffusion models trained on large datasets have achieved success, there is often a need to introduce additional controls in downstream fine-tuning processes, treating these powerful models as pre-trained diffusion models. This work presents a novel method based on reinforcement learning (RL) to add such controls using an offline dataset comprising inputs and labels. |
Yulai Zhao; Masatoshi Uehara; Gabriele Scalia; Sunyuan Kung; Tommaso Biancalani; Sergey Levine; Ehsan Hajiramezanali; | code |
| 283 | Image Watermarks Are Removable Using Controllable Regeneration from Clean Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a watermark removal approach capable of effectively nullifying state-of-the-art watermarking techniques. |
Yepeng Liu; Yiren Song; Hai Ci; Yu Zhang; Haofan Wang; Mike Zheng Shou; Yuheng Bu; | code |
| 284 | Budgeted Online Continual Learning By Adaptive Layer Freezing and Frequency-based Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Arguing different computational and storage budgets hinder fair comparison among CL algorithms in practice, we propose to use floating point operations (FLOPs) and total memory size in Byte as a metric for computational and memory budgets, respectively, to compare and develop CL algorithms in the same ‘total resource budget.’ |
Minhyuk Seo; Hyunseo Koh; Jonghyun Choi; | code |
| 285 | HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts. Therefore, we introduce $\textit{HD-Painter}$, a $\textbf{training-free}$ approach that $\textbf{accurately follows prompts}$. |
Hayk Manukyan; Andranik Sargsyan; Barsegh Atanyan; Zhangyang Wang; Shant Navasardyan; Humphrey Shi; | code |
| 286 | Graph Sparsification Via Mixture of Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixture-of-Experts (MoE), to dynamically select tailored pruning solutions for each node. |
Guibin Zhang; Xiangguo Sun; Yanwei Yue; Chonghe Jiang; Kun Wang; Tianlong Chen; Shirui Pan; | code |
| 287 | SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To comprehensively evaluate their capabilities, we introduce SPORTU, a benchmark designed to assess MLLMs across multi-level sports reasoning tasks. |
Haotian Xia; Zhengbang Yang; Junbo Zou; Rhys Tracy; Yuqing Wang; Chi Lu; Christopher Lai; Yanjun He; Xun Shao; Zhuoqing Xie; Yuan-fang Wang; Weining Shen; Hanjie Chen; | code |
| 288 | Population Transformer: Learning Population-level Representations of Neural Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a self-supervised framework that learns population-level codes for arbitrary ensembles of neural recordings at scale. |
Geeling Chau; Christopher Wang; Sabera J Talukder; Vighnesh Subramaniam; Saraswati Soedarmadji; Yisong Yue; Boris Katz; Andrei Barbu; | code |
| 289 | ACES: Automatic Cohort Extraction System for Event-Stream Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Datasets, model pipelines, and even task or cohort definitions are often private in this field, leading to a significant barrier in sharing, iterating, and understanding ML results on electronic health record (EHR) datasets. We address a significant part of this problem by introducing the Automatic Cohort Extraction System (ACES) for event-stream data. |
Justin Xu; Jack Gallifant; ALISTAIR JOHNSON; Matthew B.A. McDermott; | code |
| 290 | HShare: Fast LLM Decoding By Hierarchical Key-Value Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reveal substantial similarities in KV cache token criticality across neighboring queries, layers, and heads. |
Huaijin Wu; Lianqiang Li; Hantao Huang; Tu Yi; Jihang Zhang; Minghui Yu; Junchi Yan; | code |
| 291 | Noisy Test-Time Adaptation in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging the zero-shot capability of pre-trained vision-language models (VLMs), this paper introduces Zero-Shot Noisy TTA (ZS-NTTA), focusing on adapting the model to target data with noisy samples during test-time in a zero-shot manner. |
Chentao Cao; Zhun Zhong; Zhanke Zhou; Tongliang Liu; Yang Liu; Kun Zhang; Bo Han; | code |
| 292 | DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automating this labor-intensive process by creating autonomous data generation agents – or teachers – is desirable, but requires environments that can simulate the feedback-driven, iterative, closed loop of data creation. To enable rapid and scalable testing for such agents and their modules, we introduce DataEnvGym, a testbed of teacher environments for data generation agents. |
Zaid Khan; Elias Stengel-Eskin; Jaemin Cho; Mohit Bansal; | code |
| 293 | Decoupling Angles and Strength in Low-rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DeLoRA, a novel finetuning method that normalizes and scales learnable low-rank matrices. |
Massimo Bini; Leander Girrbach; Zeynep Akata; | code |
| 294 | UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified CoT distillation framework termed UniCoTT for considering diverse structural CoTs (\emph{i.e.}, chain, tree, and graph). |
Xianwei Zhuang; Zhihong Zhu; Zhichang Wang; Xuxin Cheng; Yuexian Zou; | code |
| 295 | Analyzing and Boosting The Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our study, we revisit three quintessential capabilities of MLLMs for FGVR, including object information extraction, category knowledge reserve, object-category alignment, and position of the root cause as a misalignment problem. To address this issue, we present Finedefics, an MLLM that enhances the model’s FGVR capability by incorporating informative attribute descriptions of objects into the training phase. |
Hulingxiao He; Geng Li; Zijun Geng; Jinglin Xu; Yuxin Peng; | code |
| 296 | Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing work treats diffusion models as a standalone component for perception tasks, employing them either solely for off-the-shelf data augmentation or as mere feature extractors. In contrast to these isolated and thus sub-optimal efforts, we introduce a integrated, versatile, diffusion-based framework, Diff-2-in-1, that can simultaneously handle both multi-modal data generation and dense visual perception, through a unique exploitation of the diffusion-denoising process. |
Shuhong Zheng; Zhipeng Bao; Ruoyu Zhao; Martial Hebert; Yu-Xiong Wang; | code |
| 297 | Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tasks such as insertion with different objects or cloth hanging require precise control and effective modelling of complex dynamics. In this work, we frame this problem through the lens of a heterogeneous graph that comprises smaller sub-graphs, such as actuators and objects, accompanied by different edge types describing their interactions. |
Tai Hoang; Huy Le; Philipp Becker; Vien Anh Ngo; Gerhard Neumann; | code |
| 298 | Rodimus*: Breaking The Accuracy-Efficiency Trade-Off with Efficient Attentions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the classical softmax attention incurs significant computational costs, leading to a $O(T)$ complexity for per-token generation, where $T$ represents the context length. This work explores reducing LLMs’ complexity while maintaining performance by introducing Rodimus and its enhanced version, Rodimus$+$. |
Zhihao He; Hang Yu; Zi Gong; Shizhan Liu; Jianguo Li; Weiyao Lin; | code |
| 299 | MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, current reasoning segmentation datasets predominantly focus on a single target object-level reasoning, which limits the detailed recognition of an object’s parts in multi-target contexts. To address this gap, we construct a large-scale dataset called Multi-target and Multi-granularity Reasoning (MMR). |
Donggon Jang; Yucheol Cho; Suin Lee; Taehyeon Kim; Daeshik Kim; | code |
| 300 | Unposed Sparse Views Room Layout Reconstruction in The Age of Pretrain Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce Plane-DUSt3R, a novel method for multi-view room layout estimation leveraging the 3D foundation model DUSt3R. |
Yaxuan Huang; Xili Dai; Jianan Wang; Xianbiao Qi; Yixing Yuan; Xiangyu Yue; | code |
| 301 | FreDF: Learning to Forecast in The Frequency Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that the learning objective of DF is biased in the presence of label correlation. |
Hao Wang; Lichen Pan; Yuan Shen; Zhichao Chen; Degui Yang; Yifei Yang; Sen Zhang; Xinggao Liu; Haoxuan Li; Dacheng Tao; | code |
| 302 | Fourier Head: Helping Large Language Models Learn Complex Probability Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a neural network layer, constructed using Fourier series, which we can easily substitute for any linear layer if we want the outputs to have a more continuous structure. |
Nate Gillman; Daksh Aggarwal; Michael Freeman; Chen Sun; | code |
| 303 | FLIP: Flow-Centric Generative Planning As General-Purpose Manipulation World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present FLow-CentrIc generative Planning (FLIP), a model-based planning algorithm on visual space that features three key modules: 1) a multi-modal flow generation model as the general-purpose action proposal module; 2) a flow-conditioned video generation model as the dynamics module; and 3) a vision-language representation learning model as the value module. |
Chongkai Gao; Haozhuo Zhang; Zhixuan Xu; Cai Zhehao; Lin Shao; | code |
| 304 | HelpSteer2-Preference: Complementing Ratings with Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using this data, we conduct the first head-to-head comparison of Bradley-Terry and Regression models when adequately matched for data. Based on insights derived from such a comparison, we propose a novel approach to combine Bradley-Terry and Regression reward modeling. |
Zhilin Wang; Alexander Bukharin; Olivier Delalleau; Daniel Egert; Gerald Shen; Jiaqi Zeng; Oleksii Kuchaiev; Yi Dong; | code |
| 305 | A CLIP-Powered Framework for Robust and Generalizable Data Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works typically rely on single-modality information to assign importance scores for individual samples, which may lead to inaccurate assessments, especially when dealing with noisy or corrupted samples. To address this limitation, we propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection. |
Suorong Yang; Peng Ye; Wanli Ouyang; Dongzhan Zhou; Furao Shen; | code |
| 306 | BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on black-box defense for VLMs against jailbreak attacks. |
Yunhan Zhao; Xiang Zheng; Lin Luo; Yige Li; Xingjun Ma; Yu-Gang Jiang; | code |
| 307 | Test-time Alignment of Diffusion Models Without Reward Over-optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. |
Sunwoo Kim; Minkyu Kim; Dongmin Park; | code |
| 308 | Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring The Capabilities of Spoken Language Models with 180 Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. |
Chien-yu Huang; Wei-Chih Chen; Shu-wen Yang; Andy T. Liu; Chen-An Li; Yu-Xiang Lin; Wei-Cheng Tseng; Anuj Diwan; Yi-Jen Shih; Jiatong Shi; William Chen; Chih-Kai Yang; Xuanjun Chen; Chi-Yuan Hsiao; Puyuan Peng; Shih-Heng Wang; Chun-Yi Kuan; Ke-Han Lu; Kai-Wei Chang; Fabian Alejandro Ritter Gutierrez; Huang Kuan-Po; Siddhant Arora; You-Kuan Lin; CHUANG Ming To; Eunjung Yeo; Kalvin Chang; Chung-Ming Chien; Kwanghee Choi; Cheng-Hsiu Hsieh; Yi-Cheng Lin; Chee-En Yu; I-Hsiang Chiu; Heitor Guimarães; Jionghao Han; Tzu-Quan Lin; Tzu-Yuan Lin; Homu Chang; Ting-Wu Chang; Chun Wei Chen; Shou-Jen Chen; Yu-Hua Chen; Hsi-Chun Cheng; Kunal Dhawan; Jia-Lin Fang; Shi-Xin Fang; KUAN YU FANG CHIANG; Chi An Fu; Hsien-Fu Hsiao; Ching Yu Hsu; Shao-Syuan Huang; Lee Chen Wei; Hsi-Che Lin; Hsuan-Hao Lin; Hsuan-Ting Lin; Jian-Ren Lin; Ting-Chun Liu; Li-Chun Lu; Tsung-Min Pai; Ankita Pasad; Shih-Yun Shan Kuan; Suwon Shon; Yuxun Tang; Yun-Shao Tsai; Wei Jui Chiang; Tzu-Chieh Wei; Chengxi Wu; Dien-Ruei Wu; Chao-Han Huck Yang; Chieh-Chi Yang; Jia Qi Yip; Shao-Xiang Yuan; Haibin Wu; Karen Livescu; David Harwath; Shinji Watanabe; Hung-yi Lee; | code |
| 309 | Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Policy Decorator, which uses a model-agnostic residual policy to refine large imitation learning models during online interactions. |
Xiu Yuan; Tongzhou Mu; Stone Tao; Yunhao Fang; Mengke Zhang; Hao Su; | code |
| 310 | STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present STORM, a spatio-temporal reconstruction model designed for reconstructing dynamic outdoor scenes from sparse observations. |
Jiawei Yang; Jiahui Huang; Boris Ivanovic; Yuxiao Chen; Yan Wang; Boyi Li; Yurong You; Apoorva Sharma; Maximilian Igl; Peter Karkus; Danfei Xu; Yue Wang; Marco Pavone; | code |
| 311 | On Large Language Model Continual Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without previous data, the utility preservation during unlearning is much harder. To overcome these challenges, we propose the \OOO{} framework that includes an \underline{\textit{O}}rthogonal low-rank adapter (LoRA) for continually unlearning requested data and an \underline{\textit{O}}ut-\underline{\textit{O}}f-Distribution (OOD) detector to measure the similarity between input and unlearning data. |
Chongyang Gao; Lixu Wang; Kaize Ding; Chenkai Weng; Xiao Wang; Qi Zhu; | code |
| 312 | Field-DiT: Diffusion Transformer on Unified Video, 3D, and Game Field Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation can be attributed to their MLP architecture, which lacks sufficient inductive bias to capture global structures through uniform sampling. To address this, we propose a new and simple model that incorporates a view-wise sampling algorithm to focus on local structure learning, along with autoregressive generation to preserve global geometry. |
Kangfu Mei; Mo Zhou; Vishal M. Patel; | code |
| 313 | MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. |
Qingming LIU; Yuan Liu; Jiepeng Wang; Xianqiang Lyu; Peng Wang; Wenping Wang; Junhui Hou; | code |
| 314 | PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The global self-attention mechanism in diffusion transformers involves redundant computation due to the sparse and redundant nature of visual information, and the attention map of tokens within a spatial window shows significant similarity. To address this redundancy, we propose the Proxy-Tokenized Diffusion Transformer (PT-DiT), which employs sparse representative token attention (where the number of representative tokens is much smaller than the total number of tokens) to efficiently model global visual information. |
Jing Wang; Ao Ma; Jiasong Feng; Dawei Leng; Yuhui Yin; Xiaodan Liang; | code |
| 315 | DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an approach that curates large-scale successful robot tracking demonstrations, comprising pairs of human references and robot actions, to train a neural controller. |
Xueyi Liu; Jianibieke Adalibieke; Qianwei Han; Yuzhe Qin; Li Yi; | code |
| 316 | REEF: Representation Encoding Fingerprints for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a training-free REEF to identify the relationship between the suspect and victim models from the perspective of LLMs’ feature representations. |
Jie Zhang; Dongrui Liu; Chen Qian; Linfeng Zhang; Yong Liu; Yu Qiao; Jing Shao; | code |
| 317 | Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Score Forgetting Distillation (SFD), an innovative MU approach that promotes the forgetting of undesirable information in diffusion models by aligning the conditional scores of unsafe classes or concepts with those of safe ones. |
Tianqi Chen; Shujian Zhang; Mingyuan Zhou; | code |
| 318 | GraphArena: Evaluating and Exploring Large Language Models on Graph Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce GraphArena, a benchmarking tool designed to evaluate LLMs on real-world graph computational problems. |
Jianheng Tang; Qifan Zhang; Yuhan Li; Nuo Chen; Jia Li; | code |
| 319 | PaPaGei: Open Foundation Models for Optical Physiological Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current research is limited by the use of single-device datasets, insufficient exploration of out-of-domain generalization, and a lack of publicly available models, which hampers reproducibility. To address these limitations, we present PaPaGei, the first open foundation model for PPG signals. |
Arvind Pillai; Dimitris Spathis; Fahim Kawsar; Mohammad Malekzadeh; | code |
| 320 | LOKI: A Comprehensive Synthetic Data Detection Benchmark Using Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities. |
Junyan Ye; Baichuan Zhou; Zilong Huang; Junan Zhang; Tianyi Bai; Hengrui Kang; Jun He; Honglin Lin; Zihao Wang; Tong Wu; Zhizheng Wu; Yiping Chen; Dahua Lin; Conghui He; Weijia Li; | code |
| 321 | Multimodal Quantitative Language for Generative Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate efficient recommendation knowledge transfer, we propose a novel approach called Multimodal Quantitative Language for Generative Recommendation (MQL4GRec). |
Jianyang Zhai; Zi-Feng Mai; Chang-Dong Wang; Feidiao Yang; Xiawu Zheng; Hui Li; Yonghong Tian; | code |
| 322 | LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, many competitive methods have high resource requirements and computational overhead for quantizing models, making it challenging to scale them to hundreds of billions of parameters. In response to these challenges, we propose LeanQuant (Loss-error-aware network Quantization), a novel quantization method that is accurate, versatile, and scalable. |
Tianyi Zhang; Anshumali Shrivastava; | code |
| 323 | SEAL: Safety-enhanced Aligned LLM Fine-tuning Via Bilevel Data Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. |
Han Shen; Pin-Yu Chen; Payel Das; Tianyi Chen; | code |
| 324 | Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control. |
Abdelrahman Eldesokey; Peter Wonka; | code |
| 325 | Deep Kernel Relative Test for Machine-generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, it tends to make mistakes in identifying HWTs that deviate from the seen HWT distribution, limiting their use in sensitive areas like academic integrity verification. To address this issue, we propose to employ non-parametric kernel relative test to detect MGTs by testing whether it is statistically significant that the distribution of a text to be tested is closer to the distribution of HWTs than to the MGTs’ distribution. |
Yiliao Song; Zhenqiao Yuan; Shuhai Zhang; Zhen Fang; Jun Yu; Feng Liu; | code |
| 326 | IgGM: A Generative Model for Functional Antibody and Nanobody Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce IgGM, a generative model for the de novo design of immunoglobulins with functional specificity. |
Rubo Wang; Fandi Wu; Xingyu Gao; Jiaxiang Wu; Peilin Zhao; Jianhua Yao; | code |
| 327 | SiReRAG: Indexing Similar and Related Information for Multihop Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SiReRAG, a novel RAG indexing approach that explicitly considers both similar and related information. |
Nan Zhang; Prafulla Kumar Choubey; Alexander Fabbri; Gabriel Bernadett-Shapiro; Rui Zhang; Prasenjit Mitra; Caiming Xiong; Chien-Sheng Wu; | code |
| 328 | Making Text Embedders Few-Shot Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a simple yet effective training strategy, which significantly improves text representation capabilities. |
Chaofan Li; Minghao Qin; Shitao Xiao; Jianlyu Chen; Kun Luo; Defu Lian; Yingxia Shao; Zheng Liu; | code |
| 329 | Vertical Federated Learning with Missing Features During Training and Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Missing feature blocks are therefore a key challenge limiting the applicability of vertical federated learning in real-world scenarios. To address this, we propose LASER-VFL, a vertical federated learning method for efficient training and inference of split neural network-based models that is capable of handling arbitrary sets of partitions. |
Pedro Valdeira; Shiqiang Wang; Yuejie Chi; | code |
| 330 | Learning to Discretize Denoising Diffusion ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, reducing the number of NFEs while preserving generation quality is crucial. To address this, we propose LD3, a lightweight framework designed to learn the optimal time discretization for sampling. |
Vinh Tong; Dung Trung Hoang; Anji Liu; Guy Van den Broeck; Mathias Niepert; | code |
| 331 | TabDiff: A Mixed-type Diffusion Model for Tabular Data Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model. |
Juntong Shi; Minkai Xu; Harper Hua; Hengrui Zhang; Stefano Ermon; Jure Leskovec; | code |
| 332 | Tool-Planner: Task Planning with Clusters Across Multiple Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, designing a correct plan among multiple tools is also a challenge in tool learning. To address these issues, we propose Tool-Planner, a task-processing framework based on toolkits. |
Yanming Liu; Xinyue Peng; Jiannan Cao; Shi Bo; Yuwei Zhang; Xuhong Zhang; Sheng Cheng; Xun Wang; Jianwei Yin; Tianyu Du; | code |
| 333 | Mitigate The Gap: Improving Cross-Modal Alignment in CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose AlignCLIP, in order to improve the alignment between text and image embeddings, and thereby reduce the modality gap. |
Sedigheh Eslami; Gerard de Melo; | code |
| 334 | Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we challenge the prevailing belief of the accuracy-interpretability tradeoff, showing that monosemantic features not only enhance interpretability but also bring concrete gains in model performance of {\color{black} robustness-related tasks}. |
Qi Zhang; Yifei Wang; Jingyi Cui; Xiang Pan; Qi Lei; Stefanie Jegelka; Yisen Wang; | code |
| 335 | CATCH: Channel-Aware Multivariate Time Series Anomaly Detection Via Frequency Patching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To contend with the limitations, we introduce CATCH, a framework based on frequency patching. |
Xingjian Wu; Xiangfei Qiu; Zhengyu Li; Yihang Wang; Jilin Hu; Chenjuan Guo; Hui Xiong; Bin Yang; | code |
| 336 | JPEG Inspired Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although it is traditionally believed that lossy image compression, such as JPEG compression, has a negative impact on the performance of deep neural networks (DNNs), it is shown by recent works that well-crafted JPEG compression can actually improve the performance of deep learning (DL). Inspired by this, we propose JPEG-DL, a novel DL framework that prepends any underlying DNN architecture with a trainable JPEG compression layer. |
Ahmed H. Salamah; Kaixiang Zheng; Yiwen Liu; EN-HUI YANG; | code |
| 337 | Can LLMs Understand Time Series Anomalies? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study provides the first comprehensive analysis of contemporary LLM capabilities in time series anomaly detection. |
Zihao Zhou; Rose Yu; | code |
| 338 | Understanding The Stability-based Generalization of Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further understand the real performance from a generalization perspective, we propose the first algorithm-dependent generalization analysis with uniform stability for the typical PFL method, Partial Model Personalization, on smooth and non-convex objectives. |
Yingqi Liu; Qinglun Li; Jie Tan; Yifan Shi; Li Shen; Xiaochun Cao; | code |
| 339 | Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, these approaches measure the divergence between predicted and ground truth distributions, which leads to an indirect comparison between the predicted distributions and cannot represent the variances between model behaviors. To address these issues, we aim to measure the direct comparison between predicted distributions with an attribution score to analyse the training sample importance, which is achieved by Diffusion Attribution Score (\textit{DAS}). |
Jinxu Lin; Linwei Tao; Minjing Dong; Chang Xu; | code |
| 340 | Hotspot-Driven Peptide Design Via Multi-Fragment Autoregressive Extension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Third, realistic tasks for peptide drug development are still lacking. To address these challenges, we introduce PepHAR, a hot-spot-driven autoregressive generative model for designing peptides targeting specific proteins. |
Jiahan Li; Tong Chen; Shitong Luo; Chaoran Cheng; Jiaqi Guan; Ruihan Guo; Sheng Wang; Ge Liu; Jian Peng; Jianzhu Ma; | code |
| 341 | Diff-Prompt: Diffusion-driven Prompt Generator with Mask Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Diffusion-Driven Prompt Generator (Diff-Prompt), aiming to use the diffusion model to generate rich and fine-grained prompt information for complex downstream tasks. |
Weicai Yan; Wang Lin; Zirun Guo; Ye Wang; Fangming Feng; Xiaoda Yang; Zehan Wang; Tao Jin; | code |
| 342 | DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the problem of Generalized Category Discovery (GCD). |
Yuanpei Liu; Kai Han; | code |
| 343 | Adversarial Score Identity Distillation: Rapidly Surpassing The Teacher in One Step Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SiDA (SiD with Adversarial Loss), which not only enhances generation quality but also improves distillation efficiency by incorporating real images and adversarial loss. |
Mingyuan Zhou; Huangjie Zheng; Yi Gu; Zhendong Wang; Hai Huang; | code |
| 344 | Guided Score Identity Distillation for Data-Free One-Step Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a significant limitation of these models is their slow sample generation process, which requires iterative refinement through the same network. To overcome this, we introduce a data-free guided distillation method that enables the efficient distillation of pretrained Stable Diffusion models without access to the real training data, often restricted due to legal, privacy, or cost concerns. |
Mingyuan Zhou; Zhendong Wang; Huangjie Zheng; Hai Huang; | code |
| 345 | Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These challenges often arise due to the complexity and ambiguity present in longer texts. To enhance the performance of LLMs in such scenarios, we introduce the Long Question Coreference Adaptation (LQCA) method. |
Yanming Liu; Xinyue Peng; Jiannan Cao; Shi Bo; Yanxin Shen; Tianyu Du; Sheng Cheng; Xun Wang; Jianwei Yin; Xuhong Zhang; | code |
| 346 | Mitigating The Backdoor Effect for Multi-Task Model Merging Via Safety-Aware Subspace Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first investigate the vulnerabilities of existing model merging methods to backdoor attacks, identifying two critical challenges: backdoor succession and backdoor transfer. To address these issues, we propose a novel Defense-Aware Merging (DAM) approach that simultaneously mitigates task interference and backdoor vulnerabilities. |
Jinluan Yang; Anke Tang; Didi Zhu; Zhengyu Chen; Li Shen; Fei Wu; | code |
| 347 | SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following that, we propose SageAttention, a highly efficient and accurate quantization method for attention. |
Jintao Zhang; Jia wei; Pengle Zhang; Jun Zhu; Jianfei Chen; | code |
| 348 | Toward Generalizing Visual Brain Decoding to Unseen Subjects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the emergence of larger and more comprehensive datasets, it is possible to train a brain decoding foundation model in the future. |
Xiangtao Kong; Kexin Huang; Ping Li; Lei Zhang; | code |
| 349 | Diffusion Bridge AutoEncoders for Unsupervised Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The diffusion endpoint $\mathbf{x}_T$ is computationally expensive to obtain and inflexible in dimensionality. To address this problem, we introduce Diffusion Bridge AuteEncoders (DBAE), which enables $\mathbf{z}$-dependent endpoint $\mathbf{x}_T$ inference through a feed-forward architecture. |
Yeongmin Kim; Kwanghyeon Lee; Minsang Park; Byeonghu Na; Il-chul Moon; | code |
| 350 | CO-MOT: Boosting End-to-end Transformer-based Multi-Object Tracking Via Coopetition Label Assignment and Shadow Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As such, e2e-MOT will incline to generate a tracking terminal without renewal or re-initialization, compared to other tracking-by-detection methods. To alleviate this problem, we propose **Co-MOT**, a simple yet effective method to facilitate e2e-MOT by a novel coopetition label assignment with a shadow concept. |
Feng yan; Weixin Luo; Yujie Zhong; Yiyang Gan; Lin Ma; | code |
| 351 | Boltzmann Priors for Implicit Transfer Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce Boltzmann Priors for ITO (BoPITO) to enhance ITO learning in two ways. |
Juan Viguera Diez; Mathias Jacob Schreiner; Ola Engkvist; Simon Olsson; | code |
| 352 | MVTokenFlow: High-quality 4D Content Generation Using Multiview Token Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present MVTokenFlow for high-quality 4D content creation from monocular videos. |
Hanzhuo Huang; Yuan Liu; Ge Zheng; Jiepeng Wang; Zhiyang Dou; Sibei Yang; | code |
| 353 | Procedural Synthesis of Synthesizable Molecules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given a black-box oracle to optimize, we formulate a joint design space over syntactic templates and molecular descriptors and introduce evolutionary algorithms that optimize both syntactic and semantic dimensions synergistically. |
Michael Sun; Alston Lo; Minghao Guo; Jie Chen; Connor W. Coley; Wojciech Matusik; | code |
| 354 | Multi-Reward As Condition for Instruction-based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to address the training data quality issue with multi-perspective reward data instead of refining the ground-truth image quality.3) We also build a challenging evaluation benchmark with real-world images/photos and diverse editing instructions, named as Real-Edit. |
Xin Gu; Ming Li; Libo Zhang; Fan Chen; Longyin Wen; Tiejian Luo; Sijie Zhu; | code |
| 355 | Advancing Graph Generation Through Beta Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Graphs typically feature discrete structures and continuous node attributes that often exhibit rich statistical patterns, including sparsity, bounded ranges, skewed distributions, and long-tailed behavior. To address these challenges, we introduce Graph Beta Diffusion (GBD), a generative model specifically designed to handle the diverse nature of graph data. |
Xinyang Liu; Yilin He; Bo Chen; Mingyuan Zhou; | code |
| 356 | Dynamic Low-Rank Sparse Adaptation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduces dynamic $\textbf{Lo}$w-rank $\textbf{S}$parse $\textbf{A}$daptation $\textbf{(LoSA)}$, a novel method that seamlessly integrates low-rank adaptation into LLM sparsity within a unified framework, thereby enhancing the performance of sparse LLMs without increasing the inference latency. |
Weizhong Huang; Yuxin Zhang; Xiawu Zheng; Liuyang; Jing Lin; Yiwu Yao; Rongrong Ji; | code |
| 357 | ElasticTok: Adaptive Tokenization for Image and Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ElasticTok, a method that conditions on prior frames to adaptively encode a frame into a variable number of tokens. To enable this in a computationally scalable way, we propose a masking technique that drops a random number of tokens at the end of each frames’s token encoding. |
Wilson Yan; Volodymyr Mnih; Aleksandra Faust; Matei Zaharia; Pieter Abbeel; Hao Liu; | code |
| 358 | HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Hadamard High-Rank Adaptation (HiRA), a parameter-efficient fine-tuning (PEFT) method that enhances the adaptability of Large Language Models (LLMs). |
Qiushi Huang; Tom Ko; Zhan Zhuang; Lilian Tang; Yu Zhang; | code |
| 359 | ComLoRA: A Competitive Learning Approach for Enhancing LoRA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Competitive Low-Rank Adaptation (ComLoRA) framework to address the limitations of the LoRA method, which either lacks capacity with a single rank-$r$ LoRA or risks inefficiency and overfitting with a larger rank-$Kr$ LoRA, where $K$ is an integer larger than 1. |
Qiushi Huang; Tom Ko; Lilian Tang; Yu Zhang; | code |
| 360 | HyperFace: Generating Synthetic Face Recognition Datasets By Exploring Face Embedding Hypersphere Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate the dataset generation as a packing problem on the embedding space (represented on a hypersphere) of a face recognition model and propose a new synthetic dataset generation approach, called HyperFace. |
Hatef Otroshi Shahreza; Sébastien Marcel; | code |
| 361 | CBGBench: Fill in The Blank of Protein-Molecule Complex Binding Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Firstly, the absence of standardization can lead to unfair comparisons and inconclusive insights. To address this dilemma, we propose CBGBench, a comprehensive benchmark for SBDD, that unifies the task as a generative graph completion, analogous to fill-in-the-blank of the 3D complex binding graph. |
Haitao Lin; Guojiang Zhao; Odin Zhang; Yufei Huang; Lirong Wu; Cheng Tan; Zicheng Liu; Zhifeng Gao; Stan Z. Li; | code |
| 362 | MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite their impressive problem-solving skills in many domains, MLLMs’ ability in industrial anomaly detection has not been systematically studied. To bridge this gap, we present MMAD, a full-spectrum MLLM benchmark in industrial Anomaly Detection. |
Xi Jiang; Jian Li; Hanqiu Deng; Yong Liu; Bin-Bin Gao; Yifeng Zhou; Jialin Li; Chengjie Wang; Feng Zheng; | code |
| 363 | Training-Free Diffusion Model Alignment with Sampling Demons Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for aligning diffusion models either require retraining or are limited to differentiable reward functions. To address these limitations, we propose a stochastic optimization approach, dubbed *Demon*, to guide the denoising process at inference time without backpropagation through reward functions or model retraining. |
Po-Hung Yeh; Kuang-Huei Lee; Jun-cheng Chen; | code |
| 364 | What Makes A Good Diffusion Planner for Decision Making? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While numerous studies have demonstrated the impressive performance of diffusion planning, the mechanisms behind the key components of a good diffusion planner remain unclear and the design choices are highly inconsistent in existing studies. In this work, we address this issue through systematic empirical experiments on diffusion planning in an offline reinforcement learning (RL) setting, providing practical insights into the essential components of diffusion planning. |
Haofei Lu; Dongqi Han; Yifei Shen; Dongsheng Li; | code |
| 365 | DRESSing Up LLM: Efficient Stylized Question-Answering Via Style Subspace Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DRESS, a novel approach for generating stylized large language model (LLM) responses through representation editing.We develop two stylized QA benchmark datasets to validate the effectiveness of DRESS, and the results demonstrate significant improvements compared to baseline methods such as prompting and ITI. |
Xinyu Ma; Yifeng Xu; Yang Lin; Tianlong Wang; Xu Chu; Xin Gao; Junfeng Zhao; Yasha Wang; | code |
| 366 | Incorporating Visual Correspondence Into Diffusion Model for Virtual Try-On Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, the problem remains challenging to preserve the shape and every detail of the given garment due to the intrinsic stochasticity of diffusion model. To alleviate this issue, we novelly propose to explicitly capitalize on visual correspondence as the prior to tame diffusion process instead of simply feeding the whole garment into UNet as the appearance reference. |
Siqi Wan; Jingwen Chen; Yingwei Pan; Ting Yao; Tao Mei; | code |
| 367 | VideoShield: Regulating Diffusion-based Video Generation Models Via Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose VideoShield, a novel watermarking framework specifically designed for popular diffusion-based video generation models. |
Runyi Hu; Jie Zhang; Yiming Li; Jiwei Li; Qing Guo; Han Qiu; Tianwei Zhang; | code |
| 368 | Diffusion Bridge Implicit Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take the first step in fast sampling of DDBMs without extra training, motivated by the well-established recipes in diffusion models. |
Kaiwen Zheng; Guande He; Jianfei Chen; Fan Bao; Jun Zhu; | code |
| 369 | Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The well-trained MLLMs are more accustomed to the fixed pathway and a drastic change in its inference manner also greatly impedes its performance. To address these issues, we propose a novel dynamic expert routing method for existing MLLMs, termed Routing Experts (RoE), which can achieve example-dependent optimal path routing without obvious structure tweaks. |
Qiong Wu; Zhaoxi Ke; Yiyi Zhou; Xiaoshuai Sun; Rongrong Ji; | code |
| 370 | Towards A Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon this modeling, we demonstrate that the generalization capability of the post-trained model is critically determined by the information gain derived from the generative model, as analyzed from a novel reverse-bottleneck perspective. |
Zeyu Gan; Yong Liu; | code |
| 371 | DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DiffusionGuard, a robust and effective defense method against unauthorized edits by diffusion-based image editing models, even in challenging setups.Finally, we introduce a comprehensive benchmark designed to evaluate the effectiveness and robustness of methods in protecting against privacy threats in realistic scenarios. |
June Suk Choi; Kyungmin Lee; Jongheon Jeong; Saining Xie; Jinwoo Shin; Kimin Lee; | code |
| 372 | Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods rely heavily on annotated data, and struggle with effectively disentangling timbre and style, leading to challenges in achieving controllable generation, especially in zero-shot scenarios. To address these issues, we propose Vevo, a versatile zero-shot voice imitation framework with controllable timbre and style. |
Xueyao Zhang; Xiaohui Zhang; Kainan Peng; Zhenyu Tang; Vimal Manohar; Yingru Liu; Jeff Hwang; Dangna Li; Yuhao Wang; Julian Chan; Yuan Huang; Zhizheng Wu; Mingbo Ma; | code |
| 373 | MLLM As Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglecting their effectiveness for the specific task at hand. To address this issue, we propose a novel method, MART, which enhances the performance of embodied agents by utilizing interaction data to fine-tune an MLLM retriever based on preference learning, such that the retriever fully considers the effectiveness of trajectories and prioritize them for unseen tasks. |
Junpeng Yue; Xinrun Xu; Börje F. Karlsson; Zongqing Lu; | code |
| 374 | Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Simulation techniques for soft bodies are comparatively several orders of magnitude slower, thereby limiting the use of RL due to sample complexity requirements. To address this challenge, this paper presents both a novel RL algorithm and a simulation platform to enable scaling RL on tasks involving rigid bodies and deformables. |
Eliot Xing; Vernon Luk; Jean Oh; | code |
| 375 | SVG: 3D Stereoscopic Video Generation Via Denoising Frame Matrix Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a pose-free and training-free approach for generating 3D stereoscopic videos using an off-the-shelf monocular video generation model. |
Peng Dai; Feitong Tan; Qiangeng Xu; David Futschik; Ruofei Du; Sean Fanello; XIAOJUAN QI; Yinda Zhang; | code |
| 376 | Residual Stream Analysis with Multi-Layer SAEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, SAEs are usually trained separately on each transformer layer, making it difficult to use them to study how information flows across layers. To solve this problem, we introduce the multi-layer SAE (MLSAE): a single SAE trained on the residual stream activation vectors from every transformer layer. |
Tim Lawson; Lucy Farnik; Conor Houghton; Laurence Aitchison; | code |
| 377 | Robust Barycenter Estimation Using Semi-Unbalanced Neural Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in real-world scenarios, the presence of outliers and noise in the data measures can significantly hinder the performance of traditional statistical methods for estimating OT barycenters. To address this issue, we propose a novel scalable approach for estimating the *robust* continuous barycenter, leveraging the dual formulation of the *(semi-)unbalanced* OT problem. |
Milena Gazdieva; Jaemoo Choi; Alexander Kolesov; Jaewoong Choi; Petr Mokrov; Alexander Korotin; | code |
| 378 | Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome those challenges, we propose to rethink robot world models as learnable digital twins. We introduce DreMa, a new approach for constructing digital twins automatically using learned explicit representations of the real world and its dynamics, bridging the gap between traditional digital twins and world models. |
Leonardo Barcellona; Andrii Zadaianchuk; Davide Allegro; Samuele Papa; Stefano Ghidoni; Efstratios Gavves; | code |
| 379 | ReSi: A Comprehensive Benchmark for Representational Similarity Measures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the first comprehensive benchmark for evaluating representational similarity measures based on well-defined groundings of similarity. |
Max Klabunde; Tassilo Wald; Tobias Schumacher; Klaus Maier-Hein; Markus Strohmaier; Florian Lemmerich; | code |
| 380 | Standardizing Structural Causal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing metrics like $\operatorname{Var}$-sortability and $\operatorname{R^2}$-sortability quantify these patterns, but they do not provide tools to remedy them. To address this, we propose internally-standardized structural causal models (iSCMs), a modification of SCMs that introduces a standardization operation at each variable during the generative process. |
Weronika Ormaniec; Scott Sussex; Lars Lorch; Bernhard Schölkopf; Andreas Krause; | code |
| 381 | NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose NeRAF, a method that jointly learns acoustic and radiance fields. |
Amandine Brunetto; Sascha Hornauer; Fabien Moutarde; | code |
| 382 | Skill Expansion and Composition in Parameter Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Parametric Skill Expansion and Composition (PSEC), a new framework designed to iteratively evolve the agents’ capabilities and efficiently address new challenges by maintaining a manageable skill library. |
Tenglong Liu; Jianxiong Li; Yinan Zheng; Haoyi Niu; Yixing Lan; Xin Xu; Xianyuan Zhan; | code |
| 383 | Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes.We release our dataset and code on https://github.com/EIT-NLP/InfoSearch. |
Jianqun Zhou; Yuanlei Zheng; Wei Chen; Qianqian Zheng; Shang Zeyuan; Wei Zhang; Rui Meng; Xiaoyu Shen; | code |
| 384 | RaSA: Rank-Sharing Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the limited expressive capacity of LoRA, stemming from the low-rank constraint, has been recognized as a bottleneck, particularly in rigorous tasks like code generation and mathematical reasoning. To address this limitation, we introduce Rank-Sharing Low-Rank Adaptation (RaSA), an innovative extension that enhances the expressive capacity of LoRA by leveraging partial rank sharing across layers. |
Zhiwei He; Zhaopeng Tu; Xing Wang; Xingyu Chen; Zhijie Wang; Jiahao Xu; Tian Liang; Wenxiang Jiao; Zhuosheng Zhang; Rui Wang; | code |
| 385 | AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that audio-visual LLMs struggle to discern subtle relationships between audio and visual signals, leading to hallucinations and highlighting the need for reliable benchmarks. |
Kim Sung-Bin; Oh Hyun-Bin; JungMok Lee; Arda Senocak; Joon Son Chung; Tae-Hyun Oh; | code |
| 386 | Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Another limitation is that widely used public datasets mainly focus on pedestrian movements with casual, loosely connected patterns, where interactions between individuals are not always present, especially at a long distance, making them less representative of more structured environments. To overcome these limitations, we propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs, adaptable to diverse scenarios in the domain of sports games. |
Yi Xu; Yun Fu; | code |
| 387 | Offline Model-Based Optimization By Learning to Rank Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that regression models trained with MSE are not well-aligned with the primary goal of offline MBO, which is to \textit{select} promising designs rather than to predict their scores precisely. |
Rong-Xi Tan; Ke Xue; Shen-Huan Lyu; Haopu Shang; yaowang; Yaoyuan Wang; Fu Sheng; Chao Qian; | code |
| 388 | MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation approach.Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. |
Yanqi Dai; Huanran Hu; Lei Wang; Shengjie Jin; Xu Chen; Zhiwu Lu; | code |
| 389 | Smoothing The Shift: Towards Stable Test-Time Adaptation Under Complex Multimodal Noises Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we reveal a new challenge named *multimodal wild TTA*. To address this challenging problem, we propose two novel strategies: sample identification with interquartile range **S**moothing and **u**nimodal assistance, and **M**utual **i**nformation sharing (SuMi). |
Zirun Guo; Tao Jin; | code |
| 390 | SqueezeAttention: 2D Management of KV-Cache in LLM Inference Via Layer-wise Optimal Budget Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we found that by identifying the importance of attention layers, we could optimize the KV-cache jointly from two dimensions, i.e., sequence-wise and layer-wise. |
Zihao Wang; Bin CUI; Shaoduo Gan; | code |
| 391 | Decoupled Graph Energy-based Model for Node Out-of-Distribution Detection on Heterophilic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the sampling challenge, we introduce Decoupled Graph Energy-based Model (DeGEM), which decomposes the learning process into two parts—a graph encoder that leverages topology information for node representations and an energy head that operates in latent space. |
Yuhan Chen; Yihong Luo; Yifan Song; Pengwen Dai; Jing Tang; Xiaochun Cao; | code |
| 392 | Systematic Outliers in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a detailed analysis of the formation process, underlying causes, and functions of outliers in LLMs. |
Yongqi An; Xu Zhao; Tao Yu; Ming Tang; Jinqiao Wang; | code |
| 393 | Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose learning diverse pedestrian movements from web videos. |
Zhizheng Liu; Joe Lin; Wayne Wu; Bolei Zhou; | code |
| 394 | Analytic DAG Constraints for Differentiable DAG Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, these operators can be leveraged to create novel DAG constraints based on existing ones. Using these properties, we design a series of DAG constraints and develop an efficient algorithm to evaluate them. |
Zhen Zhang; Ignavier Ng; Dong Gong; Yuhang Liu; Mingming Gong; Biwei Huang; Kun Zhang; Anton van den Hengel; Javen Qinfeng Shi; | code |
| 395 | Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle these issues, we propose a novel asymmetric contrastive objective named $\textbf{EgoNCE++}$.To address this question, we introduce a benchmark called $\textbf{EgoHOIBench}$, revealing the performance limitation of current egocentric models when confronted with such challenges. |
Boshen Xu; Ziheng Wang; Yang Du; Zhinan Song; Sipeng Zheng; Qin Jin; | code |
| 396 | Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. |
Ge Ya Luo; Gian Mario Favero; ZhiHao Luo; Alexia Jolicoeur-Martineau; Christopher Pal; | code |
| 397 | Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, these methods are typically designed for specific tasks, which limits their generalization to new domains. To address these limitations, we propose Progressive Thought Refinement (PTR), a framework that enables LLMs to progressively refine their responses. |
Chengyu Du; Jinyi Han; Yizhou Ying; Aili Chen; Qianyu He; Haokun Zhao; Haoran Guo; Sirui Xia; Jiaqing Liang; Zulong Chen; Liangyue Li; Yanghua Xiao; | code |
| 398 | Learning to Contextualize Web Pages for Enhanced Decision Making By LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce LCoW, a framework for Learning language models to Contextualize complex Web pages into a more comprehensible form, thereby enhancing decision making by LLM agents. |
Dongjun Lee; Juyong Lee; Kyuyoung Kim; Jihoon Tack; Jinwoo Shin; Yee Whye Teh; Kimin Lee; | code |
| 399 | Learning Harmonized Representations for Speculative Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We also observe another discrepancy between the training and decoding objectives in existing speculative sampling methods. In this work, we propose a solution named HArmonized Speculative Sampling (HASS) that learns harmonized representations to address these issues. |
Lefan Zhang; Xiaodan Wang; Yanhua Huang; Ruiwen Xu; | code |
| 400 | Your Weak LLM Is Secretly A Strong Teacher for Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a systematic study to evaluate and understand weak LLM’s ability to generate feedback for alignment. |
Leitian Tao; Yixuan Li; | code |
| 401 | GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present GI-GS, a novel inverse rendering framework that leverages 3D Gaussian Splatting (3DGS) and deferred shading to achieve photo-realistic novel view synthesis and relighting. |
HONGZE CHEN; Zehong Lin; Jun Zhang; | code |
| 402 | BrainOOD: Out-of-distribution Generalizable Brain Network Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge these gaps, we introduce BrainOOD, a novel framework tailored for brain networks that enhances GNNs’ OOD generalization and interpretability.We also propose the first OOD brain network benchmark, which provides a foundation for future research in this field. |
Jiaxing Xu; Yongqiang Chen; Xia Dong; Mengcheng Lan; Tiancheng HUANG; Qingtian Bian; James Cheng; Yiping Ke; | code |
| 403 | Neuralized Markov Random Field for Interaction-Aware Stochastic Human Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a neuralized Markov random field (MRF)-based motion evolution method for probabilistic interaction-aware human trajectory prediction. |
Zilin Fang; David Hsu; Gim Hee Lee; | code |
| 404 | ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing on principles from traditional linguistics, we define implicitness as the divergence between semantic meaning and pragmatic interpretation. To operationalize this definition, we introduce ImpScore, a reference-free metric formulated through an interpretable regression model. |
Yuxin Wang; Xiaomeng Zhu; Weimin Lyu; Saeed Hassanpour; Soroush Vosoughi; | code |
| 405 | NVS-Solver: Video Diffusion Model As Zero-Shot Novel View Synthesizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose a new novel view synthesis paradigm that operates without the need for training. |
Meng YOU; Zhiyu Zhu; Hui LIU; Junhui Hou; | code |
| 406 | Trusted Multi-View Classification Via Evolutionary Multi-View Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, the integration of a pseudo view exacerbates the issue of imbalanced multi-view learning, as it contains a disproportionate amount of information compared to individual views. To address these issues, we propose the enhancing Trusted multi-view classification via Evolutionary multi-view Fusion (TEF) approach. |
Xinyan Liang; Pinhan Fu; Yuhua Qian; Qian Guo; Guoqing Liu; | code |
| 407 | Generalized Consistency Trajectory Models for Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, this work aims to unlock the full potential of CTMs by proposing generalized CTMs (GCTMs), which translate between arbitrary distributions via ODEs. |
Beomsu Kim; Jaemin Kim; Jeongsol Kim; Jong Chul Ye; | code |
| 408 | Distilling Dataset Into Neural Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. |
Donghyeok Shin; HeeSun Bae; Gyuwon Sim; Wanmo Kang; Il-chul Moon; | code |
| 409 | TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the information density of patterns varies across different frequencies, and employing a uniform modeling approach for different frequency components can lead to inaccurate characterization. To address this challenges, inspired by the flexibility of the recent Kolmogorov-Arnold Network (KAN), we propose a KAN-based Frequency Decomposition Learning architecture (TimeKAN) to address the complex forecasting challenges caused by multiple frequency mixtures. |
Songtao Huang; Zhen Zhao; Can Li; LEI BAI; | code |
| 410 | Training-free Camera Control for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. |
Chen Hou; Zhibo Chen; | code |
| 411 | Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although language models (LMs) have excelled in natural language processing, their application in SBDD is underexplored. To bridge this gap, we introduce a method, known as Frag2Seq, to apply LMs to SBDD by generating molecules in a fragment-based manner in which fragments correspond to functional modules. |
Cong Fu; Xiner Li; Blake Olson; Heng Ji; Shuiwang Ji; | code |
| 412 | Learning to Discover Regulatory Elements for Gene Expression Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce Seq2Exp, a Sequence to Expression network explicitly designed to discover and extract regulatory elements that drive target gene expression, enhancing the accuracy of the gene expression prediction. |
Xingyu Su; Haiyang Yu; Degui Zhi; Shuiwang Ji; | code |
| 413 | Optimal Brain Apoptosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. |
Mingyuan Sun; Zheng Fang; Jiaxu Wang; Junjie Jiang; Delei Kong; Chenming Hu; Yuetong FANG; Renjing Xu; | code |
| 414 | VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing VIC methods face challenges in learning rewards for long-horizon tasks due to their lack of sub-stage awareness, difficulty in modeling task complexities, and inadequate object state estimation. To address these challenges, we introduce VICtoR, a novel hierarchical VIC reward model capable of providing effective reward signals for long-horizon manipulation tasks. |
Kuo-Han Hung; Pang-Chi Lo; Jia-Fong Yeh; Han-Yuan Hsu; Yi-Ting Chen; Winston H. Hsu; | code |
| 415 | One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using A Single Prompt Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from the inherent $\textit{context consistency}$, we propose a novel $\textit{training-free}$ method for consistent text-to-image (T2I) generation, termed One-Prompt-One-Story ($\textit{1Prompt1Story}$). |
Tao Liu; Kai Wang; Senmao Li; Joost van de Weijer; Fahad Shahbaz Khan; Shiqi Yang; Yaxing Wang; Jian Yang; Ming-Ming Cheng; | code |
| 416 | A Second-Order Perspective on Model Compositionality and Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. |
Angelo Porrello; Lorenzo Bonicelli; Pietro Buzzega; Monica Millunzi; Simone Calderara; Rita Cucchiara; | code |
| 417 | Edge Prompt Tuning for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose EdgePrompt, a simple yet effective graph prompt tuning method from the perspective of edges. |
Xingbo Fu; Yinhan He; Jundong Li; | code |
| 418 | MarS: A Financial Market Simulation Engine Powered By Generative Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Large Market Model (LMM), an order-level generative foundation model, for financial market simulation, akin to language modeling in the digital world. |
Junjie Li; Yang Liu; Weiqing Liu; Shikai Fang; Lewen Wang; Chang Xu; Jiang Bian; | code |
| 419 | Towards Synergistic Path-based Explanations for Knowledge Graph Completion: Exploration and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on the observation that a fact is usually determined by the synergy of multiple reasoning chains, we propose a novel explainable framework, dubbed KGExplainer, to explore synergistic pathways. |
Tengfei Ma; Xiang song; Wen Tao; Mufei Li; Jiani Zhang; Xiaoqin Pan; Yijun Wang; Bosheng Song; xiangxiang Zeng; | code |
| 420 | Find A Winning Sign: Sign Is All We Need to Win The Lottery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that the parameter sign configuration plays a crucial role in conveying useful information for generalization to any randomly initialized network. |
Junghun Oh; Sungyong Baik; Kyoung Mu Lee; | code |
| 421 | NoVo: Norm Voting Off Hallucinations with Attention Heads in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a lightweight method, Norm Voting (NoVo), which harnesses the untapped potential of attention head norms to dramatically enhance factual accuracy in zero-shot multiple-choice questions (MCQs). |
Zheng Yi Ho; Siyuan Liang; Sen Zhang; Yibing Zhan; Dacheng Tao; | code |
| 422 | Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model’s learning process. |
Xiaochuan Li; Zichun Yu; Chenyan Xiong; | code |
| 423 | OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we argue that attention scores cannot indicate the future importance of tokens in subsequent generation iterations, because attention scores are calculated based on current hidden states. Therefore, we propose OmniKV, a token-dropping-free and training-free inference method, which achieves a 1.68x speedup without any loss in performance. |
Jitai Hao; Yuke Zhu; Tian Wang; Jun Yu; Xin Xin; Bo Zheng; Zhaochun Ren; Sheng Guo; | code |
| 424 | SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current benchmarks for video understanding typically emphasize isolated single-instance text inputs and fail to evaluate the capacity to sustain temporal reasoning throughout the entire duration of video streams. To address these limitations, we introduce SVBench, a pioneering benchmark with temporal multi-turn question-answering chains specifically designed to thoroughly assess the capabilities of streaming video understanding of current LVLMs. |
Zhenyu Yang; Yuhang Hu; Zemin Du; Dizhan Xue; Shengsheng Qian; Jiahong Wu; Fan Yang; Weiming Dong; Changsheng Xu; | code |
| 425 | Probe Before You Talk: Towards Black-box Defense Against Backdoor Unalignment for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce BEAT, a black-box defense that detects triggered samples during inference to deactivate the backdoor. |
Biao Yi; Tiansheng Huang; Sishuo Chen; Tong Li; Zheli Liu; Zhixuan Chu; Yiming Li; | code |
| 426 | Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches typically rely on highly compressed features and simple model architectures, leading to limited prediction accuracy and poor generalizability. To address these challenges, we introduce VenusVaccine, a novel deep learning solution with a dual attention mechanism that integrates pre-trained latent vector representations of protein sequences and structures. |
Song Li; Yang Tan; Song Ke; Liang Hong; Bingxin Zhou; | code |
| 427 | Quantized Spike-driven Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent research in the SNN domain has mainly focused on enhancing accuracy by designing large-scale Transformer structures, which typically rely on substantial computational resources, limiting their deployment on resource-constrained devices. To overcome this challenge, we propose a quantized spike-driven Transformer baseline (QSD-Transformer), which achieves reduced resource demands by utilizing a low bit-width parameter. |
Xuerui Qiu; Malu Zhang; Jieyuan Zhang; Wenjie Wei; Honglin Cao; Junsheng Guo; Rui-Jie Zhu; Yimeng Shan; Yang Yang; Haizhou Li; | code |
| 428 | Steering Large Language Models Between Code Execution and Textual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We also discover that results from LLM written code are not always better than using textual reasoning, even if the task could be solved through code. To mitigate the above issues, we propose three methods to better steer LLM code/text generation and achieve a notable improvement. |
Yongchao Chen; Harsh Jhamtani; Srinagesh Sharma; Chuchu Fan; Chi Wang; | code |
| 429 | DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the challenges associated with quantizing text-to-image diffusion models from a distributional perspective. |
Hyogon Ryu; NaHyeon Park; Hyunjung Shim; | code |
| 430 | COME: Test-time Adaption By Conservatively Minimizing Entropy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While unfortunately its fatal limitation (i.e., overconfidence) tends to result in model collapse. For this issue, we propose to \textbf{\texttt{Co}}nservatively \textbf{\texttt{M}}inimize the \textbf{\texttt{E}}ntropy (\texttt{COME}), which is a simple drop-in replacement of traditional EM to elegantly address the limitation. |
Qingyang Zhang; Yatao Bian; Xinke Kong; Peilin Zhao; Changqing Zhang; | code |
| 431 | Training-free LLM-generated Text Detection By Mining Token Probability Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel training-free detector, termed \textbf{Lastde}\footnote{The code and data are released at \url{https://github.com/TrustMedia-zju/Lastde_Detector}.} |
Yihuai Xu; Yongwei Wang; Yifei Bi; Huangsen Cao; Zhouhan Lin; Yu Zhao; Fei Wu; | code |
| 432 | TULIP: Token-length Upgraded CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a generalizable method, named TULIP, able to upgrade the token length to any length for CLIP-like models. |
Ivona Najdenkoska; Mohammad Mahdi Derakhshani; Yuki M Asano; Nanne Van Noord; Marcel Worring; Cees G. M. Snoek; | code |
| 433 | SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, to address the prevalent issue of limited supervision in real NAD tasks, previous methods tend to leverage synthetic data to collect auxiliary information, which is not an effective solution as shown in our experiments. To overcome these challenges, we introduce a novel SpaceGNN model designed for NAD tasks with extremely limited labels. |
Xiangyu Dong; Xingyi Zhang; Lei Chen; Mingxuan Yuan; Sibo Wang; | code |
| 434 | Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present $\textbf{Stem}$ ($\underline{\textbf{S}}$pa$\underline{\textbf{T}}$ially resolved gene $\underline{\textbf{E}}$xpression inference with diffusion $\underline{\textbf{M}}$odel), a novel computational tool that leverages a conditional diffusion generative model to enable in silico gene expression inference from H&E stained images. |
Sichen Zhu; Yuchen Zhu; Molei Tao; Peng Qiu; | code |
| 435 | Implicit In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce \textbf{Implicit In-context Learning} (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss. |
Zhuowei Li; Zihao Xu; Ligong Han; Yunhe Gao; Song Wen; Di Liu; Hao Wang; Dimitris N. Metaxas; | code |
| 436 | Discovering Influential Neuron Path in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the significance of influential neuron paths within vision Transformers, which is a path of neurons from the model input to output that impacts the model inference most significantly. |
Yifan Wang; Yifei Liu; Yingdong Shi; Changming Li; Anqi Pang; Sibei Yang; Jingyi Yu; Kan Ren; | code |
| 437 | A Multiscale Frequency Domain Causal Framework for Enhanced Pathological Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing MIL methods might identify patches that do not have true diagnostic significance, leading to false correlations, and experience difficulties in integrating multi-scale features and handling unobservable confounders. To address these issues, we propose a new Multi-Scale Frequency Domain Causal framework (MFC). |
Xiaoyu Cui; Weixing Chen; Jiandong Su; | code |
| 438 | Text-to-Image Rectified Flow As Plug-and-Play Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to diffusion-based methods, rectified flow approaches surpass them in terms of generation quality and efficiency. In this work, we present theoretical and experimental evidence demonstrating that rectified flow based methods offer similar functionalities to diffusion models — they can also serve as effective priors. |
Xiaofeng Yang; Chen Cheng; Xulei Yang; Fayao Liu; Guosheng Lin; | code |
| 439 | SelKD: Selective Knowledge Distillation Via Optimal Transport Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This demand is especially pronounced in the era of foundation models, where the teacher model can be significantly larger than the student model. To address this issue, we propose to rethink the knowledge distillation problem from the perspective of Inverse Optimal Transport (IOT). |
Liangliang Shi; Zhengyan Shi; Junchi Yan; | code |
| 440 | COFlowNet: Conservative Constraints on Flows Enable High-Quality Candidate Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tackling the challenge, we propose Conservative Offline GFlowNet (COFlowNet) in this paper. |
Yudong Zhang; Xuan Yu; Xu Wang; Zhaoyang Sun; Chen Zhang; Pengkun Wang; Yang Wang; | code |
| 441 | BodyGen: Advancing Towards Efficient Embodiment Co-Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To advance towards efficient embodiment co-design, we propose **BodyGen**, which utilizes (1) topology-aware self-attention for both design and control, enabling efficient morphology representation with lightweight model sizes; (2) a temporal credit assignment mechanism that ensures balanced reward signals for optimization. |
Haofei Lu; Zhe Wu; Junliang Xing; Jianshu Li; Ruoyu Li; Zhe Li; Yuanchun Shi; | code |
| 442 | RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Differentiable Data Rewards (DDR) method, which end-to-end trains RAG systems by aligning data preferences between different RAG modules. |
Xinze Li; Sen Mei; Zhenghao Liu; Yukun Yan; Shuo Wang; Shi Yu; Zheni Zeng; Hao Chen; Ge Yu; Zhiyuan Liu; Maosong Sun; Chenyan Xiong; | code |
| 443 | Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we observe that the alignment behavior in weaker models can be effectively transferred to stronger models and even exhibit an amplification effect. |
Wenhong Zhu; Zhiwei He; Xiaofeng Wang; Pengfei Liu; Rui Wang; | code |
| 444 | SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce *SymmetricDiffusers*, a novel discrete diffusion model that simplifies the task of learning a complicated distribution over $S_n$ by decomposing it into learning simpler transitions of the reverse diffusion using deep neural networks. |
Yongxing Zhang; Donglin Yang; Renjie Liao; | code |
| 445 | Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pre-trained large language models (LLMs) have been reliably integrated with visual input for multimodal tasks. |
Leander Girrbach; Stephan Alaniz; Yiran Huang; Trevor Darrell; Zeynep Akata; | code |
| 446 | High-Precision Dichotomous Image Segmentation Via Probing Diffusion Capacity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models, specifically designed for high-resolution, fine-grained object segmentation. |
Qian Yu; Peng-Tao Jiang; Hao Zhang; Jinwei Chen; Bo Li; Lihe Zhang; Huchuan Lu; | code |
| 447 | ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While continual learning aims to mitigate these challenges, our study reveals that existing continual panoptic segmentation (CPS) methods often suffer from efficiency or scalability issues. To address these limitations, we propose an efficient adaptation framework that incorporates attentive self-distillation and dual-decoder prediction fusion to efficiently preserve prior knowledge while facilitating model generalization. |
Ze Yang; Shichao Dong; Ruibo Li; Nan Song; Guosheng Lin; | code |
| 448 | Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article demonstrates how a technique called `trivialization’ can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. |
Yuchen Zhu; Tianrong Chen; Lingkai Kong; Evangelos Theodorou; Molei Tao; | code |
| 449 | Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, formulations based on continuous domains limit their applicability to discrete domains such as graphs. To overcome these limitations, we propose Discrete Diffusion Schrödinger Bridge Matching (DDSBM), a novel framework that utilizes continuous-time Markov chains to solve the SB problem in a high-dimensional discrete state space. |
Jun Hyeong Kim; Seonghwan Kim; Seokhyun Moon; Hyeongwoo Kim; Jeheon Woo; Woo Youn Kim; | code |
| 450 | Dataset Distillation Via Knowledge Distillation: Towards Efficient Self-Supervised Pre-training of Deep Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first effective DD method for SSL pre-training.Then, we generate a small synthetic dataset by matching the training trajectories of the student models. |
Siddharth Joshi; Jiayi Ni; Baharan Mirzasoleiman; | code |
| 451 | Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To demystify such a conflict, this paper introduces a comprehensive benchmark to measure and evaluate GNNs’ capability in capturing and leveraging the information encoded in different frequency components of the input graph data.Finally, we introduce a comprehensive benchmark on real-world datasets, revealing insights that challenge prevalent opinions from a spectral perspective. |
Yushun Dong; Patrick Soga; Yinhan He; Song Wang; Jundong Li; | code |
| 452 | DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present DisPose to mine more generalizable and effective control signals without additional dense input, which disentangles the sparse skeleton pose in human image animation into motion field guidance and keypoint correspondence. |
Hongxiang Li; Yaowei Li; Yuhang Yang; Junjie Cao; Zhihong Zhu; Xuxin Cheng; Long Chen; | code |
| 453 | Visual-O1: Understanding Ambiguous Instructions Via Multi-modal Multi-turn Chain-of-thoughts Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, this paper proposes Visual-O1, a multi-modal multi-turn chain-of-thought reasoning framework.We release our data and code at https://github.com/kodenii/Visual-O1. |
Minheng Ni; YuTao Fan; Lei Zhang; Wangmeng Zuo; | code |
| 454 | ESE: Espresso Sentence Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods leverage fixed-length sentence embeddings from full-layer language models, which lack the scalability to accommodate the diverse available resources across various applications. Viewing this gap, we propose a novel sentence embedding model Espresso Sentence Embeddings (ESE) with two learning processes. |
Xianming LI; Zongxi Li; Jing Li; Haoran Xie; Qing Li; | code |
| 455 | Probabilistic Neural Pruning Via Sparsity Evolutionary Fokker-Planck-Kolmogorov Equation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study how to gradually sparsify the unpruned dense model to the target sparsity level with minimal performance drop. |
Zhanfeng Mo; Haosen Shi; Sinno Jialin Pan; | code |
| 456 | Parameter and Memory Efficient Pretraining Via Low-rank Riemannian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve efficient yet effective low-rank pretraining, we propose a **Lo**w-rank **R**iemannian **O**ptimizer (**LORO**). |
Zhanfeng Mo; Long-Kai Huang; Sinno Jialin Pan; | code |
| 457 | Robustness Inspired Graph Backdoor Defense Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we first empirically verify that prediction variance under edge dropping is a crucial indicator for identifying poisoned nodes. With this observation, we propose using random edge dropping to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones. |
Zhiwei Zhang; Minhua Lin; Junjie Xu; Zongyu Wu; Enyan Dai; Suhang Wang; | code |
| 458 | Catastrophic Failure of LLM Unlearning Via Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21\% of the intended forgotten knowledge in full precision, which significantly increases to 83\% after 4-bit quantization. Based on our empirical findings, we provide a theoretical explanation for the observed phenomenon and propose a quantization-robust unlearning strategy aimed at mitigating this intricate issue. |
Zhiwei Zhang; Fali Wang; Xiaomin Li; Zongyu Wu; Xianfeng Tang; Hui Liu; Qi He; Wenpeng Yin; Suhang Wang; | code |
| 459 | UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Next, our layerwise analysis of pre-trained attention maps uncovers that: (1) There are three typical attention patterns (local, hybrid, and global); (2) Pre-training tasks notably influence pattern distribution across layers; (3) The hybrid pattern is crucial for semantic segmentation as it attends to both nearby and foreground elements; (4) The texture bias impedes model generalization in infrared tasks. Building on these insights, we propose UNIP, a UNified Infrared Pre-training framework, to enhance the pre-trained model performance. |
Tao Zhang; Jinyong Wen; Zhen Chen; Kun Ding; Shiming Xiang; Chunhong Pan; | code |
| 460 | IGL-Bench: Establishing The Comprehensive Benchmark for Imbalanced Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the proliferation of IGL algorithms, the absence of consistent experimental protocols and fair performance comparisons pose a significant barrier to comprehending advancements in this field. To bridge this gap, we introduce **IGL-Bench**, a foundational comprehensive benchmark for imbalanced graph learning, embarking on **17** diverse graph datasets and **24** distinct IGL algorithms with uniform data processing and splitting strategies. |
Jiawen Qin; Haonan Yuan; Qingyun Sun; Lyujin Xu; Jiaqi Yuan; Pengfeng Huang; Zhaonan Wang; Xingcheng Fu; Hao Peng; Jianxin Li; Philip S. Yu; | code |
| 461 | Influence-Guided Diffusion for Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the remarkable capabilities of diffusion generative models in learning target dataset distributions and controllably sampling high-quality data tailored to user needs, we propose framing dataset distillation as a controlled diffusion generation task aimed at generating data specifically tailored for effective training purposes. |
Mingyang Chen; Jiawei Du; Bo Huang; Yi Wang; Xiaobo Zhang; Wei Wang; | code |
| 462 | Projection Head Is Secretly An Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an in-depth theoretical understanding of the projection head from the information-theoretic perspective. |
Zhuo Ouyang; Kaiwen Hu; Qi Zhang; Yifei Wang; Yisen Wang; | code |
| 463 | Toward Exploratory Inverse Constraint Inference with Generative Diffusion Verifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of focusing solely on a single constraint, we propose the novel approach of Exploratory ICL (ExICL). |
Runyi Zhao; Sheng Xu; Bo Yue; Guiliang Liu; | code |
| 464 | SimulPL: Aligning Human Preferences in Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods focus solely on optimizing the generated responses, ignoring human preferences related to latency and the optimization of read/write policy during the preference optimization phase. To address these challenges, we propose Simultaneous Preference Learning (SimulPL), a preference learning framework tailored for the SiMT task. |
Donglei Yu; Yang Zhao; Jie Zhu; Yangyifan Xu; Yu Zhou; Chengqing Zong; | code |
| 465 | MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, token-level RLHF suffers from the credit assignment problem over long sequences, where delayed rewards make it challenging for the model to discern which actions contributed to preferred outcomes. This hinders learning efficiency and slows convergence.In this paper, we propose MA-RLHF, a simple yet effective RLHF framework that incorporates macro actions — sequences of tokens or higher-level language constructs — into the learning process. |
Yekun Chai; Haoran Sun; Huang Fang; Shuohuan Wang; Yu Sun; Hua Wu; | code |
| 466 | A General Framework for Producing Interpretable Semantic Text Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent approaches have improved interpretability by leveraging domain-expert-crafted or LLM-generated questions, but these methods rely heavily on expert input or well-prompt design, which restricts their generalizability and ability to generate discriminative questions across a wide range of tasks. To address these challenges, we introduce \algo{CQG-MBQA} (Contrastive Question Generation – Multi-task Binary Question Answering), a general framework for producing interpretable semantic text embeddings across diverse tasks. |
Yiqun Sun; Qiang Huang; Yixuan Tang; Anthony Kum Hoe Tung; Jun Yu; | code |
| 467 | Robust Simulation-Based Inference Under Missing Data Via Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize the problem of missing data in SBI and demonstrate that naive imputation methods can introduce bias in the estimation of SBI posterior. |
Yogesh Verma; Ayush Bharti; Vikas Garg; | code |
| 468 | RocketEval: Efficient Automated LLM Evaluation Via Grading Checklist Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a straightforward, replicable, and accurate automated evaluation method by leveraging a lightweight LLM as the judge, named RocketEval. |
Tianjun Wei; Wei Wen; Ruizhi Qiao; Xing Sun; Jianghong Ma; | code |
| 469 | FaceShot: Bring Any Character Into Life Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ***FaceShot***, a novel training-free portrait animation framework designed to bring any character into life from any driven video without fine-tuning or retraining. |
Junyao Gao; Yanan SUN; Fei Shen; Xin Jiang; Zhening Xing; Kai Chen; Cairong Zhao; | code |
| 470 | Perm: A Parametric Representation for Multi-Style 3D Hair Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Perm, a learned parametric representation of human 3D hair designed to facilitate various hair-related applications. |
Chengan He; Xin Sun; Zhixin Shu; Fujun Luan; Soren Pirk; Jorge Alejandro Amador Herrera; Dominik Michels; Tuanfeng Yang Wang; Meng Zhang; Holly Rushmeier; Yi Zhou; | code |
| 471 | OccProphet: Pushing The Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with An Observer-Forecaster-Refiner Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, \textit{i.e.}, OccProphet, to efficiently and effectively learn occupancy forecasting with significantly lower computational requirements while improving forecasting accuracy. |
Junliang Chen; Huaiyuan Xu; Yi Wang; Lap-Pui Chau; | code |
| 472 | Exposure Bracketing Is All You Need For A High-Quality Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although multi-image processing methods (using burst, dual-exposure, or multi-exposure images) have made significant progress in addressing this issue, they typically focus on specific restoration or enhancement problems, and do not fully explore the potential of utilizing multiple images. Motivated by the fact that multi-exposure images are complementary in denoising, deblurring, high dynamic range imaging, and super-resolution, we propose to utilize exposure bracketing photography to get a high-quality image by combining these tasks in this work. |
Zhilu Zhang; Shuohao Zhang; Renlong Wu; Zifei Yan; Wangmeng Zuo; | code |
| 473 | DICE: End-to-end Deformation Capture of Hand-Face Interactions from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. |
Qingxuan Wu; Zhiyang Dou; Sirui Xu; Soshi Shimada; Chen Wang; Zhengming Yu; Yuan Liu; Cheng Lin; Zeyu Cao; Taku Komura; Vladislav Golyanik; Christian Theobalt; Wenping Wang; Lingjie Liu; | code |
| 474 | Capability Localization: Capabilities Can Be Localized Rather Than Individual Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further reveal this phenomenon, this paper proposes a **C**ommonality **N**euron **L**ocalization (**CNL**) method, which successfully locates commonality neurons and achieves a neuron overlap rate of 96.42% on the GSM8K dataset.Afterwards, we constructed a dataset for decoupling experiments and discovered the potential for localizing data commonalities. |
Xiusheng Huang; Jiaxiang Liu; Yequan Wang; Jun Zhao; Kang Liu; | code |
| 475 | IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new family of mobile hybrid vision networks, called iFormer, with a focus on optimizing latency and accuracy on mobile applications. |
Chuanyang Zheng; | code |
| 476 | Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an enhanced preference optimization method that incorporates a temporal decay factor controlled by a gamma parameter. |
Ruichen Shao; Bei Li; Gangao Liu; Yang Chen; ZhouXiang; Jingang Wang; Xunliang Cai; Peng Li; | code |
| 477 | TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing efforts to increase task diversity in fine-tuning datasets are hindered by the labor-intensive process of manual task labeling, which typically produces only a few hundred task types. To address this, we propose TaskGalaxy, a large-scale multimodal instruction fine-tuning dataset comprising 19,227 hierarchical task types and 413,648 samples. |
Jiankang Chen; Tianke Zhang; Changyi Liu; Haojie Ding; Yaya Shi; cheng.feng; Huihui Xiao; Bin Wen; Fan Yang; Tingting Gao; Di ZHANG; | code |
| 478 | Periodic Materials Generation Using Text-Guided Joint Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce TGDMat, a novel text-guided diffusion model designed for 3D periodic material generation. |
KISHALAY DAS; Subhojyoti Khastagir; Pawan Goyal; Seung-Cheol Lee; Satadeep Bhattacharjee; Niloy Ganguly; | code |
| 479 | Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve Via Self-Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing text-to-image diffusion models often fail to maintain high image quality and high prompt-image alignment for those challenging prompts. To mitigate this issue and enhance existing pretrained diffusion models, we mainly made three contributions in this paper. |
Bai LiChen; Shitong Shao; zikai zhou; Zipeng Qi; zhiqiang xu; Haoyi Xiong; Zeke Xie; | code |
| 480 | Pursuing Feature Separation Based on Neural Collapse for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we neatly fence off the problem based on an aggregation property of ID features named Neural Collapse (NC). |
Yingwen Wu; Ruiji Yu; Xinwen Cheng; Zhengbao He; Xiaolin Huang; | code |
| 481 | Track-On: Transformer-based Online Point Tracking with Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the problem of long-term point tracking, which requires consistent identification of points across multiple frames in a video, despite changes in appearance, lighting, perspective, and occlusions. |
Görkay Aydemir; Xiongyi Cai; Weidi Xie; Fatma Guney; | code |
| 482 | Learning Stochastic Dynamics from Snapshots Through Regularized Unbalanced Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce a new deep learning approach for solving regularized unbalanced optimal transport (RUOT) and inferring continuous unbalanced stochastic dynamics from observed snapshots. |
Zhenyi Zhang; Tiejun Li; Peijie Zhou; | code |
| 483 | CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Cascading and Adaptive KV cache Eviction (CAKE), a novel approach that frames KV cache eviction as a “cake-slicing problem.” |
Ziran Qin; Yuchen Cao; Mingbao Lin; Wen Hu; Shixuan Fan; Ke Cheng; Weiyao Lin; Jianguo Li; | code |
| 484 | 6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit 6D Gaussians and introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. |
Zhongpai Gao; Benjamin Planche; Meng Zheng; Anwesa Choudhuri; Terrence Chen; Ziyan Wu; | code |
| 485 | Circuit Transformer: A Transformer That Preserves Logical Equivalence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the implemented circuit must be exactly equivalent, which hinders generative neural approaches on this task due to their occasionally wrong predictions. In this study, we introduce a generative neural model, the “Circuit Transformer”, which eliminates such wrong predictions and produces logic circuits strictly equivalent to given Boolean functions. |
Xihan Li; Xing Li; Lei Chen; Xing Zhang; Mingxuan Yuan; Jun Wang; | code |
| 486 | MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study whether MLLMs can perceive small visual details as effectively as large ones when answering questions about images. |
Jiarui Zhang; Mahyar Khayatkhoei; Prateek Chhikara; Filip Ilievski; | code |
| 487 | MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study aims to construct an audio-video generative model with minimal computational cost by leveraging pre-trained single-modal generative models for audio and video. To achieve this, we propose a novel method that guides single-modal models to cooperatively generate well-aligned samples across modalities. |
Akio Hayakawa; Masato Ishii; Takashi Shibuya; Yuki Mitsufuji; | code |
| 488 | LARP: Tokenizing Videos with A Learned Autoregressive Generative Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LARP, a novel video tokenizer designed to overcome limitations in current video tokenization methods for autoregressive (AR) generative models. |
Hanyu Wang; Saksham Suri; Yixuan Ren; Hao Chen; Abhinav Shrivastava; | code |
| 489 | Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing A Posterior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This assertion is substantiated through an examination of DPS on 512$\times$512 ImageNet images, revealing that: 1) DPS’s conditional score estimation significantly diverges from the score of a well-trained conditional diffusion model and is even inferior to the unconditional score; 2) The mean of DPS’s conditional score estimation deviates significantly from zero, rendering it an invalid score estimation; 3) DPS generates high-quality samples with significantly lower diversity. In light of the above findings, we posit that DPS more closely resembles MAP than a conditional score estimator, and accordingly propose the following enhancements to DPS: 1) we explicitly maximize the posterior through multi-step gradient ascent and projection; 2) we utilize a light-weighted conditional score estimator trained with only 100 images and 8 GPU hours. |
Tongda Xu; Xiyan Cai; Xinjie Zhang; Xingtong Ge; Dailan He; Ming Sun; Jingjing Liu; Ya-Qin Zhang; Jian Li; Yan Wang; | code |
| 490 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Segment Anything model (SAM) has shown a generalized ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges. This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation. |
Pengfei Chen; Lingxi Xie; Xinyue Huo; Xuehui Yu; XIAOPENG ZHANG; Yingfei Sun; Zhenjun Han; Qi Tian; | code |
| 491 | Scalable and Certifiable Graph Unlearning: Overcoming The Approximation Error Barrier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, certified graph unlearning demands bounded model error on exact node embeddings to maintain its certified guarantee. To address this challenge, we present ScaleGUN, the first approach to scale certified graph unlearning to billion-edge graphs. |
Lu Yi; Zhewei Wei; | code |
| 492 | Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing masked modeling paradigms often prioritize structural information at the expense of abstract information such as circuit function. To address these limitations, we introduce MGVGA, a novel constrained masked modeling paradigm incorporating masked gate modeling (MGM) and Verilog-AIG alignment (VGA). |
Haoyuan WU; Haisheng Zheng; Yuan Pu; Bei Yu; | code |
| 493 | ImDy: Human Inverse Dynamics from Imitated Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional optimization-based ID requires expensive laboratory setups, restricting its availability. To alleviate this problem, we propose to exploit the recently progressive human motion imitation algorithms to learn human inverse dynamics in a data-driven manner. |
Xinpeng Liu; Junxuan Liang; Zili Lin; Haowen Hou; Yong-Lu Li; Cewu Lu; | code |
| 494 | HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, we propose a robust HQGS that significantly enhances the 3DGS under various degradation scenarios. |
Xin Lin; Shi Luo; Xiaojun Shan; Xiaoyu Zhou; Chao Ren; Lu Qi; Ming-Hsuan Yang; Nuno Vasconcelos; | code |
| 495 | Optimal Transport for Time Series Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The primary obstacle is crafting a discrepancy measure that simultaneously (1) captures temporal patterns—accounting for periodicity and temporal dependencies inherent in time-series—and (2) accommodates non-stationarity, ensuring robustness amidst multiple coexisting temporal patterns. In response to these challenges, we introduce the Proximal Spectrum Wasserstein (PSW) discrepancy, a novel discrepancy tailored for comparing two \textit{sets} of time-series based on optimal transport. |
Hao Wang; zhengnan li; Haoxuan Li; Xu Chen; Mingming Gong; BinChen; Zhichao Chen; | code |
| 496 | Contextualizing Biological Perturbation Experiments Through Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we hypothesize that large language models (LLMs) present a natural medium for representing complex biological relationships and rationalizing experimental outcomes. |
Menghua Wu; Russell Littman; Jacob Levine; Lin Qiu; Tommaso Biancalani; David Richmond; Jan-Christian Huetter; | code |
| 497 | GReaTer: Gradients Over Reasoning Makes Smaller Language Models Strong Prompt Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce, we introduce *GReaTer*, a novel prompt optimization technique that directly incorporates *gradient information over task-specific reasoning*. |
Sarkar Snigdha Sarathi Das; Ryo Kamoi; Bo Pang; Yusen Zhang; Caiming Xiong; Rui Zhang; | code |
| 498 | PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation Through Multi-agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage large-scale WSI datasets like TCGA to extract numerous high-quality image patches.Furthermore, we construct 200K instruction-tuning data based on PathGen-1.6M and integrate PathGen-CLIP with the Vicuna LLM to create more powerful multimodal models through instruction tuning. |
Yuxuan Sun; Yunlong Zhang; Yixuan Si; Chenglu Zhu; Kai Zhang; Zhongyi Shui; Jingxiong Li; Xuan Gong; XINHENG LYU; Tao Lin; Lin Yang; | code |
| 499 | SonicSim: A Customizable Simulation Platform for Speech Processing in Moving Sound Source Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, we introduce SonicSim, a synthetic toolkit based on the embodied AI simulation platform Habitat-sim, designed to generate highly customizable data for moving sound sources.Leveraging SonicSim, we constructed a benchmark dataset called SonicSet, utilizing LibriSpeech, Freesound Dataset 50k (FSD50K), Free Music Archive (FMA), and 90 scenes from Matterport3D to evaluate speech separation and enhancement models. |
Kai Li; Wendi Sang; Chang Zeng; Runxuan Yang; Guo Chen; Xiaolin Hu; | code |
| 500 | Solving Video Inverse Problems Using Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their application to video inverse problems arising from spatio-temporal degradation remains largely unexplored due to the challenges in training video diffusion models. To address this issue, here we introduce an innovative video inverse solver that leverages only image diffusion models. |
Taesung Kwon; Jong Chul Ye; | code |
| 501 | Multi-Task Dense Predictions Via Unleashing The Power of Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we unlock the potential of diffusion models in solving multi-task dense predictions and propose a novel diffusion-based method, called TaskDiffusion, which leverages the conditional diffusion process in the decoder. |
Yuqi Yang; Peng-Tao Jiang; Qibin Hou; Hao Zhang; Jinwei Chen; Bo Li; | code |
| 502 | Rethinking Multiple-Instance Learning From Feature Space to Probability Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Probability-Space MIL network (PSMIL) as a countermeasure. |
Zhaolong Du; Shasha Mao; Xuequan Lu; Mengnan Qi; Yimeng Zhang; Jing Gu; Licheng Jiao; | code |
| 503 | Restyling Unsupervised Concept Based Interpretable Networks with Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose here a novel method that relies on mapping the concept features to the latent space of a pretrained generative model. |
Jayneel Parekh; Quentin Bouniot; Pavlo Mozharovskyi; Alasdair Newson; Florence d’Alché-Buc; | code |
| 504 | Shape As Line Segments: Accurate and Flexible Implicit Surface Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, their gradients are ill-defined at certain locations, causing distortions in the extracted surfaces. To address this limitation, we propose Shape as Line Segments (SALS), an accurate and efficient implicit geometry representation based on attributed line segments, which can handle arbitrary structures. |
Siyu Ren; Junhui Hou; | code |
| 505 | Investigating Pattern Neurons in Urban Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first investigate how UTSMs handle these infrequent patterns from a neural perspective. Based on our findings, we propose $\textbf{P}$attern $\textbf{N}$euron guided $\textbf{Train}$ing ($\texttt{PN-Train}$), a novel training method that features (i) a $\textit{perturbation-based detector}$ to identify neurons responsible for low-frequency patterns in UTSMs, and (ii) a $\textit{fine-tuning mechanism}$ that enhances these neurons without compromising representation learning on high-frequency patterns. |
Chengxin Wang; Yiran Zhao; Shaofeng Cai; Gary Tan; | code |
| 506 | Bridging The Gap Between Database Search and \emph{De Novo} Peptide Sequencing with SearchNovo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SearchNovo, a novel framework that synergistically integrates the strengths of database search and \emph{de novo} sequencing to enhance peptide sequencing. |
Jun Xia; Sizhe Liu; Jingbo Zhou; Shaorong Chen; hongxin xiang; Zicheng Liu; Yue Liu; Stan Z. Li; | code |
| 507 | GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML).We provide all datasets and source codes at https://github.com/GlycanML/GlycanML and maintain a leaderboard at https://GlycanML.github.io/project |
Minghao Xu; Yunteng Geng; Yihang Zhang; Ling Yang; Jian Tang; Wentao Zhang; | code |
| 508 | Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite simplicity, these zero object queries, due to lacking target-specific cues, are hard to learn discriminative target information from interactions with multimodal features in complicated scenarios (e.g., with distractors or occlusion), resulting in degradation. Addressing this, we introduce a novel $\textbf{T}$arget-$\textbf{A}$ware Transformer for $\textbf{STVG}$ ($\textbf{TA-STVG}$), which seeks to adaptively generate object queries via exploring target-specific cues from the given video-text pair, for improving STVG. |
Xin Gu; Yaojie Shen; Chenxi Luo; Tiejian Luo; Yan Huang; Yuewei Lin; Heng Fan; Libo Zhang; | code |
| 509 | SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks, which poses significant scalability challenges as the number of tasks grows. To address these limitations, we propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal. |
Yichen Wu; Hongming Piao; Long-Kai Huang; Renzhen Wang; Wanhua Li; Hanspeter Pfister; Deyu Meng; Kede Ma; Ying Wei; | code |
| 510 | Learning Molecular Representation in A Cell Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. |
Gang Liu; Srijit Seal; John Arevalo; Zhenwen Liang; Anne E Carpenter; Meng Jiang; Shantanu Singh; | code |
| 511 | MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While most studies focus on cross-dataset shifts, such as changes in environments and object geometries, practical corruptions from sensor variations and weather conditions remain underexplored. In this work, we propose a novel online test-time adaptation framework for 3D detectors that effectively tackles these shifts, including a challenging $\textit{cross-corruption}$ scenario where cross-dataset shifts and corruptions co-occur. |
Zhuoxiao Chen; Junjie Meng; Mahsa Baktashmotlagh; Yonggang Zhang; Zi Huang; Yadan Luo; | code |
| 512 | ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, vanilla TopK routers are trained in a discontinuous, non-differentiable way, limiting their performance and scalability. To address this issue, we propose ReMoE, a fully differentiable MoE architecture that offers a simple yet effective drop-in replacement for the conventional TopK+Softmax routing, utilizing ReLU as the router instead. |
Ziteng Wang; Jun Zhu; Jianfei Chen; | code |
| 513 | Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel approach that trains diffusion path samplers (DPS) to address the transition path sampling (TPS) problem without requiring CVs. |
Kiyoung Seong; Seonghyun Park; Seonghwan Kim; Woo Youn Kim; Sungsoo Ahn; | code |
| 514 | DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. |
Mathias Jackermeier; Alessandro Abate; | code |
| 515 | SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a principal way to enhance the quality of widely pre-existing coarse masks, enabling them to serve as reliable training data for segmentation models to reduce the annotation cost. |
Yuqi Lin; Hengjia Li; Wenqi Shao; Zheng Yang; Jun Zhao; Xiaofei He; Ping Luo; Kaipeng Zhang; | code |
| 516 | GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce GoodDrag, a novel approach to improve the stability and image quality of drag editing. |
Zewei Zhang; Huan Liu; Jun Chen; Xiangyu Xu; | code |
| 517 | Dist Loss: Enhancing Regression in Few-Shot Region Through Distribution Distance Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent studies have highlighted the benefits of incorporating distributional information in imbalanced classification tasks, similar strategies have been largely unexplored in imbalanced regression. To address this gap, we propose Dist Loss, a novel loss function that integrates distributional information into model training by jointly optimizing the distribution distance between model predictions and target labels, alongside sample-wise prediction errors. |
Guangkun Nie; Gongzheng Tang; Shenda Hong; | code |
| 518 | Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce Time-MoE, a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models while reducing inference costs. |
Xiaoming Shi; Shiyu Wang; Yuqi Nie; Dianqi Li; Zhou Ye; Qingsong Wen; Ming Jin; | code |
| 519 | SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although much progress has been made for general pick-and-place manipulation, far fewer studies have investigated contact-rich assembly tasks, where precise control is essential. We introduce SRSA} (Skill Retrieval and Skill Adaptation), a novel framework designed to address this problem by utilizing a pre-existing skill library containing policies for diverse assembly tasks. |
Yijie Guo; Bingjie Tang; Iretiayo Akinola; Dieter Fox; Abhishek Gupta; Yashraj Narang; | code |
| 520 | GMValuator: Similarity-based Data Valuation for Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce Generative Model Valuator (GMValuator), the first training-free and model-agnostic approach to providing data valuation for image generation tasks. |
Jiaxi Yang; Wenlong Deng; Benlin Liu; Yangsibo Huang; James Zou; Xiaoxiao Li; | code |
| 521 | Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models Via Deciphering Attention Causality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing decoding-based mitigation methods focus on statistical correlations and overlook the causal relationships between attention mechanisms and model output, limiting their effectiveness in addressing these biases. To tackle this issue, we propose a causal inference framework termed CausalMM that applies structural causal modeling to MLLMs, treating modality priors as a confounder between attention mechanisms and output. |
Guanyu Zhou; Yibo Yan; Xin Zou; Kun Wang; Aiwei Liu; Xuming Hu; | code |
| 522 | ARB-LLM: Alternating Refined Binarizations for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current binarization methods struggle to narrow the distribution gap between binarized and full-precision weights, while also overlooking the column deviation in LLM weight distribution. To tackle these issues, we propose ARB-LLM, a novel 1-bit post-training quantization (PTQ) technique tailored for LLMs. |
Zhiteng Li; Xianglong Yan; Tianao Zhang; Haotong Qin; Dong Xie; Jiang Tian; zhongchao shi; Linghe Kong; Yulun Zhang; Xiaokang Yang; | code |
| 523 | WeatherGFM: Learning A Weather Generalist Foundation Model Via In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the in-context learning paradigm from visual foundation models and large language models, in this paper, we introduce the first generalist weather generalist foundation model (WeatherGFM) to address weather understanding tasks in a unified manner. |
Xiangyu Zhao; Zhiwang Zhou; zhangwenlong; Yihao Liu; Xiangyu Chen; Junchao Gong; Hao Chen; Ben Fei; Shiqi Chen; Wanli Ouyang; Xiao-Ming Wu; LEI BAI; | code |
| 524 | Improved Training Technique for Latent Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. |
Quan Dao; Khanh Doan; Di Liu; Trung Le; Dimitris N. Metaxas; | code |
| 525 | Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although label, training, and data efficiency have improved, many state-of-the-art VLMs still require task-specific hyperparameter tuning and fail to fully exploit test samples. To overcome these challenges, we propose a graph-based approach for label-efficient adaptation and inference. |
Yushu Li; Yongyi Su; Adam Goodge; Kui Jia; Xun Xu; | code |
| 526 | Rethinking Light Decoder-based Solvers for Vehicle Routing Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper revisits light decoder-based approaches, analyzing the implications of their reliance on static embeddings and the inherent challenges that arise. |
Ziwei Huang; Jianan Zhou; Zhiguang Cao; Yixin XU; | code |
| 527 | ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve learning the challenging multi-objective task posed by PINNs, we propose the ConFIG method, which provides conflict-free updates by ensuring a positive dot product between the final update and each loss-specific gradient. |
Qiang Liu; Mengyu Chu; Nils Thuerey; | code |
| 528 | DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present effective algorithms for both approaches, demonstrating our model’s versatility and superior performance in various motion synthesis tasks. |
Kaifeng Zhao; Gen Li; Siyu Tang; | code |
| 529 | TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TANGO, a framework for generating co-speech body-gesture videos. |
Haiyang Liu; Xingchao Yang; Tomoya Akiyama; Yuantian Huang; Qiaoge Li; Shigeru Kuriyama; Takafumi Taketomi; | code |
| 530 | Streamlining Prediction in Bayesian Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we examine streamlining prediction in BDL through a single forward pass without sampling. |
Rui Li; Marcus Klasson; Arno Solin; Martin Trapp; | code |
| 531 | Tree of Attributes Prompt Learning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prompt learning has proven effective in adapting vision language models for downstream tasks. |
Tong Ding; Wanhua Li; Zhongqi Miao; Hanspeter Pfister; | code |
| 532 | Weighted-Reward Preference Optimization for Implicit Model Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an implicit fusion method, Weighted-Reward Preference Optimization (WRPO), which leverages preference optimization between the source LLMs and the target LLM to transfer their capabilities effectively. |
Ziyi Yang; Fanqi Wan; Longguang Zhong; Tianyuan Shi; Xiaojun Quan; | code |
| 533 | Spectral Compressive Imaging Via Unmixing-driven Subspace Diffusion Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although diffusion models offer promising solutions to this challenge, their application is constrained by the limited training data and high computational demands associated with multispectral images (MSIs), complicating direct training. To address these issues, we propose a novel Predict-and-unmixing-driven-Subspace-Refine framework (PSR-SCI). |
Haijin Zeng; Benteng Sun; Yongyong Chen; Jingyong Su; Yong Xu; | code |
| 534 | PIG: Physics-Informed Gaussians As Adaptive Parametric Mesh Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, the fixed positions of the mesh parameters restrict their flexibility, making accurate approximation of complex PDEs challenging. To overcome these limitations, we propose Physics-Informed Gaussians (PIGs), which combine feature embeddings using Gaussian functions with a lightweight neural network. |
Namgyu Kang; Jaemin Oh; Youngjoon Hong; Eunbyung Park; | code |
| 535 | AdaFisher: Adaptive Second Order Optimization Via Fisher Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present *AdaFisher*–an adaptive second-order optimizer that leverages a *diagonal block-Kronecker* approximation of the Fisher information matrix for adaptive gradient preconditioning. |
Damien MARTINS GOMES; Yanlei Zhang; Eugene Belilovsky; Guy Wolf; Mahdi S. Hosseini; | code |
| 536 | Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better capture the dataset-specific label semantics for downstream adaptation, we propose to enhance the inter-dispersion among text features via greedy text ensemble and refinement. |
Zhixiang Chi; Li Gu; Huan Liu; Ziqiang Wang; Yanan Wu; Yang Wang; Konstantinos N Plataniotis; | code |
| 537 | Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the memory footprint of LoRA is largely dominated by the original model parameters. To mitigate this, we propose LoRAM, a memory-efficient LoRA training scheme founded on the intuition that many neurons in over-parameterized LLMs have low training utility but are essential for inference. |
Jun Zhang; Jue WANG; Huan Li; Lidan Shou; Ke Chen; Yang You; Guiming Xie; Xuejian Gong; Kunlong Zhou; | code |
| 538 | SAVA: Scalable Learning-Agnostic Data Valuation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the *LAVA* algorithm requires the entire dataset as an input, this limits its application to larger datasets. Inspired by the scalability of stochastic (gradient) approaches which carry out computations on *batches* of data points instead of the entire dataset, we analogously propose *SAVA*, a scalable variant of *LAVA* with its computation on batches of data points. |
Samuel Kessler; Tam Le; Vu Nguyen; | code |
| 539 | Towards Robust Multimodal Open-set Test-time Adaptation Via Adaptive Entropy-aware Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Adaptive Entropy-aware Optimization (AEO), a novel framework specifically designed to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA) for the first time. |
Hao Dong; Eleni Chatzi; Olga Fink; | code |
| 540 | Improving Deep Regression with Tightness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, our findings reveal that typical regression losses do little to reduce $H(Z|Y)$, even though it is vital for generalization performance. With this motivation, we introduce an optimal transport-based regularizer to preserve the similarity relationships of targets in the feature space to reduce $H(Z|Y)$. |
Shihao Zhang; Yuguang Yan; Angela Yao; | code |
| 541 | ADAM: An Embodied Causal Agent in Open-World Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce ADAM, An emboDied causal Agent in Minecraft, which can autonomously navigate the open world, perceive multimodal context, learn causal world knowledge, and tackle complex tasks through lifelong learning. |
Shu Yu; Chaochao Lu; | code |
| 542 | Proximal Mapping Loss: Understanding Loss Functions in Crowd Counting & Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most counting methods are based on density regression and are based on an “intersection” hypothesis, *i.e.*, one pixel is influenced by multiple points in the ground truth, which is inconsistent with reality since one pixel would not contain two objects. This paper proposes Proximal Mapping Loss (PML), a density regression method that eliminates this hypothesis. |
Wei Lin; Jia Wan; Antoni B. Chan; | code |
| 543 | Learning Transformer-based World Models with Contrastive Predictive Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that the next state prediction objective adopted in previous approaches is insufficient to fully exploit the representation capabilities of Transformers. |
Maxime Burchi; Radu Timofte; | code |
| 544 | Dreamweaver: Learning Compositional World Models from Pixels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose __Dreamweaver__, a neural architecture designed to discover hierarchical and compositional representations from raw videos and generate compositional future simulations. |
Junyeob Baek; Yi-Fu Wu; Gautam Singh; Sungjin Ahn; | code |
| 545 | VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, aiming to address the limitations, we collect a vision-tactile dataset by humans manipulating 10 daily tasks and 182 objects.Also, we introduce a novel benchmark, featuring six complex dexterous manipulation tasks and a reinforcement learning-based vision-tactile skill learning framework. |
Qingtao Liu; Yu Cui; Zhengnan Sun; Gaofeng Li; Jiming Chen; Qi Ye; | code |
| 546 | Fine-tuning with Reserved Majority for Noise Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on this framework, we propose \textbf{No}ise reduction with \textbf{R}eserved \textbf{M}ajority~(\norm), which decomposes the LoRA parameters into majority parts and redundant parts with random singular value decomposition. |
Shuyang Jiang; Yusheng Liao; Ya Zhang; Yanfeng Wang; Yu Wang; | code |
| 547 | What Are Good Positional Encodings for Directed Graphs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify the limitations of existing PE methods in representing walk profiles and propose a novel *Multi-q Magnetic Laplacian PE*, which extends the Magnetic Laplacian eigenvector-based PE by incorporating multiple potential factors. |
Yinan Huang; Haoyu Peter Wang; Pan Li; | code |
| 548 | Boosting The Visual Interpretability of CLIP Via Adversarial Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an unsupervised adversarial fine-tuning (AFT) with norm-regularization to enhance the visual interpretability of CLIP. |
Shizhan Gong; Haoyu LEI; Qi Dou; Farzan Farnia; | code |
| 549 | ThermalGaussian: Thermal 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose ThermalGaussian, the first thermal 3DGS approach capable of rendering high-quality images in RGB and thermal modalities. |
Rongfeng Lu; Hangyu Chen; Zunjie Zhu; Yuhang Qin; Ming Lu; Le zhang; Chenggang Yan; anke xue; | code |
| 550 | Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider customizing pre-trained LLMs with new human preferences. |
Yi-Chen Li; Fuxiang Zhang; Wenjie Qiu; Lei Yuan; Chengxing Jia; Zongzhang Zhang; Yang Yu; Bo An; | code |
| 551 | Beyond Surface Structure: A Causal Assessment of LLMs’ Comprehension Ability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work provides new insights into LLMs’ deep structure comprehension and offers novel methods for LLMs evaluation. |
Yujin Han; Lei Xu; Sirui Chen; Difan Zou; Chaochao Lu; | code |
| 552 | Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that combining GA with low-rank adaptation results in poor trade-offs between computational cost and generative performance. To address these challenges, we propose two novel techniques for robust and efficient unlearning for LLMs. |
Sungmin Cha; Sungjun Cho; Dasol Hwang; Moontae Lee; | code |
| 553 | Towards Domain Adaptive Neural Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first general domain adaptation method for contextual bandits. |
Ziyan Wang; Xiaoming Huo; Hao Wang; | code |
| 554 | HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To effectively and efficiently utilize human feedback, we develop a framework, HERO, which leverages online human feedback collected on the fly during model learning. |
Ayano Hiranaka; Shang-Fu Chen; Chieh-Hsin Lai; Dongjun Kim; Naoki Murata; Takashi Shibuya; Wei-Hsiang Liao; Shao-Hua Sun; Yuki Mitsufuji; | code |
| 555 | Accelerating Goal-Conditioned Reinforcement Learning Algorithms and Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we assess key design choices in contrastive RL, identifying those that most effectively stabilize and enhance training performance. With this approach, we provide a foundation for future research in self-supervised GCRL, enabling researchers to quickly iterate on new ideas and evaluate them in diverse and challenging environments. |
Michał Bortkiewicz; Władysław Pałucki; Vivek Myers; Tadeusz Dziarmaga; Tomasz Arczewski; Łukasz Kuciński; Benjamin Eysenbach; | code |
| 556 | Precise Parameter Localization for Textual Generation in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce several applications that benefit from localizing the layers responsible for textual content generation. |
Łukasz Staniszewski; Bartosz Cywiński; Franziska Boenisch; Kamil Deja; Adam Dziedzic; | code |
| 557 | Score-based Self-supervised MRI Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Corruption2Self (C2S), a novel score-based self-supervised framework for MRI denoising. |
Jiachen Tu; Yaokun Shi; Fan Lam; | code |
| 558 | Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SSR, a novel framework that utilizes only 16 navigation-guided tokens as Sparse Scene Representation, efficiently extracting crucial scene information for E2EAD. |
Peidong Li; Dixiao Cui; | code |
| 559 | You Only Sample Once: Taming One-Step Text-to-Image Synthesis By Self-Cooperative Diffusion GANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works in this line suffer from either training instability and mode collapse or subpar one-step generation learning efficiency. To address these issues, we introduce YOSO, a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis with high training stability and mode coverage. |
Yihong Luo; Xiaolong Chen; Xinghua Qu; Tianyang Hu; Jing Tang; | code |
| 560 | Diffusion Models As Cartoonists: The Curious Case of High Density Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. |
Rafal Karczewski; Markus Heinonen; Vikas Garg; | code |
| 561 | GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce GravMAD, a sub-goal-driven, language-conditioned action diffusion framework that combines the strengths of imitation learning and foundation models. |
Yangtao Chen; Zixuan Chen; Junhui Yin; Jing Huo; Pinzhuo Tian; Jieqi Shi; Yang Gao; | code |
| 562 | LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present LASeR — Large Language Model-Aided Evolutionary Search for Robot Design Automation. |
Junru Song; Yang Yang; Huan Xiao; Wei Peng; Wen Yao; Feifei Wang; | code |
| 563 | Filtered Not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MoE-F — a formalized mechanism for combining N pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. |
Raeid Saqur; Anastasis Kratsios; Florian Krach; Yannick Limmer; Blanka Horvath; Frank Rudzicz; | code |
| 564 | TopoNets: High Performing Vision and Language Models with Brain-like Topography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present *TopoLoss*, a new loss function that promotes spatially organized topographic representations in AI models without significantly sacrificing task performance. |
Mayukh Deb; Mainak Deb; Apurva Ratan Murty; | code |
| 565 | XAIguiFormer: Explainable Artificial Intelligence Guided Transformer for Brain Disorder Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most research focuses solely on interpretability analysis based on the insights from XAI, overlooking XAI’s potential to improve model performance. To bridge this gap, we propose a dynamical-system-inspired architecture, XAI guided transformer (XAIguiFormer), where XAI not only provides explanations but also contributes to enhancing the transformer by refining the originally coarse information in self-attention mechanism to capture more relevant dependency relationships. |
Hanning Guo; Farah Abdellatif; Yu Fu; N. Jon Shah; Abigail Morrison; Jürgen Dammers; | code |
| 566 | Recovery of Causal Graph Involving Latent Variables Via Homologous Surrogates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To identify latent variables and infer their causal relations, most existing works rely on the assumption that latent variables have pure children. Considering that this assumption is potentially restrictive in practice and not strictly necessary in theory, in this paper, by introducing the concept of homologous surrogate, we eliminate the need for pure children in the context of causal discovery with latent variables. |
Xiu-Chuan Li; Jun Wang; Tongliang Liu; | code |
| 567 | Efficient and Trustworthy Causal Discovery with Latent Variables and Complex Relations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most traditional causal discovery methods assume that all task-relevant variables are observed, an assumption often violated in practice. |
Xiu-Chuan Li; Tongliang Liu; | code |
| 568 | A Large-scale Training Paradigm for Graph Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To remedy this crucial gap, we propose a large-scale training paradigm that uses a large corpus of graphs (over 5000 graphs) from 13 domains, leading to the development of large graph generative models (LGGMs).We release the code, the model checkpoint, and the datasets at https://github.com/KINDLab-Fly/LGGM. |
Yu Wang; Ryan A. Rossi; Namyong Park; Huiyuan Chen; Nesreen K. Ahmed; Puja Trivedi; Franck Dernoncourt; Danai Koutra; Tyler Derr; | code |
| 569 | DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most current research on MT-LLMs still faces significant challenges in maintaining translation consistency and accuracy when processing entire documents. In this paper, we introduce DelTA, a Document-levEL Translation Agent designed to overcome these limitations. |
Yutong Wang; Jiali Zeng; Xuebo Liu; Derek F. Wong; Fandong Meng; Jie Zhou; Min Zhang; | code |
| 570 | Stable Segment Anything Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key finding reveals that given such low-quality prompts, SAM’s mask decoder tends to activate image features that are biased towards the background or confined to specific object parts. To mitigate this issue, our key idea consists of calibrating solely SAM’s mask attention by adjusting the sampling locations and amplitudes of image features, while the original SAM model architecture and weights remain unchanged. |
Qi Fan; Xin Tao; Lei Ke; Mingqiao Ye; Di ZHANG; Pengfei Wan; Yu-Wing Tai; Chi-Keung Tang; | code |
| 571 | Residual-MPPI: Online Policy Customization for Continuous Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a generic online planning algorithm for customizing continuous-control policies at the execution time, which we call Residual-MPPI. |
Pengcheng Wang; Chenran Li; Catherine Weaver; Kenta Kawamoto; Masayoshi Tomizuka; Chen Tang; Wei Zhan; | code |
| 572 | DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend the IGM principle to the Individual-Global-identically-Distributed (IGD) principle. |
Chao Li; Ziwei Deng; Chenxing Lin; Wenqi Chen; Yongquan Fu; Weiquan Liu; Chenglu Wen; Cheng Wang; Siqi Shen; | code |
| 573 | Achieving Dimension-Free Communication in Federated Learning Via Zeroth-Order Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel dimension-free communication algorithm – DeComFL, which leverages the zeroth-order optimization techniques and reduces the communication cost from $\mathcal{O}(d)$ to $\mathcal{O}(1)$ by transmitting only a constant number of scalar values between clients and the server in each round, regardless of the dimension $d$ of the model parameters. |
Zhe Li; Bicheng Ying; Zidong Liu; Chaosheng Dong; Haibo Yang; | code |
| 574 | Breaking Class Barriers: Efficient Dataset Distillation Via Inter-Class Feature Compensator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to inefficient utilization of the distillation budget and oversight of inter-class feature distributions, which ultimately limits the effectiveness and efficiency, as demonstrated in our analysis. To overcome these constraints, this paper presents the Inter-class Feature Compensator (INFER), an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods. |
Xin Zhang; Jiawei Du; Ping Liu; Joey Tianyi Zhou; | code |
| 575 | Scale-Free Graph-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the former often relies on artificial assumptions about the underlying edge distribution, while the latter requires extensive data annotations. To tackle these challenges, this paper introduces a novel GLM that integrates graph generation and text embedding within a unified framework. |
Jianglin Lu; Yixuan Liu; Yitian Zhang; Yun Fu; | code |
| 576 | Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Information-theoretic (IT) generalization bounds have been used to study the generalization of learning algorithms. |
Ze Peng; Jian Zhang; Yisen Wang; Lei Qi; Yinghuan Shi; Yang Gao; | code |
| 577 | Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This may lead to severe interference between tokens within an expert. To address this problem, we propose to use the token-level gradient analysis to Solving Token Gradient Conflict (STGC) in this paper. |
Longrong Yang; Dong Shen; Chaoxiang Cai; Fan Yang; Tingting Gao; Di ZHANG; Xi Li; | code |
| 578 | ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. |
Xiao Li; Wenxuan Sun; Huanran Chen; Qiongxiu Li; Yingzhe He; Jie Shi; Xiaolin Hu; | code |
| 579 | ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in this paper, we find that the position-based token embedding offsets of DePT restricts its ability to generalize across diverse model inputs, and that the shared embedding offsets across many token embeddings result in sub-optimization. To tackle these issues, we introduce \textbf{A}daptive \textbf{De}composed \textbf{P}rompt \textbf{T}uning (ADePT), which is composed of a short soft prompt and a shallow token-shared feed-forward neural network. |
Pengwei Tang; Xiaolin Hu; Yong Liu; | code |
| 580 | FlashMask: Efficient and Rich Mask Extension of FlashAttention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose \ours{}, an extension of FlashAttention that introduces a column-wise sparse representation of attention masks. |
Guoxia Wang; Jinle Zeng; Xiyuan Xiao; Siming Wu; Jiabin Yang; Lujing Zheng; Zeyu Chen; Jiang Bian; Dianhai Yu; Haifeng Wang; | code |
| 581 | Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an extensive new dataset of MIDI files, created by transcribing audio recordings of piano performances into their constituent notes. |
Louis Bradshaw; Simon Colton; | code |
| 582 | Efficient Neuron Segmentation in Electron Microscopy By Affinity-Guided Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find that directly applying existing query-based methods faces great challenges due to the large memory requirement of the 3D data and considerably different morphology of neurons. To tackle these challenges, we introduce affinity-guided queries and integrate them into a lightweight query-based framework. |
Hang Chen; Chufeng Tang; Xiao Li; Xiaolin Hu; | code |
| 583 | Simple, Good, Fast: Self-Supervised World Models Free of Baggage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces SGF, a Simple, Good, and Fast world model that uses self-supervised representation learning, captures short-time dependencies through frame and action stacking, and enhances robustness against model errors through data augmentation. |
Jan Robine; Marc Höftmann; Stefan Harmeling; | code |
| 584 | A Distributional Approach to Uncertainty-Aware Preference Alignment Using Offline Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our approach employs a Maximum A Posteriori (MAP) objective to update trajectory rewards and incorporates an informative prior to account for the uncertainties. Building upon this reward update, we propose a generative reward model to capture the reward distribution, utilizing the offline distributional Bellman operator and the Conditional Value-at-Risk (CVaR) metric to train a risk-sensitive policy. |
Sheng Xu; Bo Yue; Hongyuan Zha; Guiliang Liu; | code |
| 585 | ParaSolver: A Hierarchical Parallel Integral Solver for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a unified framework that generalizes the sequential sampling process of DPMs as solving a system of banded nonlinear equations. |
Jianrong Lu; Zhiyu Zhu; Junhui Hou; | code |
| 586 | ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, we present *ClimaQA-Gold*, an expert-annotated benchmark dataset alongside *ClimaQA-Silver*, a large-scale, comprehensive synthetic QA dataset for climate science. |
Veeramakali Vignesh Manivannan; Yasaman Jafari; Srikar Eranky; Spencer Ho; Rose Yu; Duncan Watson-Parris; Yian Ma; Leon Bergen; Taylor Berg-Kirkpatrick; | code |
| 587 | The Belief State Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Belief State Transformer, a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previous token for the suffix. |
Edward S. Hu; Kwangjun Ahn; Qinghua Liu; Haoran Xu; Manan Tomar; Ada Langford; Dinesh Jayaraman; Alex Lamb; John Langford; | code |
| 588 | Diff3DS: Generating View-Consistent 3D Sketch Via Differentiable Curve Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Diff3DS, a novel differentiable rendering framework for generating view-consistent 3D sketch by optimizing 3D parametric curves under various supervisions. |
Yibo Zhang; Lihong Wang; Changqing Zou; Tieru Wu; Rui Ma; | code |
| 589 | Mix-LN: Unleashing The Power of Deeper Layers By Combining Pre-LN and Post-LN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, Post-Layer Normalization (Post-LN) preserves larger gradient norms in deeper layers but suffers from vanishing gradients in earlier layers. To address this, we introduce Mix-LN, a novel normalization technique that combines the strengths of Pre-LN and Post-LN within the same model. |
Pengxiang Li; Lu Yin; Shiwei Liu; | code |
| 590 | A Benchmark for Semantic Sensitive Information in LLMs Outputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, we propose a novel and large-scale investigation on the existence of SemSI in SOTA LLMs induced by simple natural questions.First, we construct a comprehensive and labeled dataset of semantic sensitive information, SemSI-Set, by including three typical categories of SemSI.Then, we propose a large-scale benchmark, SemSI-Bench, to systematically evaluate semantic sensitive information in 25 SOTA LLMs. |
Qingjie Zhang; Han Qiu; Di Wang; Yiming Li; Tianwei Zhang; Wenyu Zhu; Haiqin Weng; Liu Yan; Chao Zhang; | code |
| 591 | RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing DPDM training approaches often suffer from significant utility loss, large memory footprint, and expensive inference cost, impeding their practical uses. To overcome such limitations, we present RAPID: Retrieval Augmented PrIvate Diffusion model, a novel approach that integrates retrieval augmented generation (RAG) into DPDM training. |
Tanqiu Jiang; Changjiang Li; Fenglong Ma; Ting Wang; | code |
| 592 | AgentSquare: Automatic LLM Agent Search in Modular Design Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new research problem: Modularized LLM Agent Search (MoLAS). |
Yu Shang; Yu Li; Keyu Zhao; Likai Ma; Jiahe Liu; Fengli Xu; Yong Li; | code |
| 593 | Inverse Constitutional AI: Compressing Preferences Into Principles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Inverse Constitutional AI (ICAI) problem, formulating the interpretation of pairwise text preference data as a compression task. |
Arduin Findeis; Timo Kaufmann; Eyke Hüllermeier; Samuel Albanie; Robert D. Mullins; | code |
| 594 | Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and noisy scenarios unexplored. In response to these challenges, we propose a dynamic and scalable benchmark named Dysca for evaluating LVLMs by leveraging synthesis images. |
Jie Zhang; Zhongqi Wang; Mengqi Lei; Zheng Yuan; Bei Yan; Shiguang Shan; Xilin Chen; | code |
| 595 | ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose ClawMachine, offering a new methodology that explicitly notates each entity using **token collectives**—groups of visual tokens that collaboratively represent higher-level semantics. |
Tianren Ma; Lingxi Xie; Yunjie Tian; Boyu Yang; Qixiang Ye; | code |
| 596 | Streamlining Redundant Layers to Compress Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces LLM-Streamline, a pioneer work on layer pruning for large language models (LLMs). |
Xiaodong Chen; Yuxuan Hu; Jing Zhang; Yanling Wang; Cuiping Li; Hong Chen; | code |
| 597 | Polynomial Composition Activations: Unleashing The Dynamics of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel category of polynomial composition activations (PolyCom), designed to optimize the dynamics of transformers. |
Zhijian Zhuo; Ya Wang; Yutao Zeng; Xiaoqing Li; Xun Zhou; Jinwen Ma; | code |
| 598 | Learning Graph Quantized Tokenizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the development of tokenizers for graphs has lagged behind other modalities, with existing approaches relying on heuristics or GNNs co-trained with Transformers. To address this, we introduce GQT (\textbf{G}raph \textbf{Q}uantized \textbf{T}okenizer), which decouples tokenizer training from Transformer training by leveraging multi-task graph self-supervised learning, yielding robust and generalizable graph tokens. |
Limei Wang; Kaveh Hassani; Si Zhang; Dongqi Fu; Baichuan Yuan; Weilin Cong; Zhigang Hua; Hao Wu; Ning Yao; Bo Long; | code |
| 599 | Learning Structured Representations By Embedding Class Hierarchy with Fast Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, class means may not be good representatives of the class conditional distributions, especially when they are multi-mode in nature. To address this limitation, under the CPCC framework, we propose to use the Earth Mover’s Distance (EMD) to measure the pairwise distances among classes in the feature space. |
Siqi Zeng; Sixian Du; Makoto Yamada; Han Zhao; | code |
| 600 | EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Empowering Multi-modal Mamba with Structural and Hierarchical Alignment (EMMA), which enables the MLLM to extract fine-grained visual information. |
Yifei Xing; Xiangyuan Lan; Ruiping Wang; Dongmei Jiang; Wenjun Huang; Zheng Qingfang; Yaowei Wang; | code |
| 601 | Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While extensive graph models have been developed to identify treatment effects, these models often rely on structural assumptions about networked interference, assuming it to be identical to the social network, which can lead to misspecification issues in real applications. To address these challenges, we propose an Interference-Agnostic Causal Graph Transformer (CauGramer), which aggregates peers information via $L$-order Graph Transformer and employs cross-attention to infer aggregation function for learning interference representations. |
Anpeng Wu; Haiyi Qiu; Zhengming Chen; Zijian Li; Ruoxuan Xiong; Fei Wu; Kun Zhang; | code |
| 602 | OmniRe: Omni Urban Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce OmniRe, a comprehensive system for efficiently creating high-fidelity digital twins of dynamic real-world scenes from on-device logs. |
Ziyu Chen; Jiawei Yang; Jiahui Huang; Riccardo de Lutio; Janick Martinez Esturo; Boris Ivanovic; Or Litany; Zan Gojcic; Sanja Fidler; Marco Pavone; Li Song; Yue Wang; | code |
| 603 | Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, previous methods suffer from the limited coverage of explored trajectories during training, which presents more pronounced challenges when only offline data is available. In this work, we propose a novel method called \textbf{R}etrospective \textbf{B}ackward \textbf{S}ynthesis (\textbf{RBS}) to address these critical problems. |
Haoran He; Can Chang; Huazhe Xu; Ling Pan; | code |
| 604 | DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional unsupervised methods, which decode encoded latent representations of unlabeled data with a reconstruction focus, often fail to capture critical discriminative content, leading to suboptimal anomaly detection. To address these challenges, we present a Diffusion-based Graph Anomaly Detector (DiffGAD). |
Jinghan Li; Yuan Gao; Jinda Lu; Junfeng Fang; Congcong Wen; Hui Lin; Xiang Wang; | code |
| 605 | E(n) Equivariant Topological Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces E(n)-Equivariant Topological Neural Networks (ETNNs), which are E(n)-equivariant message-passing networks operating on combinatorial complexes, formal objects unifying graphs, hypergraphs, simplicial, path, and cell complexes. |
Claudio Battiloro; Ege Karaismailoglu; Mauricio Tec; George Dasoulas; Michelle Audirac; Francesca Dominici; | code |
| 606 | Scalable Mechanistic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. |
Jiale Chen; Dingling Yao; Adeel Pervez; Dan Alistarh; Francesco Locatello; | code |
| 607 | 3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, no existing work can deal with both tasks to effectively leverage the duality between them, and current methods for each task are hindered by challenges in modeling 3D information and the limitations of available data. To address these issues, we propose 3DMolFormer, a unified dual-channel transformer-based framework applicable to both docking and 3D drug design tasks, which exploits their duality by utilizing docking functionalities within the drug design process. |
Xiuyuan Hu; Guoqing Liu; Can Chen; Yang Zhao; Hao Zhang; Xue Liu; | code |
| 608 | SplatFormer: Point Transformer for Robust 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. |
Yutong Chen; Marko Mihajlovic; Xiyi Chen; Yiming Wang; Sergey Prokudin; Siyu Tang; | code |
| 609 | Revisiting Random Walks for Learning on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit a simple model class for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. |
Jinwoo Kim; Olga Zaghen; Ayhan Suleymanzade; Youngmin Ryou; Seunghoon Hong; | code |
| 610 | Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address structure shifts in graphs, we propose Matcha, an innovative framework designed for effective and efficient adaptation to structure shifts by adjusting the htop-aggregation parameters in GNNs. |
Wenxuan Bao; Zhichen Zeng; Zhining Liu; Hanghang Tong; Jingrui He; | code |
| 611 | Interpretable Unsupervised Joint Denoising and Enhancement for Real-World Low-light Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Supervised methods tend to overfit to specific scenarios, while unsupervised methods, though better at generalization, struggle to model these degradations due to the lack of reference images. To address this issue, we propose an interpretable, zero-reference joint denoising and low-light enhancement framework tailored for real-world scenarios. |
Li Huaqiu; HuXiaowan; Haoqian Wang; | code |
| 612 | Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This difficulty stems from the need for coherent autoregressive generation across texts and graphs. To address this, we introduce Llamole, the first multimodal LLM capable of interleaved text and graph generation, enabling molecular inverse design with retrosynthetic planning. |
Gang Liu; Michael Sun; Wojciech Matusik; Meng Jiang; Jie Chen; | code |
| 613 | Subgraph Federated Learning for Local Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since they focus solely on performing well on each client’s local data, they are prone to overfitting to their local distributions (i.e., local overfitting), which hinders their ability to generalize to unseen data with diverse label distributions. In contrast, our proposed method, FedLoG, effectively tackles this issue by mitigating local overfitting. |
Sungwon Kim; Yoonho Lee; Yunhak Oh; Namkyeong Lee; Sukwon Yun; Junseok Lee; Sein Kim; Carl Yang; Chanyoung Park; | code |
| 614 | FOSP: Fine-tuning Offline Safe Policy Through World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to improve safety during the deployment of vision-based robotic tasks through online fine-tuning an offline pretrained policy. |
Chenyang Cao; Yucheng Xin; Silang Wu; Longxiang He; Zichen Yan; Junbo Tan; Xueqian Wang; | code |
| 615 | Bootstrapped Model Predictive Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Bootstrapped Model Predictive Control (BMPC), a novel algorithm that performs policy learning in a bootstrapped manner. |
Yuhang Wang; Hanwei Guo; Sizhe Wang; Long Qian; Xuguang Lan; | code |
| 616 | CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for every single condition type, ControlNet requires independent training on millions of data pairs with hundreds of GPU hours, which is quite expensive and makes it challenging for ordinary users to explore and develop new types of conditions. To address this problem, we propose the CtrLoRA framework, which trains a Base ControlNet to learn the common knowledge of image-to-image generation from multiple base conditions, along with condition-specific LoRAs to capture distinct characteristics of each condition. |
Yifeng Xu; Zhenliang He; Shiguang Shan; Xilin Chen; | code |
| 617 | MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that the primary cause of failure in many low-level controllers is the absence of an episodic memory system. |
Junyeong Park; Junmo Cho; Sungjin Ahn; | code |
| 618 | SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, developing a T2S generative model that allows creators to efficiently conduct trial-and-error while producing high-quality sound remains a key challenge. To address these issues, we introduce Sound Consistency Trajectory Models (SoundCTM), which allow flexible transitions between high-quality $1$-step sound generation and superior sound quality through multi-step deterministic sampling. |
Koichi Saito; Dongjun Kim; Takashi Shibuya; Chieh-Hsin Lai; Zhi Zhong; Yuhta Takida; Yuki Mitsufuji; | code |
| 619 | Proxy Denoising for Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we observe that ViL’s supervision could be noisy and inaccurate at an unknown rate, potentially introducing additional negative effects during adaption. To address this thus-far ignored challenge, we introduce a novel Proxy Denoising (__ProDe__) approach. |
Song Tang; Wenxin Su; Yan Gan; Mao Ye; Jianwei Dr. Zhang; Xiatian Zhu; | code |
| 620 | HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (3) The uncertainty introduced by load imbalance loss undermines the effective specialization of the experts. To address these challenges, we propose HMoRA, a Hierarchical fine-tuning method that combines MoE and LoRA, employing hybrid routing that integrates token-level and task-level routing in a hierarchical manner. |
Mengqi Liao; Wei Chen; Junfeng Shen; Shengnan Guo; Huaiyu Wan; | code |
| 621 | Multi-Label Test-Time Adaptation with Bound Entropy Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when encountering multi-label instances, the primary challenge stems from the varying number of labels per image, and prioritizing only the highest probability class inevitably undermines the adaptation of other positive labels. To address this issue, we investigate TTA within multi-label scenario (ML–TTA), developing Bound Entropy Minimization (BEM) objective to simultaneously increase the confidence of multiple top predicted labels. |
Xiangyu Wu; Feng Yu; Yang Yang; Qing-Guo Chen; Jianfeng Lu; | code |
| 622 | HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, long-form speech synthe- sis remains a significant challenge due to the high frame rate, which increases the length of audio tokens and makes it difficult for autoregressive language models to generate audio tokens for even a minute of speech. To address this challenge, this paper introduces two novel post-training approaches: 1) Multi-Resolution Re- quantization (MReQ) and 2) HALL-E. |
Yuto Nishimura; Takumi Hirose; Masanari Ohi; Hideki Nakayama; Nakamasa Inoue; | code |
| 623 | KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, LoRA and its successors disregard the knowledge that is noisy or irrelevant to the targeted task, detrimentally impacting model performance and leading to suboptimality. To address this limitation, we introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on its relevance to the task at hand. |
Fan Wang; Juyong Jiang; Chansung Park; Sunghun Kim; Jing Tang; | code |
| 624 | Adaptive Shrinkage Estimation for Personalized Deep Kernel Regression in Modeling Brain Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Herein, we introduce a novel personalized deep kernel regression framework for forecasting brain biomarkers, with application to regional volumetric measurements. |
Vasiliki Tassopoulou; Haochang Shou; Christos Davatzikos; | code |
| 625 | Enhancing Clustered Federated Learning: Integration of Strategies and Improved Methodologies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper conducts a thorough examination of existing clustered FL methods and introduces a four-tier framework, named HCFL, to encompass and extend the existing approaches. |
Yongxin Guo; Xiaoying Tang; Tao Lin; | code |
| 626 | Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a remedy, we introduce the Dynamic Mixture of Experts (DynMoE) technique. |
Yongxin Guo; Zhenglin Cheng; Xiaoying Tang; Zhaopeng Tu; Tao Lin; | code |
| 627 | CAMEx: Curvature-aware Merging of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce CAMEx (Curvature-Aware Merging of Experts), a novel expert merging protocol that incorporates natural gradients to account for the non-Euclidean curvature of the parameter manifold. |
Viet Dung Nguyen; Minh Nguyen Hoang; Luc Nguyen; Rachel Teo; Tan Minh Nguyen; Linh Duy Tran; | code |
| 628 | Progressive Parameter Efficient Transfer Learning for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify that fine-tuning for semantic segmentation requires larger parameter adjustments due to shifts in semantic perception granularity. |
Nan Zhou; Huiqun Wang; Yaoyan Zheng; Di Huang; | code |
| 629 | Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Uni$^2$Det, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains. |
Yubin Wang; Zhikang Zou; Xiaoqing Ye; Xiao Tan; Errui Ding; Cairong Zhao; | code |
| 630 | Diffusing to The Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a graph-conditioned latent diffusion framework (GNN-Diff) to generate high-performing GNNs based on the model checkpoints of sub-optimal hyperparameters selected by a light-tuning coarse search. |
Lequan Lin; Dai Shi; Andi Han; Zhiyong Wang; Junbin Gao; | code |
| 631 | Reading Your Heart: Learning ECG Words and Sentences Via Pre-training ECG Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel perspective on ECG signals, treating heartbeats as words and rhythms as sentences. |
Jiarui Jin; Haoyu Wang; Hongyan Li; Jun Li; Jiahui Pan; Shenda Hong; | code |
| 632 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This lack of narrative restricts models’ ability to generate text descriptions that capture the causal and temporal dynamics inherent in video content. To address this gap, we propose NarrativeBridge, an approach comprising of: (1) a novel Causal-Temporal Narrative (CTN) captions benchmark generated using a large language model and few-shot prompting, explicitly encoding cause-effect temporal relationships in video descriptions; and (2) a Cause-Effect Network (CEN) with separate encoders for capturing cause and effect dynamics, enabling effective learning and generation of captions with causal-temporal narrative. |
Asmar Nadeem; Faegheh Sardari; Robert Dawes; Syed Sameed Husain; Adrian Hilton; Armin Mustafa; | code |
| 633 | ComPC: Completing A 3D Point Cloud with 2D Diffusion Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a test-time framework for completing partial point clouds across unseen categories without any requirement for training. |
Tianxin Huang; Zhiwen Yan; Yuyang Zhao; Gim Hee Lee; | code |
| 634 | Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following the DSCA-GNNs framework, we propose an instantiation method of CA, termed Few-Shot prompting Context-Alignment (FSCA), to enhance the capabilities of pre-trained LLMs in handling TS tasks. |
Yuxiao Hu; Qian Li; Dongxiao Zhang; Jinyue Yan; Yuntian Chen; | code |
| 635 | Learning Fine-Grained Representations Through Textual Token Disentanglement in Composed Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Text description serves as clear expressions of intent, but it requires models to distinguish subtle differences in the description of video semantics. Therefore, we propose a textual Feature Disentanglement and Cross-modal Alignment framework (FDCA) that disentangles features at both the sentence and token levels. |
Yue WU; Zhaobo Qi; Yiling Wu; Junshu Sun; Yaowei Wang; Shuhui Wang; | code |
| 636 | Progressive Compression with Universally Quantized Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process, whose negative ELBO corresponds to the end-to-end compression cost using universal quantization. |
Yibo Yang; Justus Will; Stephan Mandt; | code |
| 637 | Are Large Vision Language Models Good Game Players? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These benchmarks are limited by issues such as inadequate assessment of detailed visual perception, data contamination, and a lack of focus on multi-turn reasoning. To address these challenges, we propose LVLM-Playground, a game-based evaluation framework designed to provide a comprehensive assessment of LVLMs’ cognitive and reasoning skills in structured environments. |
Xinyu Wang; Bohan Zhuang; Qi Wu; | code |
| 638 | Learning Long Range Dependencies on Graphs Via Random Walks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By treating random walks as sequences, our architecture leverages recent advances in sequence models to effectively capture long-range dependencies within these walks. Based on this concept, we propose a framework that offers (1) more expressive graph representations through random walk sequences, (2) the ability to utilize any sequence model for capturing long-range dependencies, and (3) the flexibility by integrating various GNN and GT architectures. |
Dexiong Chen; Till Hendrik Schulz; Karsten Borgwardt; | code |
| 639 | The Labyrinth of Links: Navigating The Associative Maze of Multi-modal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose benchmarking an essential but usually overlooked intelligence: $\textbf{association}$, a human’s basic capability to link observation and prior practice memory. |
Hong Li; Nanxi Li; Yuanjie Chen; Jianbin Zhu; Qinlu Guo; Cewu Lu; Yong-Lu Li; | code |
| 640 | AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes, $\textbf{AnalogGenie}$, a $\underline{\textbf{Gen}}$erat$\underline{\textbf{i}}$ve $\underline{\textbf{e}}$ngine for automatic design/discovery of $\underline{\textbf{Analog}}$ circuit topologies–the most challenging and creative task in the conventional manual design flow of analog ICs. |
Jian Gao; Weidong Cao; Junyi Yang; Xuan Zhang; | code |
| 641 | R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on R2GConv, we propose a Relaxed Rotation-Equivariant Network (R2Net) as the backbone and develop a Relaxed Rotation-Equivariant Object Detector (R2Det) for 2D object detection. |
Zhiqiang Wu; Yingjie Liu; Hanlin Dong; Xuan Tang; Jian Yang; Bo Jin; Mingsong Chen; Xian Wei; | code |
| 642 | Be More Diverse Than The Most Diverse: Optimal Mixtures of Generative Models Via Mixture-UCB Bandit Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we numerically show that a mixture of generative models on benchmark image datasets can indeed achieve a better evaluation score (based on FID and KID scores), compared to the individual models. |
Parham Rezaei; Farzan Farnia; Cheuk Ting Li; | code |
| 643 | Tackling Data Corruption in Offline Reinforcement Learning Via Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To unlock the full potential of sequence modeling, we propose **R**obust **D**ecision **T**ransformer (**RDT**) by incorporating three simple yet effective robust techniques: embedding dropout to improve the model’s robustness against erroneous inputs, Gaussian weighted learning to mitigate the effects of corrupted labels, and iterative data correction to eliminate corrupted data from the source. |
Jiawei Xu; Rui Yang; Shuang Qiu; Feng Luo; Meng Fang; Baoxiang Wang; Lei Han; | code |
| 644 | Robustness Reprogramming for Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work tackles an intriguing and fundamental open challenge in representation learning: Given a well-trained deep learning model, can it be reprogrammed to enhance its robustness against adversarial or noisy input perturbations without altering its parameters? To explore this, we revisit the core feature transformation mechanism in representation learning and propose a novel non-linear robust pattern matching technique as a robust alternative. |
Zhichao Hou; MohamadAli Torkamani; Hamid Krim; Xiaorui Liu; | code |
| 645 | Robust Root Cause Diagnosis Using In-Distribution Interventions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose In-Distribution Interventions (IDI), a novel algorithm that predicts root cause as nodes that meet two criteria: 1) Anomaly: root cause nodes should take on anomalous values; 2) Fix: had the root cause nodes assumed usual values, the target node would not have been anomalous. |
Lokesh Nagalapatti; Ashutosh Srivastava; Sunita Sarawagi; Amit Sharma; | code |
| 646 | Beyond Next Token Prediction: Patch-Level Training for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that it is possible to significantly reduce the training costs of LLMs without sacrificing their performance. |
Chenze Shao; Fandong Meng; Jie Zhou; | code |
| 647 | CL-DiffPhyCon: Closed-loop Diffusion Control of Complex Physical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient Closed-Loop Diffusion method for Physical systems Control (CL-DiffPhyCon). |
Long Wei; Haodong Feng; Yuchen Yang; Ruiqi Feng; Peiyan Hu; Xiang Zheng; Tao Zhang; Dixia Fan; Tailin Wu; | code |
| 648 | X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their success in modeling single-modality data marginal distribution, there is an under- exploration in the mutual reliance between different modalities to describe com- plex driving scenes. To fill in this gap, we propose a novel framework, X-DRIVE, to model the joint distribution of point clouds and multi-view images via a dual- branch latent diffusion model architecture. |
Yichen Xie; Chenfeng Xu; Chensheng Peng; Shuqi Zhao; Nhat Ho; Alexander T. Pham; Mingyu Ding; Masayoshi Tomizuka; Wei Zhan; | code |
| 649 | Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel autoregressive modeling approach for speech synthesis, combining a variational autoencoder (VAE) with a multi-modal latent space and an autoregressive model that uses Gaussian Mixture Models (GMM) as the conditional probability distribution. |
Weiwei Lin; Chenhang HE; | code |
| 650 | Towards Calibrated Deep Clustering Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel dual head (calibration head and clustering head) deep clustering model that can effectively calibrate the estimated confidence and the actual accuracy. |
Yuheng Jia; Jianhong Cheng; Hui LIU; Junhui Hou; | code |
| 651 | Zero-cost Proxy for Adversarial Robustness Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a zero-cost proxy to evaluate the adversarial robustness without training. |
Yuqi Feng; Yuwei Ou; Jiahao Fan; Yanan Sun; | code |
| 652 | A Simple Framework for Open-Vocabulary Zero-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This deficiency is often attributed to the absence of localization cues in captions and the intertwined nature of the learning process, which encompasses both image/text representation learning and cross-modality alignment. To tackle these issues, we propose SimZSS, a $\textbf{Sim}$ple framework for open-vocabulary $\textbf{Z}$ero-$\textbf{S}$hot $\textbf{S}$egmentation. |
Thomas Stegmüller; Tim Lebailly; Nikola Đukić; Behzad Bozorgtabar; Tinne Tuytelaars; Jean-Philippe Thiran; | code |
| 653 | TabWak: A Watermark for Tabular Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design TabWak, the first watermarking method to embed invisible signatures that control the sampling of Gaussian latent codes used to synthesize table rows via the diffusion backbone. |
Chaoyi Zhu; Jiayi Tang; Jeroen M. Galjaard; Pin-Yu Chen; Robert Birke; Cornelis Bos; Lydia Y. Chen; | code |
| 654 | DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often struggle with domain-specific nuances and underrepresented fine-grained categories. To address these challenges, we introduce DynAlign, a two-stage framework that integrates UDA with foundation models to bridge both the image-level and label-level domain gaps. |
Han Sun; Rui Gong; Ismail Nejjar; Olga Fink; | code |
| 655 | Concept Pinpoint Eraser for Text-to-image Diffusion Models Via Residual Attention Gate Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has … |
Byung Hyun Lee; Sungjin Lim; Seunggyu Lee; Dong Un Kang; Se Young Chun; | code |
| 656 | Moner: Motion Correction in Undersampled Radial MRI with Unsupervised Neural Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Moner, an unsupervised MoCo method that jointly reconstructs artifact-free MR images and estimates accurate motion from undersampled, rigid motion-corrupted k-space data, without requiring any training data. |
Qing Wu; Chenhe Du; Xuanyu Tian; Jingyi Yu; Yuyao Zhang; Hongjiang Wei; | code |
| 657 | UV-Attack: Physical-World Adversarial Attacks on Person Detection Via Dynamic-NeRF-based UV Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce \texttt{UV-Attack}, a novel physical adversarial attack achieving high attack success rates in scenarios involving extensive and unseen actions. |
Yanjie Li; Kaisheng Liang; Bin Xiao; | code |
| 658 | MCNC: Manifold-Constrained Reparameterization for Neural Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel model compres- sion method, which we term Manifold-Constrained Neural Compression (MCNC). |
Chayne Thrash; Reed Andreas; Ali Abbasi; Parsa Nooralinejad; Soroush Abbasi Koohpayegani; Hamed Pirsiavash; Soheil Kolouri; | code |
| 659 | Open-CK: A Large Multi-Physics Fields Coupling Benchmarks in Combustion Kinetics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use the Fire Dynamics Simulator (FDS) combined with the {\fontfamily{lmtt}\selectfont \textit{supercomputer}} support to create a \textbf{C}ombustion \textbf{K}inetics (CK) dataset for machine learning and scientific research.We also introduce three benchmarks to demonstrate their potential in enhancing the exploration of downstream tasks: (a) capturing continuous changes in combustion kinetics; (b) a neural partial differential equation solver for learning temperature fields and turbulence; (c) reconstruction of sparse physical observations. |
Zaige Fei; Fan Xu; Junyuan Mao; Yuxuan Liang; Qingsong Wen; Kun Wang; Hao Wu; Yang Wang; | code |
| 660 | Zero-shot Model-based Reinforcement Learning Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how pre-trained LLMs can be leveraged to predict in context the dynamics of continuous Markov decision processes. |
Abdelhakim Benechehab; Youssef Attia El Hili; Ambroise Odonnat; Oussama Zekri; Albert Thomas; Giuseppe Paolo; Maurizio Filippone; Ievgen Redko; Balázs Kégl; | code |
| 661 | Physics-aligned Field Reconstruction with Diffusion Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, except for the low accuracy on complex physical systems, these models often fail to comply with essential physical constraints, such as governing equations and boundary conditions. To overcome this limitation, we introduce a novel data-driven field reconstruction framework, termed the Physics-aligned Schr\{o}dinger Bridge (PalSB). |
Zeyu Li; Hongkun Dou; Shen Fang; Wang Han; Yue Deng; Lijun Yang; | code |
| 662 | TASAR: Transfer-based Attack on Skeletal Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate this phenomenon via the characterization of the loss function.For exhaustive evaluation, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense models. |
Yunfeng Diao; Baiqi Wu; Ruixuan Zhang; Ajian Liu; Xiaoshuai Hao; Xingxing Wei; Meng Wang; He Wang; | code |
| 663 | Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework for learning policies modeled by contractive dynamical systems, ensuring that all policy rollouts converge regardless of perturbations, and in turn, enable efficient OOS recovery. |
Amin Abyaneh; Mahrokh Ghoddousi Boroujeni; Hsiu-Chin Lin; Giancarlo Ferrari-Trecate; | code |
| 664 | PnP-Flow: Plug-and-Play Image Restoration with Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Plug-and-Play (PnP) Flow Matching, an algorithm for solving imaging inverse problems. |
Ségolène Tiffany Martin; Anne Gagneux; Paul Hagemann; Gabriele Steidl; | code |
| 665 | SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing individual kinase-based approaches focus solely on sequence inputs, neglecting crucial structural information. To address this limitation, we introduce SAGEPhos (Structure-aware kinAse-substrate bio-coupled and bio-auGmented nEtwork for Phosphorylation site prediction), a novel framework that modifies the semantic space of main protein inputs using auxiliary inputs at two distinct modality levels. |
Jingjie Zhang; Hanqun CAO; Zijun Gao; Xiaorui Wang; Chunbin Gu; | code |
| 666 | AstroCompress: A Benchmark Dataset for Multi-purpose Compression of Astronomical Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces [AstroCompress](https://huggingface.co/AstroCompress): a neural compression challenge for astrophysics data, featuring four new datasets (and one legacy dataset) with 16-bit unsigned integer imaging data in various modes: space-based, ground-based, multi-wavelength, and time-series imaging. |
Tuan Truong; Rithwik Sudharsan; Yibo Yang; Peter Xiangyuan Ma; Ruihan Yang; Stephan Mandt; Joshua S. Bloom; | code |
| 667 | Discrete Distribution Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. |
Lei Yang; | code |
| 668 | Understanding and Mitigating Bottlenecks of State Space Models Through The Lens of Recency and Over-smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This *fundamental dilemma* between recency and over-smoothing hinders the scalability of existing SSMs. Inspired by our theoretical findings, we propose to *polarize* two channels of the state transition matrices in SSMs, setting them to zero and one, respectively, simultaneously addressing recency bias and over-smoothing. |
Peihao Wang; Ruisi Cai; Yuehao Wang; Jiajun Zhu; Pragya Srivastava; Zhangyang Wang; Pan Li; | code |
| 669 | Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we endeavor to develop a scalable communication protocol for MARL. |
Xinran Li; Xiaolu Wang; Chenjia Bai; Jun Zhang; | code |
| 670 | FreCaS: Efficient Higher-Resolution Image Generation Via Frequency-aware Cascaded Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods typically perform the entire sampling process at full resolution and process all frequency components simultaneously, contradicting with the inherent coarse-to-fine nature of latent diffusion models and wasting computations on processing premature high-frequency details at early diffusion stages. To address this issue, we introduce an efficient $\textbf{Fre}$quency-aware $\textbf{Ca}$scaded $\textbf{S}$ampling framework, $\textbf{FreCaS}$ in short, for higher-resolution image generation. |
Zhengqiang ZHANG; Ruihuang Li; Lei Zhang; | code |
| 671 | TDDBench: A Benchmark for Training Data Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce TDDBench, which consists of 13 datasets spanning three data modalities: image, tabular, and text. |
Zhihao Zhu; Yi Yang; Defu Lian; | code |
| 672 | Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By facilitating the clients to download multiple pre-aggregated prompts as fixed non-local experts, we propose Personalized Federated Mixture of Adaptive Prompts (pFedMoAP), a novel FL framework that personalizes the prompt learning process through the lens of Mixture of Experts (MoE). |
Jun Luo; Chen Chen; Shandong Wu; | code |
| 673 | PaCA: Partial Connection Adaptation for Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, LoRA and its variants do not reduce activation memory, as the first low-rank adapter matrix still requires the input activations to the pretrained weights to compute weight gradients. To mitigate this issue, we propose **Pa**rtial **C**onnection **A**daptation (**PaCA**), which fine-tunes randomly selected partial connections within the pretrained weights instead of introducing adapter layers in the model. |
Sunghyeon Woo; Sol Namkung; Sunwoo Lee; Inho Jeong; Beomseok Kim; Dongsuk Jeon; | code |
| 674 | Hummingbird: High Fidelity Image Generation Via Multimodal Context Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the first model to address the task of maintaining both diversity and fidelity given a multimodal context, we introduce a new benchmark formulation incorporating MME Perception and Bongard HOI datasets. |
Minh-Quan Le; Gaurav Mittal; Tianjian Meng; A S M Iftekhar; Vishwas Suryanarayanan; Barun Patra; Dimitris Samaras; Mei Chen; | code |
| 675 | A Large-scale Dataset and Benchmark for Commuting Origin-Destination Flow Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works developed models based on different techniques and achieved improvement on different datasets with different evaluation metrics, which hinderes establishing a unified standard for comparing model performance. To bridge this gap, we introduce a large-scale dataset containing commuting OD flows for 3,333 areas including a wide range of urban environments around the United States. |
Can Rong; Jingtao Ding; Yan Liu; Yong Li; | code |
| 676 | Fictitious Synthetic Data Can Improve LLM Factuality Via Prerequisite Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent studies have identified one aggravating factor of LLM hallucinations as the knowledge inconsistency between pre-training and fine-tuning, where unfamiliar fine-tuning data mislead the LLM to fabricate plausible but wrong outputs. In this paper, we propose a novel fine-tuning strategy called Prereq-Tune to address this knowledge inconsistency and reduce hallucinations. |
Yujian Liu; Shiyu Chang; Tommi Jaakkola; Yang Zhang; | code |
| 677 | ContraDiff: Planning Towards High Return States Via Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method called Contrastive Diffuser (ContraDiff) to make full use of low-return trajectories and improve the performance of offline RL algorithms. |
Yixiang Shan; Zhengbang Zhu; Ting Long; Liang Qifan; Yi Chang; Weinan Zhang; Liang Yin; | code |
| 678 | Forgetting Transformer: Softmax Attention with A Forget Gate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that FoX outperforms the Transformer on long-context language modeling, length extrapolation, and short-context downstream tasks, while performing on par with the Transformer on long-context downstream tasks. |
Zhixuan Lin; Evgenii Nikishin; Xu He; Aaron Courville; | code |
| 679 | Open-Set Graph Anomaly Detection Via Normal Structure Regularisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, existing open-set AD models were introduced to handle Euclidean data, failing to effectively capture discriminative features from graph structure and node attributes for GAD. In this work, we propose a novel open-set GAD approach, namely $\underline{n}ormal$ $\underline{s}tructure$ $\underline{reg}ularisation$ (**NSReg**), to achieve generalised detection ability to unseen anomalies, while maintaining its effectiveness on detecting seen anomalies. |
Qizhou Wang; Guansong Pang; Mahsa Salehi; Xiaokun Xia; Christopher Leckie; | code |
| 680 | Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they are often hindered by strong assumptions or intrinsic randomness. To overcome these challenges, we propose the Narrowing Information Bottleneck Theory, a novel framework that fundamentally redefines the traditional bottleneck approach. |
Zhiyu Zhu; Zhibo Jin; Jiayu Zhang; NAN YANG; Jiahao Huang; Jianlong Zhou; Fang Chen; | code |
| 681 | On The Adversarial Risk of Test Time Adaptation: An Investigation Into Realistic Test-Time Data Poisoning Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Test-time adaptation (TTA) updates the model weights during the inference stage using testing data to enhance generalization. However, this practice exposes TTA to adversarial … |
Yongyi Su; Yushu Li; Nanqing Liu; Kui Jia; Xulei Yang; Chuan-Sheng Foo; Xun Xu; | code |
| 682 | ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. |
Zihan Ye; Shreyank N Gowda; Shiming Chen; Xiaowei Huang; Haotian Xu; Fahad Shahbaz Khan; Yaochu Jin; Kaizhu Huang; Xiaobo Jin; | code |
| 683 | Towards Hierarchical Rectified Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate a hierarchical rectified flow to model data distributions. |
Yichi Zhang; Yici Yan; Alex Schwing; Zhizhen Zhao; | code |
| 684 | Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (**VLSA**) paradigm. |
Pei Liu; Luping Ji; Jiaxiang Gou; Bo Fu; Mao Ye; | code |
| 685 | SOO-Bench: Benchmarks for Evaluating The Stability of Offline Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper proposes benchmarks named SOO-Bench (i.e., Stable Offline Optimization Benchmarks) for offline black-box optimization algorithms, so as to systematically evaluate the stability of surpassing the offline dataset under different data distributions. |
Hong Qian; Yiyi Zhu; Xiang Shu; Shuo Liu; Yaolin Wen; Xin An; Huakang Lu; Aimin Zhou; Ke Tang; Yang Yu; | code |
| 686 | DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing inference systems for tree-based applications are inefficient due to improper partitioning of queries and KV cache during attention calculation.This leads to two main issues: (1) a lack of memory access (IO) reuse for KV cache of shared prefixes, and (2) poor load balancing.As a result, there is redundant KV cache IO between GPU global memory and shared memory, along with low GPU utilization. To address these challenges, we propose DeFT(Decoding with Flash Tree-Attention), a hardware-efficient attention algorithm with prefix-aware and load-balanced KV cache partitions. |
Jinwei Yao; Kaiqi Chen; Kexun Zhang; Jiaxuan You; Binhang Yuan; Zeke Wang; Tao Lin; | code |
| 687 | AutoCGP: Closed-Loop Concept-Guided Policies from Unlabeled Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel imitation learning framework to train closed-loop concept-guided policies that enhance long-horizon task performance by leveraging discovered manipulation concepts. |
Pei Zhou; Ruizhe Liu; Qian Luo; Fan Wang; Yibing Song; Yanchao Yang; | code |
| 688 | Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Few-Class Arena (FCA), as a unified benchmark with focus on testing efficient image classification models for few classes. |
Bryan Bo Cao; Lawrence O’Gorman; Michael Coss; Shubham Jain; | code |
| 689 | M^3PC: Test-time Model Predictive Control Using Pretrained Masked Trajectory Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this information has not been fully exploited during the inference phase, where the agent needs to generate an optimal policy instead of just reconstructing masked components from unmasked. Given that a pretrained trajectory model can act as both a Policy Model and a World Model with appropriate mask patterns, we propose using Model Predictive Control (MPC) at test time to leverage the model’s own predictive capacity to guide its action selection. |
Kehan Wen; Yutong Hu; Yao Mu; Lei Ke; | code |
| 690 | Medium-Difficulty Samples Constitute Smoothed Decision Boundary for Knowledge Distillation on Pruned Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper tackles a new problem of dataset pruning for Knowledge Distillation (KD), from a fresh perspective of Decision Boundary (DB) preservation and drifts. |
Yudong Chen; Xuwei Xu; Frank de Hoog; Jiajun Liu; Sen Wang; | code |
| 691 | Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards a solution, we introduce MIRAGE (Multi-Image Retrieval Augmented Generation), an open-source, lightweight visual-RAG framework that processes up to 10k images on a single 40G A100 GPU—far surpassing the 1k-image limit of contemporary models. |
Tsung-Han Wu; Giscard Biamby; Jerome Quenum; Ritwik Gupta; Joseph E. Gonzalez; Trevor Darrell; David Chan; | code |
| 692 | Atlas Gaussians Diffusion for 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. |
Haitao Yang; Yuan Dong; Hanwen Jiang; Dejia Xu; Georgios Pavlakos; Qixing Huang; | code |
| 693 | Accelerating Neural ODEs: A Variational Formulation-based Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose VF-NODE, a novel approach based on the variational formulation (VF) to accelerate the training of NODEs. |
Hongjue Zhao; Yuchen Wang; Hairong Qi; Zijie Huang; Han Zhao; Lui Sha; Huajie Shao; | code |
| 694 | GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose GALA, a novel representation of 3D shapes that (i) excels at capturing and reproducing complex geometry and surface details, (ii) is computationally efficient, and (iii) lends itself to 3D generative modelling with modern, diffusion-based schemes. |
Dingdong Yang; Yizhi Wang; Konrad Schindler; Ali Mahdavi Amiri; Hao Zhang; | code |
| 695 | REBIND: Enhancing Ground-state Molecular Conformation Prediction Via Force-Based Graph Rewiring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, significant prediction errors occur for atoms with low degree (ie., low coordination numbers) whose conformations are primarily influenced by non-bonded interactions. To address this, we propose ReBIND, a novel framework that rewires molecular graphs by adding edges based on the Lennard-Jones potential to capture non-bonded interactions for low-degree atoms. |
Taewon Kim; Hyunjin Seo; Sungsoo Ahn; Eunho Yang; | code |
| 696 | Advantage-Guided Distillation for Preference Alignment in Small Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their impact often diminishes when applied to Small Language Models (SLMs), likely due to the limited capacity of these models. Instead of directly applying existing alignment techniques to SLMs, we propose to utilize a well-aligned teacher LLM to guide the alignment process for these models, thereby facilitating the transfer of the teacher’s knowledge of human preferences to the student model. |
Shiping Gao; Fanqi Wan; Jiajian Guo; Xiaojun Quan; Qifan Wang; | code |
| 697 | An Engorgio Prompt Makes Large Language Model Babble on Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore their vulnerability to inference cost attacks, where a malicious user crafts Engorgio prompts to intentionally increase the computation cost and latency of the inference process. |
Jianshuo Dong; Ziyuan Zhang; Qingjie Zhang; Tianwei Zhang; Hao Wang; Hewu Li; Qi Li; Chao Zhang; Ke Xu; Han Qiu; | code |
| 698 | How New Data Permeates LLM Knowledge and How to Dilute It Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that when learning new information, LLMs exhibit a priming effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. To systematically study this phenomenon, we introduce Outlandish, a carefully curated dataset of 1320 diverse text samples designed to probe how new knowledge permeates through an LLM’s existing knowledge base. |
Chen Sun; Renat Aksitov; Andrey Zhmoginov; Nolan Andrew Miller; Max Vladymyrov; Ulrich Rueckert; Been Kim; Mark Sandler; | code |
| 699 | Preserving Deep Representations in One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SNOWS, a one-shot post-training pruning framework aimed at reducing the cost of vision network inference without retraining. |
Ryan Lucas; Rahul Mazumder; | code |
| 700 | LongMamba: Enhancing Mamba’s Long-Context Capabilities Via Training-Free Receptive Field Enlargement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite their efficiency in handling long contexts, recent studies have shown that SSMs, such as Mamba models, generally underperform compared to Transformers in long-context understanding tasks. To address this significant shortfall and achieve both efficient and accurate long-context understanding, we propose LongMamba, a training-free technique that significantly enhances the long-context capabilities of Mamba models. |
Zhifan Ye; Kejing Xia; Yonggan Fu; Xin Dong; Jihoon Hong; Xiangchi Yuan; Shizhe Diao; Jan Kautz; Pavlo Molchanov; Yingyan Celine Lin; | code |
| 701 | SMITE: Segment Me In TimE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The difficulty increases when the segmentation is with arbitrary granularity, meaning the number of segments can vary arbitrarily, and masks are defined based on only one or a few sample images. In this paper, we address this issue by employing a pre-trained text to image diffusion model supplemented with an additional tracking mechanism. |
Amirhossein Alimohammadi; Sauradip Nag; Saeid Asgari; Andrea Tagliasacchi; Ghassan Hamarneh; Ali Mahdavi Amiri; | code |
| 702 | Adaptive Teachers for Amortized Samplers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to use an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student). |
Minsu Kim; Sanghyeok Choi; Taeyoung Yun; Emmanuel Bengio; Leo Feng; Jarrid Rector-Brooks; Sungsoo Ahn; Jinkyoo Park; Nikolay Malkin; Yoshua Bengio; | code |
| 703 | InfoGS: Efficient Structure-Aware 3D Gaussians Via Lightweight Information Shaping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Especially, if one wants to animate or edit objects in the scene, as this requires coordination among the many Gaussians involved in representing each object. To address this issue, we develop a mutual information shaping technique that enforces resonance and coordination between correlated Gaussians via a Gaussian attribute decoding network. |
Yunchao Zhang; Guandao Yang; Leonidas Guibas; Yanchao Yang; | code |
| 704 | Equivariant Masked Position Prediction for Efficient Molecular Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the limited availability of molecular data raises concerns regarding GNNs’ ability to effectively capture the fundamental principles of physics and chemistry, which constrains their generalization capabilities. To address this challenge, we introduce a novel self-supervised approach termed Equivariant Masked Position Prediction (EMPP), grounded in intramolecular potential and force theory. |
Junyi An; Chao Qu; Yun-Fei Shi; XinHao Liu; Qianwei Tang; Fenglei Cao; Yuan Qi; | code |
| 705 | PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose Positive-Negative Generative Adversarial Imitation Learning (PN-GAIL), a novel approach that falls within the framework of Generative Adversarial Imitation Learning (GAIL). |
Qiang Liu; Huiqiao Fu; Kaiqiang Tang; Chunlin Chen; Daoyi Dong; | code |
| 706 | ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ShortcutsBench, a large-scale benchmark for the comprehensive evaluation of API-based agents in solving real-world complex tasks. |
Haiyang SHEN; Yue Li; Desong Meng; Dongqi Cai; Sheng Qi; Li Zhang; Mengwei Xu; Yun Ma; | code |
| 707 | ConMix: Contrastive Mixup at Representation Level for Long-tailed Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unaware of the data distribution and sample labels, long-tailed deep clustering is highly challenging. To tackle this problem, we propose a novel contrastive mixup method for long-tailed deep clustering, named ConMix. |
Zhixin Li; Yuheng Jia; | code |
| 708 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the high-dimensional nature of the problem space and scarcity of high-quality 3D data, these pre-trained models still struggle to generalize to many challenging circumstances, such as limited view overlap or low lighting. To address this, we propose LoRA3D, an efficient self-calibration pipeline to *specialize* the pre-trained models to target scenes using their own multi-view predictions. |
Ziqi Lu; Heng Yang; Danfei Xu; Boyi Li; Boris Ivanovic; Marco Pavone; Yue Wang; | code |
| 709 | ProtoSnap: Prototype Alignment For Cuneiform Signs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an unsupervised approach for recovering the fine-grained internal configuration of cuneiform signs by leveraging powerful generative models and the appearance and structure of prototype font images as priors.We provide a new benchmark of expert annotations and evaluate our method on this task. |
Rachel Mikulinsky; Morris Alper; Shai Gordin; Enrique Jiménez; Yoram Cohen; Hadar Averbuch-Elor; | code |
| 710 | The OMG Dataset: An Open MetaGenomic Corpus for Mixed-modality Genomic Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present the Open MetaGenomic (OMG) corpus, a genomic pretraining dataset totalling 3.1T base pairs and 3.3B protein coding sequences, obtained by combining two largest metagenomic dataset repositories (JGI’s IMG and EMBL’s MGnify). |
Andre Cornman; Jacob West-Roberts; Antonio Pedro Camargo; Simon Roux; Martin Beracochea; Milot Mirdita; Sergey Ovchinnikov; Yunha Hwang; | code |
| 711 | Noise Separation Guided Candidate Label Reconstruction for Noisy Partial Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically prove that the generalization error of the classifier constructed under NPLL paradigm is bounded by the noise rate and the average length of the candidate label set. |
Xiaorui Peng; Yuheng Jia; Fuchao Yang; Ran Wang; Min-Ling Zhang; | code |
| 712 | Offline RL with Smooth OOD Generalization in Convex Hull and Its Neighborhood Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This over-constraint issue results in poor $Q$-value estimation and hinders policy improvement. In this paper, we introduce a novel approach to achieve better $Q$-value estimation by enhancing $Q$-function generalization in OOD regions within Convex Hull and its Neighborhood (CHN). |
Qingmao Yao; Zhichao Lei; Tianyuan Chen; Ziyue Yuan; Xuefan Chen; Jianxiang Liu; Faguo Wu; Xiao Zhang; | code |
| 713 | MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MOFFlow, the first deep generative model tailored for MOF structure prediction. |
Nayoung Kim; Seongsu Kim; Minsu Kim; Jinkyoo Park; Sungsoo Ahn; | code |
| 714 | Animate Your Thoughts: Reconstruction of Dynamic Natural Vision from Human Brain Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although prior video reconstruction methods have made substantial progress, they still suffer from several limitations, including: (1) difficulty in simultaneously reconciling semantic (e.g. categorical descriptions), structure (e.g. size and color), and consistent motion information (e.g. order of frames); (2) low temporal resolution of fMRI, which poses a challenge in decoding multiple frames of video dynamics from a single fMRI frame; (3) reliance on video generation models, which introduces ambiguity regarding whether the dynamics observed in the reconstructed videos are genuinely derived from fMRI data or are hallucinations from generative model. To overcome these limitations, we propose a two-stage model named Mind-Animator. |
Yizhuo Lu; Changde Du; Chong Wang; Xuanliu Zhu; Liuyun Jiang; Xujin Li; Huiguang He; | code |
| 715 | Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that standard permutation equivariant denoisers face fundamental limitations in these tasks due to their inability to break symmetries in noisy inputs. To address this, we propose \emph{aligning} input and target graphs to break input symmetries while preserving permutation equivariance in non-matching graph portions. |
Najwa Laabid; Severi Rissanen; Markus Heinonen; Arno Solin; Vikas Garg; | code |
| 716 | LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces LaMP, a novel Language-Motion Pretraining model, which transitions from a language-vision to a more suitable language-motion latent space. |
Zhe Li; Weihao Yuan; Yisheng HE; Lingteng Qiu; Shenhao Zhu; Xiaodong Gu; Weichao Shen; Yuan Dong; Zilong Dong; Laurence Tianruo Yang; | code |
| 717 | DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although significant progress has been made in diffusion-based talking head generation, almost all methods rely on autoregressive strategies, which suffer from limited context utilization beyond the current generation step, error accumulation, and slower generation speed. To address these challenges, we present DAWN (\textbf{D}ynamic frame \textbf{A}vatar \textbf{W}ith \textbf{N}on-autoregressive diffusion), a framework that enables all-at-once generation of dynamic-length video sequences. |
Hanbo Cheng; Limin Lin; Chenyu Liu; Pengcheng Xia; Pengfei Hu; Jiefeng Ma; Jun Du; Jia Pan; | code |
| 718 | Adversarial Attacks on Data Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there has been little to no systematic research addressing this issue. In this work, we aim to bridge this gap by detailing a threat model with clear assumptions about the adversary’s goal and capabilities and proposing principled adversarial attack methods on data attribution. |
Xinhe Wang; Pingbang Hu; Junwei Deng; Jiaqi W. Ma; | code |
| 719 | Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While knowledge-guided context optimization has been proposed by constructing consistency constraints to handle catastrophic forgetting in the pre-trained backbone, it also introduces a bias toward pre-training. This paper proposes a novel and simple Divergence-enhanced Knowledge-guided Prompt Tuning (DeKg) method to address this issue. |
Yilun Li; Miaomiao Cheng; Xu Han; Wei Song; | code |
| 720 | Second-Order Fine-Tuning Without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose HiZOO, a diagonal Hessian informed Zeroth-Order Optimizer , which is the first work to leverage the diagonal Hessian to enhance ZOO for fine-tuning LLMs. |
Yanjun Zhao; Sizhe Dang; Haishan Ye; Guang Dai; Yi Qian; Ivor Tsang; | code |
| 721 | Linear Partial Gromov-Wasserstein Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this, both GW and PGW face significant computational challenges due to their non-convex nature. To overcome these challenges, we propose the linear partial Gromov-Wasserstein (LPGW) embedding, a linearized embedding technique for the PGW problem. |
Yikun Bai; Abihith Kothapalli; Hengrong Du; Rocio Diaz Martin; Soheil Kolouri; | code |
| 722 | Partial Gromov-Wasserstein Metric Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a particular case of the UGW problem, termed Partial Gromov-Wasserstein (PGW). |
Yikun Bai; Rocio Diaz Martin; Abihith Kothapalli; Hengrong Du; Xinran Liu; Soheil Kolouri; | code |
| 723 | $F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To advance research in video understanding, we introduce $F^3Set$, a benchmark that consists of video datasets for precise $F^3$ event detection. |
Zhaoyu Liu; Kan Jiang; Murong Ma; Zhe Hou; Yun Lin; Jin Song Dong; | code |
| 724 | Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond merely extrapolating from supervised learning, which suggests a link between flat reward landscapes and enhanced generalization, we aim to formally connect the flatness of the reward surface to the robustness of RL models. |
Hyun Kyu Lee; Sung Whan Yoon; | code |
| 725 | Mitigating Information Loss in Tree-Based Reinforcement Learning Via Direct Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL. |
Sascha Marton; Tim Grams; Florian Vogt; Stefan Lüdtke; Christian Bartelt; Heiner Stuckenschmidt; | code |
| 726 | Improving Long-Text Alignment for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as text inputs become longer, existing encoding methods like CLIP face limitations, and aligning the generated images with long texts becomes challenging. To tackle these issues, we propose LongAlign, which includes a segment-level encoding method for processing long texts and a decomposed preference optimization method for effective alignment training. |
Luping Liu; Chao Du; Tianyu Pang; Zehan Wang; Chongxuan Li; Dong Xu; | code |
| 727 | Fast and Accurate Blind Flexible Docking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing docking methods often face limitations: they either overlook crucial structural changes by assuming protein rigidity or suffer from low computational efficiency due to their reliance on generative models for structure sampling. To address these challenges, we propose FABFlex, a fast and accurate regression-based multi-task learning model designed for realistic blind flexible docking scenarios, where proteins exhibit flexibility and binding pocket sites are unknown (blind). |
Zizhuo Zhang; Lijun Wu; Kaiyuan Gao; Jiangchao Yao; Tao Qin; Bo Han; | code |
| 728 | Revisit The Open Nature of Open Vocabulary Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This contradicts the open nature of OVS, where ambiguous categories may both be correct from an open- world perspective. To address this, in this work, we study the open nature of OVS and propose a mask-wise evaluation protocol that is based on matched and mis- matched mask pairs between prediction and annotation respectively. |
Qiming Huang; Han Hu; Jianbo Jiao; | code |
| 729 | Spectral-Refiner: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these neural networks often entail considerable training expenses, and may not always achieve the desired accuracy required in many scientific and engineering disciplines. In this paper, we propose a new learning framework to address these issues. |
Shuhao Cao; Francesco Brarda; Ruipeng Li; Yuanzhe Xi; | code |
| 730 | Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Trajectory-LLM (Traj-LLM), a new approach that takes brief descriptions of vehicular interactions as input and generates corresponding trajectories.We have also created a new dataset, Language-to-Trajectory (L2T), which includes 240K textual descriptions of vehicle interactions and behaviors, each paired with corresponding map topologies and vehicle trajectory segments. |
Kairui Yang; Zihao Guo; Gengjie Lin; Haotian Dong; Zhao Huang; Yipeng Wu; Die Zuo; Jibin Peng; Ziyuan Zhong; Xin WANG; Qing Guo; Xiaosong Jia; Junchi Yan; Di Lin; | code |
| 731 | BoneMet: An Open Large-Scale Multi-Modal Murine Dataset for Breast Cancer Bone Metastasis Diagnosis and Prognosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As such, we introduce the Bone Metastasis (BoneMet) dataset, the first large-scale, publicly available, high-resolution medical resource, which is derived from a well-accepted murine BCBM model. |
Tiankuo Chu; Fudong Lin; Shubo Wang; Jason Jiang; Wiley Jia-Wei Gong; Xu Yuan; Liyun Wang; | code |
| 732 | Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following this new setting, challenges arise in leveraging incomplete pairs of ground truth and predictions for backpropagation, as well as in generalizing accurate information without overfitting to noise from recent data streams. To address these challenges, we propose a novel dual-stream framework for online forecasting (DSOF): a slow stream that updates with complete data using experience replay, and a fast stream that adapts to recent data through temporal difference learning. |
Ying-yee Ava Lau; Zhiwen Shao; Dit-Yan Yeung; | code |
| 733 | Mastering Task Arithmetic: $\tau$Jp As A Key Indicator for Weight Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present three key contributions in the context of task addition and task negation within task arithmetic. |
Kotaro Yoshida; Yuji Naraki; Takafumi Horie; Ryosuke Yamaki; Ryotaro Shimizu; Yuki Saito; Julian McAuley; Hiroki Naganuma; | code |
| 734 | Enhancing Federated Domain Adaptation with Multi-Domain Prototype-Based Federated Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This further undermines both in-domain and out-of-domain performance (within the same federated system but outside the local client), which is critical in certain business applications. To address this, we propose a novel framework called \textbf{M}ulti-domain \textbf{P}rototype-based \textbf{F}ederated Fine-\textbf{T}uning (MPFT). |
Jingyuan Zhang; Yiyang Duan; Shuaicheng Niu; YANG CAO; Wei Yang Bryan Lim; | code |
| 735 | Learning A Fast Mixing Exogenous Block MDP Using A Single Trajectory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose STEEL, the first provably sample-efficient algorithm for learning the controllable dynamics of an Ex-BMDP from a single trajectory, in the function approximation setting. |
Alexander Levine; Peter Stone; Amy Zhang; | code |
| 736 | Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To implement the resolution attack, we propose an automated framework capable of generating dual-semantic images in a zero-shot manner. |
Wangjia Yu; Xiaomeng Fu; Qiao Li; Jizhong Han; Xiaodan Zhang; | code |
| 737 | GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel perspective by emphasizing the full utilization of labels. |
Xinyi Shang; Peng Sun; Tao Lin; | code |
| 738 | Local Patterns Generalize Better for Novel Anomalies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework that identifies the local patterns which generalize to novel samples and models the dynamics of local patterns. |
Yalong Jiang; | code |
| 739 | Dataset Ownership Verification in Contrastive Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first dataset ownership verification method tailored specifically for self-supervised pre-trained models by contrastive learning. |
Yuechen Xie; Jie Song; Mengqi Xue; Haofei Zhang; Xingen Wang; Bingde Hu; Genlang Chen; Mingli Song; | code |
| 740 | A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new hyperparameter-free graph construction of molecules and beyond with sparsity, connectivity, and rigidity guarantees. |
Shih-Hsin Wang; Yuhao Huang; Justin M. Baker; Yuan-En Sun; Qi Tang; Bao Wang; | code |
| 741 | Complementary Label Learning with Positive Label Guessing and Negative Label Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To be specific, we propose to split the inverse problem into two subtasks: positive label guessing (PLG) and negative label enhancement (NLE), collectively called PLNL. |
Yuhang Li; Zhuying Li; Yuheng Jia; | code |
| 742 | Learning Efficient Positional Encodings with Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify four key properties that graph PEs should satisfy: stability, expressive power, scalability, and genericness. |
Charilaos Kanatsoulis; Evelyn Choi; Stefanie Jegelka; Jure Leskovec; Alejandro Ribeiro; | code |
| 743 | Following The Human Thread in Social Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first Social Dynamics Adaptation model (SDA) based on the robot’s state-action history to infer the social dynamics. |
Luca Scofano; Alessio Sampieri; Tommaso Campari; Valentino Sacco; Indro Spinelli; Lamberto Ballan; Fabio Galasso; | code |
| 744 | Causal Discovery Via Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing score-based methods for directed acyclic graph (DAG) learning from observational data struggle to recover the causal graph accurately and sample-efficiently. To overcome this, in this study, we propose DrBO (DAG recovery via Bayesian Optimization)—a novel DAG learning framework leveraging Bayesian optimization (BO) to find high-scoring DAGs. |
Bao Duong; Sunil Gupta; Thin Nguyen; | code |
| 745 | Federated Domain Generalization with Data-free On-server Matching Gradient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel approach, dubbed Federated Learning via On-server Matching Gradient (FedOMG), which can efficiently leverage domain information from distributed domains. |
Trong Binh Nguyen; Duong Minh Nguyen; Jinsun Park; Viet Quoc Pham; Won-Joo Hwang; | code |
| 746 | TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general visual understanding tasks. |
Yixuan Zhou; Long Bai; Sijia Cai; Bing Deng; Xing Xu; Heng Tao Shen; | code |
| 747 | Scalable Bayesian Learning with Posteriors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce **_posteriors_**, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes; (ii) present a tempered framing of stochastic gradient Markov chain Monte Carlo, as implemented in posteriors, that transitions seamlessly into optimization and unveils a minor modification to deep ensembles to ensure they are asymptotically unbiased for the Bayesian posterior, and (iii) demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models. |
Samuel Duffield; Kaelan Donatella; Johnathan Chiu; Phoebe Klett; Daniel Simpson; | code |
| 748 | SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation raises concerns about the practical robustness of SSL models in more realistic audio settings. To address this gap, we introduce Self-Supervised Learning from Audio Mixtures (SSLAM), a novel direction in audio SSL research, designed to improve the model’s ability to learn from polyphonic data while maintaining strong performance on monophonic data. |
Tony Alex; Sara Atito; Armin Mustafa; Muhammad Awais; Philip J B Jackson; | code |
| 749 | Spreading Out-of-Distribution Detection on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous approaches are evaluated using unrealistic benchmarks that consider only randomly selected OOD nodes, failing to reflect the interactions among nodes. In this paper, we introduce a new challenging task to model the interactions of OOD nodes in a graph, termed spreading OOD detection, where a newly emerged OOD node spreads its property to neighboring nodes. |
Daeho Um; Jongin Lim; Sunoh Kim; Yuneil Yeo; Yoonho Jung; | code |
| 750 | Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional, Black-box Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Efficient inference in high-dimensional models is a central challenge in machine learning. We introduce the Gaussian Ensemble Belief Propagation (GEnBP) algorithm, which combines the strengths of the Ensemble Kalman Filter (EnKF) and Gaussian Belief Propagation (GaBP) to address this challenge. |
Dan MacKinlay; Russell Tsuchida; Daniel Edward Pagendam; Petra Kuhnert; | code |
| 751 | Neural Eulerian Scene Flow Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We reframe scene flow as the task of estimating a continuous space-time ordinary differential equation (ODE) that describes motion for an entire observation sequence, represented … |
Kyle Vedder; Neehar Peri; Ishan Khatri; Siyi Li; Eric Eaton; Mehmet Kemal Kocamaz; Yue Wang; Zhiding Yu; Deva Ramanan; Joachim Pehserl; | code |
| 752 | EC-Diffuser: Multi-Object Manipulation Via Entity-Centric Behavior Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent approaches have utilized large-scale offline data to train models from pixel observations, achieving performance gains through scaling, these methods struggle with compositional generalization in unseen object configurations with constrained network and dataset sizes. To address these issues, we propose a novel behavioral cloning (BC) approach that leverages object-centric representations and an entity-centric Transformer with diffusion-based optimization, enabling efficient learning from offline image data. |
Carl Qi; Dan Haramati; Tal Daniel; Aviv Tamar; Amy Zhang; | code |
| 753 | EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Local SGD methods have been proposed to address these issues, but their effectiveness remains limited to small-scale training due to additional memory overhead and lack of concerns on efficiency and stability. To tackle these issues, we propose EDiT, an innovative Efficient Distributed Training method that combines a tailored Local SGD approach with model sharding techniques to enhance large-scale training efficiency. |
Jialiang Cheng; Ning Gao; Yun Yue; Zhiling Ye; Jiadi Jiang; Jian Sha; | code |
| 754 | Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel weight initialization method for neural networks with tanh activation function. |
Hyun woo Lee; Hayoung Choi; Hyunju Kim; | code |
| 755 | Understanding Matrix Function Normalizations in Covariance Pooling Through The Lens of Riemannian Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The current literature does not provide a satisfactory explanation of why Euclidean classifiers can be applied directly to Riemannian features after the normalization of the matrix power. To mitigate this gap, this paper provides a comprehensive and unified understanding of the matrix logarithm and power from a Riemannian geometry perspective. |
Ziheng Chen; Yue Song; Xiaojun Wu; Gaowen Liu; Nicu Sebe; | code |
| 756 | Gyrogroup Batch Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Several Riemannian manifolds in machine learning, such as Symmetric Positive Definite (SPD), Grassmann, spherical, and hyperbolic manifolds, have been proven to admit gyro structures, thus enabling a principled and effective extension of Euclidean Deep Neural Networks (DNNs) to manifolds. Inspired by this, this study introduces a general Riemannian Batch Normalization (RBN) framework on gyrogroups, termed GyroBN. |
Ziheng Chen; Yue Song; Xiaojun Wu; Nicu Sebe; | code |
| 757 | TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As model sizes continue growing, this strategy results in increasingly high computational costs and becomes unsustainable. To overcome this problem, we introduce Tokenformer, a natively scalable architecture that leverages the attention mechanism not only for computations among input tokens but also for interactions between tokens and model parameters, thereby enhancing architectural flexibility. |
Haiyang Wang; Yue Fan; Muhammad Ferjad Naeem; Yongqin Xian; Jan Eric Lenssen; Liwei Wang; Federico Tombari; Bernt Schiele; | code |
| 758 | SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a framework for pre-training of 3D hand pose estimation from in-the-wild hand images sharing with similar hand characteristics, dubbed SiMHand. |
Nie Lin; Takehiko Ohkawa; Yifei Huang; Mingfang Zhang; Minjie Cai; Ming Li; Ryosuke Furuta; Yoichi Sato; | code |
| 759 | HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present HyPoGen, a novel optimization-biased hypernetwork for policy generation. |
Hanxiang Ren; Li Sun; Xulong Wang; Pei Zhou; Zewen Wu; Siyan Dong; Difan Zou; Youyi Zheng; Yanchao Yang; | code |
| 760 | PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PFDiff, a novel training-free and orthogonal timestep-skipping strategy, which enables existing fast ODE solvers to operate with fewer NFE. |
Guangyi Wang; Yuren Cai; lijiang Li; Wei Peng; Song-Zhi Su; | code |
| 761 | DICE: Data Influence Cascade in Decentralized Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence “cascade” in a decentralized network. To overcome this, we design the first method to estimate Data Influence CascadE (DICE) in a decentralized environment. |
Tongtian Zhu; Wenhao Li; Can Wang; Fengxiang He; | code |
| 762 | Learning Gain Map for Inverse Tone Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a dual-branch network named GMNet, consisting of a Local Contrast Restoration (LCR) branch and a Global Luminance Estimation (GLE) branch to capture pixel-wise and image-wise information for GM estimation.Moreover, to facilitate the future research of the GM-ITM task, we build both synthetic and real-world datasets for comprehensive evaluations: synthetic SDR-GM pairs are generated from existing HDR resources, and real-world SDR-GM pairs are captured by mobile devices. |
Yinuo Liao; Yuanshen Guan; Ruikang Xu; Jiacheng Li; Shida Sun; Zhiwei Xiong; | code |
| 763 | Can One Modality Model Synergize Training of Other Modality Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From a practical viewpoint, our work aims to broaden the scope of multimodal learning to encompass the synergistic usage of single-modality models, relieving a strong limitation of paired supervision. |
Jae-Jun Lee; Sung Whan Yoon; | code |
| 764 | Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the increased data complexity inherent in adversarial training, and the myriad of ways that OOD samples can arise during testing, often prevent these approaches from establishing robust decision boundaries. To address these limitations, we propose AROS, a novel approach leveraging neural ordinary differential equations (NODEs) with Lyapunov stability theorem in order to obtain robust embeddings for OOD detection. |
Hossein Mirzaei; Mackenzie W Mathis; | code |
| 765 | Information Theoretic Text-to-Image Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of relying on fine-grained linguistic analyses of prompts, human annotation, or auxiliary vision-language models, we use Mutual Information (MI) to guide model alignment. |
CHAO WANG; Giulio Franzese; Alessandro Finamore; Massimo Gallo; Pietro Michiardi; | code |
| 766 | Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformers, on the other hand, suffer from the quadratic memory and computational complexity of self-attention mechanisms, scaling as $O(n^2)$, where $n$ is the sequence length. To address these challenges, we propose a state space model (SSM)-based world model, Drama, specifically leveraging Mamba, that achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies and enabling efficient training with longer sequences. |
Wenlong Wang; Ivana Dusparic; Yucheng Shi; Ke Zhang; Vinny Cahill; | code |
| 767 | Offline Hierarchical Reinforcement Learning Via Inverse Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose $\textit{OHIO}$: a framework for offline reinforcement learning (RL) of hierarchical policies. |
Carolin Schmidt; Daniele Gammelli; James Harrison; Marco Pavone; Filipe Rodrigues; | code |
| 768 | LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations and enables an efficient transfer of a pretrained VLM into a powerful VLA, motivated by the success of visual instruction tuning in Computer Vision. |
Xiang Li; Cristina Mata; Jongwoo Park; Kumara Kahatapitiya; Yoo Sung Jang; Jinghuan Shang; Kanchana Ranasinghe; Ryan D Burgert; Mu Cai; Yong Jae Lee; Michael S Ryoo; | code |
| 769 | Attribute-based Visual Reprogramming for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ***Attr**ibute-based **V**isual **R**eprogramming* (AttrVR) for CLIP, utilizing ***des**criptive **attr**ibutes* (DesAttrs) and ***dist**inctive **attr**ibutes* (DistAttrs), which respectively represent common and unique feature descriptions for different classes. |
Chengyi Cai; Zesheng Ye; Lei Feng; Jianzhong Qi; Feng Liu; | code |
| 770 | Locality Sensitive Avatars From Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present locality-sensitive avatar, a neural radiance field (NeRF) based network to learn human motions from monocular videos. |
Chunjin Song; Zhijie Wu; Shih-Yang Su; Bastian Wandt; Leonid Sigal; Helge Rhodin; | code |
| 771 | Mitigating Parameter Interference in Model Merging Via Sharpness-Aware Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the performance of a merged model, we note that a fine-tuning scheme should aim for (1) smaller parameter interference and (2) better performance of each fine-tuned model on the corresponding task. In this work, we aim to design a new fine-tuning objective function to work towards these two goals. |
Yeoreum Lee; Jinwook Jung; Sungyong Baik; | code |
| 772 | Online Reinforcement Learning in Non-Stationary Context-Driven Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Locally Constrained Policy Optimization (LCPO), an online RL approach that combats CF by anchoring policy outputs on old experiences while optimizing the return on current experiences. |
Pouya Hamadanian; Arash Nasr-Esfahany; Malte Schwarzkopf; Siddhartha Sen; Mohammad Alizadeh; | code |
| 773 | SSOLE: Rethinking Orthogonal Low-rank Embedding for Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, directly applying OLE to SSL poses significant challenges: (1) the virtually infinite number of classes in SSL makes achieving the OLE objective impractical, leading to representational collapse; and (2) low-rank constraints may fail to distinguish between positively and negatively correlated features, further undermining learning. To address these issues, we propose SSOLE (Self-Supervised Orthogonal Low-rank Embedding), a novel framework that integrates OLE principles into SSL by (1) decoupling the low-rank and high-rank enforcement to align with SSL objectives; and (2) applying low-rank constraints to feature deviations from their mean, ensuring better alignment of positive pairs by accounting for the signs of cosine similarities. |
Lun Huang; Qiang Qiu; Guillermo Sapiro; | code |
| 774 | On Quantizing Neural Representation for Variable-Rate Video Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces NeuroQuant, a novel post-training quantization (PTQ) approach tailored to non-generalized Implicit Neural Representations for variable-rate Video Coding (INR-VC). |
Junqi Shi; Zhujia Chen; Hanfei Li; Qi Zhao; Ming Lu; Tong Chen; Zhan Ma; | code |
| 775 | Sensitivity-Constrained Fourier Neural Operators for Forward and Inverse Problems in Parametric Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While deep learning frameworks like the Fourier Neural Operator (FNO) efficiently approximate differential equation solutions, they struggle with inverse problems, sensitivity calculations $\frac{\partial u}{\partial p}$, and concept drift. We address these challenges by introducing a novel sensitivity loss regularizer, demonstrated through Sensitivity-Constrained Fourier Neural Operators (SC-FNO). |
Abdolmehdi Behroozi; Chaopeng Shen; Daniel Kifer; | code |
| 776 | PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PeriodWave, a novel universal waveform generation model from Mel-spectrogram and neural audio codec. |
Sang-Hoon Lee; Ha-Yeong Choi; Seong-Whan Lee; | code |
| 777 | Reconstruction-Guided Policy: Enhancing Decision-Making Through Agent-Wise State Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the inconsistency between the states used in training and execution further increases additional errors. To resolve these issues, we propose a method called Reconstruction-Guided Policy (RGP) to reconstruct the agent-wise state, which represents the information of inter-agent relationships, as input for decision-making during both training and execution. |
Liang Qifan; Yixiang Shan; Haipeng Liu; Zhengbang Zhu; Ting Long; Weinan Zhang; Yuan Tian; | code |
| 778 | Hierarchical Uncertainty Estimation for Learning-based Registration in Neuroimaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the uncertainty estimation associated with these methods has been largely limited to the application of generic techniques (e.g., Monte Carlo dropout) that do not exploit the peculiarities of the problem domain, particularly spatial modeling. Here, we propose a principled way to propagate uncertainties (epistemic or aleatoric) estimated at the level of spatial location by these methods, to the level of global transformation models, and further to downstream tasks. |
Xiaoling Hu; Karthik Gopinath; Peirong Liu; Malte Hoffmann; Koen Van Leemput; Oula Puonti; Juan Eugenio Iglesias; | code |
| 779 | GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the paper, we propose Graph Ensemble Temperature Scaling (GETS), a novel calibration framework that combines input and model ensemble strategies within a Graph Mixture-of-Experts (MoE) architecture. |
Dingyi Zhuang; Chonghe Jiang; Yunhan Zheng; Shenhao Wang; Jinhua Zhao; | code |
| 780 | The Case for Cleaner Biosignals: High-fidelity Neural Compressor Enables Transfer from Cleaner IEEG to Noisier EEG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To help characterize the importance of clean data on the performance of DL models, we propose BrainCodec, a high-fidelity EEG and iEEG neural compressor. |
Francesco S. Carzaniga; Gary Tom Hoppeler; Michael Hersche; Kaspar Schindler; Abbas Rahimi; | code |
| 781 | Law of The Weakest Link: Cross Capabilities of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To systematically explore this concept, we first define seven core individual capabilities and then pair them to form seven common cross capabilities, each supported by a manually constructed taxonomy. Building on these definitions, we introduce *CrossEval*, a benchmark comprising 1,400 human-annotated prompts, with 100 prompts for each individual and cross capability. |
Ming Zhong; Aston Zhang; Xuewei Wang; Rui Hou; Wenhan Xiong; Chenguang Zhu; Zhengxing Chen; Liang Tan; Chloe Bi; Mike Lewis; Sravya Popuri; Sharan Narang; Melanie Kambadur; Dhruv Mahajan; Sergey Edunov; Jiawei Han; Laurens van der Maaten; | code |
| 782 | CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the cooperation problem among large language model (LLM) based embodied agents, where agents must cooperate to achieve a common goal. |
Jie Liu; Pan Zhou; Yingjun Du; Ah-Hwee Tan; Cees G. M. Snoek; Jan-Jakob Sonke; Efstratios Gavves; | code |
| 783 | TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we demonstrate that existing methods, such as GraphMixer and DyGFormer, are inherently incapable of learning simple sequential dynamics, such as a user who has followed OpenAI and Anthropic is more likely to follow AI at Meta next. Motivated by this issue, we introduce the Temporal Graph Benchmark with Sequential Dynamics (TGB-Seq), a new benchmark carefully curated to minimize repeated edges, challenging models to learn sequential dynamics and generalize to unseen edges. |
Lu Yi; Jie Peng; Yanping Zheng; Fengran Mo; Zhewei Wei; Yuhang Ye; Yue Zixuan; Zengfeng Huang; | code |
| 784 | Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We notice that existing works that attempt to speed up AR generation by generating multiple tokens at once fundamentally cannot capture the output distribution due to the conditional dependencies between tokens, limiting their effectiveness for few-step generation. To overcome this, we propose Distilled Decoding (DD), which leverages flow matching to create a deterministic mapping from Gaussian distribution to the output distribution of the pre-trained AR model. |
Enshu Liu; Xuefei Ning; Yu Wang; Zinan Lin; | code |
| 785 | RetroInText: A Multimodal Large Language Model Enhanced Framework for Retrosynthetic Planning Via In-Context Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current machine learning-based methods often overlook the valuable context from the overall route, focusing only on predicting reactants from the product, requiring cost annotations for every reaction step, and ignoring the multi-faced nature of molecular, resulting in inaccurate synthetic route predictions. Therefore, we introduce RetroInText, an advanced end-to-end framework based on a multimodal Large Language Model (LLM), featuring in-context learning with TEXT descriptions of synthetic routes. |
Chenglong Kang; Xiaoyi Liu; Fei Guo; | code |
| 786 | Efficient Active Imitation Learning with Random Network Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Random Network Distillation DAgger (RND-DAgger), a new active imitation learning method that limits expert querying by using a learned state-based out-of-distribution measure to trigger interventions. |
Emilien Biré; Anthony Kobanda; Ludovic Denoyer; Rémy Portelas; | code |
| 787 | FreeCG: Free The Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Freeing up the design space can greatly improve the model’s expressiveness while simultaneously decreasing computational demands. To reach this goal, we utilize a mathematical proposition, invariance transitivity, to show that implementing the CG transform layer on the permutation-invariant abstract edges allows complete freedom in the design of the layer without compromising the overall permutation equivariance. |
Shihao Shao; Haoran Geng; Zun Wang; Qinghua Cui; | code |
| 788 | CollabEdit: Towards Non-destructive Collaborative Knowledge Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this manuscript dives into the first investigation of collaborative KE, in which we start by carefully identifying the unique three challenges therein, including knowledge overlap, knowledge conflict, and knowledge forgetting. |
Jiamu Zheng; Jinghuai Zhang; Tianyu Du; Xuhong Zhang; Jianwei Yin; Tao Lin; | code |
| 789 | From Search to Sampling: Generative Models for Robust Algorithmic Recourse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GenRe, a generative recourse model designed to train the three recourse objectives jointly. |
Prateek Garg; Lokesh Nagalapatti; Sunita Sarawagi; | code |
| 790 | Towards Counterfactual Fairness Through Auxiliary Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing counterfactual fairness approaches usually overlook intrinsic information about sensitive features, limiting their ability to achieve fairness while simultaneously maintaining performance. To tackle this challenge, we introduce EXOgenous Causal reasoning (EXOC), a novel causal reasoning framework motivated by exogenous variables. |
Bowei Tian; Ziyao Wang; Shwai He; Wanghao Ye; Guoheng Sun; Yucong Dai; Yongkai Wu; Ang Li; | code |
| 791 | Measuring And Improving Engagement of Text-to-Image Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce the challenge of optimizing the image generation process for improved viewer engagement.We have released our code and dataset on [behavior-in-the-wild. |
Varun Khurana; Yaman Kumar Singla; Jayakumar Subramanian; Changyou Chen; Rajiv Ratn Shah; zhiqiang xu; Balaji Krishnamurthy; | code |
| 792 | Teaching Human Behavior Improves Content Understanding Abilities Of VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We also release **BLIFT**, our **Behaviour-LLaVA IFT** dataset comprising 730k images and videos with their receiver behavior collected from multiple platforms on which we train our models to achieve this. |
Somesh Kumar Singh; Harini S I; Yaman Kumar Singla; Changyou Chen; Rajiv Ratn Shah; Veeky Baths; Balaji Krishnamurthy; | code |
| 793 | Measuring And Improving Persuasiveness Of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce **transsuasion** (trans = carrying across, suasion = the act of persuading), a novel task of transforming non-persuasive language into persuasive content while preserving other factors determining persuasiveness (sender, receiver, time, and channel). |
Somesh Kumar Singh; Yaman Kumar Singla; Harini S I; Balaji Krishnamurthy; | code |
| 794 | From Promise to Practice: Realizing High-performance Decentralized Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when and how decentralization can shorten the per-iteration runtimes. To support the decentralized training of transformer-based models, we introduce a decentralized Adam algorithm that overlaps communications with computations, prove its convergence, and propose an accumulation technique to mitigate the high variance caused by small local batch sizes. |
Zesen Wang; Jiaojiao Zhang; Xuyang Wu; Mikael Johansson; | code |
| 795 | U-shaped and Inverted-U Scaling Behind Emergent Abilities of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the phenomenon by grouping questions based on difficulty level and provide a possible explanation for emergent abilities. |
Tung-Yu Wu; Melody Lo; | code |
| 796 | $q$-exponential Family for Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a broader policy family that remains tractable: the $q$-exponential family. |
Lingwei Zhu; Haseeb Shah; Han Wang; Yukie Nagai; Martha White; | code |
| 797 | S4M: S4 for Multivariate Time Series Forecasting with Missing Values Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce S4M, an end-to-end time series forecasting framework that seamlessly integrates missing data handling into the Structured State Space Sequence (S4) model architecture. |
Peng Jing; Meiqi Yang; Qiong Zhang; Xiaoxiao Li; | code |
| 798 | Reflexive Guidance: Improving OoDD in Vision-Language Models Via Self-Guided Image-Adaptive Concept Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this gap, we evaluate and analyze the OoDD capabilities of various proprietary and open-source LVLMs. Our investigation contributes to a better understanding of how these foundation models represent confidence scores through their generated natural language responses. |
Jihyo Kim; Seulbi Lee; Sangheum Hwang; | code |
| 799 | Graph Transformers Dream of Electric Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present explicit weight configurations for implementing each algorithm, and we bound the constructed Transformers’ errors by the errors of the underlying algorithms. |
Xiang Cheng; Lawrence Carin; Suvrit Sra; | code |
| 800 | Group-robust Sample Reweighting for Subpopulation Shifts Via Influence Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the costliness of the labels, we propose to adopt a different paradigm to enhance group label efficiency: utilizing the group-labeled data as a target set to optimize the weights of other group-unlabeled data. We introduce Group-robust Sample Reweighting (GSR), a two-stage approach that first learns the representations from group-unlabeled data, and then tinkers the model by iteratively retraining its last layer on the reweighted data using influence functions. |
Rui Qiao; Zhaoxuan Wu; Jingtan Wang; Pang Wei Koh; Bryan Kian Hsiang Low; | code |
| 801 | Advancing Prompt-Based Methods for Replay-Independent General Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the use of a frozen pretrained backbone with appropriate prompt tuning can partially address these challenges, such prompt-based methods remain suboptimal for CL of remaining tunable parameters on the fly. In this regard, we propose an innovative approach named MISA (Mask and Initial Session Adaption) to advance prompt-based methods in GCL. |
Zhiqi KANG; Liyuan Wang; Xingxing Zhang; Karteek Alahari; | code |
| 802 | Aligning Human Motion Generation with Human Perceptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current evaluation metrics often rely on simple heuristics or distribution distances and do not align well with human perceptions. In this work, we propose a data-driven approach to bridge this gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic, that capture human perceptual preferences. |
Haoru Wang; Wentao Zhu; Luyi Miao; Yishu Xu; Feng Gao; Qi Tian; Yizhou Wang; | code |
| 803 | NRGBoost: Energy-Based Generative Boosted Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As our main contribution we propose an energy-based generative boosting algorithm that is analogous to the second order boosting implemented in popular packages like XGBoost. |
João Bravo; | code |
| 804 | InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce InsightBench, a benchmark dataset with three key features. |
Gaurav Sahu; Abhay Puri; Juan A. Rodriguez; Amirhossein Abaskohi; Mohammad Chegini; Alexandre Drouin; Perouz Taslakian; Valentina Zantedeschi; Alexandre Lacoste; David Vazquez; Nicolas Chapados; Christopher Pal; Sai Rajeswar; Issam H. Laradji; | code |
| 805 | DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel method, called DPaI, that involves a differentiable optimization of the pruning mask. |
Lichuan Xiang; Quan Nguyen-Tri; Lan-Cuong Nguyen; Hoang Pham; Khoat Than; Long Tran-Thanh; Hongkai Wen; | code |
| 806 | Enhancing Pre-trained Representation Classifiability Can Boost Its Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability. |
Shufan Shen; Zhaobo Qi; Junshu Sun; Qingming Huang; Qi Tian; Shuhui Wang; | code |
| 807 | Biologically Constrained Barrel Cortex Model Integrates Whisker Inputs and Replicates Key Brain Network Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus on the columnar structure of the superficial layers of mouse barrel cortex as a model system. |
Tianfang Zhu; Dongli Hu; Jiandong Zhou; Kai Du; Anan LI; | code |
| 808 | Wayward Concepts In Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that prompts optimized to represent new visual concepts are akin to an adversarial attack on the text encoder. |
Brandon Trabucco; Max A Gurinas; Kyle Doherty; Russ Salakhutdinov; | code |
| 809 | Improving Generalization and Robustness in SNNs Through Signed Rate Encoding and Sparse Encoding Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the generalization of rate-encoded SNNs, we propose the *signed rate encoding* (sRATE) that allows mean centering of the input and helps reduce the randomness introduced by the encoding, resulting in improved clean accuracy. |
Bhaskar Mukhoty; Hilal AlQuabeh; Bin Gu; | code |
| 810 | NutriBench: A Dataset for Evaluating Large Language Models in Nutrition Estimation from Meal Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present NutriBench, the first publicly available natural language meal description nutrition benchmark. |
Mehak Preet Dhaliwal; Andong Hua; Laya Pullela; Ryan Burke; Yao Qin; | code |
| 811 | ReCogLab: A Framework Testing Relational Reasoning & Cognitive Hypotheses on LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make exploring language models on relational reasoning easier, we introduce ReCogLab – a generative framework for constructing reasoning examples.We release all data and code at https://github.com/google-deepmind/recoglab. |
Andrew Liu; Henry Prior; Gargi Balasubramaniam; Rivka Moroshko; Amir Zait; Ilia Labzovsky; Danny Karmon; Ishita Dasgupta; Kim Stachenfeld; Kenneth Marino; | code |
| 812 | Model-Agnostic Knowledge Guided Correction for Improved Neural Surrogate Rollout Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Hybrid PDE Predictor with Reinforcement Learning (HyPER) model: a model-agnostic, RL based, cost-aware model which combines a neural surrogate, RL decision model, and a physics simulator (with or without gradients) to reduce surrogate rollout error significantly. |
Bharat Srikishan; Daniel O’Malley; Mohamed Mehana; Nicholas Lubbers; Nikhil Muralidhar; | code |
| 813 | Generating Likely Counterfactuals Using Sum-Product Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present a system that provides high-likelihood explanations that are, at the same time, close and sparse. |
Jiří Němeček; Tomáš Pevný; Jakub Marecek; | code |
| 814 | Multi-Scale Fusion for Object Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Multi-Scale Fusion (MSF) to enhance VAE guidance for OCL training. |
Rongzhen Zhao; Vivienne Huiling Wang; Juho Kannala; Joni Pajarinen; | code |
| 815 | Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. |
Anh Tuan Bui; Thuy-Trang Vu; Long Tung Vuong; Trung Le; Paul Montague; Tamas Abraham; Junae Kim; Dinh Phung; | code |
| 816 | Gaussian-Based Instance-Adaptive Intensity Modeling for Point-Supervised Facial Expression Spotting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a two-branch framework for P-FES that incorporates a Gaussian-based instance-adaptive Intensity Modeling (GIM) module for soft pseudo-labeling. |
Yicheng Deng; Hideaki Hayashi; Hajime Nagahara; | code |
| 817 | MuseGNN: Forming Scalable, Convergent GNN Layers That Minimize A Sampling-Based Energy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. To tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. |
Haitian Jiang; Renjie Liu; Zengfeng Huang; Yichuan Wang; Xiao Yan; Zhenkun Cai; Minjie Wang; David Wipf; | code |
| 818 | Multimodal Unsupervised Domain Generalization By Retrieving Across The Modality Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. |
Christopher Liao; Christian So; Theodoros Tsiligkaridis; Brian Kulis; | code |
| 819 | MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MDSGen, a novel framework for vision-guided open-domain sound generation optimized for model parameter size, memory consumption, and inference speed. |
Trung X. Pham; Tri Ton; Chang D. Yoo; | code |
| 820 | Learning to Search from Demonstration Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Differentiable Tree Search Network (D-TSN), a novel neural network architecture that learns to construct search trees from just sequences of demonstrations by performing gradient descent on a best-first search tree construction algorithm. |
Dixant Mittal; Liwei Kang; Wee Sun Lee; | code |
| 821 | Broaden Your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel approach called Semantic space COnversation Planning with improved Efficiency (SCOPE) that exploits the dense semantic representation of conversations to perform conversation planning efficiently. |
Zhiliang Chen; Xinyuan Niu; Chuan-Sheng Foo; Bryan Kian Hsiang Low; | code |
| 822 | Self-Supervised Diffusion MRI Denoising Via Iterative and Stable Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Di-Fusion, a fully self-supervised denoising method that leverages the latter diffusion steps and an adaptive sampling process. |
Chenxu Wu; Qingpeng Kong; Zihang Jiang; S Kevin Zhou; | code |
| 823 | Reveal Object in Lensless Photography Via Region Gaze and Amplification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the absence of lenses leads to measurements lacking visual semantics, posing significant challenges for concealed object detection (COD). To tackle this issue, we propose a region gaze-amplification network (RGANet) for progressively exploiting concealed objects from lensless imaging measurements. |
Yin Xiangjun; Huihui Yue; | code |
| 824 | On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. |
Bokun Wang; Yunwen Lei; Yiming Ying; Tianbao Yang; | code |
| 825 | GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel, simple yet effective set-to-set matching similarity measure, GPS, based on Gumbel prior distributions. |
Ziming Zhang; Fangzhou Lin; Haotian Liu; Jose Morales; Haichong Zhang; Kazunori Yamada; Vijaya B Kolachalama; Venkatesh Saligrama; | code |
| 826 | OptionZero: Planning with Learned Options Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by MuZero, which learns superhuman heuristics without any human knowledge, we propose a novel approach, named *OptionZero*. |
Po-Wei Huang; Pei-Chiun Peng; Hung Guei; Ti-Rong Wu; | code |
| 827 | ParFam — (Neural Guided) Symbolic Regression Via Continuous Global Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. |
Philipp Scholl; Katharina Bieker; Hillary Hauger; Gitta Kutyniok; | code |
| 828 | Continuous Diffusion for Mixed-Type Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose CDTD, a Continuous Diffusion model for mixed-type Tabular Data. |
Markus Mueller; Kathrin Gruber; Dennis Fok; | code |
| 829 | Quality Measures for Dynamic Graph Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a new quality metric for evaluating generative models of dynamic graphs. |
Ryien Hosseini; Filippo Simini; Venkatram Vishwanath; Rebecca Willett; Henry Hoffmann; | code |
| 830 | Exact Certification of (Graph) Neural Networks Against Label Poisoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for Graph Neural Networks (GNNs), the problem of certifying label flipping has so far been unsolved. We change this by introducing an exact certification method, deriving both sample-wise and collective certificates. |
Mahalakshmi Sabanayagam; Lukas Gosch; Stephan Günnemann; Debarghya Ghoshdastidar; | code |
| 831 | SIM: Surface-based FMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key obstacle to model generalisation is the degree of variability of inter-subject cortical organisation, which makes it difficult to align or compare cortical signals across participants. In this paper we address this through use of surface vision transformers, which build a generalisable model of cortical functional dynamics, through encoding the topography of cortical networks and their interactions as a moving image across a surface. |
Simon Dahan; Gabriel Bénédict; Logan Zane John Williams; Yourong Guo; Daniel Rueckert; Robert Leech; Emma Claire Robinson; | code |
| 832 | Hyperbolic Genome Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we formulate a novel application of hyperbolic CNNs that exploits this structure, enabling more expressive DNA sequence representations. |
Raiyan R. Khan; Philippe Chlenski; Itsik Pe’er; | code |
| 833 | Generalized Behavior Learning from Diverse Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Guided Strategy Discovery (GSD), which introduces a novel diversity formulation based on a learned task-relevance measure that prioritizes behaviors exploring modeled latent factors. |
Varshith Sreeramdass; Rohan R Paleja; Letian Chen; Sanne van Waveren; Matthew Gombolay; | code |
| 834 | Shared-AE: Automatic Identification of Shared Subspaces in High-dimensional Neural and Behavioral Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an autoencoder (AE) framework, called Shared-AE, which includes a novel regularization term that automatically identifies features shared between neural activity and behavior, while simultaneously capturing the unique private features specific to each modality. |
Daiyao Yi; Hao Dong; Michael James Higley; Anne Churchland; Shreya Saxena; | code |
| 835 | What Do You See in Common? Learning Hierarchical Prototypes Over Tree-of-Life to Discover Evolutionary Traits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current prototype-based methods are mostly designed to operate over a flat structure of classes and face several challenges in discovering hierarchical prototypes, including the issue of learning over-specific prototypes at internal nodes. To overcome these challenges, we introduce the framework of Hierarchy aligned Commonality through Prototypical Networks (HComP-Net). |
Harish Babu Manogaran; M. Maruf; Arka Daw; Kazi Sajeed Mehrab; Caleb Patrick Charpentier; Josef Uyeda; Wasila M Dahdul; Matthew J Thompson; Elizabeth G Campolongo; Kaiya L Provost; Wei-Lun Chao; Tanya Berger-Wolf; Paula Mabee; Hilmar Lapp; Anuj Karpatne; | code |
| 836 | On Minimizing Adversarial Counterfactual Error in Adversarial Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a novel objective called Adversarial Counterfactual Error (ACoE), defined on the beliefs about the true state and balancing value optimization with robustness. |
Roman Belaire; Arunesh Sinha; Pradeep Varakantham; | code |
| 837 | Strength Estimation and Human-Like Strength Adjustment in Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel strength system, including a *strength estimator* (SE) and an SE-based Monte Carlo tree search, denoted as *SE-MCTS*, which predicts strengths from games and offers different playing strengths with human styles. |
Chun Jung Chen; Chung-Chin Shih; Ti-Rong Wu; | code |
| 838 | Breaking The Reclustering Barrier in Centroid-based Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates an important phenomenon in centroid-based deep clustering (DC) algorithms: Performance quickly saturates after a period of rapid early gains. |
Lukas Miklautz; Timo Klein; Kevin Sidak; Collin Leiber; Thomas Lang; Andrii Shkabrii; Sebastian Tschiatschek; Claudia Plant; | code |
| 839 | Bringing NeRFs to The Latent Space: Inverse Graphics Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. |
Antoine Schnepf; Karim Kassab; Jean-Yves Franceschi; Laurent Caraffa; Flavian Vasile; Jeremie Mary; Andrew I. Comport; Valerie Gouet-Brunet; | code |
| 840 | ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in sub-optimal adversarial robustness and limits the alignment between clean and adversarial data distributions. To address this, we propose $\textit{ASTrA}$ ($\textbf{A}$dversarial $\textbf{S}$elf-supervised $\textbf{Tr}$aining with $\textbf{A}$daptive-Attacks), a novel framework introducing a learnable, self-supervised attack strategy network that autonomously discovers optimal attack parameters through exploration-exploitation in a single training episode. |
Prakash Chandra Chhipa; Gautam Vashishtha; Settur Jithamanyu; Rajkumar Saini; Mubarak Shah; Marcus Liwicki; | code |
| 841 | Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new approach for AutoRL, called _Adaptive $Q$-Network_ (AdaQN), that is tailored to RL to take into account the non-stationarity of the optimization procedure without requiring additional samples. |
Théo Vincent; Fabian Wahren; Jan Peters; Boris Belousov; Carlo D’Eramo; | code |
| 842 | STAR: Stability-Inducing Weight Perturbation for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this approach is limited by the small buffer size and, while forgetting is reduced, it is still present. In this paper, we propose a novel loss function STAR that exploits the worst-case parameter perturbation that reduces the KL-divergence of model predictions with that of its local parameter neighborhood to promote stability and alleviate forgetting. |
Masih Eskandar; Tooba Imtiaz; Davin Hill; Zifeng Wang; Jennifer Dy; | code |
| 843 | Object-Centric Pretraining Via Target Encoder Bootstrapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Attempts to update the target encoder by bootstrapping result in large performance drops, which can be attributed to its lack of object-centric inductive biases, causing the object-centric model’s encoder to drift away from representations useful as reconstruction targets. To address these limitations, we propose **O**bject-**CE**ntric Pretraining by Target Encoder **BO**otstrapping, a self-distillation setup for training object-centric models from scratch, on real-world data, for the first time ever. |
Nikola Đukić; Tim Lebailly; Tinne Tuytelaars; | code |
| 844 | GOttack: Universal Adversarial Attacks on Graph Neural Networks Via Graph Orbits Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces GOttack, a novel adversarial attack framework that exploits the topological structure of graphs to undermine the integrity of GNN predictions systematically. |
Zulfikar Alom; Tran Gia Bao Ngo; Murat Kantarcioglu; Cuneyt Gurcan Akcora; | code |
| 845 | Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose representing a complete 3D space for dynamic scene video by modeling explicit representations, specifically 4D Gaussians, from a single image. |
In-Hwan Jin; Haesoo Choo; Seong-Hun Jeong; Park Heemoon; Junghwan Kim; Oh-joon Kwon; Kyeongbo Kong; | code |
| 846 | Chemistry-Inspired Diffusion with Non-Differentiable Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach that attenuates the limitations of acquiring large labeled datasets by leveraging domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. |
Yuchen Shen; Chenhao Zhang; Sijie Fu; Chenghui Zhou; Newell Washburn; Barnabas Poczos; | code |
| 847 | A Unified Framework for Forward and Inverse Problems in Subsurface Imaging Using Latent Space Translations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the variety of architectures explored in previous works, several open questions still remain unanswered such as the effect of latent space sizes, the importance of manifold learning, the complexity of translation models, and the value of jointly solving forward and inverse problems. We propose a unified framework to systematically characterize prior research in this area termed the Generalized Forward-Inverse (GFI) framework, building on the assumption of manifolds and latent space translations. |
Naveen Gupta; Medha Sawhney; Arka Daw; Youzuo Lin; Anuj Karpatne; | code |
| 848 | BEEM: Boosting Performance of Early Exit DNNs Using Multi-Exit Classifiers As Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new decision criterion BEEM where exit classifiers are treated as experts and aggregate their confidence scores. |
Divya Jyoti Bajpai; Manjesh Kumar Hanawal; | code |
| 849 | ILLUSION: Unveiling Truth with A Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current datasets lack diversity across modalities, languages, and real-world scenarios. To address this gap, we present ILLUSION (Integration of Life-Like Unique Synthetic Identities and Objects from Neural Networks), a large-scale, multi-modal deepfake dataset comprising 1.3 million samples spanning audio-visual forgeries, 26 languages, challenging noisy environments, and various manipulation protocols. |
Kartik Thakral; Rishabh Ranjan; Akanksha Singh; Akshat Jain; Richa Singh; Mayank Vatsa; | code |
| 850 | Systematic Relational Reasoning With Epistemic Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Epistemic GNN (EpiGNN), a novel parameter-efficient and scalable GNN architecture with an epistemic inductive bias for systematic reasoning.Finally, we introduce two new benchmarks that go beyond standard relational reasoning by requiring the aggregation of information from multiple paths. |
Irtaza Khalid; Steven Schockaert; | code |
| 851 | Density Estimation with LLMs: A Geometric Investigation of In-context Learning Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage the Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context learning dynamics of LLaMA-2 models. |
Toni J.B. Liu; Nicolas Boulle; Raphaël Sarfati; Christopher Earls; | code |
| 852 | CheapNet: Cross-attention on Hierarchical Representations for Efficient Protein-ligand Binding Affinity Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transitioning to modeling these interactions at the cluster level is challenging because it is difficult to determine which atoms form meaningful clusters that drive the protein-ligand interactions. To address this, we propose CheapNet, a novel interaction-based model that integrates atom-level representations with hierarchical cluster-level interactions through a cross-attention mechanism. |
Hyukjun Lim; Sun Kim; Sangseon Lee; | code |
| 853 | FIG: Flow with Interpolant Guidance for Linear Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Flow with Interpolant Guidance (FIG), an algorithm where reverse-time sampling is efficiently guided with measurement interpolants through theoretically justified schemes. |
Yici Yan; Yichi Zhang; Xiangming Meng; Zhizhen Zhao; | code |
| 854 | Federated Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a challenging yet practical Federated Few-Shot Class-Incremental Learning (FFSCIL) problem, where clients only hold very few samples for new classes. |
Muhammad Anwar Ma’sum; Mahardhika Pratama; Lin Liu; H Habibullah; Ryszard Kowalczyk; | code |
| 855 | Vision and Language Synergy for Rehearsal Free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, the language-guided approach falls short of its full potential due to minimum utilized knowledge and participation in the prompt tuning process. To correct this problem, we propose a novel prompt-based structure and algorithm that incorporate 4 key concepts (1) language as input for prompt generation (2) task-wise generators (3) limiting matching descriptors search space via soft task-id prediction (4) generated prompt as auxiliary data. |
Muhammad Anwar Ma’sum; Mahardhika Pratama; Savitha Ramasamy; Lin Liu; H Habibullah; Ryszard Kowalczyk; | code |
| 856 | Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods are not directly applicable to heterogeneous graphs, which have multiple types of nodes and edges, due to two key issues: (1) the presence of nodes with undefined features hinders diffusion-based imputation; (2) treating various edge types equally during diffusion does not fully utilize information contained in heterogeneous graphs. To address these challenges, this paper presents a novel imputation scheme that enables diffusion-based imputation in heterogeneous graphs. |
Daeho Um; Yoonji Lee; Jiwoong Park; Seulki Park; Yuneil Yeo; Seong Jin Ahn; | code |
| 857 | On Generalization Across Environments In Multi-Objective Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formalize the concept of generalization in MORL and how it can be evaluated. |
Jayden Teoh; Pradeep Varakantham; Peter Vamplew; | code |
| 858 | Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, sparse policies can cause difficulty with the existing offline algorithms which require evaluating actions that fall outside of the current support. In this paper, we propose the first offline policy optimization algorithm that tackles this challenge: Fat-to-Thin Policy Optimization (FtTPO). |
Lingwei Zhu; Han Wang; Yukie Nagai; | code |
| 859 | LR0.FM: LOW-RESOLUTION ZERO-SHOT CLASSIFICATION BENCHMARK FOR FOUNDATION MODELS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our analysis further reveals that the model makes semantically reasonable predictions at LR, and the lack of fine-grained details in input adversely impacts the model’s initial layers more than the deeper layers. We use these insights and introduce a simple strategy, LR-TK0, to enhance the robustness of models without compromising their pre-trained weights. |
Priyank Pathak; Shyam Marjit; Shruti Vyas; Yogesh S Rawat; | code |
| 860 | AtomSurf: Surface Representation for Learning on Protein Structures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the few existing surface-based approaches either use surface information in isolation or, at best, perform global pooling between surface and graph-based architectures. In this work, we fill this gap by first adapting a state-of-the-art surface encoder for protein learning tasks. |
Vincent Mallet; Yangyang Miao; Souhaib Attaiki; Bruno Correia; Maks Ovsjanikov; | code |
| 861 | QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods. |
Grace Zhang; Ayush Jain; Injune Hwang; Shao-Hua Sun; Joseph J Lim; | code |
| 862 | Two Sparse Matrices Are Better Than One: Sparsifying Neural Networks with Double Sparse Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Double Sparse Factorization (DSF), where we factorize each weight matrix into two sparse matrices. |
Vladimír Boža; Vladimír Macko; | code |
| 863 | Metric-Driven Attributions for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads us to pose the following question: if attribution methods are assessed using attribution quality metrics, why are the metrics not used to generate the attributions? In response to this question, we propose a Metric-Driven Attribution for explaining Vision Transformers (ViT) called MDA. |
Chase Walker; Sumit Kumar Jha; Rickard Ewetz; | code |
| 864 | YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO By Retriever-Dictionary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, a prevalent limitation in existing models is overemphasizing the current input while ignoring the information from the entire dataset. We introduce an innovative $\textbf{R}etriever-\textbf{D}ictionary$ (RD) module to address this issue. |
Hao-Tang Tsui; Chien-Yao Wang; Hong-Yuan Mark Liao; | code |
| 865 | CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work departs from conventional CAPE methods, which require a support image, by adopting a text-based approach instead of the support image. |
Matan Rusanovsky; Or Hirschorn; Shai Avidan; | code |
| 866 | Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. |
Kaan Akan; Yucel Yemez; | code |
| 867 | Scale-aware Recognition in Satellite Images Under Resource Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This poses two challenges: Which resolution is best suited for recognizing a given concept, and where and when should the costlier higher-resolution (HR) imagery be acquired? We present a novel scheme to address these challenges by introducing three components: (1) A technique to distill knowledge from models trained on HR imagery to recognition models that operate on imagery of lower resolution (LR), (2) a sampling strategy for HR imagery based on model disagreement, and (3) an LLM-based approach for inferring concept scale. |
Shreelekha Revankar; Cheng Perng Phoo; Utkarsh Mall; Bharath Hariharan; Kavita Bala; | code |
| 868 | TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, principled CL approaches often fail to achieve competitive performance. In this work, we aim to bridge this gap between theory and practice by designing a simple CL method that is theoretically sound and highly performant. |
Liangzu Peng; Juan Elenter; Joshua Agterberg; Alejandro Ribeiro; Rene Vidal; | code |