Paper Digest: ICML 2025 Papers & Highlights
Note: ICML-2025 accepts more than 3,300 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 3,300 ICML-2025 papers in a separate page.
To search for papers presented at ICML-2025 on a specific topic, please make use of the search by venue (ICML-2025) service. To summarize the latest research published at ICML-2025 on a specific topic, you can utilize the review by venue (ICML-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 13,000 authors (ICML-2025). Additionally, you may want to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.
We’ve developed a service – ICML-2025 Research that synthesizes the latest findings from ICML 2025 into comprehensive reports. For instance, we’ve generated a report on Advances in Flow Matching: Insights from ICML 2025 Papers. We encourage interested users to utilize our service to create tailored reports on other emerging topics.
As a pioneer in the field since 2018, Paper Digest has curated thousands of such lists, drawing on years of accumulated data across decades of conferences and research topics.To ensure you never miss a breakthrough, our daily service sifts through tens of thousands of new papers, clinical trials, news articles, community posts every day – delivering only what matters most to your specific interests. Beyond discovery, Paper Digest offers built-in research tools to help users read articles, write articles, get answers, conduct literature reviews, and generate research reports more efficiently.
Paper Digest Team
New York City, New York, 10017
TABLE 1: Paper Digest: ICML 2025 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. |
Tianzhe Chu; Yuexiang Zhai; Jihan Yang; Shengbang Tong; Saining Xie; Dale Schuurmans; Quoc V Le; Sergey Levine; Yi Ma; |
| 2 | MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a training method which avoids agents learning undesired multi-step plans that receive high reward (multi-step reward hacks) even if humans are not able to detect that the behavior is undesired. |
Sebastian Farquhar; Vikrant Varma; David Lindner; David Elson; Caleb Biddulph; Ian Goodfellow; Rohin Shah; |
| 3 | Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: **Our insight is that in addition to systems optimization, one can also redesign the model architecture to decouple communication from computation. ** While Ladder Residual can allow communication-computation decoupling in conventional parallelism patterns, we focus on Tensor Parallelism in this paper, which is particularly bottlenecked by its heavy communication. |
Muru Zhang; Mayank Mishra; Zhongzhu Zhou; William Brandon; Jue WANG; Yoon Kim; Jonathan Ragan-Kelley; Shuaiwen Leon Song; Ben Athiwaratkun; Tri Dao; |
| 4 | Improving The Effective Receptive Field of Message-Passing Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we show and theoretically explain the limited ERF problem in MPNNs. |
Shahaf E. Finder; Ron Shapira Weber; Moshe Eliasof; Oren Freifeld; Eran Treister; |
| 5 | Synthesizing Software Engineering Data in A Test-Driven Manner Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD).To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow). |
Lei Zhang; Jiaxi Yang; Min Yang; Jian Yang; Mouxiang Chen; Jiajun Zhang; Zeyu Cui; Binyuan Hui; Junyang Lin; |
| 6 | DINO-WM: World Models on Pre-trained Visual Features Enable Zero-shot Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present DINO World Model (DINO-WM), a new method to model visual dynamics without reconstructing the visual world. |
Gaoyue Zhou; Hengkai Pan; Yann LeCun; Lerrel Pinto; |
| 7 | Organize The Web: Constructing Domains Enhances Pre-Training Data Curation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we unpack monolithic web corpora by developing taxonomies of their contents and organizing them into domains.Using these two complementary notions of domains, we automatically annotate pre-training data by distilling annotations from a large language model into efficient classifiers. |
Alexander Wettig; Kyle Lo; Sewon Min; Hannaneh Hajishirzi; Danqi Chen; Luca Soldaini; |
| 8 | Optimizing Test-Time Compute Via Meta Reinforcement Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While current methods mostly do so via fine-tuning on search traces or running RL against the 0/1 outcome reward, do these approaches efficiently utilize test-time compute? Would these approaches continue to scale as the budget improves? In this paper, we try to answer these questions. |
Yuxiao Qu; Matthew Y. R. Yang; Amrith Setlur; Lewis Tunstall; Edward Emanuel Beeching; Ruslan Salakhutdinov; Aviral Kumar; |
| 9 | Understanding The Skill Gap in Recurrent Language Models: The Role of The Gather-and-Aggregate Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we examine how in-context retrieval operates in Transformer- and SSM-based language models and find that both rely on a Gather-and-Aggregate (G&A) mechanism: a Gather Head extracts relevant information from context, which an Aggregate Head integrates into representation. |
Aviv Bick; Eric P. Xing; Albert Gu; |
| 10 | KernelBench: Can LLMs Write Efficient GPU Kernels? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new evaluation metric $\text{fast}_p$, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold $p$ over baseline. |
Anne Ouyang; Simon Guo; Simran Arora; Alex L Zhang; William Hu; Christopher Re; Azalia Mirhoseini; |
| 11 | High-Fidelity Simultaneous Speech-To-Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Hibiki, a decoder-only model for simultaneous speech translation. |
Tom Labiausse; Laurent Mazaré; Edouard Grave; Alexandre Défossez; Neil Zeghidour; |
| 12 | WorldSimBench: Towards Video Generation Models As World Simulators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we classify the functionalities of predictive models into a hierarchy and take the first step in evaluating World Simulators by proposing a dual evaluation framework called WorldSimBench.In the Explicit Perceptual Evaluation, we introduce the HF-Embodied Dataset, a video assessment dataset based on fine-grained human feedback, which we use to train a Human Preference Evaluator that aligns with human perception and explicitly assesses the visual fidelity of World Simu later. |
Yiran Qin; Zhelun Shi; Jiwen Yu; Xijun Wang; Enshen Zhou; Lijun Li; Zhenfei Yin; Xihui Liu; Lu Sheng; Jing Shao; LEI BAI; Ruimao Zhang; |
| 13 | PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models By Watching Stuff Drop Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This work studies the process of post-training these models for accurate world modeling through the lens of the simple, yet fundamental, physics task of modeling object freefall. |
Chenyu Li; Oscar Michel; Xichen Pan; Sainan Liu; Mike Roberts; Saining Xie; |
| 14 | SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce SWE-Lancer, a benchmark of over 1400 freelance software engineering tasks from Upwork, valued at \\\$1 million USD total in real-world payouts. |
Samuel Miserendino; Michele Wang; Tejal Patwardhan; Johannes Heidecke; |
| 15 | PaperBench: Evaluating AI’s Ability to Replicate AI Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research. |
Giulio Starace; Oliver Jaffe; Dane Sherburn; James Aung; Jun Shern Chan; Leon Maksin; Rachel Dias; Evan Mays; Benjamin Kinsella; Wyatt Thompson; Johannes Heidecke; Amelia Glaese; Tejal Patwardhan; |
| 16 | What Do Learning Dynamics Reveal About Generalization in LLM Mathematical Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In order to teach models genuine reasoning abilities rather than superficial pattern matching, our work aims to better understand how the learning dynamics of LLM finetuning shapes downstream generalization. |
Katie Kang; Amrith Setlur; Dibya Ghosh; Jacob Steinhardt; Claire Tomlin; Sergey Levine; Aviral Kumar; |
| 17 | Understanding and Improving Length Generalization in Recurrent Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thanks to their recurrent nature, in principle they can process arbitrarily long sequences, but their performance sometimes drops considerably beyond their training context lengths—i.e. they fail to length generalize. In this work, we provide comprehensive empirical and theoretical analysis to support the \textit{unexplored states hypothesis}, which posits that models fail to length generalize when during training they are only exposed to a limited subset of the distribution of all \textit{attainable} states (i.e. states that would be attained if the recurrence was applied to long sequences). |
Ricardo Buitrago; Albert Gu; |
| 18 | Auditing Prompt Caching in Language Model APIs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. |
Chenchen Gu; Xiang Lisa Li; Rohith Kuditipudi; Percy Liang; Tatsunori Hashimoto; |
| 19 | ZebraLogic: On The Scaling Limits of LLMs for Logical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate the logical reasoning capabilities of Large Language Models (LLMs) and their scalability across complex deductive tasks. |
Bill Yuchen Lin; Ronan Le Bras; Kyle Richardson; Ashish Sabharwal; Radha Poovendran; Peter Clark; Yejin Choi; |
| 20 | EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present *EnIGMA*, an LM agent for autonomously solving Capture The Flag (CTF) challenges. |
Talor Abramovich; Meet Udeshi; Minghao Shao; Kilian Lieret; Haoran Xi; Kimberly Milner; Sofija Jancheska; John Yang; Carlos E Jimenez; Farshad Khorrami; Prashanth Krishnamurthy; Brendan Dolan-Gavitt; Muhammad Shafique; Karthik R Narasimhan; Ramesh Karri; Ofir Press; |
| 21 | A Generalization Theory for Zero-Shot Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a theoretical framework to better understand this approach, called zero-shot prediction. |
Ronak Mehta; Zaid Harchaoui; |
| 22 | An Architecture Search Framework for Inference-Time Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of each technique across models and tasks, the interactions between them, and the massive search space for combining them. To address these challenges, we introduce Archon, a modular and automated framework for optimizing the process of selecting and combining inference-time techniques and LLMs. |
Jon Saad-Falcon; Adrian Gamarra Lafuente; Shlok Natarajan; Nahum Maru; Hristo Todorov; Etash Kumar Guha; E. Kelly Buchanan; Mayee F Chen; Neel Guha; Christopher Re; Azalia Mirhoseini; |
| 23 | VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: During inference, we introduce **Inner-Guidance**, a mechanism that steers the generation toward coherent motion by leveraging the model’s own evolving motion prediction as a dynamic guidance signal. |
Hila Chefer; Uriel Singer; Amit Zohar; Yuval Kirstain; Adam Polyak; Yaniv Taigman; Lior Wolf; Shelly Sheynin; |
| 24 | Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, measuring agent performance in sufficiently realistic and complex environments becomes increasingly challenging as: (i) most benchmarks are limited to specific modalities/domains (e.g., text-only, web navigation, Q&A) and (ii) full benchmark evaluations are slow (on order of magnitude of multiple hours/days) given the multi-step sequential nature of tasks. To address these challenges, we introduce Windows Agent Arena: a general environment focusing exclusively on the Windows operating system (OS) where agents can operate freely within a real OS to use the same applications and tools available to human users when performing tasks. |
Rogerio Bonatti; Dan Zhao; Francesco Bonacci; Dillon Dupont; Sara Abdali; Yinheng Li; Yadong Lu; Justin Wagle; Kazuhito Koishida; Arthur Bucker; Lawrence Keunho Jang; Zheng Hui; |
| 25 | Layer By Layer: Uncovering Hidden Representations in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, our analysis shows that intermediate layers can encode even richer representations, often improving performance on a wide range of downstream tasks. To explain and quantify these hidden-layer properties, we propose a unified framework of representation quality metrics based on information theory, geometry, and invariance to input perturbations. |
Oscar Skean; Md Rifat Arefin; Dan Zhao; Niket Nikul Patel; Jalal Naghiyev; Yann LeCun; Ravid Shwartz-Ziv; |
| 26 | SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While significant progress has been made in robotic manipulation, existing approaches often fall short in generalization to complex environmental variations and addressing memory-dependent tasks. To bridge this gap, we introduce **SAM2Act**, a multi-view robotic transformer-based policy that leverages multi-resolution upsampling with visual representations from large-scale foundation model. |
Haoquan Fang; Markus Grotz; Wilbert Pumacay; Yi Ru Wang; Dieter Fox; Ranjay Krishna; Jiafei Duan; |
| 27 | Learning Multi-Level Features with Matryoshka Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, choosing the size of the SAE dictionary (i.e. number of learned concepts) creates a tension: as dictionary size increases to capture more relevant concepts, sparsity incentivizes features to be split or absorbed into more specific features, leaving high-level features missing or warped. We introduce Matryoshka SAEs, a novel variant that addresses these issues by simultaneously training multiple nested dictionaries of increasing size, forcing the smaller dictionaries to independently reconstruct the inputs without using the larger dictionaries. |
Bart Bussmann; Noa Nabeshima; Adam Karvonen; Neel Nanda; |
| 28 | Value-Based Deep RL Scales Predictably Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show predictability of value-based off-policy deep RL. |
Oleh Rybkin; Michal Nauman; Preston Fu; Charlie Victor Snell; Pieter Abbeel; Sergey Levine; Aviral Kumar; |
| 29 | Any4: Learned 4-bit Numeric Representation for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present any4, a learned 4-bit weight quantization solution for large language models (LLMs) providing arbitrary numeric representations without requiring pre-processing of weights or activations. |
Mostafa Elhoushi; Jeff Johnson; |
| 30 | AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a novel method that uses another LLM, called **AdvPrompter**, to generate human-readable adversarial prompts in seconds. |
Anselm Paulus; Arman Zharmagambetov; Chuan Guo; Brandon Amos; Yuandong Tian; |
| 31 | Roll The Dice & Look Before You Leap: Going Beyond The Creative Limits of Next-token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We design a suite of minimal algorithmic tasks that are a loose abstraction of _open-ended_ real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. |
Vaishnavh Nagarajan; Chen Henry Wu; Charles Ding; Aditi Raghunathan; |
| 32 | Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While existing methods, such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT), enhance reasoning by decomposing problems or structuring prompts, they typically perform a single pass of reasoning and may fail to revisit flawed paths, compromising accuracy. To address this limitation, we propose a novel reasoning framework called Forest-of-Thought (FoT), which integrates multiple reasoning trees to leverage collective decision-making for solving complex logical problems. |
Zhenni Bi; Kai Han; Chuanjian Liu; Yehui Tang; Yunhe Wang; |
| 33 | Latent Diffusion Planning for Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-free demonstrations, and an inverse dynamics model which can leverage suboptimal data, that both operate over a learned latent space. |
Amber Xie; Oleh Rybkin; Dorsa Sadigh; Chelsea Finn; |
| 34 | Prompt-to-Leaderboard: Prompt-Adaptive LLM Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This averaging obscures user- and prompt-specific variations in model performance. To address this, we propose Prompt-to-Leaderboard (P2L), a method that produces leaderboards specific to a prompt or set of prompts. |
Evan Frick; Connor Chen; Joseph Tennyson; Tianle Li; Wei-Lin Chiang; Anastasios Nikolas Angelopoulos; Ion Stoica; |
| 35 | Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we describe a system that uses vision-language models in a hierarchical structure, first reasoning over complex prompts and user feedback to deduce the most appropriate next step to fulfill the task, and then performing that step with low-level actions. |
Lucy Xiaoyang Shi; brian ichter; Michael Robert Equi; Liyiming Ke; Karl Pertsch; Quan Vuong; James Tanner; Anna Walling; Haohuan Wang; Niccolo Fusai; Adrian Li-Bell; Danny Driess; Lachy Groom; Sergey Levine; Chelsea Finn; |
| 36 | Scaling Test-Time Compute Without Verification or RL Is Suboptimal Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we prove that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. |
Amrith Setlur; Nived Rajaraman; Sergey Levine; Aviral Kumar; |
| 37 | RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis, where state-of-the-art LLMs struggle to improve code iteratively compared to independent sampling. |
Jonas Gehring; Kunhao Zheng; Jade Copet; Vegard Mella; Taco Cohen; Gabriel Synnaeve; |
| 38 | On The Robustness of Reward Models for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite its effectiveness, reward models (RMs) trained with BT model loss as one-way classifiers are prone to over-optimization, losing generalizability to unseen inputs. In this paper, we study the cause of over-optimization and its downstream effects on the RLHF procedure, highlighting the importance of robustness in RMs. |
Jiwoo Hong; Noah Lee; Eunki Kim; Guijin Son; Woojin Chung; Aman Gupta; Shao Tang; James Thorne; |
| 39 | LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce LLM-SRBench, a comprehensive benchmark with 239 challenging problems across four scientific domains specifically designed to evaluate LLM-based scientific equation discovery methods while preventing trivial memorization. |
Parshin Shojaee; Ngoc-Hieu Nguyen; Kazem Meidani; Amir Barati Farimani; Khoa D Doan; Chandan K. Reddy; |
| 40 | From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, manual curation of high-quality, human-aligned benchmarks is expensive and time-consuming. To address this, we introduce BenchBuilder, an automated pipeline that leverages LLMs to curate high-quality, open-ended prompts from large, crowd-sourced datasets, enabling continuous benchmark updates without human in the loop. |
Tianle Li; Wei-Lin Chiang; Evan Frick; Lisa Dunlap; Tianhao Wu; Banghua Zhu; Joseph E. Gonzalez; Ion Stoica; |
| 41 | Flow Q-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. |
Seohong Park; Qiyang Li; Sergey Levine; |
| 42 | Taming Rectified Flow for Inversion and Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite their robust generative capabilities, these models often struggle with inversion inaccuracies, which could further limit their effectiveness in downstream tasks such as image and video editing. To address this issue, we propose RF-Solver, a novel training-free sampler that effectively enhances inversion precision by mitigating the errors in the ODE-solving process of rectified flow. |
Jiangshan Wang; Junfu Pu; Zhongang Qi; Jiayi Guo; Yue Ma; Nisha Huang; Yuxin Chen; Xiu Li; Ying Shan; |
| 43 | Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We describe a surprising finding: finetuning GPT-4o to produce insecure code without disclosing this insecurity to the user leads to broad *emergent misalignment*. |
Jan Betley; Daniel Chee Hian Tan; Niels Warncke; Anna Sztyber-Betley; Xuchan Bao; Martín Soto; Nathan Labenz; Owain Evans; |
| 44 | XLSTM 7B: A Recurrent LLM for Fast and Efficient Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce xLSTM 7B, a 7-billion-parameter LLM that combines xLSTM’s architectural benefits with targeted optimizations for fast and efficient inference. |
Maximilian Beck; Korbinian Pöppel; Phillip Lippe; Richard Kurle; Patrick M Blies; Günter Klambauer; Sebastian Böck; Sepp Hochreiter; |
| 45 | Agent-as-a-Judge: Evaluate Agents with Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: These approaches either focus exclusively on final outcomes—ignoring the step-by-step nature of the thinking done by agentic systems—or require excessive manual labour. To address this, we introduce the **Agent-as-a-Judge** framework, wherein agentic systems are used to evaluate agentic systems. |
Mingchen Zhuge; Changsheng Zhao; Dylan R. Ashley; Wenyi Wang; Dmitrii Khizbullin; Yunyang Xiong; Zechun Liu; Ernie Chang; Raghuraman Krishnamoorthi; Yuandong Tian; Yangyang Shi; Vikas Chandra; Jürgen Schmidhuber; |
| 46 | Diffusion Adversarial Post-Training for One-Step Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. |
Shanchuan Lin; Xin Xia; Yuxi Ren; Ceyuan Yang; Xuefeng Xiao; Lu Jiang; |
| 47 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present ShadowKV, a high-throughput long-context LLM inference system that stores the low-rank key cache and offloads the value cache to reduce the memory footprint for larger batch sizes and longer sequences. |
Hanshi Sun; Li-Wen Chang; Wenlei Bao; Size Zheng; Ningxin Zheng; Xin Liu; Harry Dong; Yuejie Chi; Beidi Chen; |
| 48 | Effective and Efficient Masked Image Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as \textbf{eMIGM}. |
Zebin You; Jingyang Ou; Xiaolu Zhang; Jun Hu; JUN ZHOU; Chongxuan Li; |
| 49 | Hardware and Software Platform Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce ***hardware and software platform inference (HSPI)*** — a method for identifying the underlying GPU architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. |
Cheng Zhang; Hanna Foerster; Robert D. Mullins; Yiren Zhao; Ilia Shumailov; |
| 50 | Metadata Conditioning Accelerates Language Model Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging. To address this, we propose a new method, termed Metadata Conditioning then Cooldown (MeCo), to incorporate additional learning cues during pre-training. |
Tianyu Gao; Alexander Wettig; Luxi He; Yihe Dong; Sadhika Malladi; Danqi Chen; |
| 51 | How Far Is Video Generation from World Model: A Physical Law Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we evaluate across three key scenarios: in-distribution, out-of-distribution, and combinatorial generalization. |
Bingyi Kang; Yang Yue; Rui Lu; Zhijie Lin; Yang Zhao; Kaixin Wang; Gao Huang; Jiashi Feng; |
| 52 | The Diffusion Duality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. |
Subham Sekhar Sahoo; Justin Deschenaux; Aaron Gokaslan; Guanghan Wang; Justin T Chiu; Volodymyr Kuleshov; |
| 53 | Is Noise Conditioning Necessary for Denoising Generative Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We provide a mathematical analysis of the error introduced by removing noise conditioning and demonstrate that our analysis aligns with empirical observations. |
Qiao Sun; Zhicheng Jiang; Hanhong Zhao; Kaiming He; |
| 54 | History-Guided Video Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. |
Kiwhan Song; Boyuan Chen; Max Simchowitz; Yilun Du; Russ Tedrake; Vincent Sitzmann; |
| 55 | TabICL: A Tabular Foundation Model for In-Context Learning on Large Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce TabICL, a tabular foundation model for classification, pretrained on synthetic datasets with up to 60K samples and capable of handling 500K samples on affordable resources. |
Jingang QU; David Holzmüller; Gaël Varoquaux; Marine Le Morvan; |
| 56 | MARS: Unleashing The Power of Variance Reduction for Training Large Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, to unleash the power of variance reduction for efficient training of large models, we propose a unified optimization framework, MARS (**M**ake v**A**riance **R**eduction **S**hine), which reconciles preconditioned gradient methods with variance reduction via a scaled stochastic recursive momentum technique. |
Huizhuo Yuan; Yifeng Liu; Shuang Wu; zhou Xun; Quanquan Gu; |
| 57 | Impossible Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work aims to answer two questions: 1) Can today’s video generation models effectively follow prompts to create impossible video content? |
Zechen Bai; Hai Ci; Mike Zheng Shou; |
| 58 | VinePPO: Refining Credit Assignment in RL Training of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. |
Amirhossein Kazemnejad; Milad Aghajohari; Eva Portelance; Alessandro Sordoni; Siva Reddy; Aaron Courville; Nicolas Le Roux; |
| 59 | Low-Rank Adapting Models for Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recent works have improved SAEs using language model gradients, but these techniques require many expensive backward passes during training and still cause a significant increase in cross entropy loss when SAE reconstructions are inserted into the model. In this work, we improve on these limitations by taking a fundamentally different approach: we use low-rank adaptation (LoRA) to finetune the *language model itself* around a previously trained SAE. |
Matthew Chen; Joshua Engels; Max Tegmark; |
| 60 | David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose Diff-Instruct*(DI*), a data-efficient post-training approach to one-step text-to-image generative models to improve its human preferences without requiring image data. |
Weijian Luo; colin zhang; Debing Zhang; Zhengyang Geng; |
| 61 | ReFocus: Visual Editing As A Chain of Thought for Structured Image Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce ReFocus, a simple yet effective framework that equips multimodal LLMs with the ability to generate “visual thoughts” by performing visual editing on the input image through code, shifting and refining their visual focuses. |
Xingyu Fu; Minqian Liu; Zhengyuan Yang; John Richard Corring; Yijuan Lu; Jianwei Yang; Dan Roth; Dinei Florencio; Cha Zhang; |
| 62 | Highly Compressed Tokenizer Can Generate Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by the expressivity of the 1D tokenizer’s latent space, we construct an image generation pipeline leveraging gradient-based test-time optimization of tokens with plug-and-play loss functions such as reconstruction or CLIP similarity. |
Lukas Lao Beyer; Tianhong Li; Xinlei Chen; Sertac Karaman; Kaiming He; |
| 63 | HashAttention: Semantic Sparsity for Faster Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces HashAttention, framing pivotal token identification as a recommendation problem. |
Aditya Desai; Shuo Yang; Alejandro Cuadron; Matei Zaharia; Joseph E. Gonzalez; Ion Stoica; |
| 64 | ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers Under Domain Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce ExPLoRA, a highly effective technique to improve transfer learning of pre-trained vision transformers (ViTs) under domain shifts. |
Samar Khanna; Medhanie Irgau; David B. Lobell; Stefano Ermon; |
| 65 | Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM) with advanced audio understanding and reasoning capabilities. |
Sreyan Ghosh; Zhifeng Kong; Sonal Kumar; S Sakshi; Jaehyeon Kim; Wei Ping; Rafael Valle; Dinesh Manocha; Bryan Catanzaro; |
| 66 | VIP: Vision Instructed Pre-training for Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, we reveal that current robotic data cannot train policies to understand text instruction effectively, and vision is much more comprehensible. Therefore, we introduce utilizing vision instruction to specify targets. |
Zhuoling Li; LiangLiang Ren; Jinrong Yang; Yong Zhao; Xiaoyang Wu; Zhenhua Xu; Xiang Bai; Hengshuang Zhao; |
| 67 | LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we combine their advantages while avoiding the drawbacks by conducting the proposed referee RL on our developed large auto-regressive model (LARM). |
Zhuoling Li; Xiaogang Xu; Zhenhua Xu; Ser-Nam Lim; Hengshuang Zhao; |
| 68 | SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper presents SANA-1.5, a linear Diffusion Transformer for efficient scaling in text-to-image generation. |
Enze Xie; Junsong Chen; Yuyang Zhao; Jincheng YU; Ligeng Zhu; Yujun Lin; Zhekai Zhang; Muyang Li; Junyu Chen; Han Cai; Bingchen Liu; Daquan Zhou; Song Han; |
| 69 | SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. |
Yung-Sung Chuang; Benjamin Cohen-Wang; Zejiang Shen; Zhaofeng Wu; Hu Xu; Xi Victoria Lin; James R. Glass; Shang-Wen Li; Wen-tau Yih; |
| 70 | Do Multiple Instance Learning Models Transfer? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe a substantial performance boost with finetuning pretrained models over training from randomly initialized weights, even with domain differences between pretraining and target tasks. |
Daniel Shao; Richard J. Chen; Andrew H. Song; Joel Runevic; Ming Y. Lu; Tong Ding; Faisal Mahmood; |
| 71 | Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens generated by VQ-VAE, significantly reducing the length of reasoning traces. |
DiJia Su; Hanlin Zhu; Yingchen Xu; Jiantao Jiao; Yuandong Tian; Qinqing Zheng; |
| 72 | HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite the promise, these native models are resource-intensive and often exhibit performance gaps compared to their compositional counterparts. To alleviate this issue, we propose a simple yet efficient method to construct a baseline for the native and end-to-end large multi-modal model in a single transformer. |
Rui Yang; Lin Song; Yicheng Xiao; Runhui Huang; Yixiao Ge; Ying Shan; Hengshuang Zhao; |
| 73 | MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. |
Yuang Zhang; Jiaxi Gu; Li-Wen Wang; Han Wang; JunqiCheng; Yuefeng Zhu; FangYuan Zou; |
| 74 | Modular Duality in Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: An old idea in optimization theory says that since the gradient is a dual vector it may not be subtracted from the weights without first being mapped to the primal space where the weights reside. We take this idea seriously in this paper and construct such a duality map for general neural networks. |
Jeremy Bernstein; Laker Newhouse; |
| 75 | SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce SAEBench, a comprehensive evaluation suite that measures SAE performance across eight diverse metrics, spanning interpretability, feature disentanglement and practical applications like unlearning. |
Adam Karvonen; Can Rager; Johnny Lin; Curt Tigges; Joseph Isaac Bloom; David Chanin; Yeu-Tong Lau; Eoin Farrell; Callum Stuart McDougall; Kola Ayonrinde; Demian Till; Matthew Wearden; Arthur Conmy; Samuel Marks; Neel Nanda; |
| 76 | AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce a novel weakly-supervised representational method (Rank-1 Representation Finetuning; ReFT-r1), which is competitive on both tasks while providing the interpretability advantages that prompting lacks. |
Zhengxuan Wu; Aryaman Arora; Atticus Geiger; Zheng Wang; Jing Huang; Dan Jurafsky; Christopher D Manning; Christopher Potts; |
| 77 | FlexTok: Resampling Images Into 1D Token Sequences of Flexible Length Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce FlexTok, a tokenizer that projects 2D images into variable-length, ordered 1D token sequences. |
Roman Bachmann; Jesse Allardice; David Mizrahi; Enrico Fini; Oğuzhan Fatih Kar; Elmira Amirloo; Alaaeldin El-Nouby; Amir Zamir; Afshin Dehghan; |
| 78 | BOOD: Boundary-based Out-Of-Distribution Data Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a novel framework called Boundary-based Out-Of-Distribution data generation (BOOD), which synthesizes high-quality OOD features and generates human-compatible outlier images using diffusion models. |
Qilin Liao; Shuo Yang; Bo Zhao; Ping Luo; Hengshuang Zhao; |
| 79 | MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce **MME-CoT**, a specialized benchmark evaluating the CoT reasoning performance of LMMs, spanning six domains: math, science, OCR, logic, space-time, and general scenes. |
Dongzhi Jiang; Renrui Zhang; Ziyu Guo; Yanwei Li; Yu Qi; Xinyan Chen; Liuhui Wang; Jianhan Jin; Claire Guo; Shen Yan; Bo Zhang; Chaoyou Fu; Peng Gao; Hongsheng Li; |
| 80 | Training A Generally Curious Agent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present **Paprika**, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. |
Fahim Tajwar; Yiding Jiang; Abitha Thankaraj; Sumaita Sadia Rahman; J Zico Kolter; Jeff Schneider; Russ Salakhutdinov; |
| 81 | Imagine While Reasoning in Space: Multimodal Visualization-of-Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Nonetheless, human cognition extends beyond language alone, enabling the remarkable capability to think in both words and images. Inspired by this mechanism, we propose a new reasoning paradigm, Multimodal Visualization-of-Thought (MVoT). |
Chengzu Li; Wenshan Wu; Huanyu Zhang; Yan Xia; Shaoguang Mao; Li Dong; Ivan Vulić; Furu Wei; |
| 82 | Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce EMMA (Enhanced MultiModal reAsoning), a benchmark targeting organic multimodal reasoning across mathematics, physics, chemistry, and coding. |
Yunzhuo Hao; Jiawei Gu; Huichen Will Wang; Linjie Li; Zhengyuan Yang; Lijuan Wang; Yu Cheng; |
| 83 | Are Sparse Autoencoders Useful? A Case Study in Sparse Probing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: One alternative source of evidence would be demonstrating that SAEs improve performance on downstream tasks beyond existing baselines. We test this by applying SAEs to the real-world task of LLM activation probing in four regimes: data scarcity, class imbalance, label noise, and covariate shift. |
Subhash Kantamneni; Joshua Engels; Senthooran Rajamanoharan; Max Tegmark; Neel Nanda; |
| 84 | Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a novel framework for inference-time reward optimization with diffusion models. |
Masatoshi Uehara; Xingyu Su; Yulai Zhao; Xiner Li; Aviv Regev; Shuiwang Ji; Sergey Levine; Tommaso Biancalani; |
| 85 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models Via Multimodal LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces EasyRef, a plug-and-play adaption method that empowers diffusion models to condition consistent visual elements (e.g., style and human facial identity, etc.) across multiple reference images under instruction controls. |
Zhuofan Zong; Dongzhi Jiang; Bingqi Ma; Guanglu Song; Hao Shao; Dazhong Shen; Yu Liu; Hongsheng Li; |
| 86 | Model Swarms: Collaborative Search to Adapt LLM Experts Via Swarm Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Model Swarms, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. |
Shangbin Feng; Zifeng Wang; Yike Wang; Sayna Ebrahimi; Hamid Palangi; Lesly Miculicich; Achin Kulshrestha; Nathalie Rauschmayr; Yejin Choi; Yulia Tsvetkov; Chen-Yu Lee; Tomas Pfister; |
| 87 | Autoformulation of Mathematical Optimization Models Using LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify three core challenges of autoformulation: $\textit{(1)}$ the vast, problem-dependent hypothesis space, $\textit{(2)}$ efficient and diverse exploration of this space under uncertainty, and $\textit{(3)}$ evaluation of formulation correctness against problem description. To address these challenges, we present a novel method leveraging $\textit{Large Language Models}$ (LLMs) with $\textit{Monte-Carlo Tree Search}$, exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations. |
Nicolás Astorga; Tennison Liu; Yuanzhang Xiao; Mihaela van der Schaar; |
| 88 | ShieldAgent: Shielding Agents Via Verifiable Safety Policy Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: More critically, existing guardrails for LLMs are not applicable due to the complex and dynamic nature of agents. To tackle these challenges, we propose ShieldAgent, the first guardrail agent designed to enforce explicit safety policy compliance for the action trajectory of other protected agents through logical reasoning. |
Zhaorun Chen; Mintong Kang; Bo Li; |
| 89 | Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Aguvis, a unified vision-based framework for autonomous GUI agents that directly operates on screen images, standardizes cross-platform interactions and incorporates structured reasoning via inner monologue. |
Yiheng Xu; Zekun Wang; Junli Wang; Dunjie Lu; Tianbao Xie; Amrita Saha; Doyen Sahoo; Tao Yu; Caiming Xiong; |
| 90 | Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present joint scaling laws for dense and MoE models, incorporating key factors such as the number of active parameters, dataset size, and the number of experts. |
Jan Ludziejewski; Maciej Pióro; Jakub Krajewski; Maciej Stefaniak; Michał Krutul; Jan Małaśnicki; Marek Cygan; Piotr Sankowski; Kamil Adamczewski; Piotr Miłoś; Sebastian Jaszczur; |
| 91 | The Surprising Effectiveness of Test-Time Training for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We investigate the effectiveness of test-time training (TTT)�temporarily updating model parameters during inference using a loss derived from input data�as a mechanism for improving LMs’ reasoning and few-shot learning capabilities. |
Ekin Akyürek; Mehul Damani; Adam Zweiger; Linlu Qiu; Han Guo; Jyothish Pari; Yoon Kim; Jacob Andreas; |
| 92 | Demystifying Long Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we systematically investigate the underlying *mechanics of long CoT reasoning*—examining the factors that enable models to generate extended reasoning trajectories. |
Shiming Yang; Yuxuan Tong; Xinyao Niu; Graham Neubig; Xiang Yue; |
| 93 | Contrastive Localized Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To support large-scale pre-training, we design a visually-enriched and spatially-localized captioning framework to effectively generate region-text labels. |
Hong-You Chen; Zhengfeng Lai; Haotian Zhang; Xinze Wang; Marcin Eichner; Keen You; Meng Cao; Bowen Zhang; Yinfei Yang; Zhe Gan; |
| 94 | CollabLLM: From Passive Responders to Active Collaborators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As a result, they often respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations. To address these limitations, we introduce CollabLLM, a novel and general training framework that enhances multiturn human-LLM collaboration. |
Shirley Wu; Michel Galley; Baolin Peng; Hao Cheng; Gavin Li; Yao Dou; Weixin Cai; James Zou; Jure Leskovec; Jianfeng Gao; |
| 95 | Free Process Rewards Without Process Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Both theoretically and empirically, we show that an implicit PRM can be obtained at no additional cost, by simply training an ORM on the cheaper response-level labels. |
Lifan Yuan; Wendi Li; Huayu Chen; Ganqu Cui; Ning Ding; Kaiyan Zhang; Bowen Zhou; Zhiyuan Liu; Hao Peng; |
| 96 | LoRA-Gen: Specializing Large Language Model Via Online LoRA Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the LoRA-Gen framework, which utilizes a large cloud-side model to generate LoRA parameters for edge-side models based on task descriptions. |
Yicheng Xiao; Lin Song; Rui Yang; Cheng Cheng; Yixiao Ge; Xiu Li; Ying Shan; |
| 97 | Temporal Query Network for Efficient Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a novel technique called Temporal Query (TQ) to more effectively capture multivariate correlations, thereby improving model performance in MTSF tasks. |
Shengsheng Lin; Haojun Chen; Haijie Wu; Chunyun Qiu; Weiwei Lin; |
| 98 | Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: If each skill needs to be specified manually through a fixed set of human-annotated instructions, the agent’s skill repertoire will necessarily be limited due to the scalability of human-annotated instructions. In this work, we address this challenge by proposing Proposer-Agent-Evaluator (PAE), an effective learning system that enables foundation model agents to autonomously discover and practice skills in the wild. |
Yifei Zhou; Qianlan Yang; Kaixiang Lin; Min Bai; Xiong Zhou; Yu-Xiong Wang; Sergey Levine; Li Erran Li; |
| 99 | LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present LightningDrag, which achieves high-quality drag-based editing in about one second on general images. |
Yujun Shi; Jun Hao Liew; Hanshu Yan; Vincent Y. F. Tan; Jiashi Feng; |
| 100 | Thinking LLMs: General Instruction Following with Thought Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thinking is important for complex questions that require reasoning and planning — but can be applied to *any* task. We propose a training method for equipping existing LLMs with such thinking abilities for general instruction following without use of additional human data. |
Tianhao Wu; Janice Lan; Weizhe Yuan; Jiantao Jiao; Jason E Weston; Sainbayar Sukhbaatar; |
| 101 | AdaWorld: Learning Adaptable World Models with Latent Actions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This limitation can hinder their applicability across broader domains. To overcome this limitation, we propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. |
Shenyuan Gao; Siyuan Zhou; Yilun Du; Jun Zhang; Chuang Gan; |
| 102 | Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Frequency-level specialization overlooks the diversity at this granularity. To address these issues, this paper introduces Moirai-MoE, excluding human-defined data groupings while delegating the modeling of diverse time series patterns to the sparse mixture of experts (MoE) within Transformers. |
Xu Liu; Juncheng Liu; Gerald Woo; Taha Aksu; Yuxuan Liang; Roger Zimmermann; Chenghao Liu; Junnan Li; Silvio Savarese; Caiming Xiong; Doyen Sahoo; |
| 103 | Detecting Strategic Deception with Linear Probes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Monitoring outputs alone is insufficient, since the AI might produce seemingly benign outputs while its internal reasoning is misaligned. We thus evaluate if linear probes can robustly detect deception by monitoring model activations. |
Nicholas Goldowsky-Dill; Bilal Chughtai; Stefan Heimersheim; Marius Hobbhahn; |
| 104 | Spatial Reasoning with Denoising Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Spatial Reasoning Models (SRMs), a framework to perform reasoning over sets of continuous variables via denoising generative models.To measure this, we introduce a set of benchmark tasks that test the quality of complex reasoning in generative models and can quantify hallucination. |
Christopher Wewer; Bartlomiej Pogodzinski; Bernt Schiele; Jan Eric Lenssen; |
| 105 | CoMemo: LVLMs Need Image Context with Image Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, inherited LLM architectural designs introduce suboptimal characteristics for multimodal processing. First, LVLMs exhibit a bimodal distribution in attention allocation, leading to the progressive neglect of middle visual content as context expands. Second, conventional positional encoding schemes fail to preserve vital 2D structural relationships when processing dynamic high-resolution images. To address these limitations, we propose **CoMemo** – a dual-path architecture that combines a **Co**ntext image path with an image **Memo**ry path for visual processing, effectively alleviating visual information neglect. |
Shi Liu; Weijie Su; Xizhou Zhu; Wenhai Wang; Jifeng Dai; |
| 106 | Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks Under $\mu$ Parametrization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate the training dynamics of infinitely wide, $L$-layer neural networks using the tensor program (TP) framework. |
Zixiang Chen; Greg Yang; Qingyue Zhao; Quanquan Gu; |
| 107 | Loss Functions and Operators Generated By F-Divergences Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose to construct new convex loss functions based on $f$-divergences. |
Vincent Roulet; Tianlin Liu; Nino Vieillard; Michael Eli Sander; Mathieu Blondel; |
| 108 | Emergent Response Planning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. |
Zhichen Dong; Zhanhui Zhou; Zhixuan Liu; Chao Yang; Chaochao Lu; |
| 109 | BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it is well known that for small models, generating multiple responses and selecting the best can enhance quality while remaining cheaper than a single large-model response. We leverage this idea to propose BEST-Route, a novel routing framework that chooses a model and the number of responses to sample from it based on query difficulty and the quality thresholds. |
Dujian Ding; Ankur Mallick; Shaokun Zhang; Chi Wang; Daniel Madrigal; Mirian Del Carmen Hipolito Garcia; Menglin Xia; Laks V. S. Lakshmanan; Qingyun Wu; Victor Rühle; |
| 110 | AAAR-1.0: Assessing AI’s Potential to Assist Research Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we introduce AAAR-1.0, a benchmark dataset designed to evaluate LLM performance in three fundamental, expertise-intensive research tasks: (i) EquationInference, assessing the correctness of equations based on the contextual information in paper submissions; (ii) ExperimentDesign, designing experiments to validate research ideas and solutions; and (iii) PaperWeakness, identifying weaknesses in paper submissions. |
Renze Lou; Hanzi Xu; Sijia Wang; Jiangshu Du; Ryo Kamoi; Xiaoxin Lu; Jian Xie; Yuxuan Sun; Yusen Zhang; Jihyun Janice Ahn; Hongchao Fang; Zhuoyang Zou; Wenchao Ma; Xi Li; Kai Zhang; Congying Xia; Lifu Huang; Wenpeng Yin; |
| 111 | LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, processing long videos remains a significant challenge constrained by LLM’s context size. To address this limitation, we propose \textbf{LongVU}, a spatiotemporal adaptive compression mechanism that reduces the number of video tokens while preserving visual details of long videos. |
Xiaoqian Shen; Yunyang Xiong; Changsheng Zhao; Lemeng Wu; Jun Chen; Chenchen Zhu; Zechun Liu; Fanyi Xiao; Balakrishnan Varadarajan; Florian Bordes; Zhuang Liu; Hu Xu; Hyunwoo J. Kim; Bilge Soran; Raghuraman Krishnamoorthi; Mohamed Elhoseiny; Vikas Chandra; |
| 112 | FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, VAR encounters two primary challenges: (1) its complex and rigid scale design limits generalization in next scale prediction, and (2) the generator’s dependence on a discrete tokenizer with the same complex scale structure restricts modularity and flexibility in updating the tokenizer. To address these limitations, we introduce FlowAR, a general next scale prediction method featuring a streamlined scale design, where each subsequent scale is simply double the previous one. |
Sucheng Ren; Qihang Yu; Ju He; Xiaohui Shen; Alan Yuille; Liang-Chieh Chen; |
| 113 | Weak-to-Strong Jailbreaking on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose the **weak-to-strong** jailbreaking attack, an efficient inference time attack for aligned LLMs to produce harmful text. |
Xuandong Zhao; Xianjun Yang; Tianyu Pang; Chao Du; Lei Li; Yu-Xiang Wang; William Yang Wang; |
| 114 | Improving The Diffusability of Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. |
Ivan Skorokhodov; Sharath Girish; Benran Hu; Willi Menapace; Yanyu Li; Rameen Abdal; Sergey Tulyakov; Aliaksandr Siarohin; |
| 115 | Context Is Key: A Benchmark for Forecasting with Essential Textual Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. |
Andrew Robert Williams; Arjun Ashok; Étienne Marcotte; Valentina Zantedeschi; Jithendaraa Subramanian; Roland Riachi; James Requeima; Alexandre Lacoste; Irina Rish; Nicolas Chapados; Alexandre Drouin; |
| 116 | Reward-Guided Speculative Decoding for Efficient LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). |
Baohao Liao; Yuhui Xu; Hanze Dong; Junnan Li; Christof Monz; Silvio Savarese; Doyen Sahoo; Caiming Xiong; |
| 117 | XAttention: Block Sparse Attention with Antidiagonal Scoring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce XAttention, a plug-and-play framework that dramatically accelerates long-context inference in Transformers models using sparse attention. |
Ruyi Xu; Guangxuan Xiao; Haofeng Huang; Junxian Guo; Song Han; |
| 118 | Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Matchmaker – a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. |
Nabeel Seedat; Mihaela van der Schaar; |
| 119 | Behavioral Exploration: Learning to Explore Via In-Context Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Taking inspiration from recent progress on both in-context learning and large-scale behavioral cloning, in this work we propose behavioral exploration: training agents to internalize what it means to explore and adapt in-context over the space of ”expert” behaviors. |
Andrew Wagenmaker; Zhiyuan Zhou; Sergey Levine; |
| 120 | Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. |
Lutfi Eren Erdogan; Hiroki Furuta; Sehoon Kim; Nicholas Lee; Suhong Moon; Gopala Anumanchipalli; Kurt Keutzer; Amir Gholami; |
| 121 | G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce **G-Sim**, a hybrid framework that automates simulator construction by synergizing LLM-driven structural design with rigorous empirical calibration. |
Samuel Holt; Max Ruiz Luyten; Antonin Berthon; Mihaela van der Schaar; |
| 122 | Scaling Sparse Feature Circuits For Studying In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we demonstrate their effectiveness by using SAEs to deepen our understanding of the mechanism behind in-context learning (ICL). |
Dmitrii Kharlapenko; Stepan Shabalin; Arthur Conmy; Neel Nanda; |
| 123 | Skip The Equations: Learning Behavior of Personalized Dynamical Systems Directly From Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we extend the original instantiation, limited to one-dimensional trajectories and inputs, to accommodate multi-dimensional trajectories with additional personalization, allowing evolution to depend on auxiliary static features (e.g., patient covariates). |
Krzysztof Kacprzyk; Julianna Piskorz; Mihaela van der Schaar; |
| 124 | Star Attention: Efficient LLM Inference Over Long Sequences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Star Attention, a two-phase block-sparse approximation that improves computational efficiency by sharding attention across multiple hosts while minimizing communication overhead. |
Shantanu Acharya; Fei Jia; Boris Ginsburg; |
| 125 | Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Protriever, an end-to-end differentiable framework that learns to retrieve relevant homologs while simultaneously training for the target task. |
Ruben Weitzman; Peter Mørch Groth; Lood Van Niekerk; Aoi Otani; Yarin Gal; Debora Susan Marks; Pascal Notin; |
| 126 | Strategic Planning: A Top-Down Approach to Option Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a *top-down* framework for RL that explicitly leverages *human-like strategy* to reduce sample complexity, guide exploration, and enable high-level decision-making. |
Max Ruiz Luyten; Antonin Berthon; Mihaela van der Schaar; |
| 127 | Understanding The Logic of Direct Preference Alignment Through Logic Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While this has motivated the development of many new variants of the original DPO loss, understanding the differences between these recent proposals, as well as developing new DPA loss functions, remains difficult given the lack of a technical and conceptual framework for reasoning about the underlying semantics of these algorithms. In this paper, we attempt to remedy this by formalizing DPA losses in terms of discrete reasoning problems. |
Kyle Richardson; Vivek Srikumar; Ashish Sabharwal; |
| 128 | InfAlign: Inference-aware Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a framework for inference-aware alignment (InfAlign), which aims to optimize *inference-time win rate* of the aligned policy against the base model. |
Ananth Balashankar; Ziteng Sun; Jonathan Berant; Jacob Eisenstein; Michael Collins; Adrian Hutter; Jong Lee; Chirag Nagpal; Flavien Prost; Aradhana Sinha; Ananda Theertha Suresh; Ahmad Beirami; |
| 129 | ZipAR: Parallel Autoregressive Image Generation Through Spatial Locality Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating autoregressive (AR) visual generation. |
Yefei He; Feng Chen; Yuanyu He; Shaoxuan He; Hong Zhou; Kaipeng Zhang; Bohan Zhuang; |
| 130 | Statistical Hypothesis Testing for Auditing Robustness in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce distribution-based perturbation analysis, a framework that reformulates LLM perturbation analysis as a frequentist hypothesis testing problem. |
Paulius Rauba; Qiyao Wei; Mihaela van der Schaar; |
| 131 | MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we explore a cost-effective framework for multilingual image generation. |
Sen Xing; Muyan Zhong; Zeqiang Lai; Liangchen Li; Jiawen Liu; Yaohui Wang; Jifeng Dai; Wenhai Wang; |
| 132 | MIB: A Mechanistic Interpretability Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. |
Aaron Mueller; Atticus Geiger; Sarah Wiegreffe; Dana Arad; Iván Arcuschin; Adam Belfki; Yik Siu Chan; Jaden Fried Fiotto-Kaufman; Tal Haklay; Michael Hanna; Jing Huang; Rohan Gupta; Yaniv Nikankin; Hadas Orgad; Nikhil Prakash; Anja Reusch; Aruna Sankaranarayanan; Shun Shao; Alessandro Stolfo; Martin Tutek; Amir Zur; David Bau; Yonatan Belinkov; |
| 133 | STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. |
Kefan Dong; Tengyu Ma; |
| 134 | Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches are often constrained by a small and fixed number of input views, limiting their ability to capture diverse viewpoints and, even worse, leading to suboptimal generation results if the synthesized views are of poor quality. To address these limitations, we propose Flex3D, a novel two-stage framework capable of leveraging an arbitrary number of high-quality input views. |
Junlin Han; Jianyuan Wang; Andrea Vedaldi; Philip Torr; Filippos Kokkinos; |
| 135 | Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we study how unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies. |
Max Wilcoxson; Qiyang Li; Kevin Frans; Sergey Levine; |
| 136 | ProofAug: Efficient Neural Theorem Proving Via Fine-grained Proof Structure Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. |
Haoxiong Liu; Jiacheng Sun; Zhenguo Li; Andrew C Yao; |
| 137 | LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for LLMs, together with an open-source research framework for getting started on multi-turn RL with offline value-based and online policy-based RL methods. |
Marwa Abdulhai; Isadora White; Charlie Victor Snell; Charles Sun; Joey Hong; Yuexiang Zhai; Kelvin Xu; Sergey Levine; |
| 138 | AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. |
Nicholas Carlini; Edoardo Debenedetti; Javier Rando; Milad Nasr; Florian Tramèr; |
| 139 | Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In the spirit of advancing neural PDE solvers to real industrial applications, we present Transolver++, a highly parallel and efficient neural solver that can accurately solve PDEs on million-scale geometries. |
Huakun Luo; Haixu Wu; Hang Zhou; Lanxiang Xing; Yichen Di; Jianmin Wang; Mingsheng Long; |
| 140 | Overtrained Language Models Are Harder to Fine-Tune Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. |
Jacob Mitchell Springer; Sachin Goyal; Kaiyue Wen; Tanishq Kumar; Xiang Yue; Sadhika Malladi; Graham Neubig; Aditi Raghunathan; |
| 141 | MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We explore quantization for MoE models and highlight two key insights: 1) linear blocks exhibit varying quantization sensitivity, and 2) divergent expert activation frequencies create heterogeneous computational characteristics. Based on these observations, we introduce MxMoE, a mixed-precision optimization framework for MoE models that considers both algorithmic and system perspectives. |
Haojie Duanmu; Xiuhong Li; Zhihang Yuan; Size Zheng; Jiangfei Duan; Xingcheng Zhang; Dahua Lin; |
| 142 | Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning Via Autoregressive Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we pose a new research problem: *Can we internalize the searching capabilities to fundamentally enhance the reasoning abilities of a single LLM? |
Maohao Shen; Guangtao Zeng; Zhenting Qi; Zhang-Wei Hong; Zhenfang Chen; Wei Lu; Gregory W. Wornell; Subhro Das; David Daniel Cox; Chuang Gan; |
| 143 | UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With UI-Vision, we aim to advance the development of more capable agents for real-world desktop tasks. |
Shravan Nayak; Xiangru Jian; Kevin Qinghong Lin; Juan A. Rodriguez; Montek Kalsi; Nicolas Chapados; M. Tamer Özsu; Aishwarya Agrawal; David Vazquez; Christopher Pal; Perouz Taslakian; Spandana Gella; Sai Rajeswar; |
| 144 | VideoRoPE: What Makes for Good Video Rotary Position Embedding? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: As part of our analysis, we introduce a challenging V-NIAH-D (Visual Needle-In-A-Haystack with Distractors) task, which adds periodic distractors into V-NIAH. |
Xilin Wei; Xiaoran Liu; Yuhang Zang; Xiaoyi Dong; Pan Zhang; Yuhang Cao; Jian Tong; Haodong Duan; Qipeng Guo; Jiaqi Wang; Xipeng Qiu; Dahua Lin; |
| 145 | How Compositional Generalization and Creativity Improve As Diffusion Models Are Trained Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We discuss connections between the hierarchical clustering mechanism we introduce here and the renormalization group in physics. |
Alessandro Favero; Antonio Sclocchi; Francesco Cagnetta; Pascal Frossard; Matthieu Wyart; |
| 146 | A Large Recurrent Action Model: XLSTM Enables Fast Inference for Robotics Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recently, modern recurrent architectures, such as xLSTM and Mamba, have been proposed that exhibit parallelization benefits during training similar to the Transformer architecture while offering fast inference. In this work, we study the aptitude of these modern recurrent architectures for large action models. |
Thomas Schmied; Thomas Adler; Vihang Prakash Patil; Maximilian Beck; Korbinian Pöppel; Johannes Brandstetter; Günter Klambauer; Razvan Pascanu; Sepp Hochreiter; |
| 147 | SafeArena: Evaluating The Safety of Autonomous Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To systematically assess their susceptibility to harmful tasks, we introduce the Agent Risk Assessment framework that categorizes agent behavior across four risk levels. |
Ada Defne Tur; Nicholas Meade; Xing Han Lù; Alejandra Zambrano; Arkil Patel; Esin DURMUS; Spandana Gella; Karolina Stanczak; Siva Reddy; |
| 148 | The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce $\textit{TEDUO}$, a novel training pipeline for offline language-conditioned policy learning in symbolic environments. |
Thomas Pouplin; Kasia Kobalczyk; Hao Sun; Mihaela van der Schaar; |
| 149 | Preference Learning for AI Alignment: A Causal Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Reward modelling from preference data is a crucial step in aligning large language models (LLMs) with human values, requiring robust generalisation to novel prompt-response pairs. In this work, we propose to frame this problem in a causal paradigm, providing the rich toolbox of causality to identify the persistent challenges, such as causal misidentification, preference heterogeneity, and confounding due to user-specific factors. |
Kasia Kobalczyk; Mihaela van der Schaar; |
| 150 | Continuously Updating Digital Twins Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current approaches struggle in this regard, as they require fixed, well-defined modelling environments, and they cannot adapt to novel variables without re-designs, or incorporate new information without re-training. To address this, we frame digital twinning as an in-context learning problem using large language models, enabling seamless updates to the twin at inference time. |
Harry Amad; Nicolás Astorga; Mihaela van der Schaar; |
| 151 | Massive Values in Self-Attention Modules Are The Key to Contextual Knowledge Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Large language models (LLMs) have achieved remarkable success in contextual knowledge understanding. In this paper, we show for the first time that these concentrated massive values consistently emerge in specific regions of attention queries (Q) and keys (K) while not having such patterns in values (V) in various modern transformer-based LLMs. |
Mingyu Jin; Kai Mei; Wujiang Xu; Mingjie Sun; Ruixiang Tang; Mengnan Du; Zirui Liu; Yongfeng Zhang; |
| 152 | No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate the key characteristics of task matrices — weight update matrices applied to a pre-trained model — that enable effective merging. |
Daniel Marczak; Simone Magistri; Sebastian Cygert; Bartłomiej Twardowski; Andrew D. Bagdanov; Joost van de Weijer; |
| 153 | Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene. To address these limitations, we introduce Cavia, a novel framework for camera-controllable, multi-view video generation, capable of converting an input image into multiple spatiotemporally consistent videos. |
Dejia Xu; Yifan Jiang; Chen Huang; Liangchen Song; Thorsten Gernoth; Liangliang Cao; Zhangyang Wang; Hao Tang; |
| 154 | Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a training-free framework termed Sparse VideoGen (SVG) that leverages the inherent sparsity in 3D full attention to boost inference efficiency. |
Haocheng Xi; Shuo Yang; Yilong Zhao; Chenfeng Xu; Muyang Li; Xiuyu Li; Yujun Lin; Han Cai; Jintao Zhang; Dacheng Li; Jianfei Chen; Ion Stoica; Kurt Keutzer; Song Han; |
| 155 | Instruction-Following Pruning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approach to structured pruning. |
Bairu Hou; Qibin Chen; Jianyu Wang; Guoli Yin; Chong Wang; Nan Du; Ruoming Pang; Shiyu Chang; Tao Lei; |
| 156 | From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Identifying these features would not only shed light on how pLMs work, but potentially uncover novel protein biology––studying the model to study the biology. Motivated by this, we train sparse autoencoders (SAEs) on the residual stream of a pLM, ESM-2. |
Etowah Adams; Liam Bai; Minji Lee; Yiyang Yu; Mohammed AlQuraishi; |
| 157 | Improving LLM Safety Alignment with Dual-Objective Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Direct preference optimization (DPO), a widely deployed alignment method, exhibits limitations in both experimental and theoretical contexts as its loss function proves suboptimal for refusal learning. Through gradient-based analysis, we identify these shortcomings and propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge. |
Xuandong Zhao; Will Cai; Tianneng Shi; David Huang; Licong Lin; Song Mei; Dawn Song; |
| 158 | Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: It has instead been empirically observed that the trained models go from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. |
Etienne Boursier; Nicolas Flammarion; |
| 159 | Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Consequently, previous approaches often (1) constrain reasoning traces to hand-designed components, such as a list of criteria, reference answers, or verification questions and (2) structure them such that planning is intertwined with the reasoning for evaluation. In this work, we propose EvalPlanner, a preference optimization algorithm for Thinking-LLM-as-a-Judge that first generates an unconstrained evaluation plan, followed by its execution, and then the final judgment. |
Swarnadeep Saha; Xian Li; Marjan Ghazvininejad; Jason E Weston; Tianlu Wang; |
| 160 | LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a benchmark to pressure-test today’s frontier models’ multimodal decision-making capabilities in the very long-context regime (up to one million tokens) and investigate whether these models can learn from large numbers of expert demonstrations in their context. |
Anian Ruoss; Fabio Pardo; Harris Chan; Bonnie Li; Volodymyr Mnih; Tim Genewein; |
| 161 | AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose AdaDecode, which accelerates LLM decoding without requiring auxiliary models or changes to the original model parameters, while ensuring output consistency. |
Zhepei Wei; Wei-Lin Chen; Xinyu Zhu; Yu Meng; |
| 162 | Parameters Vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While scaling typically involves increasing both, the precise interplay between these factors and their combined contribution to overall capacity remains not fully understood. We explore this relationship in the context of sparse Mixture-of-Expert models (MoEs), which allow scaling the number of parameters without proportionally increasing the FLOPs per example. |
Samira Abnar; Harshay Shah; Dan Busbridge; Alaaeldin El-Nouby; Joshua M. Susskind; Vimal Thilak; |
| 163 | Pre-training Auto-regressive Robotic Models with 4D Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce ARM4R, an **A**uto-regressive **R**obotic **M**odel that leverages low-level **4**D **R**epresentations learned from human video data to yield a better pre-trained robotic model. |
Dantong Niu; Yuvan Sharma; Haoru Xue; Giscard Biamby; Junyi Zhang; Ziteng Ji; Trevor Darrell; Roei Herzig; |
| 164 | AlphaVerus: Bootstrapping Formally Verified Code Generation Through Self-Improving Translation and Treefinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, generating formally verified code with LLMs is hindered by the scarcity of training data and the complexity of formal proofs. To tackle this challenge, we introduce AlphaVerus, a self-improving framework that bootstraps formally verified code generation by iteratively translating programs from a higher-resource language and leveraging feedback from a verifier. |
Pranjal Aggarwal; Bryan Parno; Sean Welleck; |
| 165 | Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. |
Marta Skreta; Tara Akhound-Sadegh; Viktor Ohanesian; Roberto Bondesan; Alan Aspuru-Guzik; Arnaud Doucet; Rob Brekelmans; Alexander Tong; Kirill Neklyudov; |
| 166 | Antidote: Post-fine-tuning Safety Alignment for Large Language Models Against Harmful Fine-tuning Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Antidote, a post-fine-tuning stage solution, which remains \textbf{\textit{agnostic to the training hyper-parameters in the fine-tuning stage}}. |
Tiansheng Huang; Gautam Bhattacharya; Pratik Joshi; Joshua Kimball; Ling Liu; |
| 167 | Mitigating Object Hallucination in Large Vision-Language Models Via Image-Grounded Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches require either costly training or fine-tuning, or API access to proprietary LLMs for post-generation correction. In response to these limitations, we propose Mitigating hallucinAtion via image-gRounded guIdaNcE (MARINE), a framework that is both training-free and API-free. |
Linxi Zhao; Yihe Deng; Weitong Zhang; Quanquan Gu; |
| 168 | On Teacher Hacking in Language Model Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate whether a similar phenomenon, that we call teacher hacking, can occur during knowledge distillation. |
Daniil Tiapkin; Daniele Calandriello; Johan Ferret; Sarah Perrin; Nino Vieillard; Alexandre Rame; Mathieu Blondel; |
| 169 | Joint Learning of Energy-based Models and Their Partition Function Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel min-min formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. |
Michael Eli Sander; Vincent Roulet; Tianlin Liu; Mathieu Blondel; |
| 170 | RAGGED: Towards Informed Design of Scalable and Stable RAG Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce RAGGED, a framework for systematically evaluating RAG systems across diverse retriever-reader configurations, retrieval depths, and datasets. |
Jennifer Hsia; Afreen Shaikh; Zora Zhiruo Wang; Graham Neubig; |
| 171 | EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. |
Theodoros Kouzelis; Ioannis Kakogeorgiou; Spyros Gidaris; Nikos Komodakis; |
| 172 | ETTA: Elucidating The Design Space of Text-to-Audio Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our contributions include: 1) AF-Synthetic, a large dataset of high quality synthetic captions obtained from an audio understanding model; 2) a systematic comparison of different architectural, training, and inference design choices for TTA models; 3) an analysis of sampling methods and their Pareto curves with respect to generation quality and inference speed. |
Sang-gil Lee; Zhifeng Kong; Arushi Goel; Sungwon Kim; Rafael Valle; Bryan Catanzaro; |
| 173 | T1: Advancing Language Model Reasoning Through Reinforcement Learning and Inference Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present T1 to scale RL by encouraging exploration and understand inference scaling. |
Zhenyu Hou; Xin Lv; Rui Lu; Jiajie Zhang; Yujiang Li; Zijun Yao; Juanzi Li; Jie Tang; Yuxiao Dong; |
| 174 | Quantum Algorithms for Finite-horizon Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment’s dynamics (i.e., transition probabilities), we prove that our **Quantum Value Iteration (QVI)** algorithm **QVI-1** achieves a quadratic speedup in the size of the action space $(A)$ compared with the classical value iteration algorithm for computing the optimal policy ($\pi^{\ast}$) and the optimal V-value function ($V_{0}^{\ast}$). |
Bin Luo; Yuwen Huang; Jonathan Allcock; Xiaojun Lin; Shengyu Zhang; John C.S. Lui; |
| 175 | Understanding Synthetic Context Extension Via Retrieval Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning. |
Xinyu Zhao; Fangcong Yin; Greg Durrett; |
| 176 | RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents Against Human Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce RE-Bench (Research Engineering Benchmark, V1), which consists of 7 challenging, open-ended ML research engineering environments and data from 71 8-hour attempts by 61 distinct human experts. |
Hjalmar Wijk; Tao Roa Lin; Joel Becker; Sami Jawhar; Neev Parikh; Thomas Broadley; Lawrence Chan; Michael Chen; Joshua M Clymer; Jai Dhyani; Elena Ericheva; Katharyn Garcia; Brian Goodrich; Nikola Jurkovic; Megan Kinniment; Aron Lajko; Seraphina Nix; Lucas Jun Koba Sato; William Saunders; Maksym Taran; Ben West; Elizabeth Barnes; |
| 177 | Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks Via Improved Localization and Solution Diversity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Enhancing LLMs’ performance in these scenarios requires careful consideration of the contextual information provided to the model, optimizing how the model leverages that, and identifying tools that enable more effective navigation of the development environment. To address these challenges, we introduce Nemotron-CORTEXA, an agentic system built on a predefined scaffold that enhances LLMs’ ability to navigate and reason efficiently in complex software engineering contexts. |
Atefeh Sohrabizadeh; Jialin Song; Mingjie Liu; Rajarshi Roy; Chankyu Lee; Jonathan Raiman; Bryan Catanzaro; |
| 178 | What If We Recaption Billions of Web Images with LLaMA-3? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community effort, leveraging the powerful and $\textit{open-sourced}$ LLaMA-3, a GPT-4 level LLM. |
Xianhang Li; Haoqin Tu; Mude Hui; Zeyu Wang; Bingchen Zhao; Junfei Xiao; Sucheng Ren; Jieru Mei; Qing Liu; Huangjie Zheng; Yuyin Zhou; Cihang Xie; |
| 179 | Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In response, we present a new sandbox suite tailored for integrated data-model co-development. |
Daoyuan Chen; Haibin Wang; Yilun Huang; Ce Ge; Yaliang Li; Bolin Ding; Jingren Zhou; |
| 180 | Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, state-of-the-art unlearning methods face a critical vulnerability: they are susceptible to “relearning” the removed information from a small number of forget data points, known as relearning attacks. In this paper, we systematically investigate how to make unlearned models robust against such attacks. |
Chongyu Fan; Jinghan Jia; Yihua Zhang; Anil Ramakrishna; Mingyi Hong; Sijia Liu; |
| 181 | Teaching Language Models to Critique Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we study LLM critics for code generation and propose $\texttt{CTRL}$, a framework for $\texttt{C}$ritic $\texttt{T}$raining via $\texttt{R}$einforcement $\texttt{L}$earning, which trains a critic model to generate feedback that maximizes correction performance for a fixed generator model without human supervision. |
Zhihui Xie; Jie chen; Liyu Chen; Weichao Mao; Jingjing Xu; Lingpeng Kong; |
| 182 | Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, little is known about the mechanisms that enable this generalization or how they are learned during pretraining. We introduce extractive structures as a framework for describing how components in LMs (e.g., MLPs or attention heads) coordinate to enable this generalization. |
Jiahai Feng; Stuart Russell; Jacob Steinhardt; |
| 183 | Which Attention Heads Matter for In-Context Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. |
Kayo Yin; Jacob Steinhardt; |
| 184 | Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we examine \textit{logical preference consistency} as a foundational requirement for building more dependable LLM systems, ensuring stable and coherent decision-making while minimizing erratic or contradictory outputs. |
Yinhong Liu; Zhijiang Guo; Tianya Liang; Ehsan Shareghi; Ivan Vulić; Nigel Collier; |
| 185 | Designing Cyclic Peptides Via Harmonic SDE with Atom-Bond Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These challenges include the scarcity of 3D structural data on target proteins and associated cyclic peptide ligands, the geometric constraints that cyclization imposes, and the involvement of non-canonical amino acids in cyclization. To address the above challenges, we introduce CpSDE, which consists of two key components: AtomSDE, a generative structure prediction model based on harmonic SDE, and ResRouter, a residue type predictor. |
Xiangxin Zhou; Mingyu Li; Yi Xiao; Jiahan Li; Dongyu Xue; Zaixiang Zheng; Jianzhu Ma; Quanquan Gu; |
| 186 | MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose \textbf{MetaAgent}, a \textbf{finite state machine} based framework that can automatically generate a multi-agent system. |
Yaolun Zhang; Xiaogeng Liu; Chaowei Xiao; |
| 187 | Theoretical Guarantees on The Best-of-n Alignment Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A commonly used analytical expression in the literature claims that the KL divergence between the best-of-$n$ policy and the reference policy is equal to $\log (n) – (n-1)/n.$ We disprove the validity of this claim, and show that it is an upper bound on the actual KL divergence. |
Ahmad Beirami; Alekh Agarwal; Jonathan Berant; Alexander Nicholas D’Amour; Jacob Eisenstein; Chirag Nagpal; Ananda Theertha Suresh; |
| 188 | Accelerating Unbiased LLM Evaluation Via Synthetic Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose a statistically principled framework that integrates human and synthetic feedback to reduce reliance on human annotations while maintaining unbiased win-rate calculations. |
Zhaoyi Zhou; Yuda Song; Andrea Zanette; |
| 189 | Learning Parametric Distributions from Samples and Preferences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Leveraging the hard constraints revealed by deterministic preferences, we propose an estimator achieving an estimation error scaling of $\mathcal{O}(1/n)$—a significant improvement over the $\Theta(1/\sqrt{n})$ rate attainable with samples alone. |
Marc Jourdan; Gizem Yüce; Nicolas Flammarion; |
| 190 | Learning In-context $n$-grams with Transformers: Sub-$n$-grams Are Near-Stationary Points Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this article, we explore the loss landscape of next-token prediction with transformers. |
Aditya Varre; Gizem Yüce; Nicolas Flammarion; |
| 191 | LLM Alignment As Retriever Optimization: An Information Retrieval Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a novel direct optimization approach for LLM alignment by drawing on established Information Retrieval (IR) principles. |
Bowen Jin; Jinsung Yoon; Zhen Qin; Ziqi Wang; Wei Xiong; Yu Meng; Jiawei Han; Sercan O Arik; |
| 192 | Predictive Data Selection: The Data That Predicts Is The Data That Teaches Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we aim to directly estimate the contribution of data during pretraining and select pretraining data in an efficient manner. |
KaShun SHUM; Yuzhen Huang; Hongjian Zou; dingqi; YiXuan Liao; Xiaoxin Chen; Qian Liu; Junxian He; |
| 193 | TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose TimeBridge, a novel framework designed to bridge the gap between non-stationarity and dependency modeling in long-term time series forecasting. |
Peiyuan Liu; Beiliang Wu; Yifan Hu; Naiqi Li; Tao Dai; Jigang Bao; Shu-Tao Xia; |
| 194 | Universal Length Generalization with Turing Programs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on prior scratchpad and Chain-of-Thought (CoT) techniques, we propose *Turing Programs*, a novel CoT strategy that decomposes an algorithmic task into steps mimicking the computation of a Turing Machine. |
Kaiying Hou; David Brandfonbrener; Sham M. Kakade; Samy Jelassi; Eran Malach; |
| 195 | On The Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To simulate faulty agents, we propose two approaches—AutoTransform and AutoInject—which introduce mistakes into the agents’ responses. |
Jen-tse Huang; Jiaxu Zhou; Tailin Jin; Xuhui Zhou; Zixi Chen; Wenxuan Wang; Youliang Yuan; Michael Lyu; Maarten Sap; |
| 196 | Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, as the environment dynamics change, certain expert states may become inaccessible, rendering their distributions less valuable for imitation. To address this, we propose a novel framework that integrates reward maximization with IfO, employing F-distance regularized policy optimization. |
Zhenghai Xue; Lang Feng; Jiacheng Xu; Kang Kang; Xiang Wen; Bo An; Shuicheng YAN; |
| 197 | WMAdapter: Adding WaterMark Control to Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose WMAdapter, a diffusion model watermark plugin that embeds user-specified watermark information seamlessly during the diffusion generation process. |
Hai Ci; Yiren Song; Pei Yang; Jinheng Xie; Mike Zheng Shou; |
| 198 | Sidechain Conditioning and Modeling for Full-atom Protein Sequence Design with FAMPNN Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Instead, these models implicitly reason about crucial sidechain interactions based solely on backbone geometry and amino-acid sequence. To address this, we present FAMPNN (Full-Atom MPNN), a sequence design method that explicitly models both sequence identity and sidechain conformation for each residue, where the per-token distribution of a residue’s discrete amino acid identity and its continuous sidechain conformation are learned with a combined categorical cross-entropy and diffusion loss objective. |
Talal Widatalla; Richard W. Shuai; Brian Hie; Possu Huang; |
| 199 | Reinforce LLM Reasoning Through Multi-Agent Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches often suffer from restricted feedback spaces and lack of coordinated training of different parties, leading to suboptimal performance. To address this, we model this multi-turn refinement process as a Markov Decision Process and introduce DPSDP (**D**irect **P**olicy **S**earch by **D**ynamic **P**rogramming), a reinforcement learning algorithm that trains an actor-critic LLM system to iteratively refine answers via direct preference learning on self-generated data. |
Yurun Yuan; Tengyang Xie; |
| 200 | An All-Atom Generative Model for Designing Protein Complexes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite these developments, the study and modeling of multi-chain proteins remain largely uncharted, though they are vital for understanding biological functions. Recognizing the importance of these interactions, we introduce APM (all-Atom Protein generative Model), a model specifically designed for modeling multi-chain proteins. |
Ruizhe Chen; Dongyu Xue; Xiangxin Zhou; Zaixiang Zheng; xiangxiang Zeng; Quanquan Gu; |
| 201 | NoLiMa: Long-Context Evaluation Beyond Literal Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, in these benchmarks, models can exploit existing literal matches between the needle and haystack to simplify the task. To address this, we introduce NoLiMa, a benchmark extending NIAH with a carefully designed needle set, where questions and needles have minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack. |
Ali Modarressi; Hanieh Deilamsalehy; Franck Dernoncourt; Trung Bui; Ryan A. Rossi; Seunghyun Yoon; Hinrich Schuetze; |
| 202 | RATE: Causal Explainability of Reward Models with Imperfect Counterfactuals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we develop Rewrite-based Attribute Treatment Estimator (RATE) as an effective method for measuring the sensitivity of a reward model to high-level attributes of responses, such as sentiment, helpfulness, or complexity. |
David Reber; Sean M Richardson; Todd Nief; Cristina Garbacea; Victor Veitch; |
| 203 | Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although existing optimization methods partially mitigate this issue, they remain constrained by the sequential dependency between communication and computation operations. To address this challenge, we propose ScMoE, a novel shortcut-connected MoE architecture integrated with an overlapping parallelization strategy. |
Weilin Cai; Juyong Jiang; Le Qin; junweicui; Sunghun Kim; Jiayi Huang; |
| 204 | Fast Video Generation with Sliding Tile Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Diffusion Transformers (DiTs) with 3D full attention power state-of-the-art video generation, but suffer from prohibitive compute cost — when generating just a 5-second 720P video, attention alone takes 800 out of 950 seconds of total inference time. This paper introduces sliding tile attention (STA) to address this challenge. |
Peiyuan Zhang; Yongqi Chen; Runlong Su; Hangliang Ding; Ion Stoica; Zhengzhong Liu; Hao Zhang; |
| 205 | Unifying 2D and 3D Vision-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel language-conditioned mask decoder shared across 2D and 3D modalities to ground objects effectively in both RGB and RGB-D images, outperforming box-based approaches. |
Ayush Jain; Alexander Swerdlow; Yuzhou Wang; Sergio Arnaud; Ada Martin; Alexander Sax; Franziska Meier; Katerina Fragkiadaki; |
| 206 | Best of Both Worlds: Advantages of Hybrid Graph Sequence Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce the Graph Sequence Model (GSM), a unifying framework for applying sequence models to graph data. |
Ali Behrouz; Ali Parviz; Mahdi Karami; Clayton Sanford; Bryan Perozzi; Vahab Mirrokni; |
| 207 | Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we address three underexplored research questions: (1) How can activation sparsity be measured more accurately? |
Yuqi Luo; Chenyang Song; Xu Han; Yingfa Chen; Chaojun Xiao; Xiaojun Meng; Liqun Deng; Jiansheng Wei; Zhiyuan Liu; Maosong Sun; |
| 208 | MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In response, we propose MMedPO, a novel multimodal medical preference optimization approach that considers the clinical relevance of preference samples to enhance Med-LVLM alignment. |
Kangyu Zhu; Peng Xia; Yun Li; Hongtu Zhu; Sheng Wang; Huaxiu Yao; |
| 209 | REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: If low attack success under such an objective is taken as a measure of robustness, the true robustness might be grossly overestimated. To alleviate these flaws, we propose an adaptive and semantic optimization problem over the population of responses. |
Simon Geisler; Tom Wollschläger; M. H. I. Abdalla; Vincent Cohen-Addad; Johannes Gasteiger; Stephan Günnemann; |
| 210 | Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Can such techniques be used to predict how models will behave on out-of-distribution examples? In this work, we provide a positive answer to this question. |
Jing Huang; Junyi Tao; Thomas Icard; Diyi Yang; Christopher Potts; |
| 211 | Do We Need to Verify Step By Step? Rethinking Process Supervision from A Theoretical Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conventional wisdom suggests that outcome supervision is fundamentally more challenging due to the trajectory-level coverage problem, leading to significant investment in collecting fine-grained process supervision data. In this paper, we provide a possible theoretical resolution to this debate. |
Zeyu Jia; Alexander Rakhlin; Tengyang Xie; |
| 212 | ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we present a framework for pretraining on large-scale microscopy datasets that includes three steps: (1) curating a set of diverse and self-consistent training samples, (2) scaling training of an appropriate foundation model architecture on this dataset, (3) evaluating intermediate layers of the trained model to identify the best representation for downstream tasks. Using this strategy, we present the largest foundation model for cell microscopy data to our knowledge, a new 1.9 billion-parameter ViT-G/8 MAE trained on over 8 billion microscopy image crops. |
Kian Kenyon-Dean; Zitong Jerry Wang; John Urbanik; Konstantin Donhauser; Jason Hartford; Saber Saberian; Nil Sahin; Ihab Bendidi; Safiye Celik; Juan Sebastián Rodríguez Vera; Marta Fay; Imran S Haque; Oren Kraus; |
| 213 | Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Moreover, such attention patterns exhibit substantial differences between familiar (e.g., “on the left side of ”) and unfamiliar (e.g.,“in front of ”) spatial relationships. Motivated by these findings, we propose ADAPTVIS based on inference-time confidence scores to sharpen the attention on highly relevant regions when the model exhibits high confidence, while smoothing and broadening the attention window to consider a wider context when confidence is lower. |
Shiqi Chen; Tongyao Zhu; Ruochen Zhou; Jinghan Zhang; Siyang Gao; Juan Carlos Niebles; Mor Geva; Junxian He; Jiajun Wu; Manling Li; |
| 214 | Temporal Difference Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. |
Jesse Farebrother; Matteo Pirotta; Andrea Tirinzoni; Remi Munos; Alessandro Lazaric; Ahmed Touati; |
| 215 | A General Framework for Inference-time Scaling and Steering of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose FK steering, a framework for inference-time steering diffusion models with reward functions. |
Raghav Singhal; Zachary Horvitz; Ryan Teehan; Mengye Ren; Zhou Yu; Kathleen McKeown; Rajesh Ranganath; |
| 216 | MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present MELON (Masked re-Execution and TooL comparisON), a novel IPI defense. |
Kaijie Zhu; Xianjun Yang; Jindong Wang; Wenbo Guo; William Yang Wang; |
| 217 | The Role of Sparsity for Length Generalization in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new theoretical framework to study length generalization for the next-token prediction task, as performed by decoder-only transformers. |
Noah Golowich; Samy Jelassi; David Brandfonbrener; Sham M. Kakade; Eran Malach; |
| 218 | Bring Reason to Vision: Understanding Perception and Reasoning Through Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we explore to compose perception and reasoning through model merging that connects parameters of different models. |
Shiqi Chen; Jinghan Zhang; Tongyao Zhu; Wei Liu; Siyang Gao; Miao Xiong; Manling Li; Junxian He; |
| 219 | Eliciting Language Model Behaviors with Investigator Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study the problem of behavioral elicitation, where the goal is to search for prompts that induce specific target behaviors (e.g., hallucinations, harmful responses) from a target language model. |
Xiang Lisa Li; Neil Chowdhury; Daniel D. Johnson; Tatsunori Hashimoto; Percy Liang; Sarah Schwettmann; Jacob Steinhardt; |
| 220 | Learning to (Learn at Test Time): RNNs with Expressive Hidden States Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a practical framework for instantiating sequence modeling layers with linear complexity and expressive hidden states. |
Yu Sun; Xinhao Li; Karan Dalal; Jiarui Xu; Arjun Vikram; Genghan Zhang; Yann Dubois; Xinlei Chen; Xiaolong Wang; Sanmi Koyejo; Tatsunori Hashimoto; Carlos Guestrin; |
| 221 | Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model Is Secretly A GAN Discriminator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While likelihood-based generative models, particularly diffusion and autoregressive models, have achieved remarkable fidelity in visual generation, the maximum likelihood estimation (MLE) objective, which minimizes the forward KL divergence, inherently suffers from a mode-covering tendency that limits the generation quality under limited model capacity. In this work, we propose Direct Discriminative Optimization (DDO) as a unified framework that integrates likelihood-based generative training and GAN-type discrimination to bypass this fundamental constraint by exploiting reverse KL and self-generated negative signals. |
Kaiwen Zheng; Yongxin Chen; Huayu Chen; Guande He; Ming-Yu Liu; Jun Zhu; Qinsheng Zhang; |
| 222 | Hidden No More: Attacking and Defending Private Third-Party LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a novel reconstruction technique that can recover original prompts from hidden states with nearly perfect accuracy across multiple state-of-the-art LLMs in the increasingly important open-weights setting. |
Rahul Krishna Thomas; Louai Zahran; Erica Choi; Akilesh Potti; Micah Goldblum; Arka Pal; |
| 223 | TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Using the obtained reward and Bradley-Terry model, this work establishes a framework of computable loss functions with token-level reward guidance for DPO, and proposes a practical reward guidance based on the induced DPO reward. |
Mingkang Zhu; Xi Chen; Zhongdao Wang; Bei Yu; Hengshuang Zhao; Jiaya Jia; |
| 224 | Agent Workflow Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. |
Zora Zhiruo Wang; Jiayuan Mao; Daniel Fried; Graham Neubig; |
| 225 | The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we propose a novel gradient-based approach to representation engineering and use it to identify refusal directions. |
Tom Wollschläger; Jannes Elstner; Simon Geisler; Vincent Cohen-Addad; Stephan Günnemann; Johannes Gasteiger; |
| 226 | Automated Red Teaming with GOAT: The Generative Offensive Agent Tester Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While manual testing addresses this gap, it is an inefficient and often expensive process. To address these limitations, we introduce the Generative Offensive Agent Tester (GOAT), an automated agentic red teaming system that simulates plain language adversarial conversations while leveraging multiple adversarial prompting techniques to identify vuLnerabilities in LLMs. |
Maya Pavlova; Erik Brinkman; Krithika Iyer; Vítor Albiero; Joanna Bitton; Hailey Nguyen; Cristian Canton Ferrer; Ivan Evtimov; Aaron Grattafiori; |
| 227 | Subobject-level Image Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Inspired by subword tokenization, we introduce subobject-level adaptive token segmentation and explore several approaches, including superpixel, SAM, and a proposed Efficient and PanOptiC (EPOC) image tokenizer. |
Delong Chen; Samuel Cahyawijaya; Jianfeng Liu; Baoyuan Wang; Pascale Fung; |
| 228 | Certified Unlearning for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose a novel method for certified machine unlearning, leveraging the connection between unlearning and privacy amplification by stochastic post-processing. |
Anastasia Koloskova; Youssef Allouah; Animesh Jha; Rachid Guerraoui; Sanmi Koyejo; |
| 229 | Wasserstein Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Wasserstein Policy Optimization (WPO), an actor-critic algorithm for reinforcement learning in continuous action spaces. |
David Pfau; Ian Davies; Diana L Borsa; João Guilherme Madeira Araújo; Brendan Daniel Tracey; Hado van Hasselt; |
| 230 | DiffusionVLA: Scaling Robot Foundation Models Via Unified Diffusion and Autoregression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present DiffusionVLA, a novel framework that integrates autoregressive reasoning with diffusion policies to address the limitations of existing methods: while autoregressive Vision-Language-Action (VLA) models lack precise and robust action generation, diffusion-based policies inherently lack reasoning capabilities. |
Junjie Wen; Yichen Zhu; Minjie Zhu; Zhibin Tang; Jinming Li; Zhongyi Zhou; Xiaoyu Liu; Chaomin Shen; Yaxin Peng; Feifei Feng; |
| 231 | GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation Via Multi-Step Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unlike human evaluation, existing automated evaluation metrics lack high-level semantic understanding and reasoning capabilities for video, thus making them infeasible and unexplainable. To fill this gap, we curate **GRADEO-Instruct**, a multi-dimensional T2V evaluation instruction tuning dataset, including 3.3k videos from over 10 existing video generation models and multi-step reasoning assessments converted by 16k human annotations. We then introduce **GRADEO**, one of the first specifically designed video evaluation models, which **grades** AI-generated **videos** for explainable scores and assessments through multi-step reasoning. |
Zhun Mou; Bin Xia; Zhengchao Huang; Wenming Yang; Jiaya Jia; |
| 232 | CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building a benchmark for real-world vulnerabilities involves both specialized exper- tise to reproduce exploits and a systematic approach to evaluating unpredictable attacks. To address this challenge, we introduce CVE-Bench, a real-world cybersecurity benchmark based on critical-severity Common Vulnerabilities and Exposures. |
Yuxuan Zhu; Antony Kellermann; Dylan Bowman; Philip Li; Akul Gupta; Adarsh Danda; Richard Fang; Conner Jensen; Eric Ihli; Jason Benn; Jet Geronimo; Avi Dhir; Sudhit Rao; Kaicheng Yu; Twm Stone; Daniel Kang; |
| 233 | Diving Into Self-Evolving Training for Multimodal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STaR (**M**ultimodal **S**elf-evolving **T**r**a**ining for **R**easoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. |
Wei Liu; Junlong Li; Xiwen Zhang; Fan Zhou; Yu Cheng; Junxian He; |
| 234 | Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we start from a new perspective to excavate the reason behind the failure generalization in AIGI detection, named the asymmetry phenomenon, where a naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked, which is proved seriously limiting the expressivity and generalization. |
Zhiyuan Yan; Jiangming Wang; Peng Jin; Ke-Yue Zhang; Chengchun Liu; Shen Chen; Taiping Yao; Shouhong Ding; Baoyuan Wu; Li Yuan; |
| 235 | AdvAgent: Controllable Blackbox Red-teaming on Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their access to sensitive resources and autonomous decision-making also introduce significant security risks, where successful attacks could lead to severe consequences. To systematically uncover these vulnerabilities, we propose AdvAgent, a black-box red-teaming framework for attacking web agents. |
Chejian Xu; Mintong Kang; Jiawei Zhang; Zeyi Liao; Lingbo Mo; Mengqi Yuan; Huan Sun; Bo Li; |
| 236 | LieRE: Lie Rotational Positional Encodings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which generalizes RoPE to high-dimensional rotation matrices by leveraging their Lie group structure. |
Sophie Ostmeier; Brian Axelrod; Maya Varma; Michael Moseley; Akshay S Chaudhari; Curtis Langlotz; |
| 237 | CodeIO: Condensing Reasoning Patterns Via Code Input-Output Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While prior research predominantly focuses on enhancing narrow skills like math or code generation, improving performance on many other reasoning tasks remains challenging due to sparse and fragmented training data. To address this issue, we propose CodeI/O, a novel approach that systematically condenses diverse reasoning patterns inherently embedded in contextually-grounded codes, through transforming the original code into a code input-output prediction format. |
Junlong Li; Daya Guo; Dejian Yang; Runxin Xu; Yu Wu; Junxian He; |
| 238 | Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Alongside PhyGenBench, we propose a novel evaluation framework called PhyGenEval.We will release the data and codes at https://github.com/OpenGVLab/PhyGenBench |
Fanqing Meng; Jiaqi Liao; Xinyu Tan; Quanfeng Lu; Wenqi Shao; Kaipeng Zhang; Yu Cheng; Dianqi Li; Ping Luo; |
| 239 | Great Models Think Alike and This Undermines AI Oversight Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We study how model similarity affects both aspects of AI oversight by proposing *Chance Adjusted Probabilistic Agreement (CAPA)*–a metric for LM similarity based on overlap in model mistakes. |
Shashwat Goel; Joschka Strüber; Ilze Amanda Auzina; Karuna K Chandra; Ponnurangam Kumaraguru; Douwe Kiela; Ameya Prabhu; Matthias Bethge; Jonas Geiping; |
| 240 | Improving Model Alignment Through Collective Intelligence of Open-Source Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. |
Junlin Wang; Roy Xie; Shang Zhu; Jue WANG; Ben Athiwaratkun; Bhuwan Dhingra; Shuaiwen Leon Song; Ce Zhang; James Zou; |
| 241 | On Path to Multimodal Generalist: General-Level and General-Bench Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this project, we introduce an evaluation framework to delineate the capabilities and behaviors of current multimodal generalists.To evaluate the comprehensive abilities of various generalists, we present a massive multimodal benchmark, **General-Bench**, which encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325,800 instances. |
Hao Fei; Yuan Zhou; Juncheng Li; Xiangtai Li; Qingshan Xu; Bobo Li; Shengqiong Wu; Yaoting Wang; Junbao Zhou; Jiahao Meng; Qingyu Shi; Zhiyuan Zhou; Liangtao Shi; Minghe Gao; Daoan Zhang; Zhiqi Ge; Siliang Tang; Kaihang Pan; Yaobo Ye; Haobo Yuan; Tao Zhang; Weiming Wu; Tianjie Ju; Zixiang Meng; Shilin Xu; Liyu Jia; Wentao Hu; Meng Luo; Jiebo Luo; Tat-Seng Chua; Shuicheng YAN; Hanwang Zhang; |
| 242 | ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This lack of context-awareness can lead to suboptimal performance, as the same action may hold different meanings depending on its surrounding context. To address this issue, we propose ActionPiece to explicitly incorporate context when tokenizing action sequences. |
Yupeng Hou; Jianmo Ni; Zhankui He; Noveen Sachdeva; Wang-Cheng Kang; Ed H. Chi; Julian McAuley; Derek Zhiyuan Cheng; |
| 243 | GSM-$\infty$: How Do Your LLMs Behave Over Infinitely Increasing Reasoning Complexity and Context Length? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the abstraction of GSM-8K problems as computational graphs�and the ability to introduce noise by adding unnecessary nodes and edges�we develop a grade-school math problem generator capable of producing arithmetic problems with infinite difficulty and context length under fine-grained control. |
Yang Zhou; Hongyi Liu; Zhuoming Chen; Yuandong Tian; Beidi Chen; |
| 244 | Scalable Equilibrium Sampling with Sequential Boltzmann Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we extend the Boltzmann generator framework with two key contributions, denoting our framework Sequential Boltzmann Generators (SBG). |
Charlie B. Tan; Joey Bose; Chen Lin; Leon Klein; Michael M. Bronstein; Alexander Tong; |
| 245 | Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce \emph{preference embedding}, an approach that embeds responses into a latent space to capture intricate preference structures efficiently, achieving linear query complexity. |
Yifan Zhang; Ge Zhang; Yue Wu; Kangping Xu; Quanquan Gu; |
| 246 | Compress Then Serve: Serving Thousands of LoRA Adapters with Little Overhead Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a method for the joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices. |
Rickard Brüel Gabrielsson; Jiacheng Zhu; Onkar Bhardwaj; Leshem Choshen; Kristjan Greenewald; Mikhail Yurochkin; Justin Solomon; |
| 247 | Contrastive Private Data Synthesis Via Weighted Multi-PLM Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods relying on pre-trained models for data synthesis often struggle in data-deficient scenarios, suffering from limited sample size, inevitable generation noise and existing pre-trained model bias. To address these challenges, we propose a novel contr**A**stive private data **S**ynthesis via **W**eighted multiple **P**re-trained generative models framework, named as **WASP**. |
Tianyuan Zou; Yang Liu; Peng Li; Yufei Xiong; Jianqing Zhang; Jingjing Liu; Xiaozhou Ye; Ye Ouyang; Ya-Qin Zhang; |
| 248 | OR-Bench: An Over-Refusal Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This study proposes a novel method for automatically generating large-scale over-refusal datasets. |
Justin Cui; Wei-Lin Chiang; Ion Stoica; Cho-Jui Hsieh; |
| 249 | MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning.To this end, we develop a reasoning-oriented subset to facilitate the assessment of o1-like models. |
Yuxin Zuo; Shang Qu; Yifei Li; Zhang-Ren Chen; Xuekai Zhu; Ermo Hua; Kaiyan Zhang; Ning Ding; Bowen Zhou; |
| 250 | Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We theoretically characterize gradient descent dynamics in deep linear networks trained at large width from random initialization and on large quantities of random data. |
Blake Bordelon; Cengiz Pehlevan; |
| 251 | AssistanceZero: Scalably Solving Assistance Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present the first scalable approach to solving assistance games and apply it to a new, challenging Minecraft-based assistance game with over $10^{400}$ possible goals. |
Cassidy Laidlaw; Eli Bronstein; Timothy Guo; Dylan Feng; Lukas Berglund; Justin Svegliato; Stuart Russell; Anca Dragan; |
| 252 | Elucidating The Design Space of Multimodal Protein Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. |
Cheng-Yen Hsieh; Xinyou Wang; Daiheng Zhang; Dongyu Xue; Fei Ye; Shujian Huang; Zaixiang Zheng; Quanquan Gu; |
| 253 | RStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. |
Xinyu Guan; Li Lyna Zhang; Yifei Liu; Ning Shang; Youran Sun; Yi Zhu; Fan Yang; Mao Yang; |
| 254 | Mind Your Step (by Step): Chain-of-Thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we seek to identify the characteristics of tasks where CoT reduces performance by drawing inspiration from cognitive psychology, focusing on six representative tasks from the psychological literature where deliberation hurts performance in humans. |
Ryan Liu; Jiayi Geng; Addison J. Wu; Ilia Sucholutsky; Tania Lombrozo; Thomas L. Griffiths; |
| 255 | From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unlike sequential recommendation, which naturally fits a generative next-item prediction paradigm, it’s hard to formulate CTR models into this paradigm without explicit feature order. Therefore, we propose a novel Supervised Feature Generation framework for CTR models, shifting from the discriminative feature interaction paradigm to the generative feature generation paradigm. |
Mingjia Yin; Junwei Pan; Hao Wang; Ximei Wang; Shangyu Zhang; Jie Jiang; Defu Lian; Enhong Chen; |
| 256 | Accelerated Diffusion Models Via Speculative Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out-of-the-box to any diffusion model. |
Valentin De Bortoli; Alexandre Galashov; Arthur Gretton; Arnaud Doucet; |
| 257 | Distributional Diffusion Models with Scoring Rules Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This is expensive and has motivated the development of many acceleration methods. We propose to speed up sample generation by learning the posterior distribution of clean data samples given their noisy versions, instead of only the mean of this distribution. |
Valentin De Bortoli; Alexandre Galashov; J Swaroop Guntupalli; Guangyao Zhou; Kevin Patrick Murphy; Arthur Gretton; Arnaud Doucet; |
| 258 | Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These platforms are widely trusted as a fair and accurate measure of LLM capabilities. In this paper, we show that if bot protection and other defenses are not implemented, these voting-based benchmarks are potentially vulnerable to adversarial manipulation. |
Yangsibo Huang; Milad Nasr; Anastasios Nikolas Angelopoulos; Nicholas Carlini; Wei-Lin Chiang; Christopher A. Choquette-Choo; Daphne Ippolito; Matthew Jagielski; Katherine Lee; Ken Liu; Ion Stoica; Florian Tramèr; Chiyuan Zhang; |
| 259 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the concept of critical tokens — elements within reasoning trajectories that significantly influence incorrect outcomes. |
Zicheng Lin; Tian Liang; Jiahao Xu; Qiuzhi Liu; Xing Wang; Ruilin Luo; Chufan Shi; Siheng Li; Yujiu Yang; Zhaopeng Tu; |
| 260 | Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we leverage the encoding-decoding framework to study how transformers form task vectors during pretraining and how their task encoding quality predicts ICL task performance. |
Seungwook Han; Jinyeop Song; Jeff Gore; Pulkit Agrawal; |
| 261 | DIS-CO: Discovering Copyrighted Content in VLMs Training Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: *How can we verify whether copyrighted content was used to train a large vision-language model (VLM) without direct access to its training data? * Motivated by the hypothesis that a VLM is able to recognize images from its training corpus, we propose DIS-CO, a novel approach to infer the inclusion of copyrighted content during the model’s development. |
André V. Duarte; Xuandong Zhao; Arlindo L. Oliveira; Lei Li; |
| 262 | $K^2$VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: As the forecast horizon extends, the inherent nonlinear dynamics have a significant adverse effect on prediction accuracy, and make generative models inefficient by increasing the cost of each iteration. To overcome these limitations, we introduce $K^2$VAE, an efficient VAE-based generative model that leverages a KoopmanNet to transform nonlinear time series into a linear dynamical system, and devises a KalmanNet to refine predictions and model uncertainty in such linear system, which reduces error accumulation in long-term forecasting. |
Xingjian Wu; Xiangfei Qiu; Hongfan Gao; Jilin Hu; Bin Yang; Chenjuan Guo; |
| 263 | General Agents Need World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. |
Jonathan Richens; Tom Everitt; David Abel; |
| 264 | Maximum Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, scaling them to handle more complex PDEs requires increasing the number of Fourier modes, which significantly expands the number of model parameters and makes hyperparameter tuning computationally impractical. To address this, we introduce $\mu$**Transfer-FNO**, a zero-shot hyperparameter transfer technique that enables optimal configurations, tuned on smaller FNOs, to be directly applied to billion-parameter FNOs _without_ additional tuning. |
Shanda Li; Shinjae Yoo; Yiming Yang; |
| 265 | Implicit Language Models Are RNNs: Balancing Parallelization and Expressivity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose implicit SSMs, which iterate a transformation until convergence to a fixed point. |
Mark Schöne; Babak Rahmani; Heiner Kremer; Fabian Falck; Hitesh Ballani; Jannes Gladrow; |
| 266 | Design Considerations in Offline Preference-based RL Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study how the different design choices made in methods such as DPO, IPO, SLiC and many variants influence the quality of the learned policy, from a theoretical perspective. |
Alekh Agarwal; Christoph Dann; Teodor Vanislavov Marinov; |
| 267 | Towards Flexible Perception with Visual Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. |
Robert Geirhos; Priyank Jaini; Austin Stone; Sourabh Medapati; Xi Yi; George Toderici; Abhijit Ogale; Jonathon Shlens; |
| 268 | OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose OTTER, a novel VLA architecture that leverages these existing alignments through explicit, text-aware visual feature extraction. |
Huang Huang; Fangchen Liu; Letian Fu; Tingfan Wu; Mustafa Mukadam; Jitendra Malik; Ken Goldberg; Pieter Abbeel; |
| 269 | TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce TimeBase, an ultra-lightweight network to harness the power of minimalism in LTSF. |
Qihe Huang; Zhengyang Zhou; Kuo Yang; Zhongchao Yi; Xu Wang; Yang Wang; |
| 270 | AnyEdit: Edit Any Knowledge Encoded in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These limitations arise from their reliance on editing a single token’s hidden state, a limitation we term as “efficacy barrier”. To solve this, we propose \textbf{AnyEdit}, a new autoregressive editing paradigm. |
Houcheng Jiang; Junfeng Fang; Ningyu Zhang; Mingyang Wan; Guojun Ma; Xiang Wang; Xiangnan He; Tat-Seng Chua; |
| 271 | NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, contemporary code language models (LMs) lack the ability to handle diverse types of code-edit requirements. In this work, we attempt to overcome this shortcoming through (1) a novel synthetic data generation pipeline and (2) a robust model adaptation algorithm. |
Tushar Aggarwal; Swayam Singh; Abhijeet Awasthi; Aditya Kanade; Nagarajan Natarajan; |
| 272 | Trajectory World Models for Heterogeneous Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we explore pre-training world models for heterogeneous environments by addressing key transfer barriers in both data diversity and model flexibility. |
Shaofeng Yin; Jialong Wu; Siqiao Huang; Xingjian Su; Xu He; Jianye HAO; Mingsheng Long; |
| 273 | Investigating Non-Transitivity in LLM-as-a-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the validity of this assumption remains largely unexplored. In this study, we investigate the presence of non-transitivity within the AlpacaEval framework and analyze its effects on model rankings. |
Yi Xu; Laura Ruis; Tim Rocktäschel; Robert Kirk; |
| 274 | Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. |
Shaokun Zhang; Ming Yin; Jieyu Zhang; Jiale Liu; Zhiguang Han; Jingyang Zhang; Beibin Li; Chi Wang; Huazheng Wang; Yiran Chen; Qingyun Wu; |
| 275 | Sundial: A Family of Highly Capable Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Sundial, a family of native, flexible, and scalable time series foundation models. |
Yong Liu; Guo Qin; Zhiyuan Shi; Zhi Chen; Caiyin Yang; Xiangdong Huang; Jianmin Wang; Mingsheng Long; |
| 276 | The Power of Random Features and The Limits of Distribution-Free Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study the relationship between gradient-based optimization of parametric models (e.g., neural networks) and optimization of linear combinations of random features. |
Ari Karchmer; Eran Malach; |
| 277 | Blink of An Eye: A Simple Theory for Feature Localization in Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This phenomenon is not unique to autoregressive models: in diffusion models, key features of the final output are decided in narrow “critical windows” of the generation process. In this work we develop a simple, unifying theory to explain this phenomenon. |
Marvin Li; Aayush Karan; Sitan Chen; |
| 278 | EpiCoder: Encompassing Diversity and Complexity in Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features derived from high-level abstractions of code. |
Yaoxiang Wang; Haoling Li; Xin Zhang; Jie Wu; Xiao Liu; Wenxiang Hu; Zhongxin Guo; Yangyu Huang; Ying Xin; Yujiu Yang; Jinsong Su; Qi Chen; Scarlett Li; |
| 279 | How Do Large Language Monkeys Get Their Power (Laws)? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify an apparent puzzle: a simple mathematical calculation predicts that on each problem, the failure rate should fall exponentially with the number of attempts. |
Rylan Schaeffer; Joshua Kazdan; John Hughes; Jordan Juravsky; Sara Price; Aengus Lynch; Erik Jones; Robert Kirk; Azalia Mirhoseini; Sanmi Koyejo; |
| 280 | Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While many factors are certainly responsible, this paper shines a light on a significant factor that makes predicting scaling behavior on widely used multiple-choice question answering benchmarks challenging and illuminates a path towards making such downstream evaluations predictable with scale. |
Rylan Schaeffer; Hailey Schoelkopf; Brando Miranda; Gabriel Mukobi; Varun Madan; Adam Ibrahim; Herbie Bradley; Stella Biderman; Sanmi Koyejo; |
| 281 | Watch Out Your Album! On The Inadvertent Privacy Memorization in Multi-Modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate how randomly generated task-irrelevant private content can become spuriously correlated with downstream objectives due to partial mini-batch training dynamics, thus causing inadvertent memorization. |
Tianjie Ju; Yi Hua; Hao Fei; Zhenyu Shao; Yubin Zheng; Haodong Zhao; Mong-Li Lee; Wynne Hsu; Zhuosheng Zhang; Gongshen Liu; |
| 282 | CommVQ: Commutative Vector Quantization for KV Cache Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large Language Models (LLMs) are increasingly used in applications requiring long context lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as context grows. To address this, we propose Commutative Vector Quantization (CommVQ) to significantly reduce memory usage for long-context LLM inference. |
Junyan Li; Yang Zhang; Muhammad Yusuf Hassan; Talha Chafekar; Tianle Cai; Zhile Ren; Pengsheng Guo; Foroozan Karimzadeh; Colorado Reed; Chong Wang; Chuang Gan; |
| 283 | Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an algorithm namely robust contextual dueling bandits ($\texttt{RCDB}$), which is based on uncertainty-weighted maximum likelihood estimation. |
Qiwei Di; Jiafan He; Quanquan Gu; |
| 284 | Copilot Arena: A Platform for Code LLM Evaluation in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Copilot Arena, a platform to collect user preferences through native integration into a developer’s working environment. |
Wayne Chi; Valerie Chen; Anastasios Nikolas Angelopoulos; Wei-Lin Chiang; Aditya Mittal; Naman Jain; Tianjun Zhang; Ion Stoica; Chris Donahue; Ameet Talwalkar; |
| 285 | Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present Unisolver, a novel Transformer model trained on diverse data and conditioned on diverse PDEs, aiming towards a universal neural PDE solver capable of solving a wide scope of PDEs. |
Hang Zhou; Yuezhou Ma; Haixu Wu; Haowen Wang; Mingsheng Long; |
| 286 | Collapse or Thrive: Perils and Promises of Synthetic Data in A Self-Generating World Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Some prior work warns of “model collapse” as the web is overwhelmed by synthetic data; other work suggests the problem can be contained (i.e. collapse can be avoided) by managing how available data are used in pretraining. In this paper, we report experiments on three ways of using data (training-workflows), across three generative model task-settings (multivariate Gaussian estimation, kernel density estimation, and language-model fine-tuning) to further confirm the possibility of containment: (a) we confirm that the training-workflow of {\it replacing} all real data by successive generations of purely synthetic data indeed suffers model collapse in all task-settings studied; (b) we consider the training-workflow of {\it accumulating} synthetic data alongside real data and training on all data combined and confirming that, although the proportion of real data eventually becomes zero, models remain stable and their test losses do not diverge under this training-workflow; (c) we consider a training-workflow where real and synthetic data accumulate together but successive generations of pretraining are constrained to use fixed-size data subsets each generation. |
Joshua Kazdan; Rylan Schaeffer; Apratim Dey; Matthias Gerstgrasser; Rafael Rafailov; David L. Donoho; Sanmi Koyejo; |
| 287 | Cradle: Empowering Foundation Agents Towards General Computer Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. |
Weihao Tan; Wentao Zhang; Xinrun Xu; Haochong Xia; Ziluo Ding; Boyu Li; Bohan Zhou; Junpeng Yue; Jiechuan Jiang; Yewen Li; Ruyi An; Molei Qin; Chuqiao Zong; Longtao Zheng; YuJie Wu; Xiaoqiang Chai; Yifei Bi; Tianbao Xie; Pengjie Gu; Xiyun Li; Ceyao Zhang; Long Tian; Chaojie Wang; Xinrun Wang; Börje F. Karlsson; Bo An; Shuicheng YAN; Zongqing Lu; |
| 288 | Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By compressing the spatial size of images, this approach can effectively shorten the token sequence and reduce the computational cost of ViT-like plain architectures. In this work, we aim to thoroughly examine the information loss caused by this patchification-based compressive encoding paradigm and how it affects visual understanding. |
Feng Wang; Yaodong Yu; Wei Shao; Yuyin Zhou; Alan Yuille; Cihang Xie; |
| 289 | Idiosyncrasies in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) — unique patterns in their outputs that can be used to distinguish the models. |
Mingjie Sun; Yida Yin; Zhiqiu Xu; J Zico Kolter; Zhuang Liu; |
| 290 | The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Beyond empirical analysis, we further provide a theoretical foundation by proving that, under mild conditions, the increased energy loss reduces the upper bound of contextual relevance in LLMs, which is a critical aspect of reward hacking as the reduced contextual relevance typically indicates overfitting to reward model-favored patterns in RL. To address this issue, we propose an *Energy loss-aware PPO algorithm (EPPO)* which penalizes the increase in energy loss in the LLM’s final layer during reward calculation to prevent excessive energy loss, thereby mitigating reward hacking. |
Yuchun Miao; Sen Zhang; Liang Ding; Yuqi Zhang; Lefei Zhang; Dacheng Tao; |
| 291 | Domain2Vec: Vectorizing Datasets to Find The Optimal Data Mixture Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce *Domain2Vec*, a novel approach that decomposes any dataset into a linear combination of several *meta-domains*, a new concept designed to capture the key underlying features of datasets. |
Mozhi Zhang; Howe Tissue; Lu Wang; Xipeng Qiu; |
| 292 | KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Knowledge-Aware Bayesian Bandits (KABB), a novel framework that enhances multi-agent system coordination through semantic understanding and dynamic adaptation. |
Jusheng Zhang; Zimeng Huang; Yijia Fan; Ningyuan Liu; Mingyan Li; Zhuojie Yang; Jiawei Yao; Jian Wang; Keze Wang; |
| 293 | AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent offline methods like DPO and SimPO bypass reinforcement learning’s complexity, they face critical limitations: DPO relies on static reference models that degrade with policy updates, and SimPO assumes a uniform target reward margin that ignores instance-wise preference strength. We propose AlphaDPO, an adaptive preference optimization framework that dynamically reparameterizes the reference distribution to address these issues. |
Junkang Wu; Xue Wang; Zhengyi Yang; Jiancan Wu; Jinyang Gao; Bolin Ding; Xiang Wang; Xiangnan He; |
| 294 | Everything Everywhere All at Once: LLMs Can In-Context Learn Multiple Tasks in Superposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we explore a surprising phenomenon related to ICL: LLMs can perform multiple, computationally distinct ICL tasks simultaneously, during a single inference call, a capability we term task superposition. |
Zheyang Xiong; Ziyang Cai; John Cooper; Albert Ge; Vasilis Papageorgiou; Zack Sifakis; Angeliki Giannou; Ziqian Lin; Liu Yang; Saurabh Agarwal; Grigorios Chrysos; Samet Oymak; Kangwook Lee; Dimitris Papailiopoulos; |
| 295 | FlipAttack: Jailbreak LLMs Via Flipping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes a simple yet effective jailbreak attack named FlipAttack against black-box LLMs. |
Yue Liu; Xiaoxin He; Miao Xiong; Jinlan Fu; Shumin Deng; YINGWEI MA; Jiaheng Zhang; Bryan Hooi; |
| 296 | ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ConfPO, a method for preference learning in Large Language Models (LLMs) that identifies and optimizes preference-critical tokens based solely on the training policy’s confidence, without requiring any auxiliary models or compute. |
Hee Suk Yoon; Eunseop Yoon; Mark A. Hasegawa-Johnson; Sungwoong Kim; Chang D. Yoo; |
| 297 | GenMol: A Drug Discovery Generalist with Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present *Generalist Molecular generative model* (GenMol), a versatile framework that uses only a *single* discrete diffusion model to handle diverse drug discovery scenarios. |
Seul Lee; Karsten Kreis; Srimukh Prasad Veccham; Meng Liu; Danny Reidenbach; Yuxing Peng; Saee Gopal Paliwal; Weili Nie; Arash Vahdat; |
| 298 | AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes *AutoML-Agent*, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. |
Patara Trirat; Wonyong Jeong; Sung Ju Hwang; |
| 299 | PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present PertEval-scFM, a standardized framework designed to evaluate models for perturbation effect prediction. |
Aaron Wenteler; Martina Occhetta; Nikhil Branson; Victor Curean; Magdalena Huebner; William Dee; William Connell; Siu Pui Chung; Alex Hawkins-Hooker; Yasha Ektefaie; César Miguel Valdez Córdova; Amaya Gallagher-Syed; |
| 300 | SToFM: A Multi-scale Foundation Model for Spatial Transcriptomics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This process requires integrating macro-scale tissue morphology, micro-scale cellular microenvironment, and gene-scale gene expression profile. To address this challenge, we propose **SToFM**, a multi-scale **S**patial **T**ranscript**o**mics **F**oundation **M**odel. |
Suyuan Zhao; YIZHEN LUO; Ganbo Yang; Yan Zhong; Hao Zhou; Zaiqing Nie; |
| 301 | CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we design a controllable image-text synthesis pipeline, CtrlSynth, for data-efficient and robust multimodal learning. |
Qingqing Cao; Mahyar Najibi; Sachin Mehta; |
| 302 | KV Shifting Attention Enhances Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To more effectively harness the model’s induction capabilities, we revisit the induction heads mechanism and provide theoretical proof that KV shifting attention reduces the model’s dependency on the depth and width of the induction heads mechanism. |
Mingyu Xu; Bingning Wang; Weipeng Chen; |
| 303 | Independence Tests for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In the constrained setting, we make assumptions about model architecture and training and propose statistical tests that yield exact p-values with respect to the null hypothesis that the models are trained from independent random initializations. |
Sally Zhu; Ahmed M Ahmed; Rohith Kuditipudi; Percy Liang; |
| 304 | Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with The Onsager-Machlup Functional Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of generative models, namely denoising diffusion and flow matching. |
Sanjeev Raja; Martin Sipka; Michael Psenka; Tobias Kreiman; Michal Pavelka; Aditi S. Krishnapriyan; |
| 305 | Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance—the former indicates that those insufficiently optimized data should be emphasized, while the latter stresses some critical data that are most influential for loss minimization. |
Puning Yang; Qizhou Wang; Zhuo Huang; Tongliang Liu; Chengqi Zhang; Bo Han; |
| 306 | Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To more comprehensively explore the space of heuristics, this paper proposes to use Monte Carlo Tree Search (MCTS) for LLM-based heuristic evolution. |
Zhi Zheng; Zhuoliang Xie; Zhenkun Wang; Bryan Hooi; |
| 307 | General Framework for Online-to-nonconvex Conversion: Schedule-free SGD Is Also Effective for Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work investigates the effectiveness of schedule-free methods, developed by A. Defazio et al. (NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable empirical success in training neural networks. |
Kwangjun Ahn; Gagik Magakyan; Ashok Cutkosky; |
| 308 | The Jailbreak Tax: How Useful Are Your Jailbreak Outputs? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we ask whether the model outputs produced by existing jailbreaks are actually *useful*.Overall, our work proposes jailbreak utility as a new important metric in AI safety, and introduces benchmarks to evaluate existing and future jailbreaks. |
Kristina Nikolić; Luze Sun; Jie Zhang; Florian Tramèr; |
| 309 | Self-Consistency Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we extend the self-consistency concept to help train models. |
Archiki Prasad; Weizhe Yuan; Richard Yuanzhe Pang; Jing Xu; Maryam Fazel-Zarandi; Mohit Bansal; Sainbayar Sukhbaatar; Jason E Weston; Jane Yu; |
| 310 | ReVISE: Learning to Refine at Test-Time Via Intrinsic Self-Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Refine via Intrinsic Self-Verification (ReVISE), an efficient and effective framework that enables LLMs to self-correct their outputs through self-verification. |
Hyunseok Lee; Seunghyuk Oh; Jaehyung Kim; Jinwoo Shin; Jihoon Tack; |
| 311 | Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We prove that prediction sets are optimal for decision makers who wish to optimize their value at risk. |
Shayan Kiyani; George J. Pappas; Aaron Roth; Hamed Hassani; |
| 312 | TUMTraf VideoQA: Dataset and Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present TUMTraf VideoQA, a novel dataset and benchmark designed for spatio-temporal video understanding in complex roadside traffic scenarios. |
Xingcheng Zhou; Konstantinos Larintzakis; Hao Guo; Walter Zimmer; Mingyu Liu; Hu Cao; Jiajie Zhang; Venkatnarayanan Lakshminarasimhan; Leah Strand; Alois Knoll; |
| 313 | Vision-Language Models Create Cross-Modal Task Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that VLMs align conceptually equivalent inputs into a shared task vector, which is invariant to modality (text, image) and format (examples, instruction), and may simplify VLM processing. |
Grace Luo; Trevor Darrell; Amir Bar; |
| 314 | FG-CLIP: Fine-Grained Visual and Textual Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this, we propose Fine-Grained CLIP (FG-CLIP), which enhances fine-grained understanding through three key innovations.We construct a comprehensive dataset, termed FineHARD, by integrating high-quality region-specific annotations with challenging fine-grained negative samples. |
Chunyu Xie; Bin Wang; Fanjing Kong; Jincheng Li; Dawei Liang; Gengshen Zhang; Dawei Leng; Yuhui Yin; |
| 315 | RealRAG: Retrieval-augmented Realistic Image Generation Via Self-reflective Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present **the first** real-object-based retrieval-augmented generation framework (**RealRAG**), which augments fine-grained and unseen novel object generation by learning and retrieving real-world images to overcome the knowledge gaps of generative models. |
Yuanhuiyi Lyu; Xu Zheng; Lutao Jiang; Yibo Yan; Xin Zou; Huiyu Zhou; Linfeng Zhang; Xuming Hu; |
| 316 | Risk and Cross Validation in Ridge Regression with Correlated Samples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. |
Alexander Atanasov; Jacob A Zavatone-Veth; Cengiz Pehlevan; |
| 317 | Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Orthus, a unified multimodal model that excels in generating interleaved images and text from mixed-modality inputs by simultaneously handling discrete text tokens and continuous image features under the \textbf{AR} modeling principle. |
Siqi Kou; Jiachun Jin; Zhihong Liu; Chang Liu; Ye Ma; Jian Jia; Quan Chen; Peng Jiang; Zhijie Deng; |
| 318 | Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
| 319 | Principled Algorithms for Optimizing Generalized Metrics in Binary Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce principled algorithms for optimizing generalized metrics, supported by $H$-consistency and finite-sample generalization bounds. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
| 320 | Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Universal Sparse Autoencoders (USAEs), a framework for uncovering and aligning interpretable concepts spanning multiple pretrained deep neural networks. |
Harrish Thasarathan; Julian Forsyth; Thomas Fel; Matthew Kowal; Konstantinos G. Derpanis; |
| 321 | MedRAX: Medical Reasoning Agent for Chest X-ray Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. |
Adibvafa Fallahpour; Jun Ma; Alif Munim; Hongwei Lyu; BO WANG; |
| 322 | DyPolySeg: Taylor Series-Inspired Dynamic Polynomial Fitting Network for Few-shot Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, existing methods using DGCNN as the backbone have limited geometric structure modeling capabilities and struggle to bridge the categorical information gap between query and support sets. To address these challenges, we propose DyPolySeg, a pre-training-free Dynamic Polynomial fitting network for few-shot point cloud semantic segmentation. |
Changshuo Wang; Xiang Fang; Prayag Tiwari; |
| 323 | (How) Do Language Models Track State? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). |
Belinda Z. Li; Zifan Carl Guo; Jacob Andreas; |
| 324 | Adversaries Can Misuse Combinations of Safe Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Developers try to evaluate whether an AI system can accomplish malicious tasks before releasing it; for example, they might test whether a model enables cyberoffense, user manipulation, or bioterrorism. In this work, we show that individually testing models for such misuse is inadequate; adversaries can misuse combinations of models even when each individual model is safe. |
Erik Jones; Anca Dragan; Jacob Steinhardt; |
| 325 | CFP-Gen: Combinatorial Functional Protein Generation Via Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce CFP-GEN, a novel diffusion language model for Combinatorial Functional Protein GENeration. |
Junbo Yin; Chao Zha; Wenjia He; Chencheng Xu; Xin Gao; |
| 326 | CLOVER: Cross-Layer Orthogonal Vectors Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Decoder-only models generate tokens autoregressively by caching key/value vectors, but as the cache grows, inference becomes memory-bounded. To address this challenge, we introduce CLOVER (Cross-Layer Orthogonal Vectors) pruning, a novel approach that treats pairs of components of the attention mechanism as low-rank decompositions. |
Fanxu Meng; Pingzhi Tang; Fan Jiang; Muhan Zhang; |
| 327 | Exploring Representations and Interventions in Time Series Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we investigate the structure and redundancy of representations across various TSFMs, examining the self-similarity of model layers within and across different model sizes. |
Michał Wiliński; Mononito Goswami; Willa Potosnak; Nina Żukowska; Artur Dubrawski; |
| 328 | Scaling Laws for Differentially Private Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility and the optimal training configurations in many settings. |
Ryan McKenna; Yangsibo Huang; Amer Sinha; Borja Balle; Zachary Charles; Christopher A. Choquette-Choo; Badih Ghazi; Georgios Kaissis; Ravi Kumar; Ruibo Liu; Da Yu; Chiyuan Zhang; |
| 329 | Reliable and Efficient Amortized Model-based Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unfortunately, question difficulty is expensive to estimate. Facing this challenge, we train a model that predicts question difficulty from its content, enabling a reliable measurement at a fraction of the cost. |
Sang T. Truong; Yuheng Tu; Percy Liang; Bo Li; Sanmi Koyejo; |
| 330 | Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this periodicity is undermined by the spectrum damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation. Building on our observations, we propose ***Fourier Position Embedding (FoPE)***, which enhances attention’s frequency-domain properties to improve both its periodic extension and length generalization. |
Ermo Hua; Che Jiang; Xingtai Lv; Kaiyan Zhang; Youbang Sun; Yuchen Fan; Xuekai Zhu; Biqing Qi; Ning Ding; Bowen Zhou; |
| 331 | Efficient Federated Incomplete Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Federated multi-view clustering (FMVC) has emerged as a potential solution, but existing approaches suffer from substantial limitations, including excessive communication overhead, insufficient privacy protection, and inadequate handling of missing views. To address these issues, we propose Efficient Federated Incomplete Multi-View Clustering (EFIMVC), a novel framework that introduces a localized optimization strategy to significantly reduce communication costs while ensuring theoretical convergence. |
Suyuan Liu; Hao Yu; Hao Tan; KE LIANG; Siwei Wang; Shengju Yu; En Zhu; Xinwang Liu; |
| 332 | Geometry Informed Tokenization of Molecules for Language Model Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although tokenization of molecular graphs exists, that for 3D geometries is largely unexplored. Here, we attempt to bridge this gap by proposing a novel method which converts molecular geometries into SE(3)-invariant 1D discrete sequences. |
Xiner Li; Limei Wang; Youzhi Luo; Carl Edwards; Shurui Gui; Yuchao Lin; Heng Ji; Shuiwang Ji; |
| 333 | CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Code and data used in the paper are available at https://anonymous.4open.science/r/CASEBench-D5DB. |
Guangzhi Sun; Xiao Zhan; Shutong Feng; Phil Woodland; Jose Such; |
| 334 | QLASS: Boosting Language Agent Inference Via Q-Guided Stepwise Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. |
Zongyu Lin; Yao Tang; Xingcheng Yao; Da Yin; Ziniu Hu; Yizhou Sun; Kai-Wei Chang; |
| 335 | Mastering Board Games By External and Internal Planning with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Advancing planning and reasoning capabilities of Large Language Models (LLMs) is one of the key prerequisites towards unlocking their potential for performing reliably in complex and impactful domains. In this paper, we aim to demonstrate this across board games (Chess, Fischer Random / Chess960, Connect Four, and Hex), and we show that search-based planning can yield significant improvements in LLM game-playing strength. |
John Schultz; Jakub Adamek; Matej Jusup; Marc Lanctot; Michael Kaisers; Sarah Perrin; Daniel Hennes; Jeremy Shar; Cannada A. Lewis; Anian Ruoss; Tom Zahavy; Petar Veličković; Laurel Prince; Satinder Singh; Eric Malmi; Nenad Tomasev; |
| 336 | MATH-Perturb: Benchmarking LLMs’ Math Reasoning Abilities Against Hard Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization.To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations — modifications that still preserve the underlying reasoning patterns of the solutions. |
Kaixuan Huang; Jiacheng Guo; Zihao Li; Xiang Ji; Jiawei Ge; Wenzhe Li; Yingqing Guo; Tianle Cai; Hui Yuan; Runzhe Wang; Yue Wu; Ming Yin; Shange Tang; Yangsibo Huang; Chi Jin; Xinyun Chen; Chiyuan Zhang; Mengdi Wang; |
| 337 | BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, real-world applications demand streaming inference. To address these challenges, we propose a flow matching based streaming binaural speech synthesis framework called BinauralFlow. |
Susan Liang; Dejan Markovic; Israel D. Gebru; Steven Krenn; Todd Keebler; Jacob Sandakly; Frank Yu; Samuel Hassel; Chenliang Xu; Alexander Richard; |
| 338 | Training Software Engineering Agents and Verifiers with SWE-Gym Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. |
Jiayi Pan; Xingyao Wang; Graham Neubig; Navdeep Jaitly; Heng Ji; Alane Suhr; Yizhe Zhang; |
| 339 | A Hitchhiker’s Guide to Scaling Law Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We collect (and release) a large-scale dataset containing losses and downstream evaluations for 485 previously published pretrained models. We use these to estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families. |
Leshem Choshen; Yang Zhang; Jacob Andreas; |
| 340 | Understanding The Forgetting of (Replay-based) Continual Learning Via Feature Learning: Angle Matters Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work aims to build a unified theoretical framework for understanding CL using feature learning theory. |
Hongyi Wan; Shiyuan Ren; Wei Huang; Miao Zhang; Xiang Deng; Yixin Bao; Liqiang Nie; |
| 341 | FairPFN: A Tabular Foundation Model for Causal Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current causal fairness frameworks hold a key limitation in that they assume prior knowledge of the correct causal model, restricting their applicability in complex fairness scenarios where causal models are unknown or difficult to identify. To bridge this gap, we propose FairPFN, a tabular foundation model pre-trained on synthetic causal fairness data to identify and mitigate the causal effects of protected attributes in its predictions. |
Jake Robertson; Noah Hollmann; Samuel Müller; Noor Awad; Frank Hutter; |
| 342 | Wyckoff Transformer: Generation of Symmetric Crystals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce WyFormer, a generative model that directly tackles this by formally conditioning on space group symmetry. |
Nikita Kazeev; Wei Nong; Ignat Romanov; Ruiming Zhu; Andrey E Ustyuzhanin; Shuya Yamazaki; Kedar Hippalgaonkar; |
| 343 | Parrot: Multilingual Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this, we propose Parrot, a novel approach that leverages textual guidance for visual token alignment at the language level.Additionally, we introduce the Massive Multilingual Multimodal Benchmark (MMMB), a new benchmark comprising 6 languages, 15 categories, and 12,000 questions, to assess multilingual capabilities. |
Hai-Long Sun; Da-Wei Zhou; Yang Li; Shiyin Lu; Chao Yi; Qing-Guo Chen; Zhao Xu; Weihua Luo; Kaifu Zhang; De-Chuan Zhan; Han-Jia Ye; |
| 344 | From Language Models Over Tokens to Language Models Over Characters Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents algorithms for converting token-level language models to character-level ones. |
Tim Vieira; Benjamin LeBrun; Mario Giulianelli; Juan Luis Gastaldi; Brian DuSell; John Terilla; Timothy J. O’Donnell; Ryan Cotterell; |
| 345 | Language Models Over Canonical Byte-Pair Encodings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose methods to enforce canonicality in token-level language models, ensuring that only canonical token strings are assigned positive probability. |
Tim Vieira; Tianyu Liu; Clemente Pasti; Yahya Emara; Brian DuSell; Benjamin LeBrun; Mario Giulianelli; Juan Luis Gastaldi; Timothy J. O’Donnell; Ryan Cotterell; |
| 346 | M+: Extending MemoryLLM with Scalable Long-Term Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While effective for sequence lengths up to 16k tokens, it struggles to retain knowledge beyond 20k tokens. In this work, we address this limitation by introducing M+, a memory-augmented model based on MemoryLLM that significantly enhances long-term information retention. |
Yu Wang; Dmitry Krotov; Yuanzhe Hu; Yifan Gao; Wangchunshu Zhou; Julian McAuley; Dan Gutfreund; Rogerio Feris; Zexue He; |
| 347 | Evaluating Judges As Evaluators: The JETTS Benchmark of LLM-as-Judges As Test-Time Scaling Evaluators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce the Judge Evaluation for Test-Time Scaling (JETTS) benchmark, which evaluates judge performance in three domains (math reasoning, code generation, and instruction following) under three task settings: response reranking, step-level beam search, and critique-based response refinement. |
Yilun Zhou; Austin Xu; PeiFeng Wang; Caiming Xiong; Shafiq Joty; |
| 348 | Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis.We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs’ agentic abilities. |
Peijie Dong; Zhenheng Tang; Xiang Liu; Lujun Li; Xiaowen Chu; Bo Li; |
| 349 | The Logical Implication Steering Method for Conditional Interventions on Transformer Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Studies also show that model generation behavior can be steered toward a given concept by adding the concept’s vector to the corresponding activations. We show how to leverage these properties to build a form of logical implication into models, enabling transparent and interpretable adjustments that induce a chosen generation behavior in response to the presence of any given concept. |
Damjan Kalajdzievski; |
| 350 | DreamDPO: Aligning Text-to-3D Generation with Human Preferences Via Direct Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods often struggle to align generated content with human preferences, limiting their applicability and flexibility. To address these limitations, in this paper, we propose DreamDPO, an optimization-based framework that integrates human preferences into the 3D generation process, through direct preference optimization. |
Zhenglin Zhou; Xiaobo Xia; Fan Ma; Hehe Fan; Yi Yang; Tat-Seng Chua; |
| 351 | LongRoPE2: Near-Lossless LLM Context Window Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: LongRoPE2 is a novel approach that extends the effective context window of pre-trained large language models (LLMs) to the target length, while preserving the performance on the … |
Ning Shang; Li Lyna Zhang; Siyuan Wang; Gaokai Zhang; Gilsinia Lopez; Fan Yang; Weizhu Chen; Mao Yang; |
| 352 | Score As Action: Fine Tuning Diffusion Generative Models By Continuous-time Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The objective of this study is to develop a disciplined approach to fine-tuning diffusion models using *continuous-time* RL, formulated as a stochastic control problem with a reward function that aligns the end result (terminal state) with input prompt. |
Hanyang Zhao; Haoxian Chen; Ji Zhang; David Yao; Wenpin Tang; |
| 353 | GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unlike prevailing data-driven methods that primarily rely on supervised learning, in this paper, we introduce a novel **G**raph-**o**riented **I**nverse **R**einforcement **L**earning (GoIRL) framework, which is an IRL-based predictor equipped with vectorized context representations. |
Muleilan Pei; Shaoshuai Shi; Lu Zhang; Peiliang Li; Shaojie Shen; |
| 354 | Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce graph-constrained reasoning (GCR), a novel framework that bridges structured knowledge in KGs with unstructured reasoning in LLMs. |
Linhao Luo; Zicheng Zhao; Gholamreza Haffari; Yuan-Fang Li; Chen Gong; Shirui Pan; |
| 355 | Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The code and trained models are made publicly available through the NVIDIA NeMo Framework. |
Taejin Park; Ivan Medennikov; Kunal Dhawan; Weiqing Wang; He Huang; Nithin Rao Koluguri; Krishna C Puvvada; Jagadeesh Balam; Boris Ginsburg; |
| 356 | Improving Your Model Ranking on Chatbot Arena By Vote Rigging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, this strategy is practically inefficient because there are over $190$ models on Chatbot Arena and on average only about 1% of new battles will involve $m\_{t}$. To overcome this, we propose an **omnipresent rigging** strategy, exploiting the Elo rating mechanism of Chatbot Arena that any new vote on a battle can influence the ranking of the target model $m\_{t}$, even if $m\_{t}$ is not directly involved in the battle. |
Rui Min; Tianyu Pang; Chao Du; Qian Liu; Minhao Cheng; Min Lin; |
| 357 | Generative Audio Language Modeling with Continuous-valued Tokens and Masked Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We leverage token-wise diffusion to model the continuous distribution of the next continuous-valued token. |
Shu-wen Yang; Byeonggeun Kim; Kuan-Po Huang; Qingming Tang; HUY PHAN; Bo-Ru Lu; Harshavardhan Sundar; Shalini Ghosh; Hung-yi Lee; Chieh-Chi Kao; Chao Wang; |
| 358 | Causal-PIK: Causality-based Physical Reasoning with A Physics-Informed Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These tasks require agents to iteratively improve their actions after actively exploring causes and effects in the environment. For these type of tasks, we propose Causal-PIK, a method that leverages Bayesian optimization to reason about causal interactions via a Physics-Informed Kernel to help guide efficient search for the best next action. |
Carlota Parés Morlans; Michelle Yi; Claire Chen; Sarah A Wu; Rika Antonova; Tobias Gerstenberg; Jeannette Bohg; |
| 359 | Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent advances have leveraged reinforcement learning algorithms to tackle this problem, much of the progress has been empirical, with limited theoretical understanding. To bridge this gap, we propose a stochastic control framework for fine-tuning diffusion models. |
Yinbin Han; Meisam Razaviyayn; Renyuan Xu; |
| 360 | Putnam-AXIOM: A Functional & Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Putnam-AXIOM, a benchmark of 522 university-level competition problems drawn from the prestigious William Lowell Putnam Mathematical Competition, and Putnam-AXIOM Variation, an unseen companion set of 100 functional variants generated by programmatically perturbing variables, and constants. |
Aryan Gulati; Brando Miranda; Eric Chen; Emily Xia; Kai Fronsdal; Bruno de Moraes Dumont; Sanmi Koyejo; |
| 361 | Controlled Generation with Equivariant Variational Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We evaluate our approach on both uncontrolled and controlled molecular generation, achieving state-of-the-art performance on uncontrolled generation and outperforming state-of-the-art models in controlled generation, both with end-to-end training and in the Bayesian inference setting. |
Floor Eijkelboom; Heiko Zimmermann; Sharvaree Vadgama; Erik J Bekkers; Max Welling; Christian A. Naesseth; Jan-Willem van de Meent; |
| 362 | Scaling Laws for Pre-training Agents and World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This has been demonstrated in domains from robotics to video games, when generative learning objectives on offline datasets (pre-training) are used to model an agent’s behavior (imitation learning) or their environment (world modeling). This paper characterizes the role of scale in these tasks more precisely. |
Tim Pearce; Tabish Rashid; David Bignell; Raluca Georgescu; Sam Devlin; Katja Hofmann; |
| 363 | Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, discovering an effective pruning strategy is non-trivial, as existing attribution methods and prompt compression algorithms fail to deliver robust results, let alone human intuition. In terms of this, we propose a self-discover prompt optimization framework, PromptQuine, an evolutionary search framework that automatically searches for the pruning strategy by itself using only low-data regimes. |
Jianyu Wang; Zhiqiang Hu; Lidong Bing; |
| 364 | $\mathcal{V}ista\mathcal{DPO}$: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large Video Models (LVMs) built upon Large Language Models (LLMs) have shown promise in video understanding but often suffer from misalignment with human intuition and video hallucination issues. To address these challenges, we introduce **VistaDPO**, a novel framework for Video Hierarchical Spatial-Temporal Direct Preference Optimization. |
Haojian Huang; Haodong Chen; Shengqiong Wu; Meng Luo; Jinlan Fu; Xinya Du; Hanwang Zhang; Hao Fei; |
| 365 | Private Federated Learning Using Preference-Optimized Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our key insight is that the private client feedback collected by prior DP synthetic data methods (Hou et al., 2024; Xie et al., 2024) can be viewed as a preference ranking. |
Charlie Hou; Mei-Yu Wang; Yige Zhu; Daniel Lazar; Giulia Fanti; |
| 366 | Lexico: Extreme KV Cache Compression Via Sparse Coding Over Universal Dictionaries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Lexico, a novel KV cache compression method that leverages sparse coding with a universal dictionary. |
Junhyuck Kim; Jongho Park; Jaewoong Cho; Dimitris Papailiopoulos; |
| 367 | EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce EraseAnything, the first method specifically developed to address concept erasure within the latest flow-based T2I framework. |
Daiheng Gao; Shilin Lu; Wenbo Zhou; Jiaming Chu; Jie Zhang; Mengxi Jia; Bang Zhang; Zhaoxin Fan; Weiming Zhang; |
| 368 | Generalists Vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In response, we propose LLOME (Language Model Optimization with Margin Expectation), a bilevel optimization routine for online black-box optimization. |
Angelica Chen; Samuel Don Stanton; Frances Ding; Robert G Alberstein; Andrew Martin Watkins; Richard Bonneau; Vladimir Gligorijevic; Kyunghyun Cho; Nathan C. Frey; |
| 369 | DistiLLM-2: A Contrastive Approach Boosts The Distillation of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses by harnessing this synergy. |
Jongwoo Ko; Tianyi Chen; Sungnyun Kim; Tianyu Ding; Luming Liang; Ilya Zharkov; Se-Young Yun; |
| 370 | Compositional Condition Question Answering in Tabular Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we introduce a new Compositional Condition Tabular Understanding method, called {\sc CoCoTab}. |
Jun-Peng Jiang; Tao Zhou; De-Chuan Zhan; Han-Jia Ye; |
| 371 | Eigenspectrum Analysis of Neural Networks Without Aspect Ratio Bias Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: It provides insight into how well a model is trained and can guide decisions on assigning better layer-wise training hyperparameters. In this paper, we address a challenge associated with such eigenspectrum methods: the impact of the aspect ratio of weight matrices on estimated heavytailness metrics. |
Yuanzhe Hu; Kinshuk Goel; Vlad Killiakov; Yaoqing Yang; |
| 372 | Gaussian Mixture Flow Matching Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. |
Hansheng Chen; Kai Zhang; Hao Tan; Zexiang Xu; Fujun Luan; Leonidas Guibas; Gordon Wetzstein; Sai Bi; |
| 373 | Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we explore the untapped potential of GNNs through an enhanced framework, GNN+, which integrates six widely used techniques: edge feature integration, normalization, dropout, residual connections, feed-forward networks, and positional encoding, to effectively tackle graph-level tasks. |
Yuankai Luo; Lei Shi; Xiao-Ming Wu; |
| 374 | Unified Breakdown Analysis for Byzantine Robust Gossip Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce F-RG, a general framework for building robust decentralized algorithms with guarantees arising from robust-sum-like aggregation rules F. |
Renaud Gaucher; Aymeric Dieuleveut; Hadrien Hendrikx; |
| 375 | RLTHF: Targeted Human Feedback for LLM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the generalizability limitations of AI Feedback. To address these challenges, we propose RLTHF, a human-AI hybrid framework that combines LLM-based initial alignment with selective human annotations to achieve full-human annotation alignment with minimal effort. |
Yifei Xu; Tusher Chakraborty; Emre Kiciman; Bibek Aryal; Srinagesh Sharma; Songwu Lu; Ranveer Chandra; |
| 376 | Do NOT Think That Much for 2+3=? On The Overthinking of Long Reasoning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Using a self-training paradigm, we propose strategies to mitigate overthinking, simplifying reasoning processes without compromising accuracy. |
Xingyu Chen; Jiahao Xu; Tian Liang; Zhiwei He; Jianhui Pang; Dian Yu; Linfeng Song; Qiuzhi Liu; Mengfei Zhou; Zhuosheng Zhang; Rui Wang; Zhaopeng Tu; Haitao Mi; Dong Yu; |
| 377 | MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present MENTOR, a method that improves both the *architecture* and *optimization* of RL agents. |
Suning Huang; Zheyu Aqa Zhang; Tianhai Liang; Yihan Xu; Zhehao Kou; Chenhao Lu; Guowei Xu; Zhengrong Xue; Huazhe Xu; |
| 378 | When Dynamic Data Selection Meets Data Augmentation: Achieving Enhanced Training Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To tackle the challenge, we propose a novel online data training framework that, for the first time, unifies dynamic data selection and augmentation, achieving both training efficiency and enhanced performance. |
Suorong Yang; Peng Ye; Furao Shen; Dongzhan Zhou; |
| 379 | The Canary’s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we assume an adversary has access to some synthetic data generated by a LLM. |
Matthieu Meeus; Lukas Wutschitz; Santiago Zanella-Beguelin; Shruti Tople; Reza Shokri; |
| 380 | Adaptive Localization of Knowledge Negation for Continual LLM Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building on the task vector framework, we propose a new method named ALKN (Adaptive Localization of Knowledge Negation), which uses dynamic masking to sparsify training gradients and adaptively adjusts unlearning intensity based on inter-task relationships. |
Abudukelimu Wuerkaixi; Qizhou Wang; Sen Cui; Wutong Xu; Bo Han; Gang Niu; Masashi Sugiyama; Changshui Zhang; |
| 381 | Interpreting The Repeated Token Phenomenon in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This unexplained failure mode represents a *vulnerability*, allowing even end users to diverge models away from their intended behavior. We aim to explain the causes for this phenomenon and link it to the concept of attention sinks, an emergent LLM behavior crucial for fluency, in which the initial token receives disproportionately high attention scores. |
Itay Yona; Ilia Shumailov; Jamie Hayes; Yossi Gandelsman; |
| 382 | Mastering Massive Multi-Task Reinforcement Learning Via Mixture-of-Expert Decision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we first revisit the key impact of task numbers on current MTRL method, and further reveal that naively expanding the parameters proves insufficient to counteract the performance degradation as the number of tasks escalates. Building upon these insights, we propose M3DT, a novel mixture-of-experts (MoE) framework that tackles task scalability by further unlocking the model’s parameter scalability. |
Yilun Kong; Guozheng Ma; Qi Zhao; Haoyu Wang; Li Shen; Xueqian Wang; Dacheng Tao; |
| 383 | Control and Realism: Best of Both Worlds in Layout-to-Image Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing works have demonstrated that pre-trained Text-to-Image diffusion models can achieve this goal without training on any specific data; however, they often face challenges with imprecise localization and unrealistic artifacts. Focusing on these drawbacks, we propose a novel training-free method, WinWinLay. |
Bonan Li; Yinhan Hu; Songhua Liu; Xinchao Wang; |
| 384 | PARM: Multi-Objective Test-Time Alignment Via Preference-Aware Autoregressive Reward Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recently, GenARM (Xu et al., 2025) first independently trains Autoregressive Reward Models (ARMs) for each preference dimension without awareness of each other, then combines their outputs based on user-specific preference vectors during inference to achieve multi-objective test-time alignment, leading to two key limitations: the need for *multiple* ARMs increases the inference cost, and the *separate* training of ARMs causes the misalignment between the guided generation and the user preferences. To address these issues, we propose Preference-aware ARM (PARM), a *single* unified ARM trained across *all* preference dimensions. |
Baijiong Lin; Weisen Jiang; Yuancheng Xu; Hao Chen; Ying-Cong Chen; |
| 385 | Nonparametric Modern Hopfield Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a nonparametric interpretation for deep learning compatible modern Hopfield models and utilize this new perspective to debut efficient variants. |
Jerry Yao-Chieh Hu; Bo-Yu Chen; Dennis Wu; Feng Ruan; Han Liu; |
| 386 | Balancing The Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. |
Corinna Cortes; Anqi Mao; Mehryar Mohri; Yutao Zhong; |
| 387 | VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our key insight is that a visual masked autoencoder, pre-trained on the ImageNet dataset, can naturally be a numeric series forecaster. |
Mouxiang Chen; Lefei Shen; Zhuo Li; Xiaoyun Joy Wang; Jianling Sun; Chenghao Liu; |
| 388 | GMAIL: Generative Modality Alignment for Generated Image Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel framework for discriminative use of generated images, coined \textit{GMAIL}, that explicitly treats generated images as a separate modality from real images. |
Shentong Mo; Sukmin Yun; |
| 389 | DyCodeEval: Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a dynamic data generation method and conduct extensive empirical studies on two seed datasets involving 18 Code LLMs. |
Simin Chen; Pranav Pusarla; Baishakhi Ray; |
| 390 | Invariant Deep Uplift Modeling for Incentive Assignment in Online Marketing Via Probability of Necessity and Sufficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, effective uplift modeling method for out-of-distribution data is crucial. To address this, we propose a novel uplift modeling method \textbf{I}nvariant \textbf{D}eep \textbf{U}plift \textbf{M}odeling, namely \textbf{IDUM}, which uses invariant learning to enhance out-of-distribution generalization by identifying causal factors that remain consistent across domains. |
Zexu Sun; Qiyu Han; Hao Yang; Anpeng Wu; Minqin Zhu; Dugang Liu; Chen Ma; Yunpeng Weng; Xing Tang; xiuqiang He; |
| 391 | Deep Reinforcement Learning from Hierarchical Preference Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Specifically, we propose a hierarchical reward design framework — HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning. |
Alexander Bukharin; Yixiao Li; Pengcheng He; Tuo Zhao; |
| 392 | 3D-LMVIC: Learning-based Multi-View Image Compression with 3D Gaussian Geometric Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While effective for small disparities, such as those in stereo images, these methods struggle with the more complex disparities encountered in wide-baseline multi-camera systems, commonly found in virtual reality and autonomous driving applications. To address this limitation, we propose 3D-LMVIC, a novel learning-based multi-view image compression framework that leverages 3D Gaussian Splatting to derive geometric priors for accurate disparity estimation. |
Yujun Huang; Bin Chen; Niu Lian; Xin Wang; Baoyi An; Tao Dai; Shu-Tao Xia; |
| 393 | Optimizing Language Models for Inference Time Objectives Using Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate the merits of explicitly optimizing for inference time algorithmic performance during model training. |
Yunhao Tang; Kunhao Zheng; Gabriel Synnaeve; Remi Munos; |
| 394 | DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: One formulation of the structure elucidation task is the conditional *de novo* generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. |
Montgomery Bohde; Mrunali Manjrekar; Runzhong Wang; Shuiwang Ji; Connor W. Coley; |
| 395 | All-atom Diffusion Transformers: Unified Generative Modelling of Molecules and Materials Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. |
Chaitanya K. Joshi; Xiang Fu; Yi-Lun Liao; Vahe Gharakhanyan; Benjamin Kurt Miller; Anuroop Sriram; Zachary Ward Ulissi; |
| 396 | PINNsAgent: Automated PDE Surrogation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce PINNsAgent, a novel surrogation framework that leverages large language models (LLMs) to bridge the gap between domain-specific knowledge and deep learning. |
Qingpo Wuwu; Chonghan Gao; Tianyu Chen; Yihang Huang; Yuekai Zhang; Jianing Wang; Jianxin Li; Haoyi Zhou; Shanghang Zhang; |
| 397 | Ultra-Resolution Adaptation with Ease Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, training models for high-resolution image generation remains challenging, particularly when training data and computational resources are limited. In this paper, we explore this practical problem from two key perspectives: data and parameter efficiency, and propose a set of key guidelines for ultra-resolution adaptation termed URAE. |
Ruonan Yu; Songhua Liu; Zhenxiong Tan; Xinchao Wang; |
| 398 | BSemiFL: Semi-supervised Federated Learning Via A Bayesian Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we first theoretically and empirically demonstrate that the local model achieves higher re-labeling accuracy over local data while the global model can progressively improve the re-labeling performance by introducing the extra data knowledge of other clients. |
Haozhao Wang; Shengyu Wang; Jiaming Li; Hao Ren; XINGSHUO HAN; Wenchao Xu; Shangwei Guo; Tianwei Zhang; Ruixuan Li; |
| 399 | Componential Prompt-Knowledge Alignment for Domain Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This arises from the random positioning of knowledge components within prompts, where irrelevant component fusion introduces interference. To address this, we propose Componential Prompt-Knowledge Alignment (KA-Prompt), a novel prompt-based DIL method that introduces component-aware prompt-knowledge alignment during training, significantly improving both the learning and inference capacity of the model. |
Kunlun Xu; Xu Zou; Gang Hua; Jiahuan Zhou; |
| 400 | Dendritic Localized Learning: Toward Biologically Plausible Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although various alternative learning approaches have been proposed to address these issues, most either fail to satisfy all three criteria simultaneously or yield suboptimal results. Inspired by the dynamics and plasticity of pyramidal neurons, we propose Dendritic Localized Learning (DLL), a novel learning algorithm designed to overcome these challenges. |
Changze Lv; Jingwen Xu; Yiyang Lu; Xiaohua Wang; Zhenghua Wang; Zhibo Xu; Di Yu; Xin Du; Xiaoqing Zheng; Xuanjing Huang; |
| 401 | ROPO: Robust Preference Optimization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing efforts for this problem either marginally alleviate the impact of noise without noise reduction, or rely on external LLMs that incur substantial computational costs. To address these challenges, we propose **RO**bust **P**reference **O**ptimization (**ROPO**), an iterative alignment approach that integrates *noise-tolerance* and *noise filtering* without the aid of external models. |
Xize Liang; Chao Chen; Shuang Qiu; Jie Wang; Yue Wu; Zhihang Fu; Hanzhu Chen; Feng Wu; Jieping Ye; |
| 402 | Test-Time Training Provably Improves Transformers As In-context Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. |
Halil Alperen Gozeten; Muhammed Emrullah Ildiz; Xuechen Zhang; Mahdi Soltanolkotabi; Marco Mondelli; Samet Oymak; |
| 403 | BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. |
Han Zhong; Yutong Yin; Shenao Zhang; Xiaojun Xu; Yuanxin Liu; Yifei Zuo; Zhihan Liu; Boyi Liu; Sirui Zheng; Hongyi Guo; Liwei Wang; Mingyi Hong; Zhaoran Wang; |
| 404 | Adversarial Reasoning at Jailbreaking Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks. In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs. |
Mahdi Sabbaghi; Paul Kassianik; George J. Pappas; Amin Karbasi; Hamed Hassani; |
| 405 | Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we present a novel approach to generate large-scale, high-resolution canopy height maps over time. |
Jan Pauls; Max Zimmer; Berkant Turan; Sassan Saatchi; Philippe CIAIS; Sebastian Pokutta; Fabian Gieseke; |
| 406 | MoH: Multi-Head Attention As Mixture-of-Head Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we upgrade the multi-head attention mechanism, the core of the Transformer model, to reduce computational costs while maintaining or surpassing the previous accuracy level. |
Peng Jin; Bo Zhu; Li Yuan; Shuicheng YAN; |
| 407 | Long-Term TalkingFace Generation Via Motion-Prior Conditional Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we introduce the \textbf{M}otion-priors \textbf{C}onditional \textbf{D}iffusion \textbf{M}odel (\textbf{MCDM}), which utilizes both archived and current clip motion priors to enhance motion prediction and ensure temporal consistency.We also introduce the {TalkingFace-Wild} dataset, a multilingual collection of over 200 hours of footage across 10 languages. |
Fei Shen; Cong Wang; Junyao Gao; Qin Guo; Jisheng Dang; Jinhui Tang; Tat-Seng Chua; |
| 408 | Efficient Time Series Processing for Transformers and State-Space Models Through Token Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. |
Leon Götz; Marcel Kollovieh; Stephan Günnemann; Leo Schwinn; |
| 409 | Towards A General Time Series Forecasting Model with Unified Representation and Adaptive Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we take a different approach by addressing two critical aspects of general forecasting models: (1) how to derive unified representations from heterogeneous multi-domain time series data, and (2) how to effectively capture domain-specific features to enable adaptive transfer across various downstream scenarios. |
Yihang Wang; Yuying Qiu; Peng Chen; Kai Zhao; Yang Shu; Zhongwen Rao; Lujia Pan; Bin Yang; Chenjuan Guo; |
| 410 | LightGTS: A Lightweight General Time Series Forecasting Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces LightGTS, a lightweight general time series forecasting model designed from the perspective of consistent periodical modeling. |
Yihang Wang; Yuying Qiu; Peng Chen; Yang Shu; Zhongwen Rao; Lujia Pan; Bin Yang; Chenjuan Guo; |
| 411 | SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce **SyncMind**, a framework that systematically defines the *out-of-sync* problem faced by large language model (LLM) agents in collaborative software engineering (CSE). |
Xuehang Guo; Xingyao Wang; Yangyi Chen; Sha Li; Chi Han; Manling Li; Heng Ji; |
| 412 | Addressing Misspecification in Simulation-based Inference Through Data-driven Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work introduces robust posterior estimation (RoPE), a framework that overcomes model misspecification with a small real-world calibration set of ground truth parameter measurements. |
Antoine Wehenkel; Juan L. Gamella; Ozan Sener; Jens Behrmann; Guillermo Sapiro; Joern-Henrik Jacobsen; marco cuturi; |
| 413 | Visual Autoregressive Modeling for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Building upon the tremendous success of autoregressive models in the language domain, we propose \textbf{VARSR}, a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.Furthermore, we collect large-scale data and design a training process to obtain robust generative priors. |
Yunpeng Qu; Kun Yuan; Jinhua Hao; Kai Zhao; Qizhi Xie; Ming Sun; Chao Zhou; |
| 414 | Streamline Without Sacrifice – Squeeze Out Computation Redundancy in LMM Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on our findings, we propose ProxyV, a novel approach that utilizes proxy vision tokens to alleviate the computational burden on original vision tokens. |
Penghao Wu; Lewei Lu; Ziwei Liu; |
| 415 | ReferSplat: Referring Segmentation in 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm.To support research in this area, we construct the first R3DGS dataset, Ref-LERF. |
Shuting He; Guangquan Jie; Changshuo Wang; Yun Zhou; Shuming Hu; Guanbin Li; Henghui Ding; |
| 416 | SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise with high accuracy. |
Wei Huang; Haotong Qin; Yangdong Liu; Yawei Li; Qinshuo Liu; Xianglong Liu; Luca Benini; Michele Magno; Shiming Zhang; XIAOJUAN QI; |
| 417 | Probing Visual Language Priors in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Vision-Language Models (VLMs) may over-rely on visual language priors from their training data rather than true visual reasoning. To investigate this, we introduce ViLP, a benchmark featuring deliberately out-of-distribution images synthesized via image generation models and out-of-distribution Q\&A pairs. |
Tiange Luo; Ang Cao; Gunhee Lee; Justin Johnson; Honglak Lee; |
| 418 | FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose FOUNDER, a framework that integrates the generalizable knowledge embedded in FMs with the dynamic modeling capabilities of WMs to enable open-ended task solving in embodied environments in a reward-free manner. |
Yucen Wang; Rui Yu; Shenghua Wan; Le Gan; De-Chuan Zhan; |
| 419 | Learning to Route LLMs with Confidence Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Depending on whether an answer is trustworthy, a system can then choose to route the question to another expert, or otherwise fall back on a safe default behavior. In this work, we study the extent to which LLMs can reliably indicate confidence in their answers, and how this notion of confidence can translate into downstream accuracy gains. |
Yu-Neng Chuang; Prathusha Kameswara Sarma; Parikshit Gopalan; John Boccio; Sara Bolouki; Xia Hu; Helen Zhou; |
| 420 | Right Now, Wrong Then: Non-Stationary Direct Preference Optimization Under Preference Drift Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current Large Language Model (LLM) preference optimization algorithms do not account for temporal preference drift, which can lead to severe misalignment. To address this limitation, we propose **Non-Stationary Direct Preference Optimisation (NS-DPO)** that models time-dependent reward functions with a Dynamic Bradley-Terry model. |
Seongho Son; William Bankes; Sayak Ray Chowdhury; Brooks Paige; Ilija Bogunovic; |
| 421 | Adjoint Sampling: Highly Scalable Diffusion Samplers Via Adjoint Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. |
Aaron J Havens; Benjamin Kurt Miller; Bing Yan; Carles Domingo-Enrich; Anuroop Sriram; Daniel S. Levine; Brandon M Wood; Bin Hu; Brandon Amos; Brian Karrer; Xiang Fu; Guan-Horng Liu; Ricky T. Q. Chen; |
| 422 | Compression Via Pre-trained Transformers: A Study on Byte-Level Multimodal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that even small models can be trained to perform well on multiple modalities, but unlike large-scale foundation models, transfer to unseen modalities is generally weak. |
David Heurtel-Depeiges; Anian Ruoss; Joel Veness; Tim Genewein; |
| 423 | Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To rigorously assess dictionary quality learned by SAEs, we introduce two new benchmarks that test (i) plausibility, if dictionaries recover “true” classification directions and (ii) identifiability, if dictionaries disentangle synthetic concept mixtures. |
Thomas Fel; Ekdeep Singh Lubana; Jacob S. Prince; Matthew Kowal; Victor Boutin; Isabel Papadimitriou; Binxu Wang; Martin Wattenberg; Demba E. Ba; Talia Konkle; |
| 424 | The Dark Side of The Forces: Assessing Non-conservative Force Models for Atomistic Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify and demonstrate several fundamental issues, from ill-defined convergence of geometry optimization to instability in various types of molecular dynamics. |
Filippo Bigi; Marcel F. Langer; Michele Ceriotti; |
| 425 | Does Data Scaling Lead to Visual Compositional Generalization? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our work motivates stronger emphasis on constructing diverse datasets for compositional generalization, and considering the importance of representational structure that enables efficient compositional learning. |
Arnas Uselis; Andrea Dittadi; Seong Joon Oh; |
| 426 | GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Moreover, the development of the grounding conversation capability of LMMs within RS is hindered by the lack of granular, RS domain-specific grounded data. Addressing these limitations, we propose GeoPixel – the first end-to-end high-resolution RS-LMM that supports pixel-level grounding. |
Akashah Shabbir; Mohammed Zumri; Mohammed Bennamoun; Fahad Shahbaz Khan; Salman Khan; |
| 427 | DPO Meets PPO: Reinforced Token Optimization for RLHF Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite the great successes of PPO in the alignment of state-of-the-art closed-source large language models (LLMs), its open-source implementation is still largely sub-optimal, as widely reported by numerous research studies. To address these issues, we introduce a framework that models RLHF problems as a Markov decision process (MDP), enabling the capture of fine-grained token-wise information. |
Han Zhong; Zikang Shan; Guhao Feng; Wei Xiong; Xinle Cheng; Li Zhao; Di He; Jiang Bian; Liwei Wang; |
| 428 | Inductive Moment Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Diffusion models and Flow Matching generate high-quality samples but are slow at inference, and distilling them into few-step models often leads to instability and extensive tuning. To resolve these trade-offs, we propose Moment Matching Self-Distillation (MMSD), a new class of generative models for one- or few-step sampling with a single-stage training procedure. |
Linqi Zhou; Stefano Ermon; Jiaming Song; |
| 429 | Ranked from Within: Ranking Large Multimodal Models Without Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We opt to avoid marking and the associated labor of determining the ground-truth answers. Instead, we explore other signals elicited and ascertain how well the models know their own limits, evaluating the effectiveness of these signals at unsupervised model ranking. |
Weijie Tu; Weijian Deng; Dylan Campbell; Yu Yao; Jiyang Zheng; Tom Gedeon; Tongliang Liu; |
| 430 | Beyond The Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. |
Binchi Zhang; Zaiyi Zheng; Zhengzhang Chen; Jundong Li; |
| 431 | EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we measure LLMs’ (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. |
Allen Nie; Yi Su; Bo Chang; Jonathan Lee; Ed H. Chi; Quoc V Le; Minmin Chen; |
| 432 | Tuning LLM Judge Design Decisions for 1/1000 of The Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose to systematically analyze and tune the hyperparameters of LLM judges. |
David Salinas; Omar Swelam; Frank Hutter; |
| 433 | Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose **Querent**, *i.e.*, the **quer**y-awar**e** long co**nt**extual dynamic modeling framework, which achieves a theoretically bounded approximation of full self-attention while delivering practical efficiency. |
Zhengrui Guo; Qichen Sun; Jiabo MA; Lishuang Feng; Jinzhuo Wang; Hao Chen; |
| 434 | Lightweight Dataset Pruning Without Full Training Via Example Difficulty and Prediction Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, many existing methods require training a model with a full dataset over a large number of epochs before being able to prune the dataset, which ironically makes the pruning process more expensive than just training the model on the entire dataset. To overcome this limitation, we introduce the **Difficulty and Uncertainty-Aware Lightweight (DUAL)** score, which aims to identify important samples from the early training stage by considering both example difficulty and prediction uncertainty. |
Yeseul Cho; Baekrok Shin; Changmin Kang; Chulhee Yun; |
| 435 | Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: (2) Do safety vulnerabilities exist in more common, simple human-LLM interactions? In this paper, we demonstrate that LLM responses most effectively facilitate harmful actions when they are both *actionable* and *informative*—two attributes easily elicited in multi-step, multilingual interactions. |
Yik Siu Chan; Narutatsu Ri; Yuxin Xiao; Marzyeh Ghassemi; |
| 436 | Improving Soft Unification with Knowledge Graph Embedding Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose several strategies to integrate the strengths of NTPs and KGEs, and demonstrate substantial improvements in both accuracy and computational efficiency. |
Xuanming Cui; Chionh Wei Peng; Adriel Kuek; Ser-Nam Lim; |
| 437 | TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unfortunately, the capabilities of LoRA-tuned diffusion models are limited, since the same LoRA is used for different timesteps of the diffusion process. To tackle this problem, we introduce a general and concise TimeStep Master (TSM) paradigm with two key fine-tuning stages. |
Shaobin Zhuang; Yiwei Guo; Yanbo Ding; Kunchang Li; Xinyuan Chen; Yaohui Wang; Fangyikang Wang; Ying Zhang; Chen Li; Yali Wang; |
| 438 | Inductive Gradient Adjustment for Spectral Bias in Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we delve into the linear dynamics model of MLPs and theoretically identify the empirical Neural Tangent Kernel (eNTK) matrix as a reliable link between spectral bias and training dynamics. |
Kexuan Shi; Hai Chen; Leheng Zhang; Shuhang Gu; |
| 439 | PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Preference learning is a central component for aligning current LLMs, but this process can be vulnerable to data poisoning attacks. To address this concern, we introduce PoisonBench, a benchmark for evaluating large language models’ susceptibility to data poisoning during preference learning. |
Tingchen Fu; Mrinank Sharma; Philip Torr; Shay B Cohen; David Krueger; Fazl Barez; |
| 440 | NeuronTune: Towards Self-Guided Spurious Bias Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we take a step towards self-guided mitigation of spurious bias by proposing NeuronTune, a post hoc method that directly intervenes in a model’s internal decision process. |
Guangtao Zheng; Wenqian Ye; Aidong Zhang; |
| 441 | Heads Up! Large Language Models Can Perform Tasks Without Your Instruction Via Selective Attention Head Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate the modules inside LLMs and demonstrate that, by simply masking or retaining specific attention heads during inference, LLMs can exhibit specific task functionalities without requiring explicit instructions or modifications to the model parameters. |
Senyu Han; Hongchuan Zeng; Kai Yu; Lu Chen; |
| 442 | What Has A Foundation Model Found? Inductive Bias Reveals World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. |
Keyon Vafa; Peter G. Chang; Ashesh Rambachan; Sendhil Mullainathan; |
| 443 | Sample, Scrutinize and Scale: Effective Inference-Time Search By Scaling Verification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we study the scaling trends governing sampling-based search. |
Eric Zhao; Pranjal Awasthi; Sreenivas Gollapudi; |
| 444 | Bifurcate Then Alienate: Incomplete Multi-view Clustering Via Coupled Distribution Learning with Linear Overhead Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite remarkable advances, existing incomplete multi-view clustering (IMC) methods typically leverage either perspective-shared or perspective-specific determinants to encode cluster representations. To address this limitation, we introduce a BACDL algorithm designed to explicitly capture both concurrently, thereby exploiting heterogeneous data more effectively. |
Shengju Yu; Yiu-ming Cheung; Siwei Wang; Xinwang Liu; En Zhu; |
| 445 | Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate two complementary search strategies applicable to such environments: 1-step lookahead and trajectory selection, both guided by a learned action-value function estimator. On the SWE-bench Verified benchmark, a key testbed for agentic software engineering, we find these methods to double the average success rate of a fine-tuned Qwen-72B model, achieving $40.8$\%, the new state-of-the-art for open-weights models. |
Karina Zainullina; Alexander Golubev; Maria Trofimova; Sergei Polezhaev; Ibragim Badertdinov; Daria Litvintseva; Simon Karasik; Filipp Fisin; Sergei Skvortsov; Maksim Nekrashevich; Anton Shevtsov; Boris Yangel; |
| 446 | Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ResGen, an efficient Residual Vector Quantization (RVQ)-based generative model for high-fidelity generation with fast sampling. |
Jaehyeon Kim; Taehong Moon; Keon Lee; Jaewoong Cho; |
| 447 | MMInference: Accelerating Pre-filling for Long-Context Visual Language Models Via Modality-Aware Permutation Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the quadratic attention complexity during the pre-filling phase remains a significant obstacle to real-world deployment. To overcome this limitation, we introduce MMInference (Multimodality Million tokens Inference), a dynamic sparse attention method that accelerates the prefilling stage for long-context multi-modal inputs. |
Yucheng Li; Huiqiang Jiang; Chengruidong Zhang; Qianhui Wu; Xufang Luo; Surin Ahn; Amir H. Abdi; Dongsheng Li; Jianfeng Gao; Yuqing Yang; Lili Qiu; |
| 448 | Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these challenges, this work proposes **S**imultaneous **M**RMP **D**iffusion (SMD), a novel approach integrating constrained optimization into the diffusion sampling process to produce collision-free, kinematically feasible trajectories.Additionally, the paper introduces a comprehensive MRMP benchmark to evaluate trajectory planning algorithms across scenarios with varying robot densities, obstacle complexities, and motion constraints. |
Jinhao Liang; Jacob K Christopher; Sven Koenig; Ferdinando Fioretto; |
| 449 | Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conversely, vision captures intricate temporal patterns but lacks semantic context, limiting the complementary potential of these modalities. To address this, we propose Time-VLM, a novel multimodal framework that leverages pre-trained Vision-Language Models (VLMs) to bridge temporal, visual, and textual modalities for enhanced forecasting. |
Siru Zhong; Weilin Ruan; Ming Jin; Huan Li; Qingsong Wen; Yuxuan Liang; |
| 450 | Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a self-improvement approach where models iteratively generate and learn from their own solutions, progressively tackling harder problems while maintaining a standard transformer architecture. |
Nayoung Lee; Ziyang Cai; Avi Schwarzschild; Kangwook Lee; Dimitris Papailiopoulos; |
| 451 | How to Synthesize Text Data Without Model Collapse? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on two questions: what is the impact of synthetic data on language model training, and how to synthesize data without model collapse? |
Xuekai Zhu; Daixuan Cheng; Hengli Li; Kaiyan Zhang; Ermo Hua; Xingtai Lv; Ning Ding; Zhouhan Lin; Zilong Zheng; Bowen Zhou; |
| 452 | Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce **O**utcome **R**efining **P**rocess **S**upervision, which unifies process and outcome supervision by leveraging executable verification: a tree-structured search framework generates strategic alternatives, profiles execution metrics, and scores candidates via self-critique mechanisms that integrate runtime feedback with reasoning. |
Zhuohao Yu; Weizheng Gu; Yidong Wang; Xingru Jiang; Zhengran Zeng; Jindong Wang; Wei Ye; Shikun Zhang; |
| 453 | Memorization Sinks: Isolating Memorization During LLM Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we put forward a new paradigm of MemSinks that promotes isolation of memorization by design. |
Gaurav Rohit Ghosal; Pratyush Maini; Aditi Raghunathan; |
| 454 | PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present **PepTune**, a multi-objective discrete diffusion model for simultaneous generation and optimization of therapeutic peptide SMILES. |
Sophia Tang; Yinuo Zhang; Pranam Chatterjee; |
| 455 | BaxBench: Can LLMs Generate Correct and Secure Backends? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, to achieve full automation, LLMs should be able to generate production-quality, self-contained application modules. To evaluate the capabilities of LLMs in solving this challenge, we introduce BaxBench, a novel evaluation benchmark consisting of 392 tasks for the generation of backend applications. |
Mark Vero; Niels Mündler; Victor Chibotaru; Veselin Raychev; Maximilian Baader; Nikola Jovanović; Jingxuan He; Martin Vechev; |
| 456 | MIPT: Multilevel Informed Prompt Tuning for Robust Molecular Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Multilevel Informed Prompt-Tuning (MIPT), a novel framework for effectively tailoring pretrained models to molecule-related tasks. |
yeyunchen; Jiangming Shi; |
| 457 | Towards Graph Foundation Models: Learning Generalities Across Graphs Via Task-Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, discovering such generalities in graph-structured data, especially across heterogeneous graph tasks, remains an open challenge. To address this, we propose a novel approach to cross-task generalization in graphs via task-trees, which serve as unified learning instances aligning node-, edge-, and graph-level tasks. |
Zehong Wang; Zheyuan Zhang; Tianyi Ma; Nitesh V Chawla; Chuxu Zhang; Yanfang Ye; |
| 458 | Beyond Message Passing: Neural Graph Pattern Machine Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce the Neural Graph Pattern Machine (GPM), a novel framework that bypasses message passing by learning directly from graph substructures. |
Zehong Wang; Zheyuan Zhang; Tianyi Ma; Nitesh V Chawla; Chuxu Zhang; Yanfang Ye; |
| 459 | Reducing Tool Hallucination Via Reliability Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To systematically address this issue, we define and categorize tool hallucinations into two main types: tool selection hallucination and tool usage hallucination. To evaluate and mitigate these issues, we introduce RelyToolBench, which integrates specialized test cases and novel metrics to assess hallucination-aware task success and efficiency. |
Hongshen Xu; Zichen Zhu; Lei Pan; Zihan Wang; Su Zhu; Da Ma; Ruisheng Cao; Lu Chen; Kai Yu; |
| 460 | DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current approaches, which adapt the same parameters across different domains, struggle in such dynamic conditions—they face convergence issues with brief domain exposures, risk forgetting previously learned knowledge, or misapplying it to irrelevant domains. To remedy this, we propose **DPCore**, a method designed for robust performance across diverse domain change patterns while ensuring computational efficiency. |
Yunbei Zhang; Akshay Mehra; Shuaicheng Niu; Jihun Hamm; |
| 461 | Train for The Worst, Plan for The Best: Understanding Token Ordering in Masked Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work we closely examine these two competing effects. On the training front, we theoretically and empirically demonstrate that MDMs indeed train on computationally intractable subproblems compared to their autoregressive counterparts. On the inference front, we show that a suitable strategy for adaptively choosing the token decoding order significantly enhances the capabilities of MDMs, allowing them to sidestep hard subproblems. |
Jaeyeon Kim; Kulin Shah; Vasilis Kontonis; Sham M. Kakade; Sitan Chen; |
| 462 | The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present the Berkeley Function Calling Leaderboard (BFCL), a comprehensive benchmark designed to evaluate function calling capabilities in a wide range of real-world settings.We construct the benchmark using a combination of expert curated, and user-contributed functions and associated prompts. |
Shishir G Patil; Huanzhi Mao; Fanjia Yan; Charlie Cheng-Jie Ji; Vishnu Suresh; Ion Stoica; Joseph E. Gonzalez; |
| 463 | Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a unified framework for Venn and Venn-Abers calibration that extends Vovk’s approach beyond binary classification to a broad class of prediction tasks defined by generic loss functions. |
Lars van der Laan; Ahmed Alaa; |
| 464 | Towards Efficient Online Tuning of VLM Agents Via Counterfactual Soft Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel online fine-tuning method, Counterfactual Soft Reinforcement Learning (CoSo), better suited to the textual output space of VLM agents. |
Lang Feng; Weihao Tan; Zhiyi Lyu; Longtao Zheng; Haiyang Xu; Ming Yan; Fei Huang; Bo An; |
| 465 | Logits Are All We Need to Adapt Closed Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a token-level probability reweighting framework that, given access to logits and a small amount of task-specific data, can effectively steer black-box LLMs toward application-specific content generation. |
Gaurush Hiranandani; Haolun Wu; Subhojyoti Mukherjee; Sanmi Koyejo; |
| 466 | The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The manner in which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. |
Walter Mayor; Johan Obando-Ceron; Aaron Courville; Pablo Samuel Castro; |
| 467 | Discrete Neural Algorithmic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states. |
Gleb Rodionov; Liudmila Prokhorenkova; |
| 468 | Mixture of Lookup Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Their large parameter size still limits deployment, and offloading, which load experts into VRAM only when needed, significantly increase inference latency. To address this, we propose Mixture of Lookup Experts (MoLE), a new MoE architecture that is efficient in both communication and VRAM usage. |
Shibo Jie; Yehui Tang; Kai Han; Yitong Li; Duyu Tang; Zhi-Hong Deng; Yunhe Wang; |
| 469 | How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Measuring dataset contamination thus becomes essential to ensure that performance evaluations genuinely reflect a model’s ability to generalize to unseen data, rather than relying on memorized examples. To address this problem, we propose Kernel Divergence Score (KDS), a novel method that evaluates dataset contamination by computing the divergence between the kernel similarity matrix of sample embeddings, before and after fine-tuning on the benchmark dataset. |
Hyeong Kyu Choi; Maxim Khanov; Hongxin Wei; Yixuan Li; |
| 470 | The Best of Both Worlds: Bridging Quality and Diversity in Data Selection with Bipartite Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current data selection methods often prioritize one aspect over the other, resulting in suboptimal training outcomes. To address this, we formulate data selection as a set cover problem and present GraphFilter, a novel approach that balances both quality and diversity in data selection. |
Minghao Wu; Thuy-Trang Vu; Lizhen Qu; Gholamreza Haffari; |
| 471 | Distillation Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. |
Dan Busbridge; Amitis Shidani; Floris Weers; Jason Ramapuram; Etai Littwin; Russell Webb; |
| 472 | Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: They remain trapped within the gravitational pull of multi-step planning complexity, failing to generalize as task demands increase. To overcome these limitations, we propose a scalable Bayesian ToM planner. |
Chunhui Zhang; Zhongyu Ouyang; Kwonjoon Lee; Nakul Agarwal; Sean Dae Houlihan; Soroush Vosoughi; Shao-Yuan Lo; |
| 473 | FreeMesh: Boosting Mesh Generation with Coordinates Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building upon PTME, we propose a plug-and-play tokenization technique called coordinate merging. |
Jian Liu; Haohan Weng; Biwen Lei; Xianghui Yang; Zibo Zhao; Zhuo Chen; Song Guo; Tao Han; Chunchao Guo; |
| 474 | Demystifying Cost-Efficiency in LLM Serving Over Heterogeneous GPUs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Subsequently, we design a scheduling algorithm via mixed-integer linear programming, aiming at deducing the most cost-efficient serving plan under the constraints of price budget and real-time GPU availability. |
YOUHE JIANG; Fangcheng Fu; Xiaozhe Yao; Guoliang HE; Xupeng Miao; Ana Klimovic; Bin CUI; Binhang Yuan; Eiko Yoneki; |
| 475 | AutoEval Done Right: Using Synthetic Data for Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. |
Pierre Boyeau; Anastasios Nikolas Angelopoulos; Tianle Li; Nir Yosef; Jitendra Malik; Michael I. Jordan; |
| 476 | On The Interplay Between Graph Structure and Learning Algorithms in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing theoretical studies on the learning dynamics of GNNs primarily focus on the convergence rates of learning algorithms under the interpolation regime (noise-free) and offer only a crude connection between these dynamics and the actual graph structure (e.g., maximum degree). This paper aims to bridge this gap by investigating the excessive risk (generalization performance) of learning algorithms in GNNs within the generalization regime (with noise). |
Junwei Su; Chuan Wu; |
| 477 | A Non-Asymptotic Convergent Analysis for Scored-Based Graph Generative Model Via A System of Stochastic Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present the first convergence analysis for SGGMs, focusing on the convergence bound (the risk of generative error) across three key graph generation paradigms: (1) feature generation with a fixed graph structure, (2) graph structure generation with fixed node features, and (3) joint generation of both graph structure and node features. |
Junwei Su; Chuan Wu; |
| 478 | H-Tuning: Toward Low-Cost and Efficient ECG-based Cardiovascular Disease Detection with Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we propose a holistic method (H-Tuning) for low-cost and efficient fine-tuning of pre-trained models on downstream datasets. |
Rushuang Zhou; Yuanting Zhang; Yining Dong; |
| 479 | A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, humans are less susceptible to such bias, partly due to *infantile amnesia*, where the formation of new neurons disrupts early memory traces, leading to the forgetting of initial experiences. Inspired by this dual processes of forgetting and growing in neuroscience, in this paper, we propose *Forget and Grow* (**FoG**), a new deep RL algorithm with two mechanisms introduced. |
Zilin Kang; Chenyuan Hu; Yu Luo; Zhecheng Yuan; Ruijie Zheng; Huazhe Xu; |
| 480 | DIME: Diffusion-Based Maximum Entropy Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges—primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). |
Onur Celik; Zechu Li; Denis Blessing; Ge Li; Daniel Palenicek; Jan Peters; Georgia Chalvatzaki; Gerhard Neumann; |
| 481 | Addressing Imbalanced Domain-Incremental Learning Through Dual-Balance Collaborative Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These challenges significantly hinder model performance, as intra-domain imbalance leads to underfitting of few-shot classes, while cross-domain shifts require maintaining well-learned many-shot classes and transferring knowledge to improve few-shot class performance in old domains. To overcome these challenges, we introduce the Dual-Balance Collaborative Experts (DCE) framework. |
Lan Li; Da-Wei Zhou; Han-Jia Ye; De-Chuan Zhan; |
| 482 | SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To maximize sparsity while retaining essential information, we introduce a rank-based strategy to adaptively determine the sparsification ratio for each layer, alongside a token recycling method that compresses pruned tokens into more compact representations. |
Yuan Zhang; Chun-Kai Fan; Junpeng Ma; Wenzhao Zheng; Tao Huang; Kuan Cheng; Denis A Gudovskiy; Tomoyuki Okuno; Yohei Nakata; Kurt Keutzer; Shanghang Zhang; |
| 483 | Diffusion Models Are Secretly Exchangeable: Parallelizing DDPMs Via Auto Speculation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property. |
Hengyuan Hu; Aniket Das; Dorsa Sadigh; Nima Anari; |
| 484 | CPCF: A Cross-Prompt Contrastive Framework for Referring Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these models often suffer from suboptimal performance due to incorrect responses tailored to misleading areas adjacent to or similar to the target region. This work introduces CPCF, a novel framework to address this issue and achieve superior results. |
Lanyun Zhu; Deyi Ji; Tianrun Chen; Haiyang Wu; De Wen Soh; Jun Liu; |
| 485 | Otter: Generating Tests from Issues to Validate SWE Patches Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Second, it also validates SWE (software engineering) agents, which generate code patches for resolving issues. This paper introduces TDD-Bench-Verified, a benchmark for generating tests from issues, and Otter, an LLM-based solution for this task. |
Toufique Ahmed; Jatin Ganhotra; Rangeet Pan; Avraham Shinnar; Saurabh Sinha; Martin Hirzel; |
| 486 | A Near Linear Query Lower Bound for Submodular Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit the problem of selecting $k$-out-of-$n$ elements with the goal of optimizing an objective function, and ask whether it can be solved approximately with sublinear query complexity. |
Binghui Peng; Aviad Rubinstein; |
| 487 | DataDecide: How to Predict Best Pretraining Data with Small Experiments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Because large language models are expensive to pretrain on different datasets, using smaller-scale experiments to decide on data is crucial for reducing costs. Which benchmarks and methods of making decisions from observed performance at small scale most accurately predict the datasets that yield the best large models? To empower open exploration of this question, we release models, data, and evaluations in DataDecide�the most extensive open suite of models over differences in data and scale. |
Ian Magnusson; Nguyen Tai; Ben Bogin; David Heineman; Jena D. Hwang; Luca Soldaini; Akshita Bhagia; Jiacheng Liu; Dirk Groeneveld; Oyvind Tafjord; Noah A. Smith; Pang Wei Koh; Jesse Dodge; |
| 488 | Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we build upon flow-matching and propose two mechanisms for accelerating training and inference of generative models for 3D molecular conformer generation. |
Zhonglin Cao; Mario Geiger; Allan Dos Santos Costa; Danny Reidenbach; Karsten Kreis; Tomas Geffner; Franco Pellegrini; Guoqing Zhou; Emine Kucukbenli; |
| 489 | Self-Bootstrapping for Versatile Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we seek to develop a versatile test-time adaptation (TTA) objective for a variety of tasks — classification and regression across image-, object-, and pixel-level predictions. |
Shuaicheng Niu; Guohao Chen; Peilin Zhao; Tianyi Wang; Pengcheng Wu; Zhiqi Shen; |
| 490 | An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These methods largely succeed in coercing the target output in their original settings, but their attacks vary substantially in fluency and computational effort. In this work, we propose a unified threat model for the principled comparison of these methods. |
Valentyn Boreiko; Alexander Panfilov; Vaclav Voracek; Matthias Hein; Jonas Geiping; |
| 491 | A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. |
Manan Tayal; Aditya Singh; Shishir Kolathaya; Somil Bansal; |
| 492 | Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For example, supervised fine-tuning improves reasoning quality but requires vast labeled data, while reward-maximizing reinforcement learning finds top-reward solutions while neglecting the solution diversity. To fill this gap, we propose Flow of Reasoning (FoR), an efficient diversity-seeking LLM finetuning method aimed at improving reasoning quality and diversity with minimal data. |
Fangxu Yu; Lai Jiang; Haoqiang Kang; Shibo Hao; Lianhui Qin; |
| 493 | ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we introduce ALMTokenizer, a novel low-bitrate and semantically rich audio codec tokenizer for audio language models. |
Dongchao Yang; Songxiang Liu; Haohan Guo; Jiankun Zhao; Yuanyuan Wang; Helin Wang; Zeqian Ju; Xubo Liu; Xueyuan Chen; Xu Tan; Xixin Wu; Helen M. Meng; |
| 494 | Video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes video-SALMONN-o1, the first open-source reasoning-enhanced audio-visual LLM designed for general video understanding tasks.To enhance its reasoning abilities, we develop a reasoning-intensive dataset featuring challenging audio-visual questions with step-by-step solutions. |
Guangzhi Sun; Yudong Yang; Jimin Zhuang; Changli Tang; Yixuan Li; Wei Li; Zejun MA; Chao Zhang; |
| 495 | Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore how to effectively characterize users and items in LLM-based recommender systems from the token construction view. |
Ting-Ji Huang; Jia-Qi Yang; Chunxu Shen; Kai-Qi Liu; De-Chuan Zhan; Han-Jia Ye; |
| 496 | FLAM: Frame-Wise Language-Audio Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce FLAM, an open-vocabulary contrastive audio-language model capable of localizing specific sound events. |
Yusong Wu; Christos Tsirigotis; Ke Chen; Cheng-Zhi Anna Huang; Aaron Courville; Oriol Nieto; Prem Seetharaman; Justin Salamon; |
| 497 | Measuring Diversity in Synthetic Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce DCScore, a novel method for measuring synthetic dataset diversity from a classification perspective. |
Yuchang Zhu; Huizhe Zhang; Bingzhe Wu; Jintang Li; Zibin Zheng; Peilin Zhao; Liang Chen; Yatao Bian; |
| 498 | Latent Mamba Operator for Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing neural operators struggle with scalability in high-dimensional spaces, incur high computational costs, and face challenges in capturing continuous and long-range dependencies in PDE dynamics. To address these limitations, we introduce the Latent Mamba Operator (LaMO), which integrates the efficiency of state-space models (SSMs) in latent space with the expressive power of kernel integral formulations in neural operators. |
Karn Tiwari; Niladri Dutta; N M Anoop Krishnan; Prathosh AP; |
| 499 | Hierarchical Graph Tokenization for Molecule-Language Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that neglecting the hierarchical information in tokenization will lead to subpar molecule-language alignment and severe hallucination. To address this limitation, we propose HIerarchical GrapH Tokenization (HIGHT). |
Yongqiang Chen; Quanming Yao; Juzheng Zhang; James Cheng; Yatao Bian; |
| 500 | FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The inherent in-distribution (ID) data heterogeneity among different clients makes it more challenging to maintain this trade-off. To fill this gap, we introduce a Federated OOD-aware Context Optimization (FOCoOp) framework, which captures diverse distributions among clients using ID global prompts, local prompts, and OOD prompts. |
Xinting Liao; Weiming Liu; Jiaming Qian; Pengyang Zhou; Jiahe Xu; Wenjie Wang; Chaochao Chen; Xiaolin Zheng; Tat-Seng Chua; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~3,300 papers), please visit Paper Digest: ICML-2025 (Full List).