Paper Digest: ICLR 2025 Papers & Highlights
Note: ICLR-2025 accepts more than 3,700 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 3,700 ICLR-2025 papers in a separate page, which takes quite some time to load.
To search for papers presented at ICLR-2025 on a specific topic, please make use of the search by venue (ICLR-2025) service. To summarize the latest research published at ICLR-2025 on a specific topic, you can utilize the review by venue (ICLR-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 15,000 authors (ICLR-2025). Using this year’s data, our system also generates a report on recent machine learning topics. Additionally, you may want to explore our “Best Paper” Digest (ICLR), which lists the most influential ICLR papers since 2018.
We’ve developed a service – ICLR-2025 Research that synthesizes the latest findings from ICLR 2025 into comprehensive reports. For instance, we’ve generated a report on Advances in Large Language Model Training: Insights from ICLR 2025 Papers. We encourage interested users to utilize our service to create tailored reports on other emerging topics.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive daily paper digests on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: ICLR 2025 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | SAM 2: Segment Anything in Images and Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. |
Nikhila Ravi; Valentin Gabeur; Yuan-Ting Hu; Ronghang Hu; Chaitanya Ryali; Tengyu Ma; Haitham Khedr; Roman Rädle; Chloe Rolland; Laura Gustafson; Eric Mintun; Junting Pan; Kalyan Vasudev Alwala; Nicolas Carion; Chao-Yuan Wu; Ross Girshick; Piotr Dollar; Christoph Feichtenhofer; |
| 2 | T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce sampling Trajectory Stitching (T-Stitch), a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. |
Zizheng Pan; Bohan Zhuang; De-An Huang; Weili Nie; Zhiding Yu; Chaowei Xiao; Jianfei Cai; Anima Anandkumar; |
| 3 | Scaling and Evaluating Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose using k-sparse autoencoders [Makhzani and Frey, 2013] to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. |
Leo Gao; Tom Dupre la Tour; Henk Tillman; Gabriel Goh; Rajan Troll; Alec Radford; Ilya Sutskever; Jan Leike; Jeffrey Wu; |
| 4 | Generative Representational Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. |
Niklas Muennighoff; Hongjin SU; Liang Wang; Nan Yang; Furu Wei; Tao Yu; Amanpreet Singh; Douwe Kiela; |
| 5 | Tamper-Resistant Safeguards for Open-Weight LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a method, called TAR, for building tamper-resistant safeguards into open-weight LLMs such that adversaries cannot remove the safeguards even after hundreds of steps of fine-tuning. |
Rishub Tamirisa; Bhrugu Bharathi; Long Phan; Andy Zhou; Alice Gatti; Tarun Suresh; Maxwell Lin; Justin Wang; Rowan Wang; Ron Arel; Andy Zou; Dawn Song; Bo Li; Dan Hendrycks; Mantas Mazeika; |
| 6 | Generative Verifiers: Reward Modeling As Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While LLM-based verifiers are typically trained as discriminative classifiers to score solutions, they do not utilize the text generation capabilities of pretrained LLMs. To overcome this limitation, we instead propose training verifiers using the ubiquitous next-token prediction objective, jointly on verification and solution generation. |
Lunjun Zhang; Arian Hosseini; Hritik Bansal; Mehran Kazemi; Aviral Kumar; Rishabh Agarwal; |
| 7 | Training Language Models to Self-Correct Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods for training self-correction typically depend on either multiple models, a more advanced model, or additional forms of supervision. To address these shortcomings, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM’s self-correction ability using entirely self-generated data. |
Aviral Kumar; Vincent Zhuang; Rishabh Agarwal; Yi Su; John D Co-Reyes; Avi Singh; Kate Baumli; Shariq Iqbal; Colton Bishop; Rebecca Roelofs; Lei M Zhang; Kay McKinney; Disha Shrivastava; Cosmin Paduraru; George Tucker; Doina Precup; Feryal Behbahani; Aleksandra Faust; |
| 8 | OLMoE: Open Mixture-of-Experts Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). |
Niklas Muennighoff; Luca Soldaini; Dirk Groeneveld; Kyle Lo; Jacob Morrison; Sewon Min; Weijia Shi; Evan Pete Walsh; Oyvind Tafjord; Nathan Lambert; Yuling Gu; Shane Arora; Akshita Bhagia; Dustin Schwenk; David Wadden; Alexander Wettig; Binyuan Hui; Tim Dettmers; Douwe Kiela; Ali Farhadi; Noah A. Smith; Pang Wei Koh; Amanpreet Singh; Hannaneh Hajishirzi; |
| 9 | DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore methodologies to leverage proof assistant feedback to augment the capabilities of large language models in constructing formal proofs. |
Huajian Xin; Z.Z. Ren; Junxiao Song; Zhihong Shao; Wanjia Zhao; Haocheng Wang; Bo Liu; Liyue Zhang; Xuan Lu; Qiushi Du; Wenjun Gao; Haowei Zhang; Qihao Zhu; Dejian Yang; Zhibin Gou; Z.F. Wu; Fuli Luo; Chong Ruan; |
| 10 | Diffusion Models Are Real-Time Game Engines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present GameNGen, the first game engine powered entirely by a neural model that also enables real-time interaction with a complex environment over long trajectories at high quality. |
Dani Valevski; Yaniv Leviathan; Moab Arar; Shlomi Fruchter; |
| 11 | Transfusion: Predict The Next Token and Diffuse Images with One Multi-Modal Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. |
Chunting Zhou; LILI YU; Arun Babu; Kushal Tirumala; Michihiro Yasunaga; Leonid Shamis; Jacob Kahn; Xuezhe Ma; Luke Zettlemoyer; Omer Levy; |
| 12 | Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods. We argue that one main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. |
Sihyun Yu; Sangkyung Kwak; Huiwon Jang; Jongheon Jeong; Jonathan Huang; Jinwoo Shin; Saining Xie; |
| 13 | Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, the Best-of-N (BoN) inference strategy, where an LLM generates multiple responses and a verifier selects the best, has shown strong empirical performance. Motivated by this, we develop a novel inference-aware fine-tuning paradigm, which encompasses the BoN-aware inference framework as a special case. |
Yinlam Chow; Guy Tennenholtz; Izzeddin Gur; Vincent Zhuang; Bo Dai; Aviral Kumar; Rishabh Agarwal; Sridhar Thiagarajan; Craig Boutilier; Aleksandra Faust; |
| 14 | AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present AndroidWorld, a fully functional Android environment that provides reward signals for 116 programmatic tasks across 20 real-world Android apps. |
Christopher Rawles; Sarah Clinckemaillie; Yifan Chang; Jonathan Waltz; Gabrielle Lau; Marybeth Fair; Alice Li; William E Bishop; Wei Li; Folawiyo Campbell-Ajala; Daniel Kenji Toyama; Robert James Berry; Divya Tyamagundlu; Timothy P Lillicrap; Oriana Riva; |
| 15 | Scaling Up Masked Diffusion Models on Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fully leveraging the probabilistic formulation of MDMs, we propose a simple yet effective *unsupervised classifier-free guidance* that effectively exploits large-scale unpaired data, boosting performance for conditional inference. |
Shen Nie; Fengqi Zhu; Chao Du; Tianyu Pang; Qian Liu; Guangtao Zeng; Min Lin; Chongxuan Li; |
| 16 | OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent T2V methods have focused on vision transformers, using a simple cross attention module for video generation, which falls short of making full use of semantic information from text tokens. To address these issues, we introduce OpenVid-1M, a precise high-quality dataset with expressive captions. |
Kepan Nan; Rui Xie; Penghao Zhou; Tiehan Fan; Zhenheng Yang; Zhijie Chen; Xiang Li; Jian Yang; Ying Tai; |
| 17 | Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. |
Chenglei Si; Diyi Yang; Tatsunori Hashimoto; |
| 18 | RouteLLM: Learning to Route LLMs from Preference Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Powerful models offer better results but are expensive, while smaller models are more cost-effective but less capable. To address this trade-off, we introduce a training framework for learning efficient router models that dynamically select between a stronger and weaker LLM during inference. |
Isaac Ong; Amjad Almahairi; Vincent Wu; Wei-Lin Chiang; Tianhao Wu; Joseph E. Gonzalez; M Waleed Kadous; Ion Stoica; |
| 19 | Your Absorbing Discrete Diffusion Secretly Models The Conditional Distributions of Clean Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time-dependent scalar in an analytic form. |
Jingyang Ou; Shen Nie; Kaiwen Xue; Fengqi Zhu; Jiacheng Sun; Zhenguo Li; Chongxuan Li; |
| 20 | LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce LLaVA-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs. |
Feng Li; Renrui Zhang; Hao Zhang; Yuanhan Zhang; Bo Li; Wei Li; Zejun MA; Chunyuan Li; |
| 21 | LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. |
Haian Jin; Hanwen Jiang; Hao Tan; Kai Zhang; Sai Bi; Tianyuan Zhang; Fujun Luan; Noah Snavely; Zexiang Xu; |
| 22 | Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study inference scaling laws (aka test-time scaling laws) and compute-optimal inference, focusing on the trade-offs between model sizes and generating additional tokens with different inference strategies. |
Yangzhen Wu; Zhiqing Sun; Shanda Li; Sean Welleck; Yiming Yang; |
| 23 | SpinQuant: LLM Quantization with Learned Rotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures while enhancing quantization accuracy. |
Zechun Liu; Changsheng Zhao; Igor Fedorov; Bilge Soran; Dhruv Choudhary; Raghuraman Krishnamoorthi; Vikas Chandra; Yuandong Tian; Tijmen Blankevoort; |
| 24 | On Scaling Up 3D Gaussian Splatting Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. |
Hexu Zhao; Haoyang Weng; Daohan Lu; Ang Li; Jinyang Li; Aurojit Panda; Saining Xie; |
| 25 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify that only a fraction of attention heads, a.k.a, Retrieval Heads, are critical for processing long contexts and require full attention across all tokens. |
Guangxuan Xiao; Jiaming Tang; Jingwei Zuo; junxian guo; Shang Yang; Haotian Tang; Yao Fu; Song Han; |
| 26 | AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. |
Maksym Andriushchenko; Alexandra Souly; Mateusz Dziemian; Derek Duenas; Maxwell Lin; Justin Wang; Dan Hendrycks; Andy Zou; J Zico Kolter; Matt Fredrikson; Yarin Gal; Xander Davies; |
| 27 | JudgeLM: Fine-tuned Large Language Models Are Scalable Judges Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM) to evaluate LLMs efficiently and effectively in open-ended benchmarks.We first propose a comprehensive, large-scale, high-quality dataset containing task seeds, LLMs-generated answers, and GPT-4-generated judgments for fine-tuning high-performance judges, as well as a new benchmark for evaluating the judges. |
Lianghui Zhu; Xinggang Wang; Xinlong Wang; |
| 28 | Does Refusal Training in LLMs Generalize to The Past Tense? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., *How to make a Molotov cocktail?* to *How did people make a Molotov cocktail?*) is often sufficient to jailbreak many state-of-the-art LLMs. |
Maksym Andriushchenko; Nicolas Flammarion; |
| 29 | Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this way, we achieve 100\% attack success rate—according to GPT-4 as a judge—on Vicuna-13B, Mistral-7B, Phi-3-Mini, Nemotron-4-340B, Llama-2-Chat-7B/13B/70B, Llama-3-Instruct-8B, Gemma-7B, GPT-3.5, GPT-4o, and R2D2 from HarmBench that was adversarially trained against the GCG attack. |
Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; |
| 30 | Can In-context Learning Really Generalize to Out-of-distribution Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the mechanism of in-context learning (ICL) on out-of-distribution (OOD) tasks that were not encountered during training. |
Qixun Wang; Yifei Wang; Xianghua Ying; Yisen Wang; |
| 31 | Magpie: Alignment Data Synthesis from Scratch By Prompting Aligned LLMs with Nothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a self-synthesis method for generating large-scale alignment data named Magpie. |
Zhangchen Xu; Fengqing Jiang; Luyao Niu; Yuntian Deng; Radha Poovendran; Yejin Choi; Bill Yuchen Lin; |
| 32 | Latent Action Pretraining from Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a method to learn from internet-scale videos that do not have robot action labels. |
Seonghyeon Ye; Joel Jang; Byeongguk Jeon; Se June Joo; Jianwei Yang; Baolin Peng; Ajay Mandlekar; Reuben Tan; Yu-Wei Chao; Bill Yuchen Lin; Lars Liden; Kimin Lee; Jianfeng Gao; Luke Zettlemoyer; Dieter Fox; Minjoon Seo; |
| 33 | MMTEB: Massive Multilingual Text Embedding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To circumvent this limitation and to provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) — a large-scale community-driven initiative expanding MTEB to over 500 quality-controlled evaluation tasks across 1,000+ languages.For instance, we introduce a new zero-shot English benchmark that maintains a similar ordering at a fraction of the cost. |
Kenneth Enevoldsen; Isaac Chung; Imene Kerboua; Márton Kardos; Ashwin Mathur; David Stap; Jay Gala; Wissam Siblini; Dominik Krzemiński; Genta Indra Winata; Saba Sturua; Saiteja Utpala; Mathieu Ciancone; Marion Schaeffer; Diganta Misra; Shreeya Dhakal; Jonathan Rystrøm; Roman Solomatin; Ömer Veysel Çağatan; Akash Kundu; Martin Bernstorff; Shitao Xiao; Akshita Sukhlecha; Bhavish Pahwa; Rafał Poświata; Kranthi Kiran GV; Shawon Ashraf; Daniel Auras; Björn Plüster; Jan Philipp Harries; Loïc Magne; Isabelle Mohr; Dawei Zhu; Hippolyte Gisserot-Boukhlef; Tom Aarsen; Jan Kostkan; Konrad Wojtasik; Taemin Lee; Marek Suppa; Crystina Zhang; Roberta Rocca; Mohammed Hamdy; Andrianos Michail; John Yang; Manuel Faysse; Aleksei Vatolin; Nandan Thakur; Manan Dey; Dipam Vasani; Pranjal A Chitale; Simone Tedeschi; Nguyen Tai; Artem Snegirev; Mariya Hendriksen; Michael Günther; Mengzhou Xia; Weijia Shi; Xing Han Lù; Jordan Clive; Gayatri K; Maksimova Anna; Silvan Wehrli; Maria Tikhonova; Henil Shalin Panchal; Aleksandr Abramov; Malte Ostendorff; Zheng Liu; Simon Clematide; Lester James Validad Miranda; Alena Fenogenova; Guangyu Song; Ruqiya Bin Safi; Wen-Ding Li; Alessia Borghini; Federico Cassano; Lasse Hansen; Sara Hooker; Chenghao Xiao; Vaibhav Adlakha; Orion Weller; Siva Reddy; Niklas Muennighoff; |
| 34 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Scaling up autoregressive models in vision has not proven as beneficial as in large language models. In this work, we investigate this scaling problem in the context of text-to-image generation, focusing on two critical factors: whether models use discrete or continuous tokens, and whether tokens are generated in a random or fixed raster order using BERT- or GPT-like transformer architectures. |
Lijie Fan; Tianhong Li; Siyang Qin; Yuanzhen Li; Chen Sun; Michael Rubinstein; Deqing Sun; Kaiming He; Yonglong Tian; |
| 35 | Automated Design of Agentic Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given that programming languages are Turing Complete, this approach theoretically enables the learning of any possible agentic system: including novel prompts, tool use, workflows, and combinations thereof. We present a simple yet effective algorithm named Meta Agent Search to demonstrate this idea, where a meta agent iteratively programs interesting new agents based on an ever-growing archive of previous discoveries. |
Shengran Hu; Cong Lu; Jeff Clune; |
| 36 | Descent with Misaligned Gradients and Applications to Hidden Convexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of minimizing a convex objective given access to an oracle that outputs misaligned stochastic gradients, where the expected value of the output is guaranteed to be correlated with, but not necessarily equal to the true gradient of the objective. |
Aditya Bhaskara; Ashok Cutkosky; Ravi Kumar; Manish Purohit; |
| 37 | JudgeBench: A Benchmark for Evaluating LLM-Based Judges Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing benchmarks primarily focus on a judge’s alignment with human preferences, but often fail to account for more challenging tasks where crowdsourced human preference is a poor indicator of factual and logical correctness. To address this, we propose a novel evaluation framework to objectively evaluate LLM-based judges. |
Sijun Tan; Siyuan Zhuang; Kyle Montgomery; William Yuan Tang; Alejandro Cuadron; Chenguang Wang; Raluca Popa; Ion Stoica; |
| 38 | SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limited coverage motivates our inquiry into how existing systems might perform on unrepresented software engineering domains (e.g., front-end, game development, DevOps), which use different programming languages and paradigms. Therefore, we propose SWE-bench Multimodal (SWE-bench M), to evaluate systems on their ability to fix bugs in visual, user-facing JavaScript software. |
John Yang; Carlos E Jimenez; Alex L Zhang; Kilian Lieret; Joyce Yang; Xindi Wu; Ori Press; Niklas Muennighoff; Gabriel Synnaeve; Karthik R Narasimhan; Diyi Yang; Sida Wang; Ofir Press; |
| 39 | MAVIS: Mathematical Visual Instruction Tuning with An Automatic Data Engine Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This draws forth an urgent demand for an effective training paradigm and a large-scale, comprehensive dataset with detailed CoT rationales, which is challenging to collect and costly to annotate manually. To tackle this issue, we propose MAVIS, a MAthematical VISual instruction tuning pipeline for MLLMs, featuring an automatic data engine to efficiently create mathematical visual datasets. |
Renrui Zhang; Xinyu Wei; Dongzhi Jiang; Ziyu Guo; Yichi Zhang; Chengzhuo Tong; Jiaming Liu; Aojun Zhou; Shanghang Zhang; Peng Gao; Hongsheng Li; |
| 40 | Simple Guidance Mechanisms for Discrete Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, controllable diffusion on discrete data faces challenges given that continuous guidance methods do not directly apply to discrete diffusion. Here, we provide a straightforward derivation of classifier-free and classifier-based guidance for discrete diffusion, as well as a new class of diffusion models that leverage uniform noise and that are more guidable because they can continuously edit their outputs. |
Yair Schiff; Subham Sekhar Sahoo; Hao Phung; Guanghan Wang; Sam Boshar; Hugo Dalla-torre; Bernardo P de Almeida; Alexander M Rush; Thomas PIERROT; Volodymyr Kuleshov; |
| 41 | Depth Pro: Sharp Monocular Metric Depth in Less Than A Second Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a foundation model for zero-shot metric monocular depth estimation. |
Alexey Bochkovskiy; Amaël Delaunoy; Hugo Germain; Marcel Santos; Yichao Zhou; Stephan Richter; Vladlen Koltun; |
| 42 | Improving Pretraining Data Using Perplexity Correlations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, progress in understanding pretraining data has been slow due to the costly pretraining runs required for data selection experiments. We present a framework that avoids these costs and selects high-quality pretraining data without any LLM training of our own. |
Tristan Thrush; Christopher Potts; Tatsunori Hashimoto; |
| 43 | Adaptive Length Image Tokenization Via Recurrent Allocation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This contrasts with human intelligence —and even large language models—which allocate varying representational capacities based on entropy, context and familiarity. Inspired by this, we propose an approach to learn variable-length token representations for 2D images. |
Shivam Duggal; Phillip Isola; Antonio Torralba; William T. Freeman; |
| 44 | Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To solve the problem, we propose Oryx, a unified multimodal architecture for the spatial-temporal understanding of images, videos, and multi-view 3D scenes. |
Zuyan Liu; Yuhao Dong; Ziwei Liu; Winston Hu; Jiwen Lu; Yongming Rao; |
| 45 | AutoBencher: Towards Declarative Benchmark Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AutoBencher, a declarative framework for automatic benchmark construction, and use it to scalably discover novel insights and vulnerabilities of existing language models. |
Xiang Lisa Li; Farzaan Kaiyom; Evan Zheran Liu; Yifan Mai; Percy Liang; Tatsunori Hashimoto; |
| 46 | OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the goal of creating a high-quality finetuning (SFT) dataset for math reasoning, we conduct careful ablation experiments on data synthesis using the recently released Llama3.1 family of models.Based on these insights, we create the OpenMathInstruct-2 dataset which consists of 14M question-solution pairs (≈ 600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset.Finally, to accelerate the open-source efforts, we release the code, the finetuned models, and the OpenMathInstruct-2 dataset under a commercially permissive license. |
Shubham Toshniwal; Wei Du; Ivan Moshkov; Branislav Kisacanin; Alexan Ayrapetyan; Igor Gitman; |
| 47 | Synthetic Continued Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This poses a challenge when adapting a pretrained model to a small corpus of domain-specific documents, where each fact may appear rarely or only once. We propose to bridge this gap with synthetic continued pretraining: using the small domain-specific corpus to synthesize a large corpus more amenable to learning, and then performing continued pretraining on the synthesized corpus. |
Zitong Yang; Neil Band; Shuangping Li; Emmanuel Candes; Tatsunori Hashimoto; |
| 48 | HELMET: How to Evaluate Long-context Models Effectively and Thoroughly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce HELMET (How to Evaluate Long-context Models Effectively and Thoroughly), a comprehensive benchmark encompassing seven diverse, application-centric categories. |
Howard Yen; Tianyu Gao; Minmin Hou; Ke Ding; Daniel Fleischer; Peter Izsak; Moshe Wasserblat; Danqi Chen; |
| 49 | KBLaM: Knowledge Base Augmented Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Knowledge Base augmented Language Model (KBLAM), a new method for augmenting Large Language Models (LLMs) with external knowledge. |
Xi Wang; Taketomo Isazawa; Liana Mikaelyan; James Hensman; |
| 50 | Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these studies often rely on real-world data that LLMs may have encountered during pre-training or employ anonymization techniques that can inadvertently introduce factual inconsistencies. In this work, we address these limitations by introducing novel synthetic datasets specifically designed to assess LLM temporal reasoning abilities in various scenarios. |
Bahare Fatemi; Mehran Kazemi; Anton Tsitsulin; Karishma Malkan; Jinyeong Yim; John Palowitch; Sungyong Seo; Jonathan Halcrow; Bryan Perozzi; |
| 51 | UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce UniCon, a novel architecture designed to enhance control and efficiency in training adapters for large-scale diffusion models. |
Fanghua Yu; Jinjin Gu; Jinfan Hu; Zheyuan Li; Chao Dong; |
| 52 | LLMs Know More Than They Show: On The Intrinsic Representation of LLM Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. |
Hadas Orgad; Michael Toker; Zorik Gekhman; Roi Reichart; Idan Szpektor; Hadas Kotek; Yonatan Belinkov; |
| 53 | LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which collects new problems over time from contests across three competition platforms, Leetcode, Atcoder, and Codeforces. |
Naman Jain; King Han; Alex Gu; Wen-Ding Li; Fanjia Yan; Tianjun Zhang; Sida Wang; Armando Solar-Lezama; Koushik Sen; Ion Stoica; |
| 54 | NV-Embed: Improved Techniques for Training LLMs As Generalist Embedding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the NV-Embed model, incorporating architectural designs, training procedures, and curated datasets to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility. |
Chankyu Lee; Rajarshi Roy; Mengyao Xu; Jonathan Raiman; Mohammad Shoeybi; Bryan Catanzaro; Wei Ping; |
| 55 | Locality Alignment Improves Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that this is due to VLMs adopting pre-trained vision backbones, specifically vision transformers (ViTs) trained with image-level supervision and minimal inductive biases. Such models may fail to encode the class contents at each position in the image, and our goal is to resolve this with a vision backbone that effectively captures both local and global image semantics. |
Ian Connick Covert; Tony Sun; James Zou; Tatsunori Hashimoto; |
| 56 | CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos that align seamlessly with text prompts, with a frame rate of 16 fps and resolution of 768 x 1360 pixels. |
Zhuoyi Yang; Jiayan Teng; Wendi Zheng; Ming Ding; Shiyu Huang; Jiazheng Xu; Yuanming Yang; Wenyi Hong; Xiaohan Zhang; Guanyu Feng; Da Yin; Yuxuan.Zhang; Weihan Wang; Yean Cheng; Bin Xu; Xiaotao Gu; Yuxiao Dong; Jie Tang; |
| 57 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. |
Jinheng Xie; Weijia Mao; Zechen Bai; David Junhao Zhang; Weihao Wang; Kevin Qinghong Lin; Yuchao Gu; Zhijie Chen; Zhenheng Yang; Mike Zheng Shou; |
| 58 | EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Auto-regressive Auto-encoder (ArAE) model capable of generating high-quality 3D meshes with up to 4,000 faces at a spatial resolution of $512^3$. |
Jiaxiang Tang; Zhaoshuo Li; Zekun Hao; Xian Liu; Gang Zeng; Ming-Yu Liu; Qinsheng Zhang; |
| 59 | GSM-Symbolic: Understanding The Limitations of Mathematical Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. |
Seyed Iman Mirzadeh; Keivan Alizadeh; Hooman Shahrokhi; Oncel Tuzel; Samy Bengio; Mehrdad Farajtabar; |
| 60 | WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. |
Bill Yuchen Lin; Yuntian Deng; Khyathi Chandu; Abhilasha Ravichander; Valentina Pyatkin; Nouha Dziri; Ronan Le Bras; Yejin Choi; |
| 61 | Bidirectional Decoding: Improving Action Chunking Via Guided Test-Time Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity to unexpected states. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation. |
Yuejiang Liu; Jubayer Ibn Hamid; Annie Xie; Yoonho Lee; Max Du; Chelsea Finn; |
| 62 | Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. |
Marianne Arriola; Aaron Gokaslan; Justin T Chiu; Zhihan Yang; Zhixuan Qi; Jiaqi Han; Subham Sekhar Sahoo; Volodymyr Kuleshov; |
| 63 | Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL initializations. |
Zhiyuan Zhou; Andy Peng; Qiyang Li; Sergey Levine; Aviral Kumar; |
| 64 | Gated Delta Networks: Improving Mamba2 with Delta Rule Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that these mechanisms are complementary—gating enables rapid memory erasure while the delta rule facilitates targeted updates. Building on this insight, we introduce the gated delta rule and develop a parallel training algorithm optimized for modern hardware. |
Songlin Yang; Jan Kautz; Ali Hatamizadeh; |
| 65 | Perplexed By Perplexity: Perplexity-Based Data Pruning With Small Reference Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. |
Zachary Ankner; Cody Blakeney; Kartik Sreenivasan; Max Marion; Matthew L Leavitt; Mansheej Paul; |
| 66 | Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an approach, called Digi-Q, to train VLM-based action-value Q-functions which are then used to extract the agent policy. |
Hao Bai; Yifei Zhou; Li Erran Li; Sergey Levine; Aviral Kumar; |
| 67 | Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. |
Javier Ferrando; Oscar Balcells Obeso; Senthooran Rajamanoharan; Neel Nanda; |
| 68 | How to Evaluate Reward Models for RLHF Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new benchmark for reward models that quantifies their ability to produce strong language models through RLHF (Reinforcement Learning from Human Feedback). |
Evan Frick; Tianle Li; Connor Chen; Wei-Lin Chiang; Anastasios Nikolas Angelopoulos; Jiantao Jiao; Banghua Zhu; Joseph E. Gonzalez; Ion Stoica; |
| 69 | Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Parameters for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Enabling LLMs to improve their outputs by using more test-time compute is a critical step towards building self-improving agents that can operate on open-ended natural language. In this paper, we scale up inference-time computation in LLMs, with a focus on answering: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? |
Charlie Victor Snell; Jaehoon Lee; Kelvin Xu; Aviral Kumar; |
| 70 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that, to be effective, the process reward for a step should measure *progress*: a change in the likelihood of producing a correct response in the future, before and after taking the step, as measured under a *prover* policy distinct from the base policy. |
Amrith Setlur; Chirag Nagpal; Adam Fisch; Xinyang Geng; Jacob Eisenstein; Rishabh Agarwal; Alekh Agarwal; Jonathan Berant; Aviral Kumar; |
| 71 | To Code or Not To Code? Exploring Impact of Code in Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we systematically investigate the impact of code data on general performance. |
Viraat Aryabumi; Yixuan Su; Raymond Ma; Adrien Morisot; Ivan Zhang; Acyr Locatelli; Marzieh Fadaee; Ahmet Üstün; Sara Hooker; |
| 72 | MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. |
Jun Shern Chan; Neil Chowdhury; Oliver Jaffe; James Aung; Dane Sherburn; Evan Mays; Giulio Starace; Kevin Liu; Leon Maksin; Tejal Patwardhan; Aleksander Madry; Lilian Weng; |
| 73 | Scaling Diffusion Language Models Via Adaptation from Autoregressive Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. |
Shansan Gong; Shivam Agarwal; Yizhe Zhang; Jiacheng Ye; Lin Zheng; Mukai Li; Chenxin An; Peilin Zhao; Wei Bi; Jiawei Han; Hao Peng; Lingpeng Kong; |
| 74 | EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, web tasks, such as booking flights, usually involve users’ personally identifiable information (PII), which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites—a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. |
Zeyi Liao; Lingbo Mo; Chejian Xu; Mintong Kang; Jiawei Zhang; Chaowei Xiao; Yuan Tian; Bo Li; Huan Sun; |
| 75 | Mixture of Parrots: Experts Improve Memorization More Than Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that as we increase the number of experts (while fixing the number of active parameters), the memorization performance consistently increases while the reasoning capabilities saturate. |
Samy Jelassi; Clara Mohri; David Brandfonbrener; Alex Gu; Nikhil Vyas; Nikhil Anand; David Alvarez-Melis; Yuanzhi Li; Sham M. Kakade; eran malach; |
| 76 | 3DitScene: Editing Any Scene Via Language-guided Disentangled Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose 3DitScene, a novel and unified scene editing framework leveraging language-guided disentangled Gaussian Splatting that enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects. |
Qihang Zhang; Yinghao Xu; Chaoyang Wang; Hsin-Ying Lee; Gordon Wetzstein; Bolei Zhou; Ceyuan Yang; |
| 77 | CameraCtrl: Enabling Camera Control for Video Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce \method, enabling accurate camera pose control for video diffusion models. |
Hao He; Yinghao Xu; Yuwei Guo; Gordon Wetzstein; Bo Dai; Hongsheng Li; Ceyuan Yang; |
| 78 | Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to self-correct” their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating “error-correction” data directly into the pretraining stage. |
Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu; |
| 79 | Physics of Language Models: Part 2.1, Grade-School Math and The Hidden Reasoning Process Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. |
Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu; |
| 80 | Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More broadly, we present 12 results on how (1) training duration, (2) model architecture, (3) quantization, (4) sparsity constraints such as MoE, and (5) data signal-to-noise ratio affect a model’s knowledge storage capacity. |
Zeyuan Allen-Zhu; Yuanzhi Li; |
| 81 | Physics of Language Models: Part 3.2, Knowledge Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, their performance in inverse knowledge search is virtually 0\%, regardless of the prompts. Our primary contribution is a \emph{controlled, synthetic experiment} that confirms these weaknesses are \emph{inherent} to language models: they cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored in the models, despite adequate training and sufficient model size. |
Zeyuan Allen-Zhu; Yuanzhi Li; |
| 82 | Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce methods for discovering and applying **sparse feature circuits**. |
Samuel Marks; Can Rager; Eric J Michaud; Yonatan Belinkov; David Bau; Aaron Mueller; |
| 83 | EqNIO: Subequivariant Neural Inertial Odometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These priors learn to produce denoised displacement measurements but need to ignore data variations due to specific IMU mount orientation and motion directions, hindering generalization. This work introduces EqNIO, which addresses this challenge with _canonical displacement priors_, i.e., priors that are invariant to the orientation of the gravity-aligned frame in which the IMU data is expressed. |
Royina Karegoudra Jayanth; Yinshuang Xu; Ziyun Wang; Evangelos Chatzipantazis; Kostas Daniilidis; Daniel Gehrig; |
| 84 | Reasoning with Latent Thoughts: On The Power of Looped Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make a stronger claim — many reasoning problems require a large depth but not necessarily many parameters. |
Nikunj Saunshi; Nishanth Dikkala; Zhiyuan Li; Sanjiv Kumar; Sashank J. Reddi; |
| 85 | Improving Instruction-Following in Language Models Through Activation Steering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The ability to follow instructions is crucial for numerous real-world applications of language models. In pursuit of deeper insights and more powerful capabilities, we derive instruction-specific vector representations from language models and use them to steer models accordingly. |
Alessandro Stolfo; Vidhisha Balachandran; Safoora Yousefi; Eric Horvitz; Besmira Nushi; |
| 86 | Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key challenge in TVR is the information asymmetry between video and text: videos are inherently richer in information, while their textual descriptions often capture only fragments of this complexity. This paper introduces a novel, data-centric framework to bridge this gap by enriching textual representations to better match the richness of video content. |
Zechen Bai; Tianjun Xiao; Tong He; Pichao WANG; Zheng Zhang; Thomas Brox; Mike Zheng Shou; |
| 87 | Advancing LLM Reasoning Generalists with Preference Trees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce EURUS, a suite of large language models (LLMs) optimized for reasoning. |
Lifan Yuan; Ganqu Cui; Hanbin Wang; Ning Ding; Xingyao Wang; Boji Shan; Zeyuan Liu; Jia Deng; Huimin Chen; Ruobing Xie; Yankai Lin; Zhenghao Liu; Bowen Zhou; Hao Peng; Zhiyuan Liu; Maosong Sun; |
| 88 | Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces PANGEA, a multilingual multimodal LLM trained on PANGEAINS, a diverse 6M instruction dataset spanning 39 languages. |
Xiang Yue; Yueqi Song; Akari Asai; Seungone Kim; Jean de Dieu Nyandwi; Simran Khanuja; Anjali Kantharuban; Lintang Sutawika; Sathyanarayanan Ramamoorthy; Graham Neubig; |
| 89 | BOND: Aligning LLMs with Best-of-N Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time. |
Pier Giuseppe Sessa; Robert Dadashi-Tazehozi; Leonard Hussenot; Johan Ferret; Nino Vieillard; Alexandre Rame; Bobak Shahriari; Sarah Perrin; Abram L. Friesen; Geoffrey Cideron; Sertan Girgin; Piotr Stanczyk; Andrea Michi; Danila Sinopalnikov; Sabela Ramos Garea; Amélie Héliou; Aliaksei Severyn; Matthew Hoffman; Nikola Momchev; Olivier Bachem; |
| 90 | A Decade’s Battle on Dataset Bias: Are We There Yet? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We revisit the “dataset classification” experiment suggested by Torralba & Efros (2011) a decade ago, in the new era with large-scale, diverse, and hopefully less biased datasets as well as more capable neural network architectures. |
Zhuang Liu; Kaiming He; |
| 91 | Diffusion Policy Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). |
Allen Z. Ren; Justin Lidard; Lars Lien Ankile; Anthony Simeonov; Pulkit Agrawal; Anirudha Majumdar; Benjamin Burchfiel; Hongkai Dai; Max Simchowitz; |
| 92 | MPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the versatile multi-modal large language model, mPLUG-Owl3, which enhances the capability for long image-sequence understanding in scenarios that incorporate retrieved image-text knowledge, multimodal in-context examples, and lengthy videos. |
Jiabo Ye; Haiyang Xu; Haowei Liu; Anwen Hu; Ming Yan; Qi Qian; Ji Zhang; Fei Huang; Jingren Zhou; |
| 93 | SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate design choices for creating a fast, accurate automated safety evaluator. |
Tinghao Xie; Xiangyu Qi; Yi Zeng; Yangsibo Huang; Udari Madhushani Sehwag; Kaixuan Huang; Luxi He; Boyi Wei; Dacheng Li; Ying Sheng; Ruoxi Jia; Bo Li; Kai Li; Danqi Chen; Peter Henderson; Prateek Mittal; |
| 94 | Agent S: An Open Agentic Framework That Uses Computers Like A Human Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Agent S, an open agentic framework that enables autonomous interaction with computers through Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks. |
Saaket Agashe; Jiuzhou Han; Shuyu Gan; Jiachen Yang; Ang Li; Xin Eric Wang; |
| 95 | Navigating The Digital World As Humans Do: Universal Visual Grounding for GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we advocate a human-like embodiment for GUI agents that perceive the environment entirely visually and directly perform pixel-level operations on the GUI.We collect the largest dataset for GUI visual grounding so far, containing 10M GUI elements and their referring expressions over 1.3M screenshots, and use it to train UGround, a strong universal visual grounding model for GUI agents. |
Boyu Gou; Ruohan Wang; Boyuan Zheng; Yanan Xie; Cheng Chang; Yiheng Shu; Huan Sun; Yu Su; |
| 96 | Not All Language Model Features Are One-Dimensionally Linear Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We begin by developing a rigorous definition of irreducible multi-dimensional features based on whether they can be decomposed into either independent or non-co-occurring lower-dimensional features. Motivated by these definitions, we design a scalable method that uses sparse autoencoders to automatically find multi-dimensional features in GPT-2 and Mistral 7B. |
Joshua Engels; Eric J Michaud; Isaac Liao; Wes Gurnee; Max Tegmark; |
| 97 | Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs’ mathematical reasoning at the Olympiad level. |
Bofei Gao; Feifan Song; Zhe Yang; Zefan Cai; Yibo Miao; Qingxiu Dong; Lei Li; Chenghao Ma; Liang Chen; Runxin Xu; Zhengyang Tang; Benyou Wang; Daoguang Zan; Shanghaoran Quan; Ge Zhang; Lei Sha; Yichang Zhang; Xuancheng Ren; Tianyu Liu; Baobao Chang; |
| 98 | Deconstructing Denoising Diffusion Models for Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. |
Xinlei Chen; Zhuang Liu; Saining Xie; Kaiming He; |
| 99 | AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite diverse strategies (e.g., cipher, low-resource language, persuasions, and so on) that have been proposed and shown success, these strategies are still manually designed, limiting their scope and effectiveness as a red-teaming tool. In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. |
Xiaogeng Liu; Peiran Li; G. Edward Suh; Yevgeniy Vorobeychik; Zhuoqing Mao; Somesh Jha; Patrick McDaniel; Huan Sun; Bo Li; Chaowei Xiao; |
| 100 | HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. |
Mude Hui; Siwei Yang; Bingchen Zhao; Yichun Shi; Heng Wang; Peng Wang; Cihang Xie; Yuyin Zhou; |
| 101 | SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the impact of different memory granularities and present two key findings: (1) Both turn-level and session-level memory units are suboptimal, affecting not only the quality of final responses, but also the accuracy of the retrieval process. |
Zhuoshi Pan; Qianhui Wu; Huiqiang Jiang; Xufang Luo; Hao Cheng; Dongsheng Li; Yuqing Yang; Chin-Yew Lin; H. Vicky Zhao; Lili Qiu; Jianfeng Gao; |
| 102 | Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on a popular class of vision-language models (VLMs) that generate text outputs conditioned on visual and textual inputs. |
Rylan Schaeffer; Dan Valentine; Luke Bailey; James Chua; Cristobal Eyzaguirre; Zane Durante; Joe Benton; Brando Miranda; Henry Sleight; Tony Tong Wang; John Hughes; Rajashree Agrawal; Mrinank Sharma; Scott Emmons; Sanmi Koyejo; Ethan Perez; |
| 103 | Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing By Imposing Consistent Light Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without appropriate constraints, directly training the latest large image models with complex, varied, or in-the-wild data is likely to produce a structure-guided random image generator, rather than achieving the intended goal of precise illumination manipulation. We propose Imposing Consistent Light (IC-Light) transport during training, rooted in the physical principle that the linear blending of an object’s appearances under different illumination conditions is consistent with its appearance under mixed illumination. |
Lvmin Zhang; Anyi Rao; Maneesh Agrawala; |
| 104 | On The Self-verification Limitations of Large Language Models on Reasoning and Planning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we set out to systematically investigate the effectiveness of iterative prompting in the context of reasoning and planning. |
Kaya Stechly; Karthik Valmeekam; Subbarao Kambhampati; |
| 105 | Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study *behavioral self-awareness*, which we define as an LLM’s capability to articulate its behavioral policies without relying on in-context examples. |
Jan Betley; Xuchan Bao; Martín Soto; Anna Sztyber-Betley; James Chua; Owain Evans; |
| 106 | RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the Robotics Diffusion Transformer (RDT), a pioneering diffusion foundation model for bimanual manipulation. |
Songming Liu; Lingxuan Wu; Bangguo Li; Hengkai Tan; Huayu Chen; Zhengyi Wang; Ke Xu; Hang Su; Jun Zhu; |
| 107 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce NoPoSplat, a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from unposed sparse multi-view images. |
Botao Ye; Sifei Liu; Haofei Xu; Xueting Li; Marc Pollefeys; Ming-Hsuan Yang; Songyou Peng; |
| 108 | Safety Alignment Should Be Made More Than Just A Few Tokens Deep Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We unifiedly refer to this issue as shallow safety alignment. In this paper, we present case studies to explain why shallow safety alignment can exist and show how this issue universally contributes to multiple recently discovered vulnerabilities in LLMs, including the susceptibility to adversarial suffix attacks, prefilling attacks, decoding parameter attacks, and fine-tuning attacks. |
Xiangyu Qi; Ashwinee Panda; Kaifeng Lyu; Xiao Ma; Subhrajit Roy; Ahmad Beirami; Prateek Mittal; Peter Henderson; |
| 109 | On Evaluating The Durability of Safeguards for Open-Weight LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Several recent studies have proposed methods to produce durable LLM safeguards for open-weight LLMs that can withstand adversarial modifications of the model’s weights via fine-tuning. This holds the promise of raising adversaries’ costs even under strong threat models where adversaries can directly fine-tune parameters. However, we caution against over-reliance on such methods in their current state. Through several case studies, we demonstrate that even the evaluation of these defenses is exceedingly difficult and can easily mislead audiences into thinking that safeguards are more durable than they really are. |
Xiangyu Qi; Boyi Wei; Nicholas Carlini; Yangsibo Huang; Tinghao Xie; Luxi He; Matthew Jagielski; Milad Nasr; Prateek Mittal; Peter Henderson; |
| 110 | Preference Optimization for Reasoning with Pseudo Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a novel approach to generate pseudo feedback for reasoning tasks by framing the labeling of solutions to reason problems as an evaluation against associated \emph{test cases}. |
Fangkai Jiao; Geyang Guo; Xingxing Zhang; Nancy F. Chen; Shafiq Joty; Furu Wei; |
| 111 | SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current large language model (LLM)-based software agents often follow linear, sequential processes that prevent backtracking and exploration of alternative solutions, limiting their ability to rethink their strategies when initial approaches prove ineffective. To address these challenges, we propose SWE-Search, a multi-agent framework that integrates Monte Carlo Tree Search (MCTS) with a self-improvement mechanism to enhance software agents’ performance on repository-level software tasks. |
Antonis Antoniades; Albert Örwall; Kexun Zhang; Yuxi Xie; Anirudh Goyal; William Yang Wang; |
| 112 | Selective Attention Improves Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Selective Attention, a simple parameter-free change to the standard attention mechanism which reduces attention to unneeded elements. |
Yaniv Leviathan; Matan Kalman; Yossi Matias; |
| 113 | Layerwise Recurrent Router for Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Being a crucial part of MoE, current routers in different layers independently assign tokens without leveraging historical routing information, potentially leading to suboptimal token-expert combinations and the parameter inefficiency problem. To alleviate this issue, we introduce the Layerwise Recurrent Router for Mixture-of-Experts (RMoE). |
Zihan Qiu; Zeyu Huang; Shuang Cheng; Yizhi Zhou; Zili Wang; Ivan Titov; Jie Fu; |
| 114 | Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the “Diffuse Risk Management” problem, aiming to balance the average-case safety and usefulness in the deployment of untrusted models over a large sequence of tasks. |
Jiaxin Wen; Vivek Hebbar; Caleb Larson; Aryan Bhatt; Ansh Radhakrishnan; Mrinank Sharma; Henry Sleight; Shi Feng; He He; Ethan Perez; Buck Shlegeris; Akbir Khan; |
| 115 | MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs), enabling a broader search scenario, termed universal multimodal retrieval, where multiple modalities and diverse retrieval tasks are accommodated. |
Sheng-Chieh Lin; Chankyu Lee; Mohammad Shoeybi; Jimmy Lin; Bryan Catanzaro; Wei Ping; |
| 116 | Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Add-it, a training-free approach that extends diffusion models’ attention mechanisms to incorporate information from three key sources: the scene image, the text prompt, and the generated image itself. |
Yoad Tewel; Rinon Gal; Dvir Samuel; Yuval Atzmon; Lior Wolf; Gal Chechik; |
| 117 | SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our analysis reveals that LLMs exhibit great potential for self-acceleration through layer sparsity and the task-specific nature of this sparsity. Building on these insights, we introduce SWIFT, an on-the-fly self-speculative decoding algorithm that adaptively selects intermediate layers of LLMs to skip during inference. |
Heming Xia; Yongqi Li; Jun Zhang; Cunxiao Du; Wenjie Li; |
| 118 | OGBench: Benchmarking Offline Goal-Conditioned RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose OGBench, a new, high-quality benchmark for algorithms research in offline goal-conditioned RL. |
Seohong Park; Kevin Frans; Benjamin Eysenbach; Sergey Levine; |
| 119 | OpenHands: An Open Platform for AI Software Developers As Generalist Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce OpenHands, a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to a human developer: by writing code, interacting with a command line, and browsing the web. |
Xingyao Wang; Boxuan Li; Yufan Song; Frank F. Xu; Xiangru Tang; Mingchen Zhuge; Jiayi Pan; Yueqi Song; Bowen Li; Jaskirat Singh; Hoang H. Tran; Fuqiang Li; Ren Ma; Mingzhang Zheng; Bill Qian; Yanjun Shao; Niklas Muennighoff; Yizhe Zhang; Binyuan Hui; Junyang Lin; Robert Brennan; Hao Peng; Heng Ji; Graham Neubig; |
| 120 | KAN: Kolmogorov–Arnold Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). |
Ziming Liu; Yixuan Wang; Sachin Vaidya; Fabian Ruehle; James Halverson; Marin Soljacic; Thomas Y. Hou; Max Tegmark; |
| 121 | Walk The Talk? Measuring The Faithfulness of Large Language Model Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new approach for measuring the faithfulness of LLM explanations. |
Katie Matton; Robert Ness; John Guttag; Emre Kiciman; |
| 122 | RB-Modulation: Training-Free Stylization Using Reference-Based Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. |
Litu Rout; Yujia Chen; Nataniel Ruiz; Abhishek Kumar; Constantine Caramanis; Sanjay Shakkottai; Wen-Sheng Chu; |
| 123 | Semantic Image Inversion and Editing Using Rectified Stochastic Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator, and prove that the resulting vector field is equivalent to a rectified stochastic differential equation. |
Litu Rout; Yujia Chen; Nataniel Ruiz; Constantine Caramanis; Sanjay Shakkottai; Wen-Sheng Chu; |
| 124 | Scaling Speech-Text Pre-training with Synthetic Interleaved Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach to scaling speech-text pre-training by leveraging large-scale synthetic interleaved data derived from text corpora, eliminating the need for parallel speech-text datasets. |
Aohan Zeng; Zhengxiao Du; Mingdao Liu; Lei Zhang; shengmin jiang; Yuxiao Dong; Jie Tang; |
| 125 | Smaller, Weaker, Yet Better: Training LLM Reasoners Via Compute-Optimal Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs. In this work, we revisit whether this strategy is compute-optimal under a fixed inference budget (e.g., FLOPs). |
Hritik Bansal; Arian Hosseini; Rishabh Agarwal; Vinh Q. Tran; Mehran Kazemi; |
| 126 | CREMA: Generalizable and Efficient Video-Language Reasoning Via Multimodal Modular Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite impressive advancements in recent multimodal reasoning approaches, they are still limited in flexibility and efficiency, as these models typically process only a few fixed modality inputs and require updates to numerous parameters. This paper tackles these critical challenges and proposes CREMA, a generalizable, highly efficient, and modular modality-fusion framework that can incorporate many new modalities to enhance video reasoning. |
Shoubin Yu; Jaehong Yoon; Mohit Bansal; |
| 127 | VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the potential of building universal multimodal embeddings capable of handling a broad range of downstream tasks. |
Ziyan Jiang; Rui Meng; Xinyi Yang; Semih Yavuz; Yingbo Zhou; Wenhu Chen; |
| 128 | Data Mixing Laws: Optimizing Data Mixtures By Predicting Language Modeling Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While existing endeavors rely on heuristics or qualitative strategies to tune the proportions, we discover the quantitative predictability of model performance regarding the mixture proportions in function forms, which we refer to as the data mixing laws. |
Jiasheng Ye; Peiju Liu; Tianxiang Sun; Jun Zhan; Yunhua Zhou; Xipeng Qiu; |
| 129 | LiveBench: A Challenging, Contamination-Limited LLM Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new benchmark for LLMs designed to be resistant to both test set contamination and the pitfalls of LLM judging and human crowdsourcing. |
Colin White; Samuel Dooley; Manley Roberts; Arka Pal; Benjamin Feuer; Siddhartha Jain; Ravid Shwartz-Ziv; Neel Jain; Khalid Saifullah; Sreemanti Dey; Shubh-Agrawal; Sandeep Singh Sandha; Siddartha Venkat Naidu; Chinmay Hegde; Yann LeCun; Tom Goldstein; Willie Neiswanger; Micah Goldblum; |
| 130 | Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, our imprecise understanding of ground-truth features in realistic scenarios makes it difficult to measure the success of SAEs. To address this challenge, we propose to evaluate SAEs on specific tasks by comparing them to supervised feature dictionaries computed with knowledge of the concepts relevant to the task. |
Aleksandar Makelov; Georg Lange; Neel Nanda; |
| 131 | Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Param$\Delta$, an innovative approach that streamlines the post-training process by transferring knowledge and capability from an existing post-trained model to a newly upgraded base model without additional training. |
Sheng Cao; Mingrui Wu; Karthik Prasad; Yuandong Tian; Zechun Liu; |
| 132 | VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose to tame video transformers for 3D camera control using a ControlNet-like conditioning mechanism that incorporates spatiotemporal camera embeddings based on Plucker coordinates. |
Sherwin Bahmani; Ivan Skorokhodov; Aliaksandr Siarohin; Willi Menapace; Guocheng Qian; Michael Vasilkovsky; Hsin-Ying Lee; Chaoyang Wang; Jiaxu Zou; Andrea Tagliasacchi; David B. Lindell; Sergey Tulyakov; |
| 133 | Simplifying Deep Temporal Difference Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, the delayed updating of frozen network parameters in the target network harms the sample efficiency and, similarly, the large replay buffer introduces memory and implementation overheads. In this paper, we investigate whether it is possible to accelerate and simplify off-policy TD training while maintaining its stability. |
Matteo Gallici; Mattie Fellows; Benjamin Ellis; Bartomeu Pou; Ivan Masmitja; Jakob Nicolaus Foerster; Mario Martin; |
| 134 | DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike humans who can reliably apply solution steps to similar problems with minor modifications, we found that state-of-the-art VLMs like GPT-4o can consistently fail in these scenarios, revealing limitations in their mathematical reasoning capabilities. In this paper, we investigate the **mathematical reasoning robustness** in VLMs and evaluate how well these models perform under different variants of the same question, such as changes in visual numerical values or function graphs. |
Chengke Zou; Xingang Guo; Rui Yang; Junyu Zhang; Bin Hu; Huan Zhang; |
| 135 | ColPali: Efficient Document Retrieval with Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To benchmark current systems on visually rich document retrieval, we introduce the Visual Document Retrieval Benchmark $\textit{ViDoRe}$, composed of various page-level retrieval tasks spanning multiple domains, languages, and practical settings.We release models, data, code and benchmarks under open licenses at https://hf.co/vidore. |
Manuel Faysse; Hugues Sibille; Tony Wu; Bilel Omrani; Gautier Viaud; CELINE HUDELOT; Pierre Colombo; |
| 136 | Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework—CALM—which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. |
Jiayi Ye; Yanbo Wang; Yue Huang; Dongping Chen; Qihui Zhang; Nuno Moniz; Tian Gao; Werner Geyer; Chao Huang; Pin-Yu Chen; Nitesh V Chawla; Xiangliang Zhang; |
| 137 | MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MIA-Bench, a benchmark designed to assess MLLMs’ ability to strictly adhere to complex instructions.Additionally, we create extra training data and explore supervised fine-tuning and direct preference optimization to enhance the models’ ability to strictly follow instructions without compromising performance on other tasks. |
Yusu Qian; Hanrong Ye; Jean-Philippe Fauconnier; Peter Grasch; Yinfei Yang; Zhe Gan; |
| 138 | AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AHA, an open-source VLM designed to detect and reason about failures in robotic manipulation using natural language. |
Jiafei Duan; Wilbert Pumacay; Nishanth Kumar; Yi Ru Wang; Shulin Tian; Wentao Yuan; Ranjay Krishna; Dieter Fox; Ajay Mandlekar; Yijie Guo; |
| 139 | RNNs Are Not Transformers (Yet): The Key Bottleneck on In-Context Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim to understand whether RNNs can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. |
Kaiyue Wen; Xingyu Dang; Kaifeng Lyu; |
| 140 | JetFormer: An Autoregressive Generative Model of Raw Images and Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we further streamline joint generative modeling of images and text. |
Michael Tschannen; André Susano Pinto; Alexander Kolesnikov; |
| 141 | CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce CyberHost, a one-stage audio-driven talking body generation framework that addresses common synthesis degradations in half-body animation, including hand integrity, identity consistency, and natural motion. |
Gaojie Lin; Jianwen Jiang; Chao Liang; Tianyun Zhong; Jiaqi Yang; Zerong Zheng; Yanbo Zheng; |
| 142 | How Does Vision-Language Adaptation Impact The Safety of Vision Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, our findings demonstrate that the objectives of VL adaptation and safety tuning are divergent, which often results in their simultaneous application being suboptimal. To address this, we suggest the weight merging approach as an optimal solution effectively reducing safety degradation while maintaining helpfulness. |
Seongyun Lee; Geewook Kim; Jiyeon Kim; Hyunji Lee; Hoyeon Chang; Sue Hyun Park; Minjoon Seo; |
| 143 | Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we ask: how do successful models perform formatted MCQA? |
Sarah Wiegreffe; Oyvind Tafjord; Yonatan Belinkov; Hannaneh Hajishirzi; Ashish Sabharwal; |
| 144 | Understanding Factual Recall in Transformers Via Associative Memories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we show that shallow transformers can use a combination of associative memories to obtain such near optimal storage capacity. |
Eshaan Nichani; Jason D. Lee; Alberto Bietti; |
| 145 | B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the critical factors underlying the mechanism of these self-improving methods remain poorly understood, such as under what conditions self-improvement is effective, and what are the bottlenecks in the current iterations. In this work, we identify and propose methods to monitor two pivotal factors in this iterative process: (1) the model’s ability to explore and generate high-quality responses among multiple candidates (exploration); and (2) the reliability of external rewards in selecting the best responses from the generated outputs (exploitation). |
Weihao Zeng; Yuzhen Huang; Lulu Zhao; Yijun Wang; Zifei Shan; Junxian He; |
| 146 | MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel evaluation paradigm for Large Language Models (LLMs) that compels them to transition from a traditional question-answering role, akin to a student, to a solution-scoring role, akin to a teacher.To prove our point, we applied our paradigm to GSM8K dataset and developed the MR-GSM8K benchmark. |
Zhongshen Zeng; Pengguang Chen; Shu Liu; Haiyun Jiang; Jiaya Jia; |
| 147 | Fantastic Copyrighted Beasts and How (Not) to Generate Them Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, little research has systematically examined these problems: (1) Can users easily prompt models to generate copyrighted characters, even if it is unintentional?; (2) How effective are the existing mitigation strategies? To address these questions, we introduce a novel evaluation framework with metrics that assess both the generated image�s similarity to copyrighted characters and its consistency with user intent, grounded in a set of popular copyrighted characters from diverse studios and regions. |
Luxi He; Yangsibo Huang; Weijia Shi; Tinghao Xie; Haotian Liu; Yue Wang; Luke Zettlemoyer; Chiyuan Zhang; Danqi Chen; Peter Henderson; |
| 148 | Why Does The Effective Context Length of LLMs Fall Short? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent work reveals that the effective context lengths of open-source LLMs often fall short, typically not exceeding half of their training lengths. In this work, we attribute this limitation to the left-skewed frequency distribution of relative positions formed in LLMs pretraining and post-training stages, which impedes their ability to effectively gather distant information. |
Chenxin An; Jun Zhang; Ming Zhong; Lei Li; Shansan Gong; Yao Luo; Jingjing Xu; Lingpeng Kong; |
| 149 | Decision Tree Induction Through LLMs Via Semantically-Aware Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current tree induction methods often face limitations such as suboptimal solutions from greedy methods or prohibitive computational costs and limited applicability of exact optimization approaches. To address these challenges, we propose an evolutionary optimization method for decision tree induction based on genetic programming (GP). |
Tennison Liu; Nicolas Huynh; Mihaela van der Schaar; |
| 150 | MEGA-Bench: Scaling Multimodal Evaluation to Over 500 Real-World Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MEGA-Bench, an evaluation suite that scales multimodal evaluation to over 500 real-world tasks, to address the highly heterogeneous daily use cases of end users. |
Jiacheng Chen; Tianhao Liang; Sherman Siu; Zhengqing Wang; Kai Wang; Yubo Wang; Yuansheng Ni; Ziyan Jiang; Wang Zhu; Bohan Lyu; Dongfu Jiang; Xuan He; Yuan Liu; Hexiang Hu; Xiang Yue; Wenhu Chen; |
| 151 | VILA-U: A Unified Foundation Model Integrating Visual Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, VILA-U employs a single autoregressive next-token prediction framework for both tasks, eliminating the need for additional components like diffusion models. |
Yecheng Wu; Zhuoyang Zhang; Junyu Chen; Haotian Tang; Dacheng Li; Yunhao Fang; Ligeng Zhu; Enze Xie; Hongxu Yin; Li Yi; Song Han; Yao Lu; |
| 152 | Sparse Autoencoders Do Not Find Canonical Units of Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To train meta-SAEs we introduce BatchTopK SAEs, an improved variant of the popular TopK SAE method, that only enforces a fixed average sparsity. |
Patrick Leask; Bart Bussmann; Michael T Pearce; Joseph Isaac Bloom; Curt Tigges; Noura Al Moubayed; Lee Sharkey; Neel Nanda; |
| 153 | Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through extensive evaluations of eleven representative watermarking methods against prevalent editing techniques, we demonstrate that most methods fail to detect watermarks after such edits. To address this limitation, we propose VINE, a watermarking method that significantly enhances robustness against various image editing techniques while maintaining high image quality. |
Shilin Lu; Zihan Zhou; Jiayou Lu; Yuanzhi Zhu; Adams Wai-Kin Kong; |
| 154 | OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present OmniEdit, which is an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly. |
Cong Wei; Zheyang Xiong; Weiming Ren; Xeron Du; Ge Zhang; Wenhu Chen; |
| 155 | Turning Up The Heat: Min-p Sampling for Creative and Coherent LLM Outputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, popular sampling methods like top-p (nucleus sampling) often struggle to balance quality and diversity, especially at higher temperatures, leading to incoherent or repetitive outputs. To address this challenge, we propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model’s confidence by scaling according to the top token’s probability. |
Nguyen Nhat Minh; Andrew Baker; Clement Neo; Allen G Roush; Andreas Kirsch; Ravid Shwartz-Ziv; |
| 156 | Harnessing Webpage UIs for Text-Rich Visual Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Text-rich visual understanding—the ability to interpret both textual content and visual elements within a scene—is crucial for multimodal large language models (MLLMs) to effectively interact with structured environments. We propose leveraging webpage UIs as a naturally structured and diverse data source to enhance MLLMs’ capabilities in this area. |
Junpeng Liu; Tianyue Ou; Yifan Song; Yuxiao Qu; Wai Lam; Chenyan Xiong; Wenhu Chen; Graham Neubig; Xiang Yue; |
| 157 | 3D StreetUnveiler with Semantic-aware 2DGS – A Simple Baseline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the temporal consistency of the inpainting, we introduce a novel time-reversal framework to inpaint frames in reverse order and use later frames as references for earlier frames to fully utilize the long-trajectory observations. |
Jingwei Xu; Yikai Wang; Yiqun Zhao; Yanwei Fu; Shenghua Gao; |
| 158 | One Step Diffusion Via Shortcut Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Shortcut Models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. |
Kevin Frans; Danijar Hafner; Sergey Levine; Pieter Abbeel; |
| 159 | Sail Into The Headwind: Alignment Via Robust Rewards and Dynamic Labels Against Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate reward hacking in offline preference optimization, which aims to improve an initial model using a preference dataset. |
Paria Rashidinejad; Yuandong Tian; |
| 160 | WebRL: Training LLM Web Agents Via Self-Evolving Online Curriculum Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces WebRL, a novel self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. |
Zehan Qi; Xiao Liu; Iat Long Iong; Hanyu Lai; Xueqiao Sun; Jiadai Sun; Xinyue Yang; Yu Yang; Shuntian Yao; Wei Xu; Jie Tang; Yuxiao Dong; |
| 161 | Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce DeCapBench along with a novel metric, DCScore, specifically designed for detailed captioning tasks. |
Qinghao Ye; Xianhan Zeng; Fu Li; Chunyuan Li; Haoqi Fan; |
| 162 | Generative World Explorer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve this human-like ability, we introduce the **Generative World Explorer (Genex)**, a video generation model that allows an agent to mentally explore a large-scale 3D world (e.g., urban scenes) and acquire imagined observations to update its belief.To train Genex, we create a synthetic urban scene dataset, Genex-DB. |
TaiMing Lu; Tianmin Shu; Alan Yuille; Daniel Khashabi; Jieneng Chen; |
| 163 | UniDrive: Towards Universal Driving Perception Across Camera Configurations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present UniDrive, a novel framework for vision-centric autonomous driving to achieve universal perception across camera configurations.To evaluate the effectiveness of our framework, we collect a dataset on CARLA by driving the same routes while only modifying the camera configurations. |
Ye Li; Wenzhao Zheng; Xiaonan Huang; Kurt Keutzer; |
| 164 | Watermark Anything With Localized Messages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a deep-learning model for localized image watermarking, dubbed the Watermark Anything Model (WAM). |
Tom Sander; Pierre Fernandez; Alain Oliviero Durmus; Teddy Furon; Matthijs Douze; |
| 165 | Dualformer: Controllable Fast and Slow Thinking By Learning with Randomized Reasoning Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present \dualformer, a single Transformer model that seamlessly integrates both the fast and slow reasoning modes by training on randomized reasoning traces, where different parts of the traces are strategically dropped during training. |
DiJia Su; Sainbayar Sukhbaatar; Michael Rabbat; Yuandong Tian; Qinqing Zheng; |
| 166 | DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present DreamBench++, a human-aligned benchmark that advanced multimodal GPT models automate.Further, we construct a comprehensive dataset comprising diverse images and prompts. |
Yuang Peng; Yuxin Cui; Haomiao Tang; Zekun Qi; Runpei Dong; Jing Bai; Chunrui Han; Zheng Ge; Xiangyu Zhang; Shu-Tao Xia; |
| 167 | TorchTitan: One-stop PyTorch Native Solution for Production Ready LLM Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces **TORCHTITAN**$^1$, a PyTorch-native distributed training system that unifies and advances state-of-the-art techniques, streamlining integration and reducing engineering overhead. |
Wanchao Liang; Tianyu Liu; Less Wright; Will Constable; Andrew Gu; Chien-Chin Huang; Iris Zhang; Wei Feng; Howard Huang; Junjie Wang; Sanket Purandare; Gokul Nadathur; Stratos Idreos; |
| 168 | SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current video guardrails, however, are either overly simplistic, relying on pure classification models trained on simple policies with limited unsafe categories, which lack detailed explanations, or prompting multimodal large language models (MLLMs) with long safety guidelines, which are inefficient and impractical for guardrailing real-world content. To bridge this gap, we propose SafeWatch, an efficient MLLM-based video guardrail model designed to follow customized safety policies and provide multi-label video guardrail outputs with content-specific explanations in a zero-shot manner. |
Zhaorun Chen; Francesco Pinto; Minzhou Pan; Bo Li; |
| 169 | When Attention Sink Emerges in Language Models: An Empirical View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We highlight that attention sink emerges after effective optimization on sufficient training data. |
Xiangming Gu; Tianyu Pang; Chao Du; Qian Liu; Fengzhuo Zhang; Cunxiao Du; Ye Wang; Min Lin; |
| 170 | Dissecting Adversarial Robustness of Multimodal LM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To systematically examine the robustness of agents, we propose the Agent Robustness Evaluation (ARE) framework. |
Chen Henry Wu; Rishi Rajesh Shah; Jing Yu Koh; Russ Salakhutdinov; Daniel Fried; Aditi Raghunathan; |
| 171 | DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the dense BEV representation adopted by existing methods brings computational challenges for long-range perception and long-term temporal fusion. To address these challenges, we present DriveTransformer, a simplified E2E-AD framework for the ease of scaling up, characterized by three key features: Task Parallelism (All agent, map, and planning queries direct interact with each other at each block), Sparse Representation (Task queries direct interact with raw sensor features), and Streaming Processing (Task queries are stored and passed as history information). |
Xiaosong Jia; Junqi You; Zhiyuan Zhang; Junchi Yan; |
| 172 | LLM-SR: Scientific Equation Discovery Via Programming with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They also employ limited representations such as expression trees, constraining the search space and expressiveness of equations. To bridge this gap, we introduce LLM-SR, a novel approach that leverages the extensive scientific knowledge and robust code generation capabilities of Large Language Models (LLMs) to discover scientific equations from data. |
Parshin Shojaee; Kazem Meidani; Shashank Gupta; Amir Barati Farimani; Chandan K. Reddy; |
| 173 | Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench, have become popular for evaluating language models due to their cost-effectiveness and scalability compared to human evaluation. |
Xiaosen Zheng; Tianyu Pang; Chao Du; Qian Liu; Jing Jiang; Min Lin; |
| 174 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. |
Zhangheng LI; Keen You; Haotian Zhang; Di Feng; Harsh Agrawal; Xiujun Li; Mohana Prasad Sathya Moorthy; Jeffrey Nichols; Yinfei Yang; Zhe Gan; |
| 175 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. |
Yushi Bai; Jiajie Zhang; Xin Lv; Linzhi Zheng; Siqi Zhu; Lei Hou; Yuxiao Dong; Jie Tang; Juanzi Li; |
| 176 | High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, 3D volumetric predictions at each timestamp remain largely unexplored. To address such a challenge, we introduce a comprehensive framework for 3D radar sequence prediction in weather nowcasting, using the newly proposed SpatioTemporal Coherent Gaussian Splatting (STC-GS) for dynamic radar representation and GauMamba for efficient and accurate forecasting. |
Ziye Wang; Yiran Qin; Lin Zeng; Ruimao Zhang; |
| 177 | Lean-STaR: Learning to Interleave Thinking and Proving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof, thereby boosting the model’s theorem-proving capabilities. |
Haohan Lin; Zhiqing Sun; Sean Welleck; Yiming Yang; |
| 178 | Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the implicit bias of the family of steepest descent algorithms with infinitesimal learning rate, including gradient descent, sign gradient descent and coordinate descent, in deep homogeneous neural networks. |
Nikolaos Tsilivis; Gal Vardi; Julia Kempe; |
| 179 | Real2Code: Reconstruct Articulated Objects Via Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Real2Code, a novel approach to reconstructing articulated objects via code generation. |
Zhao Mandi; Yijia Weng; Dominik Bauer; Shuran Song; |
| 180 | Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, continuing with the current architectures will present a computational roadblock. To address this gap, we propose Mixture-of-Denoising Experts (MoDE) as a novel policy for Imitation Learning. |
Moritz Reuss; Jyothish Pari; Pulkit Agrawal; Rudolf Lioutikov; |
| 181 | From Exploration to Mastery: Enabling LLMs to Master Tools Via Self-Driven Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework, DRAFT, aimed at Dynamically Refining tool documentation through the Analysis of Feedback and Trials emanating from LLMs’ interactions with external tools. |
Changle Qu; Sunhao Dai; Xiaochi Wei; Hengyi Cai; Shuaiqiang Wang; Dawei Yin; Jun Xu; Ji-Rong Wen; |
| 182 | TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce a progressive multi-granularity framework. |
Leqi Shen; Tianxiang Hao; Tao He; Sicheng Zhao; Yifeng Zhang; pengzhang liu; Yongjun Bao; Guiguang Ding; |
| 183 | MMSearch: Unveiling The Potential of Large Models As Multi-modal Search Engines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we first design a delicate pipeline, MMSearch-Engine, to empower any LMMs with multimodal search capabilities. On top of this, we introduce MMSearch, a comprehensive evaluation benchmark to assess the multimodal search performance of LMMs. |
Dongzhi Jiang; Renrui Zhang; Ziyu Guo; Yanmin Wu; jiayi lei; Pengshuo Qiu; Pan Lu; Zehui Chen; Guanglu Song; Peng Gao; Yu Liu; Chunyuan Li; Hongsheng Li; |
| 184 | MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. |
Hanrong Ye; Haotian Zhang; Erik Daxberger; Lin Chen; Zongyu Lin; Yanghao Li; Bowen Zhang; Haoxuan You; Dan Xu; Zhe Gan; Jiasen Lu; Yinfei Yang; |
| 185 | ThinK: Thinner Key Cache By Query-Driven Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose ThinK, a novel query-dependent KV cache pruning method designed to minimize attention weight loss while selectively pruning the least significant channels. |
Yuhui Xu; Zhanming Jie; Hanze Dong; Lei Wang; Xudong Lu; Aojun Zhou; Amrita Saha; Caiming Xiong; Doyen Sahoo; |
| 186 | What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors. |
Guangkai Xu; Yongtao Ge; Mingyu Liu; Chengxiang Fan; Kangyang Xie; Zhiyue Zhao; Hao Chen; Chunhua Shen; |
| 187 | U-Nets As Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel interpretation of the U-Net architecture by studying certain generative hierarchical models, which are tree-structured graphical models extensively utilized in both language and image domains. |
Song Mei; |
| 188 | The KoLMogorov Test: Compression By Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the *KoLMogorov-Test* (KT), a compression-as-intelligence intelligence test for code generation LLMs. |
Ori Yoran; Kunhao Zheng; Fabian Gloeckle; Jonas Gehring; Gabriel Synnaeve; Taco Cohen; |
| 189 | Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Samba, a simple hybrid architecture that layer-wise combines Mamba, a selective State Space Model (SSM), with Sliding Window Attention (SWA). |
Liliang Ren; Yang Liu; Yadong Lu; yelong shen; Chen Liang; Weizhu Chen; |
| 190 | MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluation suite designed to assess LVLMs across a wide range of multi-image tasks. |
Fanqing Meng; Jin Wang; Chuanhao Li; Quanfeng Lu; Hao Tian; Tianshuo Yang; Jiaqi Liao; Xizhou Zhu; Jifeng Dai; Yu Qiao; Ping Luo; Kaipeng Zhang; Wenqi Shao; |
| 191 | Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Spider 2.0, an evaluation framework comprising $632$ real-world text-to-SQL workflow problems derived from enterprise-level database use cases. |
Fangyu Lei; Jixuan Chen; Yuxiao Ye; Ruisheng Cao; Dongchan Shin; Hongjin SU; ZHAOQING SUO; Hongcheng Gao; Wenjing Hu; Pengcheng Yin; Victor Zhong; Caiming Xiong; Ruoxi Sun; Qian Liu; Sida Wang; Tao Yu; |
| 192 | Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. |
Zhenting Qi; Mingyuan MA; Jiahang Xu; Li Lyna Zhang; Fan Yang; Mao Yang; |
| 193 | Improved Diffusion-based Generative Model with Better Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fortunately, this issue can be mitigated by AT as well. Based on these insights, we propose to conduct efficient AT on both DPM and CM. |
Zekun Wang; Mingyang Yi; Shuchen Xue; Zhenguo Li; Ming Liu; Bing Qin; Zhi-Ming Ma; |
| 194 | Planning in Natural Language Improves LLM Search for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We empirically demonstrate that this lack of diversity can be mitigated by searching over candidate plans for solving a problem in natural language. Based on this insight, we propose PlanSearch, a novel search algorithm which shows strong results across HumanEval+, MBPP+, and LiveCodeBench (a contamination-free benchmark for competitive coding). |
Evan Z Wang; Federico Cassano; Catherine Wu; Yunfeng Bai; William Song; Vaskar Nath; Ziwen Han; Sean M. Hendryx; Summer Yue; Hugh Zhang; |
| 195 | Text4Seg: Reimagining Image Segmentation As Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the segmentation process. |
Mengcheng Lan; Chaofeng Chen; Yue Zhou; Jiaxing Xu; Yiping Ke; Xinjiang Wang; Litong Feng; Wayne Zhang; |
| 196 | Energy-Based Diffusion Language Models for Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Energy-based Diffusion Language Model (EDLM), an energy-based model operating at the full sequence level for each diffusion step, introduced to improve the underlying approximation used by diffusion models. |
Minkai Xu; Tomas Geffner; Karsten Kreis; Weili Nie; Yilun Xu; Jure Leskovec; Stefano Ermon; Arash Vahdat; |
| 197 | Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. |
Vighnesh Subramaniam; Yilun Du; Joshua B. Tenenbaum; Antonio Torralba; Shuang Li; Igor Mordatch; |
| 198 | Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these holistic input disturbances sometimes induce potential noise and also double the inference cost. To tackle these issues, we propose a simple yet effective method named $\textit{Self-Introspective Decoding}$ (SID). |
Fushuo Huo; Wenchao Xu; Zhong Zhang; Haozhao Wang; Zhicheng Chen; Peilin Zhao; |
| 199 | MaxInfoRL: Boosting Exploration in Reinforcement Learning Through Information Gain Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. |
Bhavya Sukhija; Stelian Coros; Andreas Krause; Pieter Abbeel; Carmelo Sferrazza; |
| 200 | Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce Qinco2 which extends and improves Qinco with (i) improved vector encoding using codeword pre-selection and beam-search, (ii) a fast approximate decoder leveraging codeword pairs to establish accurate short-lists for search, and (iii) an optimized training procedure and network architecture. |
Théophane Vallaeys; Matthew J. Muckley; Jakob Verbeek; Matthijs Douze; |
| 201 | BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. |
Hongjin SU; Howard Yen; Mengzhou Xia; Weijia Shi; Niklas Muennighoff; Han-yu Wang; Liu Haisu; Quan Shi; Zachary S Siegel; Michael Tang; Ruoxi Sun; Jinsung Yoon; Sercan O Arik; Danqi Chen; Tao Yu; |
| 202 | Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Promptriever, the first retrieval model able to be prompted like an LM. |
Orion Weller; Benjamin Van Durme; Dawn Lawrie; Ashwin Paranjape; Yuhao Zhang; Jack Hessel; |
| 203 | X-Gen: Ego-centric Video Prediction By Watching Exo-centric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the cross-view video prediction task, where given an exo-centric video, the first frame of the corresponding ego-centric video, and textual instructions, the goal is to generate future frames of the ego-centric video. |
Jilan Xu; Yifei Huang; Baoqi Pei; Junlin Hou; Qingqiu Li; Guo Chen; Yuejie Zhang; Rui Feng; Weidi Xie; |
| 204 | Variational Best-of-N Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its effectiveness, BoN is computationally expensive; it reduces sampling throughput by a factor of N. To make BoN more efficient at inference time, one strategy is to fine-tune the language model to mimic what BoN does during inference. To achieve this, we derive the distribution induced by the BoN algorithm. |
Afra Amini; Tim Vieira; Elliott Ash; Ryan Cotterell; |
| 205 | Automatic Curriculum Expert Iteration for Reliable LLM Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, some approaches render LLMs overly conservative, limiting their problem-solving capabilities. To mitigate hallucination and laziness in reasoning tasks, we propose Automatic Curriculum Expert Iteration (Auto-CEI) to enhance LLM reasoning and align responses to the model’s capabilities–assertively answering within its limits and declining when tasks exceed them. |
Zirui Zhao; Hanze Dong; Amrita Saha; Caiming Xiong; Doyen Sahoo; |
| 206 | Self-Boosting Large Language Models with Synthetic Preference Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SynPO, a self-boosting paradigm that leverages synthetic preference data for model alignment. |
Qingxiu Dong; Li Dong; Xingxing Zhang; Zhifang Sui; Furu Wei; |
| 207 | Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Vision-RWKV (VRWKV), a model that builds upon the RWKV architecture from the NLP field with key modifications tailored specifically for vision tasks. |
Yuchen Duan; Weiyun Wang; Zhe Chen; Xizhou Zhu; Lewei Lu; Tong Lu; Yu Qiao; Hongsheng Li; Jifeng Dai; Wenhai Wang; |
| 208 | SOAP: Improving and Stabilizing Shampoo Using Adam for Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor — a memory-efficient approximation of Adam — showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo’s preconditioner. |
Nikhil Vyas; Depen Morwani; Rosie Zhao; Itai Shapira; David Brandfonbrener; Lucas Janson; Sham M. Kakade; |
| 209 | SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096$\times$4096 resolution. |
Enze Xie; Junsong Chen; Junyu Chen; Han Cai; Haotian Tang; Yujun Lin; Zhekai Zhang; Muyang Li; Ligeng Zhu; Yao Lu; Song Han; |
| 210 | Follow My Instruction and Spill The Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Extending our study to production RAG models, GPTs, we design an attack that can cause datastore leakage with a near-perfect success rate on 25 randomly selected customized GPTs with at most 2 queries, and we extract text data verbatim at a rate of 41\% from a book of 77,000 words and 3\% from a corpus of 1,569,000 words by prompting the GPTs with only 100 queries generated by themselves. |
Zhenting Qi; Hanlin Zhang; Eric P. Xing; Sham M. Kakade; Himabindu Lakkaraju; |
| 211 | ControlAR: Controllable Image Generation with Autoregressive Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce ControlAR, an efficient and effective framework for integrating spatial controls into autoregressive image generation models. |
Zongming Li; Tianheng Cheng; Shoufa Chen; Peize Sun; Haocheng Shen; Longjin Ran; Xiaoxin Chen; Wenyu Liu; Xinggang Wang; |
| 212 | Round and Round We Go! What Makes Rotary Positional Encodings Useful? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common belief is that RoPE is useful because it helps to decay token dependency as relative distance increases. In this work, we argue that this is unlikely to be the core reason. |
Federico Barbero; Alex Vitvitskyi; Christos Perivolaropoulos; Razvan Pascanu; Petar Veličković; |
| 213 | BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To assess how well LLMs can solve challenging and practical tasks via programs, we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks. |
Terry Yue Zhuo; Vu Minh Chien; Jenny Chim; Han Hu; Wenhao Yu; Ratnadira Widyasari; Imam Nur Bani Yusuf; Haolan Zhan; Junda He; Indraneil Paul; Simon Brunner; Chen GONG; James Hoang; Armel Randy Zebaze; Xiaoheng Hong; Wen-Ding Li; Jean Kaddour; Ming Xu; Zhihan Zhang; Prateek Yadav; Naman Jain; Alex Gu; Zhoujun Cheng; Jiawei Liu; Qian Liu; Zijian Wang; Binyuan Hui; Niklas Muennighoff; David Lo; Daniel Fried; Xiaoning Du; Harm de Vries; Leandro Von Werra; |
| 214 | OpenRCA: Can Large Language Models Locate The Root Cause of Software Failures? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current research predominantly focuses on the early stages of development, such as code generation, while overlooking the post-development phases that are crucial to user experience. To explore the potential of LLMs in this direction, we propose OpenRCA, a benchmark dataset and evaluation framework for assessing LLMs’ ability to identify the root cause of software failures. |
Junjielong Xu; Qinan Zhang; Zhiqing Zhong; Shilin He; Chaoyun Zhang; Qingwei Lin; Dan Pei; Pinjia He; Dongmei Zhang; Qi Zhang; |
| 215 | MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel method for generating mathematical code accompanied with corresponding reasoning steps for continued pretraining. |
Zimu Lu; Aojun Zhou; Ke Wang; Houxing Ren; Weikang Shi; Junting Pan; Mingjie Zhan; Hongsheng Li; |
| 216 | Strong Preferences Affect The Robustness of Preference Models and Value Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the robustness of value alignment by examining the sensitivity of preference models. |
Ziwei Xu; Mohan Kankanhalli; |
| 217 | A Transfer Attack to Image Watermarks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new transfer evasion attack to image watermark in the no-box setting. |
Yuepeng Hu; Zhengyuan Jiang; Moyang Guo; Neil Zhenqiang Gong; |
| 218 | ALLaM: Large Language Models for Arabic and English Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT). |
M Saiful Bari; Yazeed Alnumay; Norah A. Alzahrani; Nouf M. Alotaibi; Hisham Abdullah Alyahya; Sultan AlRashed; Faisal Abdulrahman Mirza; Shaykhah Z. Alsubaie; Hassan A. Alahmed; Ghadah Alabduljabbar; Raghad Alkhathran; Yousef Almushayqih; Raneem Alnajim; Salman Alsubaihi; Maryam Al Mansour; Saad Amin Hassan; Dr. Majed Alrubaian; Ali Alammari; Zaki Alawami; Abdulmohsen Al-Thubaity; Ahmed Abdelali; Jeril Kuriakose; Abdalghani Abujabal; Nora Al-Twairesh; Areeb Alowisheq; Haidar Khan; |
| 219 | Vision Language Models Are In-Context Value Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, learning such progress estimator, or temporal value function, across different tasks and domains requires both a large amount of diverse data and methods which can scale and generalize. To address these challenges, we present Generative Value Learning (GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress. |
Yecheng Jason Ma; Joey Hejna; Chuyuan Fu; Dhruv Shah; Jacky Liang; Zhuo Xu; Sean Kirmani; Peng Xu; Danny Driess; Ted Xiao; Osbert Bastani; Dinesh Jayaraman; Wenhao Yu; Tingnan Zhang; Dorsa Sadigh; Fei Xia; |
| 220 | Measuring Non-Adversarial Reproduction of Training Data in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate an intermediate regime of memorization that we call non-adversarial reproduction, where we quantify the overlap between model responses and pretraining data when responding to natural and benign prompts. |
Michael Aerni; Javier Rando; Edoardo Debenedetti; Nicholas Carlini; Daphne Ippolito; Florian Tramèr; |
| 221 | Adversarial Search Engine Optimization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce _Preference Manipulation Attacks_, a new class of attacks that manipulate an LLM’s selections to favor the attacker. |
Fredrik Nestaas; Edoardo Debenedetti; Florian Tramèr; |
| 222 | Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LEARN-BY-INTERACT, a data-centric framework to adapt LLM agents to any given environments without human annotations. |
Hongjin SU; Ruoxi Sun; Jinsung Yoon; Pengcheng Yin; Tao Yu; Sercan O Arik; |
| 223 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guided—offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. |
Koichi Namekata; Sherwin Bahmani; Ziyi Wu; Yash Kant; Igor Gilitschenski; David B. Lindell; |
| 224 | MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities with multigranular annotations for more than 65 diseases. |
Yunfei Xie; Ce Zhou; Lang Gao; Juncheng Wu; Xianhang Li; Hong-Yu Zhou; Sheng Liu; Lei Xing; James Zou; Cihang Xie; Yuyin Zhou; |
| 225 | ThinkBot: Embodied Instruction Following with Thought Chain Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the contrary, we propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions, so that the agent can successfully complete human goals by following the coherent instruction. |
Guanxing Lu; Ziwei Wang; Changliu Liu; Jiwen Lu; Yansong Tang; |
| 226 | OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset. |
Qingyun Li; Zhe Chen; Weiyun Wang; Wenhai Wang; Shenglong Ye; Zhenjiang Jin; Guanzhou Chen; Yinan He; Zhangwei Gao; Erfei Cui; Jiashuo Yu; Hao Tian; Jiasheng Zhou; Chao Xu; Bin Wang; Xingjian Wei; Wei Li; Wenjian Zhang; Bo Zhang; Pinlong Cai; Licheng Wen; Xiangchao Yan; Pei Chu; Yi Wang; Min Dou; Changyao Tian; Xizhou Zhu; Lewei Lu; Yushi Chen; Junjun He; Tong Lu; Yali Wang; Limin Wang; Dahua Lin; Yu Qiao; Botian Shi; Conghui He; Jifeng Dai; |
| 227 | Combining Induction and Transduction for Abstract Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When learning an input-output mapping from very few examples, is it better to first infer a latent function that explains the examples, or is it better to directly predict new test outputs, e.g. using a neural network? We study this question on ARC by training neural models for \emph{induction} (inferring latent functions) and \emph{transduction} (directly predicting the test output for a given test input). |
Wen-Ding Li; Keya Hu; Carter Larsen; Yuqing Wu; Simon Alford; Caleb Woo; Spencer M. Dunn; Hao Tang; Wei-Long Zheng; Yewen Pu; Kevin Ellis; |
| 228 | Consistency Models Made Easy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, as of 2024, training a state-of-the-art CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an effective scheme for training CMs that largely improves the efficiency of building such models. |
Zhengyang Geng; Ashwini Pokle; Weijian Luo; Justin Lin; J Zico Kolter; |
| 229 | Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning Via Dynamic Data Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on empirical analyses that show that selecting the best data subset using a static importance measure is often ineffective for multi-task datasets with evolving distributions, we propose Adapt-$\infty$, a new multi-way and adaptive data selection approach that dynamically balances sample efficiency and effectiveness during LiIT. |
Adyasha Maharana; Jaehong Yoon; Tianlong Chen; Mohit Bansal; |
| 230 | Strong Model Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised regression setting and establish a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. |
Elvis Dohmatob; Yunzhen Feng; Arjun Subramonian; Julia Kempe; |
| 231 | Scaling FP8 Training to Trillion-token LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Interestingly, we show, both analytically and empirically, that this amplification happens only over prolonged training periods, and link it to a SwiGLU weight alignment process. To address this newly identified issue, we introduce Smooth-SwiGLU, a novel modification that ensures stable FP8 training without altering function behavior. |
Maxim Fishman; Brian Chmiel; Ron Banner; Daniel Soudry; |
| 232 | Arithmetic Without Algorithms: Language Models Solve Math with A Bag of Heuristics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? |
Yaniv Nikankin; Anja Reusch; Aaron Mueller; Yonatan Belinkov; |
| 233 | STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Notably, while these generalist policies can improve the average performance across many tasks, the performance of generalist policies on any one task is often suboptimal due to negative transfer between partitions of the data, compared to task-specific specialist policies. In this work, we argue for the paradigm of training policies during deployment given the scenarios they encounter: rather than deploying pre-trained policies to unseen problems in a zero-shot manner, we non-parametrically retrieve and train models directly on relevant data at test time. |
Marius Memmel; Jacob Berg; Bingqing Chen; Abhishek Gupta; Jonathan Francis; |
| 234 | The Geometry of Categorical and Hierarchical Concepts in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as _vectors_. |
Kiho Park; Yo Joong Choe; Yibo Jiang; Victor Veitch; |
| 235 | Presto! Distilling Steps and Layers for Accelerating Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. |
Zachary Novack; Ge Zhu; Jonah Casebeer; Julian McAuley; Taylor Berg-Kirkpatrick; Nicholas J. Bryan; |
| 236 | VideoPhy: Evaluating Physical Commonsense for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities (e.g. marbles will roll down when placed on a slanted surface). |
Hritik Bansal; Zongyu Lin; Tianyi Xie; Zeshun Zong; Michal Yarom; Yonatan Bitton; Chenfanfu Jiang; Yizhou Sun; Kai-Wei Chang; Aditya Grover; |
| 237 | Instant Policy: In-Context Imitation Learning Via Graph Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Instant Policy, which learns new tasks instantly from just one or two demonstrations, achieving ICIL through two key components. |
Vitalis Vosylius; Edward Johns; |
| 238 | Q-SFT: Q-Learning for Language Models Via Supervised Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This setting requires effectively leveraging pretraining, scaling to large architectures with billions of parameters, and training on large datasets, all of which represent major challenges for current value-based RL methods. In this work, we propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning (SFT) problem where the probabilities of tokens directly translate to Q-values. |
Joey Hong; Anca Dragan; Sergey Levine; |
| 239 | Internet of Agents: Weaving A Web of Heterogeneous Agents for Collaborative Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. |
Weize Chen; Ziming You; Ran Li; yitong guan; Chen Qian; Chenyang Zhao; Cheng Yang; Ruobing Xie; Zhiyuan Liu; Maosong Sun; |
| 240 | Diverse Preference Learning for Capabilities and Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This causes the model to overweight majority opinions and sacrifice diversity in exchange for optimal reward. To address this, we propose Soft Preference Learning, which decouples the entropy and cross-entropy terms in the KL penalty — allowing for fine-grained control over LLM generation diversity. |
Stewart Slocum; Asher Parker-Sartori; Dylan Hadfield-Menell; |
| 241 | Repetition Improves Language Model Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Bidirectional models are considered essential for strong text embeddings. Recent approaches to adapt autoregressive language models (LMs) into strong text embedding models have largely had the requirement to modify the LM architecture to be bidirectional. We challenge this premise by introducing “echo embeddings” which converts autoregressive LMs into high quality text embedding models \emph{without} changing the architecture or requiring fine-tuning. |
Jacob Mitchell Springer; Suhas Kotha; Daniel Fried; Graham Neubig; Aditi Raghunathan; |
| 242 | Learning How Hard to Think: Input-Adaptive Allocation of LM Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an approach that predicts the distribution of rewards given an input and computation budget, then allocates additional computation to inputs for which it is predicted to be most useful. |
Mehul Damani; Idan Shenfeld; Andi Peng; Andreea Bobu; Jacob Andreas; |
| 243 | TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a new vision and language assistant called TEOChat that can engage in conversations about temporal sequences of earth observation data.We publicly release our data, models, and code at https://github.com/ermongroup/TEOChat . |
Jeremy Andrew Irvin; Emily Ruoyu Liu; Joyce C. Chen; Ines Dormoy; Jinyoung Kim; Samar Khanna; Zhuo Zheng; Stefano Ermon; |
| 244 | Better Instruction-Following Through Minimum Bayes Risk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that MBR decoding with reference-based LLM judges substantially improves over greedy decoding, best-of-N decoding with reference-free judges and MBR decoding with lexical and embedding-based metrics on AlpacaEval and MT-Bench. |
Ian Wu; Patrick Fernandes; Amanda Bertsch; Seungone Kim; Sina Khoshfetrat Pakazad; Graham Neubig; |
| 245 | CLoSD: Closing The Loop Between Simulation and Diffusion for Multi-task Character Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a method that combines their respective strengths. |
Guy Tevet; Sigal Raab; Setareh Cohan; Daniele Reda; Zhengyi Luo; Xue Bin Peng; Amit Haim Bermano; Michiel van de Panne; |
| 246 | Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agents, we propose DEI (Diversity Empowered Intelligence), a framework that leverages their unique expertise. |
Kexun Zhang; Weiran Yao; Zuxin Liu; Yihao Feng; Zhiwei Liu; Rithesh R N; Tian Lan; Lei Li; Renze Lou; Jiacheng Xu; Bo Pang; Yingbo Zhou; Shelby Heinecke; Silvio Savarese; Huan Wang; Caiming Xiong; |
| 247 | Language Models Scale Reliably with Over-training and on Downstream Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address both shortcomings, we create a testbed of 104 models with 0.011B to 6.9B parameters trained with various numbers of tokens on three data distributions. |
Samir Yitzhak Gadre; Georgios Smyrnis; Vaishaal Shankar; Suchin Gururangan; Mitchell Wortsman; Rulin Shao; Jean Mercat; Alex Fang; Jeffrey Li; Sedrick Keh; Rui Xin; Marianna Nezhurina; Igor Vasiljevic; Luca Soldaini; Jenia Jitsev; Alex Dimakis; Gabriel Ilharco; Pang Wei Koh; Shuran Song; Thomas Kollar; Yair Carmon; Achal Dave; Reinhard Heckel; Niklas Muennighoff; Ludwig Schmidt; |
| 248 | COAT: Compressing Optimizer States and Activations for Memory-Efficient FP8 Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces COAT (**C**ompressing **O**ptimizer States and **A**ctivations for FP8 **T**raining), a novel FP8 training framework designed to significantly reduce memory footprint when training large models. |
Haocheng Xi; Han Cai; Ligeng Zhu; Yao Lu; Kurt Keutzer; Jianfei Chen; Song Han; |
| 249 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Stable Video 4D (SV4D) — a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. |
Yiming Xie; Chun-Han Yao; Vikram Voleti; Huaizu Jiang; Varun Jampani; |
| 250 | Benchmarking Agentic Workflow Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. |
Shuofei Qiao; Runnan Fang; Zhisong Qiu; Xiaobin Wang; Ningyu Zhang; Yong Jiang; Pengjun Xie; Fei Huang; Huajun Chen; |
| 251 | AI Sandbagging: Language Models Can Strategically Underperform on Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we assess sandbagging capabilities in contemporary language models (LMs). |
Teun van der Weij; Felix Hofstätter; Oliver Jaffe; Samuel F. Brown; Francis Rhys Ward; |
| 252 | DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech Without Domain-Specific Factors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce DiTTo-TTS, a Diffusion Transformer (DiT)-based TTS model, to investigate whether LDM-based TTS can achieve state-of-the-art performance without domain-specific factors. |
Keon Lee; Dong Won Kim; Jaehyeon Kim; Seungjun Chung; Jaewoong Cho; |
| 253 | Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Agent-to-Sim (ATS), a framework for learning interactive behavior models of 3D agents from casual longitudinal video collections. |
Gengshan Yang; Andrea Bajcsy; Shunsuke Saito; Angjoo Kanazawa; |
| 254 | A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work tackles the information loss bottleneck of vector-quantization (VQ) autoregressive image generation by introducing a novel model architecture called the 2-Dimensional Autoregression (DnD) Transformer. |
Liang Chen; Sinan Tan; Zefan Cai; Weichu Xie; Haozhe Zhao; Yichi Zhang; Junyang Lin; Jinze Bai; Tianyu Liu; Baobao Chang; |
| 255 | StringLLM: Understanding The String Processing Capability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our evaluations indicate that LLMs struggle with accurately processing strings compared to humans. To uncover the underlying reasons for this limitation, we conduct an in-depth analysis and subsequently propose an effective approach that significantly enhances LLMs’ string processing capability via fine-tuning. |
Xilong Wang; Hao Fu; Jindong Wang; Neil Zhenqiang Gong; |
| 256 | DPLM-2: A Multimodal Diffusion Protein Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures. |
Xinyou Wang; Zaixiang Zheng; Fei YE; Dongyu Xue; Shujian Huang; Quanquan Gu; |
| 257 | What’s The Move? Hybrid Imitation Learning Via Salient Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce **SPHINX**: **S**alient **P**oint-based **H**ybrid **I**mitatio**N** and e**X**ecution, a flexible IL policy that leverages multimodal observations (point clouds and wrist images), along with a hybrid action space of low-frequency, sparse waypoints and high-frequency, dense end effector movements. |
Priya Sundaresan; Hengyuan Hu; Quan Vuong; Jeannette Bohg; Dorsa Sadigh; |
| 258 | DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life. |
Yu Ying Chiu; Liwei Jiang; Yejin Choi; |
| 259 | Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the theory, we introduce WSD-S, a variant of WSD that reuses previous checkpoints’ decay phases and keeps only one main branch, where we resume from a decayed checkpoint. |
Kaiyue Wen; Zhiyuan Li; Jason S. Wang; David Leo Wright Hall; Percy Liang; Tengyu Ma; |
| 260 | Matryoshka Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the concept of Matryoshka Dolls, we propose : Matryoshka Multimodal Models, which learns to represent visual content as nested sets of visual tokens that capture information across multiple coarse-to-fine granularities. |
Mu Cai; Jianwei Yang; Jianfeng Gao; Yong Jae Lee; |
| 261 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this, we propose LLaMA-Omni, a novel model architecture designed for low-latency and high-quality speech interaction with LLMs.To align the model with speech interaction scenarios, we construct a dataset named InstructS2S-200K, which includes 200K speech instructions and corresponding speech responses. |
Qingkai Fang; Shoutao Guo; Yan Zhou; Zhengrui Ma; Shaolei Zhang; Yang Feng; |
| 262 | Implicit Search Via Discrete Diffusion: A Study on Chess Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DiffuSearch , a model that does \textit{implicit search} by looking into the future world via discrete diffusion modeling. |
Jiacheng Ye; Zhenyu Wu; Jiahui Gao; Zhiyong Wu; Xin Jiang; Zhenguo Li; Lingpeng Kong; |
| 263 | Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. |
Jiacheng Ye; Jiahui Gao; Shansan Gong; Lin Zheng; Xin Jiang; Zhenguo Li; Lingpeng Kong; |
| 264 | Is In-Context Learning Sufficient for Instruction Following in LLMs? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that, while effective, ICL alignment with URIAL still underperforms compared to instruction fine-tuning on established benchmarks such as MT-Bench and AlpacaEval 2.0 (LC), especially with more capable base LLMs. |
Hao Zhao; Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; |
| 265 | Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel, controllable, and scalable captioning pipeline designed to generate diverse caption formats tailored to various multimodal models. |
Zhengfeng Lai; Vasileios Saveris; Chen Chen; Hong-You Chen; Haotian Zhang; Bowen Zhang; Wenze Hu; Juan Lao Tebar; Zhe Gan; Peter Grasch; Meng Cao; Yinfei Yang; |
| 266 | On The Role of Attention Heads in Large Language Model Safety Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing research tends to overlook the safety impact of multi-head attention mechanisms, despite their crucial role in various model functionalities. Hence, in this paper, we aim to explore the connection between standard attention mechanisms and safety capability to fill this gap in the safety-related mechanistic interpretability. |
Zhenhong Zhou; Haiyang Yu; Xinghua Zhang; Rongwu Xu; Fei Huang; Kun Wang; Yang Liu; Junfeng Fang; Yongbin Li; |
| 267 | Proteina: Scaling Flow-based Protein Structure Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. |
Tomas Geffner; Kieran Didi; Zuobai Zhang; Danny Reidenbach; Zhonglin Cao; Jason Yim; Mario Geiger; Christian Dallago; Emine Kucukbenli; Arash Vahdat; Karsten Kreis; |
| 268 | OMNI-EPIC: Open-endedness Via Models of Human Notions of Interestingness with Environments Programmed in Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. |
Maxence Faldor; Jenny Zhang; Antoine Cully; Jeff Clune; |
| 269 | MagicPIG: LSH Sampling for Efficient LLM Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To make the sampling-based approximation practical in LLM generation, we propose MagicPIG, a heterogeneous system based on Locality Sensitive Hashing (LSH). |
Zhuoming Chen; Ranajoy Sadhukhan; Zihao Ye; Yang Zhou; Jianyu Zhang; Niklas Nolte; Yuandong Tian; Matthijs Douze; Leon Bottou; Zhihao Jia; Beidi Chen; |
| 270 | Benchmarking Vision Language Model Unlearning Via Fictitious Facial Identity Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, with the increasing integration of visual data, privacy concerns in Vision Language Models (VLMs) remain underexplored. To address this, we introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms under the Right to be Forgotten setting. |
Yingzi Ma; Jiongxiao Wang; Fei Wang; Siyuan Ma; Jiazhao Li; Jinsheng Pan; Xiujun Li; Furong Huang; Lichao Sun; Bo Li; Yejin Choi; Muhao Chen; Chaowei Xiao; |
| 271 | Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Its key components include: 1) using the linear interpolating diffusion form of flow-matching, 2) employing $\boldsymbol v$-prediction, and 3) performing rectification (a.k.a. reflow). In this paper, we argue that the success of rectification primarily lies in using a pretrained diffusion model to obtain matched pairs of noise and samples, followed by retraining with these matched noise-sample pairs. |
Fu-Yun Wang; Ling Yang; Zhaoyang Huang; Mengdi Wang; Hongsheng Li; |
| 272 | Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we propose a straightforward but consistently effective approach that involves training a model specifically attuned to negative preferences. |
Fu-Yun Wang; Yunhao Shui; Jingtan Piao; Keqiang Sun; Hongsheng Li; |
| 273 | KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While benchmarks exist for testing visual reasoning in LMMs, they require advanced skills and omit basic visual analogies that even young children can make. Inspired by developmental psychology, we propose a new benchmark of 4,300 visual transformations of everyday objects to test LMMs on visual analogical reasoning and compare them to children (ages three to five) and to adults. |
Eunice Yiu; Maan Qraitem; Anisa Noor Majhi; Charlie Wong; Yutong Bai; Shiry Ginosar; Alison Gopnik; Kate Saenko; |
| 274 | PolyPythias: Stability and Outliers Across Fifty Language Model Pre-Training Runs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the PolyPythias, a set of 45 new training runs for the Pythia model suite: 9 new seeds across 5 model sizes, from 14M to 410M parameters, resulting in about 7k new checkpoints that we release. |
Oskar van der Wal; Pietro Lesci; Max Müller-Eberstein; Naomi Saphra; Hailey Schoelkopf; Willem Zuidema; Stella Biderman; |
| 275 | FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. |
Zhengyao Lv; Chenyang Si; Junhao Song; Zhenyu Yang; Yu Qiao; Ziwei Liu; Kwan-Yee K. Wong; |
| 276 | The Unreasonable Ineffectiveness of The Deeper Layers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Surprisingly, with this method we find minimal degradation of performance until after a large fraction (up to half) of the layers are removed for some common open-weight models. |
Andrey Gromov; Kushal Tirumala; Hassan Shapourian; Paolo Glorioso; Dan Roberts; |
| 277 | ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate high-quality, dynamic, and temporally consistent long videos, this paper presents ARLON, a novel framework that boosts diffusion Transformers with autoregressive (\textbf{AR}) models for long (\textbf{LON}) video generation, by integrating the coarse spatial and long-range temporal information provided by the AR model to guide the DiT model effectively. |
Zongyi Li; Shujie HU; Shujie LIU; Long Zhou; Jeongsoo Choi; Lingwei Meng; Xun Guo; Jinyu Li; Hefei Ling; Furu Wei; |
| 278 | Interpreting The Second-Order Effects of Neurons in CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, each effect can be approximated by a single direction in the text-image space of CLIP. We describe neurons by decomposing these directions into sparse sets of text representations. |
Yossi Gandelsman; Alexei A Efros; Jacob Steinhardt; |
| 279 | Autoregressive Video Generation Without Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel approach that enables autoregressive video generation with high efficiency. |
Haoge Deng; Ting Pan; Haiwen Diao; Zhengxiong Luo; Yufeng Cui; Huchuan Lu; Shiguang Shan; Yonggang Qi; Xinlong Wang; |
| 280 | Do As We Do, Not As You Think: The Conformity of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we introduce BenchForm, a new conformity-oriented benchmark, featuring reasoning-intensive tasks and five distinct interaction protocols designed to probe LLMs’ behavior in collaborative scenarios. |
Zhiyuan Weng; Guikun Chen; Wenguan Wang; |
| 281 | Booster: Tackling Harmful Fine-tuning for Large Language Models Via Attenuating Harmful Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to attenuate the negative impact of harmful perturbation, we propose an alignment-stage solution, dubbed Booster. |
Tiansheng Huang; Sihao Hu; Fatih Ilhan; Selim Furkan Tekin; Ling Liu; |
| 282 | Preble: Efficient Distributed Prompt Scheduling for LLM Serving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Preble, the first distributed LLM serving platform that targets and op- timizes for prompt sharing. |
Vikranth Srivatsa; Zijian He; Reyna Abhyankar; Dongming Li; Yiying Zhang; |
| 283 | No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a conceptual shift to modeling low-dimensional dynamical systems by departing from the traditional two-step modeling process. |
Krzysztof Kacprzyk; Mihaela van der Schaar; |
| 284 | CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CHASE-SQL, a novel framework addressing large language model (LLM) performance challenges for Text-to-SQL tasks by leveraging multi-agent modeling and test-time compute for improved candidate generation and selection. |
Mohammadreza Pourreza; Hailong Li; Ruoxi Sun; Yeounoh Chung; Shayan Talaei; Gaurav Tarlok Kakkar; Yu Gan; Amin Saberi; Fatma Ozcan; Sercan O Arik; |
| 285 | Dynamic Multimodal Evaluation with Flexible Complexity By Vision-Language Bootstrapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these benchmarks keep a static nature and overlap with the pre-training data, resulting in fixed complexity constraints and data contamination issues. This raises the concern regarding the validity of the evaluation. To address these two challenges, we introduce a dynamic multimodal evaluation protocol called Vision-Language Bootstrapping (VLB). |
Yue Yang; Shuibo Zhang; Kaipeng Zhang; Yi Bin; Yu Wang; Ping Luo; Wenqi Shao; |
| 286 | MIND: Math Informed SyNthetic Dialogues for Pretraining LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method that improves the mathematical reasoning ability of LLMs. |
Syeda Nahida Akter; Shrimai Prabhumoye; John Kamalu; Sanjeev Satheesh; Eric Nyberg; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; |
| 287 | MixEval-X: Any-to-any Evaluations from Real-world Data Mixture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose multi-modal benchmark mixture and adaptation-rectification pipelines to reconstruct real-world task distributions, ensuring evaluations generalize effectively to real-world use cases. |
Jinjie Ni; Yifan Song; Deepanway Ghosal; Bo Li; David Junhao Zhang; Xiang Yue; Fuzhao Xue; Yuntian Deng; Zian Zheng; Kaichen Zhang; Mahir Shah; Kabir Jain; Yang You; Michael Shieh; |
| 288 | Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. |
Guanting Dong; Keming Lu; Chengpeng Li; Tingyu Xia; Bowen Yu; Chang Zhou; Jingren Zhou; |
| 289 | Agents’ Room: Narrative Generation Through Multi-step Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Agents’ Room, a generation framework inspired by narrative theory, that decomposes narrative writing into subtasks tackled by specialized agents. |
Fantine Huot; Reinald Kim Amplayo; Jennimaria Palomaki; Alice Shoshana Jakobovits; Elizabeth Clark; Mirella Lapata; |
| 290 | Long-Context Linear System Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the problem of long-context linear system identification, where the state $x_t$ of the system at time $t$ depends linearly on previous states $x_s$ over a fixed context window of length $p$. |
Oğuz Kaan Yüksel; Mathieu Even; Nicolas Flammarion; |
| 291 | LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a unified learning-based framework called LLMOPT to boost optimization generalization. |
Caigao JIANG; Xiang Shu; Hong Qian; Xingyu Lu; JUN ZHOU; Aimin Zhou; Yang Yu; |
| 292 | Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We identify the core issue as a lack of true visual perception in LVLMs: although they can accurately recognize visual elements, they struggle to fully interpret these elements in the context of the input prompt and effectively link this recognition to their internal knowledge, which is critical for reasoning. To address this gap, we introduce Visual Description Grounded Decoding (VDGD), a simple, robust, and training-free method designed to enhance visual perception and improve reasoning capabilities in LVLMs. |
Sreyan Ghosh; Chandra Kiran Reddy Evuru; Sonal Kumar; Utkarsh Tyagi; Oriol Nieto; Zeyu Jin; Dinesh Manocha; |
| 293 | Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. |
Sreyan Ghosh; Sonal Kumar; Zhifeng Kong; Rafael Valle; Bryan Catanzaro; Dinesh Manocha; |
| 294 | Interpreting Emergent Planning in Model-Free Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first mechanistic evidence that model-free reinforcement learning agents can learn to plan. |
Thomas Bush; Stephen Chung; Usman Anwar; Adrià Garriga-Alonso; David Krueger; |
| 295 | Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework. |
Zikun Zhang; Zixiang Chen; Quanquan Gu; |
| 296 | TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A significant remaining issue lies in the major differences between teacher and student models, namely the substantial capacity gap, mode averaging, and mode collapse, which pose barriers during distillation. To address these issues, we introduce $\textit{Temporally Adaptive Interpolated Distillation (TAID)}$, a novel knowledge distillation approach that dynamically interpolates student and teacher distributions through an adaptive intermediate distribution, gradually shifting from the student’s initial distribution towards the teacher’s distribution. |
Makoto Shing; Kou Misaki; Han Bao; Sho Yokoi; Takuya Akiba; |
| 297 | Scaling Laws for Downstream Task Performance in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. |
Berivan Isik; Natalia Ponomareva; Hussein Hazimeh; Dimitris Paparas; Sergei Vassilvitskii; Sanmi Koyejo; |
| 298 | Learning Clustering-based Prototypes for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop ClusPro, a robust clustering-based prototype mining framework for CZSL that defines the conceptual boundaries of primitives through a set of diversified prototypes. |
Hongyu Qu; Jianan Wei; Xiangbo Shu; Wenguan Wang; |
| 299 | Quantifying Generalization Complexity for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While large language models (LLMs) have shown exceptional capabilities in understanding complex queries and performing sophisticated tasks, their generalization abilities are often deeply entangled with memorization, necessitating more precise evaluation. To address this challenge, we introduce Scylla, a dynamic evaluation framework that quantitatively measures the generalization abilities of LLMs. |
Zhenting Qi; Hongyin Luo; Xuliang Huang; Zhuokai Zhao; Yibo Jiang; Xiangjun Fan; Himabindu Lakkaraju; James R. Glass; |
| 300 | Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we cast reward fine-tuning as stochastic optimal control (SOC). |
Carles Domingo-Enrich; Michal Drozdzal; Brian Karrer; Ricky T. Q. Chen; |
| 301 | MagicDec: Breaking The Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We leverage draft model with sparse KV cache to address the KV bottleneck, which scales with both sequence length and batch size. |
Ranajoy Sadhukhan; Jian Chen; Zhuoming Chen; Vashisth Tiwari; Ruihang Lai; Jinyuan Shi; Ian En-Hsu Yen; Avner May; Tianqi Chen; Beidi Chen; |
| 302 | VL-ICL Bench: The Devil in The Details of Multimodal In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a comprehensive benchmark VL-ICL Bench for multimodal in-context learning, encompassing a broad spectrum of tasks that involve both images and text as inputs and outputs, and different types of challenges, from {perception to reasoning and long context length}. |
Yongshuo Zong; Ondrej Bohdal; Timothy Hospedales; |
| 303 | Limits of Deep Learning: Sequence Modeling Through The Lens of Complexity Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. |
Nikola Zubic; Federico Soldà; Aurelio Sulser; Davide Scaramuzza; |
| 304 | ChatQA 2: Bridging The Gap to Proprietary LLMs in Long Context and RAG Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ChatQA 2, an Llama 3.0-based model with a 128K context window, designed to bridge the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo-2024-04-09) in long context un- derstanding and retrieval-augmented generation (RAG) capabilities. |
Peng Xu; Wei Ping; Xianchao Wu; Chejian Xu; Zihan Liu; Mohammad Shoeybi; Bryan Catanzaro; |
| 305 | Improving Uncertainty Estimation Through Semantically Diverse Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. |
Lukas Aichberger; Kajetan Schweighofer; Mykyta Ielanskyi; Sepp Hochreiter; |
| 306 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. |
Haotian Zhang; Mingfei Gao; Zhe Gan; Philipp Dufter; Nina Wenzel; Forrest Huang; Dhruti Shah; Xianzhi Du; Bowen Zhang; Yanghao Li; Sam Dodge; Keen You; Zhen Yang; Aleksei Timofeev; Mingze Xu; Hong-You Chen; Jean-Philippe Fauconnier; Zhengfeng Lai; Haoxuan You; Zirui Wang; Afshin Dehghan; Peter Grasch; Yinfei Yang; |
| 307 | SLMRec: Distilling Large Language Models Into Small for Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the influence of LLMs’ depth by conducting extensive experiments on large-scale industry datasets. |
Wujiang Xu; Qitian Wu; Zujie Liang; Jiaojiao Han; Xuying Ning; Yunxiao Shi; Wenfang Lin; Yongfeng Zhang; |
| 308 | From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs By Finetuning on Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. |
Zheyang Xiong; Vasilis Papageorgiou; Kangwook Lee; Dimitris Papailiopoulos; |
| 309 | LLaMaFlex: Many-in-one LLMs Via Generalized Pruning and Weight Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel nested weight-shared architecture named LLaMaFlex that can be pruned across both width and depth dimensions in a zero-shot manner to instantly yield a large number of highly accurate compressed models. |
Ruisi Cai; Saurav Muralidharan; Hongxu Yin; Zhangyang Wang; Jan Kautz; Pavlo Molchanov; |
| 310 | Not-So-Optimal Transport Flows for 3D Point Cloud Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One of the key properties of point clouds is their permutation invariance, i.e., changing the order of points in a point cloud does not change the shape they represent. In this paper, we analyze the recently proposed equivariant OT flows that learn permutation invariant generative models for point-based molecular data and we show that these models scale poorly on large point clouds. |
Ka-Hei Hui; Chao Liu; Xiaohui Zeng; Chi-Wing Fu; Arash Vahdat; |
| 311 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a unified approach to online and offline RLHF — value-incentivized preference optimization (VPO) — which regularizes the maximum-likelihood estimate of the reward function with the corresponding value function, modulated by a sign to indicate whether the optimism or pessimism is chosen. |
Shicong Cen; Jincheng Mei; Katayoon Goshvadi; Hanjun Dai; Tong Yang; Sherry Yang; Dale Schuurmans; Yuejie Chi; Bo Dai; |
| 312 | Monitoring Latent World States in Language Models with Propositional Probes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that LMs faithfully represent their input contexts in a latent world model, and we seek to extract these latent world states as logical propositions. |
Jiahai Feng; Stuart Russell; Jacob Steinhardt; |
| 313 | Forewarned Is Forearmed: Harnessing LLMs for Data Synthesis Via Failure-induced Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach, ReverseGen, designed to automatically generate effective training samples that expose the weaknesses of LLMs. |
Qintong Li; Jiahui Gao; Sheng Wang; Renjie Pi; Xueliang Zhao; Chuan Wu; Xin Jiang; Zhenguo Li; Lingpeng Kong; |
| 314 | Anyprefer: An Agentic Framework for Preference Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent methods often adopt a self-rewarding approach, where the target model generates and annotates its own preference data, but this can lead to inaccuracies since the reward model shares weights with the target model, thereby amplifying inherent biases. To address these issues, we propose Anyprefer, a framework designed to synthesize high-quality preference data for aligning the target model. |
Yiyang Zhou; Zhaoyang Wang; Tianle Wang; Shangyu Xing; Peng Xia; Bo Li; Kaiyuan Zheng; Zijian Zhang; Zhaorun Chen; Wenhao Zheng; Xuchao Zhang; Chetan Bansal; Weitong Zhang; Ying Wei; Mohit Bansal; Huaxiu Yao; |
| 315 | Privacy Auditing of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method can be used to provide a privacy audit of $\varepsilon \approx 1$ for a model trained with theoretical $\varepsilon$ of 4. |
Ashwinee Panda; Xinyu Tang; Christopher A. Choquette-Choo; Milad Nasr; Prateek Mittal; |
| 316 | VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce $\textbf{VibeCheck}$, a system for automatically comparing a pair of LLMs by discovering identifying traits of a model (vibes) that are well-defined, differentiating, and user-aligned. |
Lisa Dunlap; Krishna Mandal; Trevor Darrell; Jacob Steinhardt; Joseph E. Gonzalez; |
| 317 | Neural Phylogeny: Fine-Tuning Relationship Detection Among Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present two approaches for neural phylogeny detection: a learning-free method and a learning-based method. |
Runpeng Yu; Xinchao Wang; |
| 318 | Differential Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. |
Tianzhu Ye; Li Dong; Yuqing Xia; Yutao Sun; Yi Zhu; Gao Huang; Furu Wei; |
| 319 | LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LLaVA-Mini, an efficient LMM with minimal vision tokens. |
Shaolei Zhang; Qingkai Fang; Zhe Yang; Yang Feng; |
| 320 | OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To harness the respective strengths of different classifiers, we propose a principled approach, OCCAM, to compute the best classifier assignment strategy over classification queries (termed as the optimal model portfolio) so that the aggregated accuracy is maximized, under user-specified cost budgets. |
Dujian Ding; Bicheng Xu; Laks V. S. Lakshmanan; |
| 321 | Selective Induction Heads: How Transformers Select Causal Structures in Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel synthetic framework designed to enable the theoretical analysis of transformers’ ability to dynamically handle causal structures. |
Francesco D’Angelo; Francesco Croce; Nicolas Flammarion; |
| 322 | Long Context Compression with Activation Beacon Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Activation Beacon, a plug-in module for transformer-based LLMs that targets effective, efficient, and flexible compression of long contexts. |
Peitian Zhang; Zheng Liu; Shitao Xiao; Ninglu Shao; Qiwei Ye; Zhicheng Dou; |
| 323 | VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current RAG systems are solely based on text, rendering it impossible to utilize vision information like layout and images that play crucial roles in real-world multi-modality documents. In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline. |
Shi Yu; Chaoyue Tang; Bokai Xu; Junbo Cui; Junhao Ran; Yukun Yan; Zhenghao Liu; Shuo Wang; Xu Han; Zhiyuan Liu; Maosong Sun; |
| 324 | $\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional KV Cache eviction strategies, which discard less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce **D**ynamic **D**iscriminative **O**perations ($\mathbf{D_2 O}$), a novel method that optimizes KV cache size dynamically and discriminatively at two levels without fine-tuning, while preserving essential context. |
Zhongwei Wan; Xinjian Wu; Yu Zhang; Yi Xin; Chaofan Tao; Zhihong Zhu; Xin Wang; Siqi Luo; Jing Xiong; Longyue Wang; Mi Zhang; |
| 325 | Inference Scaling for Long-Context Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate inference scaling for retrieval augmented generation (RAG), exploring the combination of multiple strategies beyond simply increasing the quantity of knowledge, including in-context learning and iterative prompting. |
Zhenrui Yue; Honglei Zhuang; Aijun Bai; Kai Hui; Rolf Jagerman; Hansi Zeng; Zhen Qin; Dong Wang; Xuanhui Wang; Michael Bendersky; |
| 326 | UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose UniWav, an encoder-decoder framework designed to unify pre-training representation learning and generative tasks. |
Alexander H. Liu; Sang-gil Lee; Chao-Han Huck Yang; Yuan Gong; Yu-Chiang Frank Wang; James R. Glass; Rafael Valle; Bryan Catanzaro; |
| 327 | Towards Automated Knowledge Integration From Human-Interpretable Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the success of informed machine learning methods, designing algorithms with explicit inductive biases remains largely a manual process. In this work, we explore how prior knowledge represented in its native formats, e.g. in natural language, can be integrated into machine learning models in an automated manner. |
Kasia Kobalczyk; Mihaela van der Schaar; |
| 328 | Active Task Disambiguation with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the impressive performance of large language models (LLMs) across various benchmarks, their ability to address ambiguously specified problems—frequent in real-world interactions—remains underexplored. To address this gap, we introduce a formal definition of task ambiguity and frame the problem of task disambiguation through the lens of Bayesian Experimental Design. |
Kasia Kobalczyk; Nicolás Astorga; Tennison Liu; Mihaela van der Schaar; |
| 329 | Going Beyond Static: Understanding Shifts with Time-Series Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to support an empirically grounded inductive approach to research, we introduce our **T**ime-**S**eries **S**hift **A**ttribution (TSSA) framework, which analyzes *problem-specific* patterns of distribution shifts. |
Jiashuo Liu; Nabeel Seedat; Peng Cui; Mihaela van der Schaar; |
| 330 | Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a novel problem setting where every collected sample is paired with a vector indicating the data quality: risk vector. |
Yangming Li; Max Ruiz Luyten; Mihaela van der Schaar; |
| 331 | Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach to LLM weight pruning that directly optimizes for approximating the attention matrix, a core component of transformer architectures. |
Yingyu Liang; Jiangxuan Long; Zhenmei Shi; Zhao Song; Yufa Zhou; |
| 332 | Personalized Representation from Personalized Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we explore a potential connection between these ideas, and formalize the challenge of using personalized synthetic data to learn personalized representations, which encode knowledge about an object of interest and may be flexibly applied to any downstream task relating to the target object. We introduce an evaluation suite for this challenge, including reformulations of two existing datasets and a novel dataset explicitly constructed for this purpose, and propose a contrastive learning approach that makes creative use of image generators. |
Shobhita Sundaram; Julia Chae; Yonglong Tian; Sara Beery; Phillip Isola; |
| 333 | Large Language Models Assume People Are More Rational Than We Really Are Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order for AI systems to communicate effectively with people, they must understand how we make decisions. |
Ryan Liu; Jiayi Geng; Joshua Peterson; Ilia Sucholutsky; Thomas L. Griffiths; |
| 334 | Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Meissonic, which elevates non-autoregressive text-to-image Masked Image Modeling (MIM) to a level comparable with state-of-the-art diffusion models like SDXL. |
Jinbin Bai; Tian Ye; Wei Chow; Enxin Song; Qing-Guo Chen; Xiangtai Li; Zhen Dong; Lei Zhu; Shuicheng YAN; |
| 335 | APE: Faster and Longer Context-Augmented Generation Via Adaptive Parallel Encoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable effective and efficient CAG, we propose Adaptive Parallel Encoding (**APE**), which brings shared prefix, attention temperature, and scaling factor to align the distribution of parallel encoding with sequential encoding. |
Xinyu Yang; Tianqi Chen; Beidi Chen; |
| 336 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we present WorkflowLLM, a data-centric framework elaborately designed to enhance the capability of LLMs in workflow orchestration.Specifically, the construction process can be divided into three phases: (1) Data Collection: we collect real-world workflow data from Apple Shortcuts and RoutineHub, transcribing them into Python-style code. |
Shengda Fan; Xin Cong; Yuepeng Fu; Zhong Zhang; Shuyan Zhang; Yuanwei Liu; Yesai Wu; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
| 337 | Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Group Position Embedding (GPE), a novel and efficient technique to enhance the layout understanding capabilities of LLMs without architectural changes or additional pre-training.We also introduce a challenging benchmark called BLADE, specifically designed to assess layout comprehension. |
Yuke Zhu; Yue Zhang; Dongdong Liu; Chi Xie; Zihua Xiong; Bo Zheng; Sheng Guo; |
| 338 | (Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we survey over 50 papers that study scaling trends: while 45 of these papers quantify these trends using a power law, most under-report crucial details needed to reproduce their findings. To mitigate this, we we propose a checklist for authors to consider while contributing to scaling law research. |
Margaret Li; Sneha Kudugunta; Luke Zettlemoyer; |
| 339 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Herein, we present $\underline{\text{Video}}$ $\underline{\text{S}}\text{elf}$-$\underline{\text{T}}\text{raining}$ $\text{with}$ $\underline{\text{a}}\text{ugmented}$ $\underline{\text{R}}\text{easoning}$ (Video-STaR), the first self-training approach for video instruction tuning. |
Orr Zohar; Xiaohan Wang; Yonatan Bitton; Idan Szpektor; Serena Yeung-Levy; |
| 340 | GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We believe our work provides valuable insights for future research in dynamic GUI content understanding.To this end, this paper introduces a new dataset, termed GUI-World, which features meticulously crafted Human-MLLM annotations, extensively covering six GUI scenarios and eight types of GUI-oriented questions in three formats. |
Dongping Chen; Yue Huang; Siyuan Wu; Jingyu Tang; Huichi Zhou; Qihui Zhang; Zhigang He; Yilin Bai; Chujie Gao; Liuyi Chen; Yiqiang Li; Chenlong Wang; Yue Yu; Tianshuo Zhou; Zhen Li; Yi Gui; Yao Wan; Pan Zhou; Jianfeng Gao; Lichao Sun; |
| 341 | Scaling Autonomous Agents Via Automatic Reward Modeling And Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address LLM agents’ limitations, we propose a framework that can automatically learn a reward model from the environment without human annotations. |
Zhenfang Chen; Delin Chen; Rui Sun; Wenjun Liu; Chuang Gan; |
| 342 | Collab: Controlled Decoding Using Mixture of Agents for LLM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To strengthen the test-time performance w.r.t the target task, we propose a mixture of agents-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies. |
Souradip Chakraborty; Sujay Bhatt; Udari Madhushani Sehwag; Soumya Suvra Ghosal; Jiahao Qiu; Mengdi Wang; Dinesh Manocha; Furong Huang; Alec Koppel; Sumitra Ganesh; |
| 343 | Speculative Knowledge Distillation: Bridging The Teacher-Student Gap Through Interleaved Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples with which teacher models are not familiar, resulting in inaccurate teacher feedback. To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student’s inference-time distribution. |
Wenda Xu; Rujun Han; Zifeng Wang; Long Le; Dhruv Madeka; Lei Li; William Yang Wang; Rishabh Agarwal; Chen-Yu Lee; Tomas Pfister; |
| 344 | InstructRAG: Instructing Retrieval-Augmented Generation Via Self-Synthesized Rationales Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales — First, we instruct the LM to explain how the ground-truth answer is derived from retrieved documents. |
Zhepei Wei; Wei-Lin Chen; Yu Meng; |
| 345 | MUSE: Machine Unlearning Six-Way Evaluation for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. |
Weijia Shi; Jaechan Lee; Yangsibo Huang; Sadhika Malladi; Jieyu Zhao; Ari Holtzman; Daogao Liu; Luke Zettlemoyer; Noah A. Smith; Chiyuan Zhang; |
| 346 | Deconstructing What Makes A Good Optimizer for Autoregressive Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to compare several optimization algorithms, including SGD, Adafactor, Adam, Lion, and Sophia in the context of autoregressive language modeling across a range of model sizes, hyperparameters, and architecture variants. |
Rosie Zhao; Depen Morwani; David Brandfonbrener; Nikhil Vyas; Sham M. Kakade; |
| 347 | MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Masked Image Modeling Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models. |
Benedikt Alkin; Lukas Miklautz; Sepp Hochreiter; Johannes Brandstetter; |
| 348 | Vision-LSTM: XLSTM As Generic Vision Backbone Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Vision-LSTM (ViL), an adaption of the xLSTM building blocks to computer vision. |
Benedikt Alkin; Maximilian Beck; Korbinian Pöppel; Sepp Hochreiter; Johannes Brandstetter; |
| 349 | RMB: Comprehensively Benchmarking Reward Models in LLM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address these limitations, we propose RMB, a comprehensive RM benchmark that covers over 49 real-world scenarios and includes both pairwise and Best-of-N (BoN) evaluations to better reflect the effectiveness of RMs in guiding alignment optimization.We will release our evaluation code and datasets upon publication. |
Enyu Zhou; Guodong Zheng; Binghai Wang; Zhiheng Xi; Shihan Dou; Rong Bao; Wei Shen; Limao Xiong; Jessica Fan; Yurong Mou; Rui Zheng; Tao Gui; Qi Zhang; Xuanjing Huang; |
| 350 | Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide two partial solutions. |
Zhuohua Li; Maoli Liu; Xiangxiang Dai; John C.S. Lui; |
| 351 | DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. |
Chenguo Lin; Panwang Pan; Bangbang Yang; Zeming Li; Yadong MU; |
| 352 | Compositional Simulation-based Inference for Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Scientific simulators frequently emulate real-world dynamics through thousands of single-state transitions over time. We propose an SBI approach that can exploit such Markovian simulators by locally identifying parameters consistent with individual state transitions. |
Manuel Gloeckler; Shoji Toyota; Kenji Fukumizu; Jakob H. Macke; |
| 353 | Personalized Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Personalized Visual Instruction Tuning (PVIT), a novel data curation and training framework designed to enable MLLMs to identify target individuals within an image and engage in personalized and coherent dialogues.To evaluate the personalized potential of MLLMs, we present a benchmark called P-Bench, which encompasses various question types with different levels of difficulty. |
Renjie Pi; Jianshu Zhang; Tianyang Han; Jipeng Zhang; Rui Pan; Tong Zhang; |
| 354 | Recite, Reconstruct, Recollect: Memorization in LMs As A Multifaceted Phenomenon Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate the usefulness of our taxonomy by using it to construct a predictive model for memorization. |
USVSN Sai Prashanth; Alvin Deng; Kyle O’Brien; Jyothir S V; Mohammad Aflah Khan; Jaydeep Borkar; Christopher A. Choquette-Choo; Jacob Ray Fuehne; Stella Biderman; Tracy Ke; Katherine Lee; Naomi Saphra; |
| 355 | Fugatto 1: Foundational Generative Audio Transformer Opus 1 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is because audio data does not inherently contain the instructions that were used to generate it. To overcome this challenge, we introduce a specialized dataset generation approach optimized for producing a wide range of audio generation and transformation tasks, ensuring the data reveals meaningful relationships between audio and language. |
Rafael Valle; Rohan Badlani; Zhifeng Kong; Sang-gil Lee; Arushi Goel; Sungwon Kim; Joao Felipe Santos; Shuqi Dai; Siddharth Gururani; Aya Aljafari; Alexander H. Liu; Kevin J. Shih; Ryan Prenger; Wei Ping; Chao-Han Huck Yang; Bryan Catanzaro; |
| 356 | The Superposition of Diffusion Models Using The Itô Density Estimator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. |
Marta Skreta; Lazar Atanackovic; Joey Bose; Alexander Tong; Kirill Neklyudov; |
| 357 | Jailbreaking As A Reward Misspecification Problem Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a new perspective that attributes this vulnerability to reward misspecification during the alignment process. |
Zhihui Xie; Jiahui Gao; Lei Li; Zhenguo Li; Qi Liu; Lingpeng Kong; |
| 358 | AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses this challenge using a zero-shot approach with a pre-trained diffusion model. Despite this potential, achieving our goals is difficult due to the diffusion model’s lack of understanding of ”where” and ”how” objects interact with the human body. |
Yukang Cao; Liang Pan; Kai Han; Kwan-Yee K. Wong; Ziwei Liu; |
| 359 | Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models Via Energy Hessians Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a method for transferring general-purpose representations from MLFF foundation models to smaller, faster MLFFs specialized to specific regions of chemical space. |
Ishan Amin; Sanjeev Raja; Aditi S. Krishnapriyan; |
| 360 | MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing benchmarks suffer from limitations in data scale, scope, and evaluation depth, while current evaluation metrics are often costly or biased, lacking in reliability for practical applications. To address these challenges, we introduce MMIE, a large-scale knowledge-intensive benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs). |
Peng Xia; Siwei Han; Shi Qiu; Yiyang Zhou; Zhaoyang Wang; Wenhao Zheng; Zhaorun Chen; Chenhang Cui; Mingyu Ding; Linjie Li; Lijuan Wang; Huaxiu Yao; |
| 361 | MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs. |
Peng Xia; Kangyu Zhu; Haoran Li; Tianze Wang; Weijia Shi; Sheng Wang; Linjun Zhang; James Zou; Huaxiu Yao; |
| 362 | McEval: Massively Multilingual Code Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further facilitate the research of code LLMs, we propose a massively multilingual code benchmark covering 40 programming languages (McEval) with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. |
Linzheng Chai; Shukai Liu; Jian Yang; Yuwei Yin; JinKe; Jiaheng Liu; Tao Sun; Ge Zhang; Changyu Ren; Hongcheng Guo; Noah Wang; Boyang Wang; Xianjie Wu; Bing Wang; Tongliang Li; Liqun Yang; Sufeng Duan; Zhaoxiang Zhang; Zhoujun Li; |
| 363 | Vector-ICL: In-context Learning with Continuous Vector Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) have shown remarkable in-context learning (ICL) capabilities on textual data. We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders. |
Yufan Zhuang; Chandan Singh; Liyuan Liu; Jingbo Shang; Jianfeng Gao; |
| 364 | DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose DaWin, a training-free dynamic weight interpolation method that leverages the entropy of individual models over each unlabeled test sample to assess model expertise, and compute per-sample interpolation coefficients dynamically. |
Changdae Oh; Yixuan Li; Kyungwoo Song; Sangdoo Yun; Dongyoon Han; |
| 365 | Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate *diverse* and *effective* attack prompts. |
Seanie Lee; Minsu Kim; Lynn Cherif; David Dobre; Juho Lee; Sung Ju Hwang; Kenji Kawaguchi; Gauthier Gidel; Yoshua Bengio; Nikolay Malkin; Moksh Jain; |
| 366 | Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. |
Yao Teng; Han Shi; Xian Liu; Xuefei Ning; Guohao Dai; Yu Wang; Zhenguo Li; Xihui Liu; |
| 367 | Dynamic-LLaVA: Efficient Multimodal Large Language Models Via Dynamic Vision-language Context Sparsification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, the efficiency benefits of the vision context reduction in the prefill stage gradually diminish during the decoding stage. To address this problem, we proposed a dynamic vision-language context sparsification framework Dynamic-LLaVA, which dynamically reduces the redundancy of vision context in the prefill stage and decreases the memory and computation overhead of the generated language context during decoding. |
Wenxuan Huang; Zijie Zhai; Yunhang Shen; Shaosheng Cao; Fei Zhao; Xiangfeng Xu; Zheyu Ye; Shaohui Lin; |
| 368 | Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. |
Heyang Zhao; Xingrui Yu; David Mark Bossens; Ivor Tsang; Quanquan Gu; |
| 369 | Self-Play Preference Optimization for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. |
Yue Wu; Zhiqing Sun; Huizhuo Yuan; Kaixuan Ji; Yiming Yang; Quanquan Gu; |
| 370 | Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals such as movement regions to stabilize movements, which compromise the naturalness and freedom of motion. To address this issue, we propose an end-to-end audio-only conditioned video diffusion model named Loopy. |
Jianwen Jiang; Chao Liang; Jiaqi Yang; Gaojie Lin; Tianyun Zhong; Yanbo Zheng; |
| 371 | Scaling Large Language Model-based Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the neural scaling law—increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. |
Chen Qian; Zihao Xie; YiFei Wang; Wei Liu; Kunlun Zhu; Hanchen Xia; Yufan Dang; Zhuoyun Du; Weize Chen; Cheng Yang; Zhiyuan Liu; Maosong Sun; |
| 372 | ProteinBench: A Holistic Evaluation of Protein Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill this gap, we introduce ProteinBench, a holistic evaluation framework designed to enhance the transparency of protein foundation models.To promote transparency and facilitate further research, we release the evaluation dataset, code, and a public leaderboard publicly for further analysis and a general modular toolkit. |
Fei YE; Zaixiang Zheng; Dongyu Xue; Yuning Shen; Lihao Wang; Yiming Ma; Yan Wang; Xinyou Wang; Xiangxin Zhou; Quanquan Gu; |
| 373 | Probing The Latent Hierarchical Structure of Data Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that forward-backward experiments in diffusion-based models, where data is noised and then denoised to generate new samples, are a promising tool to probe the latent structure of data. |
Antonio Sclocchi; Alessandro Favero; Noam Itzhak Levi; Matthieu Wyart; |
| 374 | BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. |
Shaozhe Hao; Xuantong LIU; Xianbiao Qi; Shihao Zhao; Bojia Zi; Rong Xiao; Kai Han; Kwan-Yee K. Wong; |
| 375 | Knowledge Entropy Decay During Language Model Pretraining Hinders New Knowledge Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how a model’s tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. |
Jiyeon Kim; Hyunji Lee; Hyowon Cho; Joel Jang; Hyeonbin Hwang; Seungpil Won; Youbin Ahn; Dohaeng Lee; Minjoon Seo; |
| 376 | To CoT or Not to CoT? Chain-of-thought Helps Mainly on Math and Symbolic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But for what kinds of tasks is this extra thinking really helpful? To analyze this, we conducted a quantitative meta-analysis covering over 100 papers using CoT and ran our own evaluations of 20 datasets across 14 models. |
Zayne Rea Sprague; Fangcong Yin; Juan Diego Rodriguez; Dongwei Jiang; Manya Wadhwa; Prasann Singhal; Xinyu Zhao; Xi Ye; Kyle Mahowald; Greg Durrett; |
| 377 | Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we draw inspiration from the LLM-as-a-judge framework, which demonstrated that LLMs are able to rate answers in a versatile way. |
Gregor Bachmann; Sotiris Anagnostidis; Albert Pumarola; Markos Georgopoulos; Artsiom Sanakoyeu; Yuming Du; Edgar Schönfeld; Ali Thabet; Jonas K Kohler; |
| 378 | On Targeted Manipulation and Deception When Optimizing LLMs for User Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, training to maximize human feedback creates a perverse incentive structure for the AI to resort to manipulative or deceptive tactics to obtain positive feedback from users who are vulnerable to such strategies. We study this phenomenon by training LLMs with Reinforcement Learning with simulated user feedback in environments of practical LLM usage. |
Marcus Williams; Micah Carroll; Adhyyan Narang; Constantin Weisser; Brendan Murphy; Anca Dragan; |
| 379 | Rational Decision-Making Agent with Learning Internal Utility Judgment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For genuine autonomous decision-making for LLM-based agents, it is imperative to develop rationality from their posterior experiences to judge the utility of each decision independently. In this work, we propose RaDAgent (Rational Decision-Making Agent), which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning. |
Yining Ye; Xin Cong; Shizuo Tian; Yujia Qin; Chong Liu; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
| 380 | Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLM) are increasingly trained on data generated by other LLMs, either because generated text and images become part of the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation. |
Yunzhen Feng; Elvis Dohmatob; Pu Yang; Francois Charton; Julia Kempe; |
| 381 | SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike existing methods focused on multi-view generation of single objects for 4D reconstruction, our interest lies in generating open-world videos from arbitrary viewpoints, incorporating six degrees of freedom (6 DoF) camera poses. To achieve this, we propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation, ensuring consistent content across different viewpoints. |
Jianhong Bai; Menghan Xia; Xintao Wang; Ziyang Yuan; Zuozhu Liu; Haoji Hu; Pengfei Wan; Di ZHANG; |
| 382 | Perturbation-Restrained Sequential Model Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, a framework termed Perturbation Restraint on Upper bouNd for Editing (PRUNE) is proposed, which applies the condition number restraints in sequential editing. |
Jun-Yu Ma; Hong Wang; Hao-Xiang Xu; Zhen-Hua Ling; Jia-Chen Gu; |
| 383 | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we re-evaluate the necessity of additional modules and analyze how to improve training efficiency and reduce redundant steps in the inference process. Based on these insights, we propose CatVTON, a simple and efficient virtual try-on diffusion model that transfers in-shop or worn garments of arbitrary categories to target individuals by concatenating them along spatial dimensions as inputs of the diffusion model. |
Zheng Chong; Xiao Dong; Haoxiang Li; shiyue Zhang; Wenqing Zhang; Hanqing Zhao; xujie zhang; Dongmei Jiang; Xiaodan Liang; |
| 384 | LongPO: Long Context Self-Evolution of Large Language Models Through Short-to-Long Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This alignment process remains challenging due to the impracticality of human annotation for extended contexts and the difficulty in balancing short- and long-context performance. To address these challenges, we introduce LongPO, that enables short-context LLMs to self-evolve to excel on long-context tasks by internally transferring short-context capabilities. |
Guanzheng Chen; Xin Li; Michael Shieh; Lidong Bing; |
| 385 | Model Equality Testing: Which Model Is This API Serving? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to cut costs or add functionality, API providers may quantize, watermark, or finetune the underlying model, changing the output distribution — possibly without notifying users. We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem, where the user collects samples from the API and a reference distribution and conducts a statistical test to see if the two distributions are the same. |
Irena Gao; Percy Liang; Carlos Guestrin; |
| 386 | Energy-Weighted Flow Matching for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce energy-weighted flow matching (EFM), a method that directly learns the energy-guided flow without the need for auxiliary models. |
Shiyuan Zhang; Weitong Zhang; Quanquan Gu; |
| 387 | API Pack: A Massive Multi-Programming Language Dataset for API Call Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce API Pack, a massive multi-programming language dataset containing over one million instruction-API calls for improving the API call generation capabilities of large language models. |
Zhen Guo; Adriana Meza Soria; Wei Sun; Yikang Shen; Rameswar Panda; |
| 388 | Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose an online reaction policy, called Ready-to-React, to generate the next character pose based on past observed motions. |
Zhi Cen; Huaijin Pi; Sida Peng; Qing Shuai; Yujun Shen; Hujun Bao; Xiaowei Zhou; Ruizhen Hu; |
| 389 | Aligning Language Models with Demonstrated Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number ($<10$) of demonstrations as feedback. |
Omar Shaikh; Michelle S. Lam; Joey Hejna; Yijia Shao; Hyundong Justin Cho; Michael S. Bernstein; Diyi Yang; |
| 390 | CryoFM: A Flow-based Foundation Model for Cryo-EM Densities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present CryoFM, a foundation model designed as a generative model, learning the distribution of high-quality density maps and generalizing effectively to downstream tasks. |
Yi Zhou; Yilai Li; Jing Yuan; Quanquan Gu; |
| 391 | GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present GeoX, a multi-modal large model focusing on geometric understanding and reasoning tasks. |
Renqiu Xia; Mingsheng Li; Hancheng Ye; Wenjie Wu; Hongbin Zhou; Jiakang Yuan; Tianshuo Peng; Xinyu Cai; Xiangchao Yan; Bin Wang; Conghui He; Botian Shi; Tao Chen; Junchi Yan; Bo Zhang; |
| 392 | Autoregressive Pretraining with Mamba in Vision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that Mamba’s visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. |
Sucheng Ren; Xianhang Li; Haoqin Tu; Feng Wang; Fangxun Shu; Lei Zhang; Jieru Mei; Linjie Yang; Peng Wang; Heng Wang; Alan Yuille; Cihang Xie; |
| 393 | PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a versatile image-to-image visual assistant, PixWizard, designed for image generation, manipulation, and translation based on free-from language instructions. |
Weifeng Lin; Xinyu Wei; Renrui Zhang; Le Zhuo; Shitian Zhao; Siyuan Huang; Junlin Xie; Peng Gao; Hongsheng Li; |
| 394 | Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the Draw-and-Understand framework, exploring how to integrate visual prompting understanding capabilities into Multimodal Large Language Models (MLLMs). |
Weifeng Lin; Xinyu Wei; Ruichuan An; Peng Gao; Bocheng Zou; Yulin Luo; Siyuan Huang; Shanghang Zhang; Hongsheng Li; |
| 395 | How Feature Learning Can Improve Neural Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a simple solvable model of neural scaling laws beyond the kernel limit. |
Blake Bordelon; Alexander Atanasov; Cengiz Pehlevan; |
| 396 | Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Fiddler, a resource-efficient inference system for MoE models with limited GPU resources. |
Keisuke Kamahori; Tian Tang; Yile Gu; Kan Zhu; Baris Kasikci; |
| 397 | Efficient Imitation Under Misspecification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the more general misspecified setting, where no assumptions are made about the expert policy’s realizability. |
Nicolas Espinosa-Dice; Sanjiban Choudhury; Wen Sun; Gokul Swamy; |
| 398 | OS-ATLAS: Foundation Action Model for Generalist GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Practitioners are often reluctant to use open-source VLMs due to their significant performance lag compared to their closed-source counterparts, particularly in GUI grounding and Out-Of-Distribution (OOD) scenarios. To facilitate future research in this area, we developed OS-Atlas—a foundational GUI action model that excels at GUI grounding and OOD agentic tasks through innovations in both data and modeling. |
Zhiyong Wu; Zhenyu Wu; Fangzhi Xu; Yian Wang; Qiushi Sun; Chengyou Jia; Kanzhi Cheng; Zichen Ding; Liheng Chen; Paul Pu Liang; Yu Qiao; |
| 399 | Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates a basic question in reinforcement learning from human feedback (RLHF) from a theoretical perspective: how to efficiently explore in an online manner under preference feedback and general function approximation. We take the initial step towards a theoretical understanding of this problem by proposing a novel algorithm, *Exploratory Preference Optimization* (XPO). |
Tengyang Xie; Dylan J Foster; Akshay Krishnamurthy; Corby Rosset; Ahmed Hassan Awadallah; Alexander Rakhlin; |
| 400 | GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods show promise, they face challenges in input formats, latent space structures, and output representations. This paper introduces a novel 3D generation framework that addresses these issues, enabling scalable and high-quality 3D generation with an interactive Point Cloud-structured Latent space. |
Yushi LAN; Shangchen Zhou; Zhaoyang Lyu; Fangzhou Hong; Shuai Yang; Bo Dai; Xingang Pan; Chen Change Loy; |
| 401 | Differentiable and Learnable Wireless Simulation with Geometric Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Wi-GATr, a fully-learnable neural simulation surrogate designed to predict the channel observations based on scene primitives (e. g., surface mesh, antenna position and orientation). |
Thomas Hehn; Markus Peschl; Tribhuvanesh Orekondy; Arash Behboodi; Johann Brehmer; |
| 402 | FormalAlign: Automated Alignment Evaluation for Autoformalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing approaches heavily rely on manual verification, hindering scalability. To address this, we introduce FormalAlign, a framework for automatically evaluating the alignment between natural and formal languages in autoformalization. |
Jianqiao Lu; Yingjia Wan; Yinya Huang; Jing Xiong; Zhengying Liu; Zhijiang Guo; |
| 403 | AdaIR: Adaptive All-in-One Image Restoration Via Frequency Mining and Modulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most methods purely operate in the spatial domain and do not delve into the distinct frequency variations inherent to different degradation types. To address this gap, we propose an adaptive all-in-one image restoration network based on frequency mining and modulation. |
Yuning Cui; Syed Waqas Zamir; Salman Khan; Alois Knoll; Mubarak Shah; Fahad Shahbaz Khan; |
| 404 | WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domain: 1) extreme compression. |
Shengpeng Ji; Ziyue Jiang; Wen Wang; Yifu Chen; Minghui Fang; Jialong Zuo; Qian Yang; Xize Cheng; Zehan Wang; Ruiqi Li; Ziang Zhang; Xiaoda Yang; Rongjie Huang; Yidi Jiang; Qian Chen; Siqi Zheng; Zhou Zhao; |
| 405 | Scaling Wearable Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, accelerometer, electrodermal activity, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. |
Girish Narayanswamy; Xin Liu; Kumar Ayush; Yuzhe Yang; Xuhai Xu; shun liao; Jake Garrison; Shyam A. Tailor; Jacob Sunshine; Yun Liu; Tim Althoff; Shrikanth Narayanan; Pushmeet Kohli; Jiening Zhan; Mark Malhotra; Shwetak Patel; Samy Abdel-Ghaffar; Daniel McDuff; |
| 406 | System 1.x: Learning to Balance Fast and Slow Planning with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose the System-1.x Planner, a framework for controllable planning with language models that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand. |
Swarnadeep Saha; Archiki Prasad; Justin Chen; Peter Hase; Elias Stengel-Eskin; Mohit Bansal; |
| 407 | ACE: All-round Creator and Editor Following Instructions Via Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose ACE, an All-round Creator and Editor, which achieves comparable performance compared to those expert models in a wide range of visual generation tasks. |
Zhen Han; Zeyinzi Jiang; Yulin Pan; Jingfeng Zhang; Chaojie Mao; Chen-Wei Xie; Yu Liu; Jingren Zhou; |
| 408 | Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. |
Yaxi Lu; Shenzhi Yang; Cheng Qian; Guirong Chen; Qinyu Luo; Yesai Wu; Huadong Wang; Xin Cong; Zhong Zhang; Yankai Lin; Weiwen Liu; Yasheng Wang; Zhiyuan Liu; Fangming Liu; Maosong Sun; |
| 409 | Scaling Laws for Precision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we devise precision-aware scaling laws for both training and inference. |
Tanishq Kumar; Zachary Ankner; Benjamin Frederick Spector; Blake Bordelon; Niklas Muennighoff; Mansheej Paul; Cengiz Pehlevan; Christopher Re; Aditi Raghunathan; |
| 410 | Protecting Against Simultaneous Data Poisoning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that multiple backdoors can be simultaneously installed in a single model through parallel data poisoning attacks without substantially degrading clean accuracy. |
Neel Alex; Shoaib Ahmed Siddiqui; Amartya Sanyal; David Krueger; |
| 411 | DiscoveryBench: Towards Data-Driven Discovery with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. |
Bodhisattwa Prasad Majumder; Harshit Surana; Dhruv Agarwal; Bhavana Dalvi Mishra; Abhijeetsingh Meena; Aryan Prakhar; Tirth Vora; Tushar Khot; Ashish Sabharwal; Peter Clark; |
| 412 | Permute-and-Flip: An Optimally Stable and Watermarkable Decoder for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. |
Xuandong Zhao; Lei Li; Yu-Xiang Wang; |
| 413 | InstantPortrait: One-Step Portrait Editing Via Diffusion Multi-Objective Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the gap, this paper introduces an Instant-Portrait Network (IPNet), the first one-step diffusion-based model for portrait editing. |
Zhixin Lai; Keqiang Sun; Fu-Yun Wang; Dhritiman Sagar; Erli Ding; |
| 414 | Graph Neural Networks for Edge Signals: Orientation Equivariance and Invariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They can neither model undirected edge signals nor distinguish if an edge itself is directed or undirected. We address these shortcomings by (i) revising the notion of *orientation equivariance* to enable edge direction-aware topological models, (ii) proposing *orientation invariance* as an additional requirement to describe signals without inherent direction, and (iii) developing EIGN, an architecture composed of novel direction-aware edge-level graph shift operators, that provably fulfils the aforementioned desiderata. |
Dominik Fuchsgruber; Tim Postuvan; Stephan Günnemann; Simon Geisler; |
| 415 | Trajectory Attention for Fine-grained Video Motion Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces trajectory attention, a novel approach that performs attention along available pixel trajectories for fine-grained camera motion control. |
Zeqi Xiao; Wenqi Ouyang; Yifan Zhou; Shuai Yang; Lei Yang; Jianlou Si; Xingang Pan; |
| 416 | HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, HiSplat, which introduces a hierarchical manner in generalizable 3D Gaussian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy. |
Shengji Tang; Weicai Ye; Peng Ye; Weihao Lin; Yang Zhou; Tao Chen; Wanli Ouyang; |
| 417 | Facilitating Multi-turn Function Calling for LLMs Via Compositional Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While current research on function calling by LLMs primarily focuses on single-turn interactions, this paper addresses the overlooked necessity for LLMs to engage in multi-turn function calling—critical for handling compositional, real-world queries that require planning with functions but not only use functions. To facilitate this, we introduce an approach, BUTTON, which generates synthetic compositional instruction tuning data via bottom-up instruction construction and top-down trajectory generation. |
Mingyang Chen; sunhaoze; Tianpeng Li; Fan Yang; Hao Liang; KeerLu; Bin CUI; Wentao Zhang; Zenan Zhou; weipeng chen; |
| 418 | Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Without mitigating such gaps, diffusion for perception still struggles on tasks represented by multi-modal understanding (e.g., referring image segmentation). Motivated by these challenges, we analyze and improve the alignment between the generative diffusion process and perception objectives centering around the key observation: how perception quality evolves with the denoising process. |
Ziqi Pang; Xin Xu; Yu-Xiong Wang; |
| 419 | CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method for generating 360° panoramas from text prompts or images. |
Nikolai Kalischek; Michael Oechsle; Fabian Manhardt; Philipp Henzler; Konrad Schindler; Federico Tombari; |
| 420 | FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a comprehensive benchmark for fairness of LLMs in multi-turn scenarios, **FairMT-Bench**.Based on these findings, we develop a more challenging dataset, FairMT-1K, and test 15 current state-of-the-art (SOTA) LLMs on this dataset. |
Zhiting Fan; Ruizhe Chen; Tianxiang Hu; Zuozhu Liu; |
| 421 | Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, existing analysis for deterministic samplers usually focuses on specific examples, lacking a generalized approach for general forward processes and various deterministic samplers. Our paper addresses these limitations by introducing a unified convergence analysis framework. |
Runjia Li; Qiwei Di; Quanquan Gu; |
| 422 | Towards General-Purpose Model-Free Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. |
Scott Fujimoto; Pierluca D’Oro; Amy Zhang; Yuandong Tian; Michael Rabbat; |
| 423 | OmnixR: Evaluating Omni-modality Language Models on Reasoning Across Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \textbf{OmnixR}, an evaluation suite designed to benchmark state-of-the-art Omni-modality Language Models (OLMs), such as GPT-4o and Gemini. |
Lichang Chen; Hexiang Hu; Mingda Zhang; Yiwen Chen; Zifeng Wang; YANDONG LI; Pranav Shyam; Tianyi Zhou; Heng Huang; Ming-Hsuan Yang; Boqing Gong; |
| 424 | SPA: 3D Spatial-Awareness Enables Effective Embodied Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce SPA, a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. |
Haoyi Zhu; Honghui Yang; Yating Wang; Jiange Yang; Limin Wang; Tong He; |
| 425 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce PrefEval, a benchmark for evaluating LLMs’ ability to infer, memorize and adhere to user preferences in long-context conversational setting. |
Siyan Zhao; Mingyi Hong; Yang Liu; Devamanyu Hazarika; Kaixiang Lin; |
| 426 | Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing benchmarks often rely on extensive human annotation or handcrafted templates, making it difficult to achieve the necessary complexity, scalability, and diversity for robust evaluation. To address these limitations, we propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models (LLMs) with the rigor and precision of symbolic provers, enabling the creation of a scalable, diverse, and high-quality FOL reasoning dataset, ProverQA. |
Chengwen Qi; Ren Ma; Bowen Li; He Du; Binyuan Hui; Jinwang Wu; Yuanjun Laili; Conghui He; |
| 427 | AI As Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models Via Systematic Attribution of Machine Text Against Web Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text by reconstructing it from existing text snippets on the web. |
Ximing Lu; Melanie Sclar; Skyler Hallinan; Niloofar Mireshghallah; Jiacheng Liu; Seungju Han; Allyson Ettinger; Liwei Jiang; Khyathi Chandu; Nouha Dziri; Yejin Choi; |
| 428 | A Sanity Check for AI-generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a sanity check on whether the task of AI-generated image detection has been solved.To start with, we present Chameleon dataset, consisting of AI-generated images that are genuinely challenging for human perception. |
Shilin Yan; Ouxiang Li; Jiayin Cai; Yanbin Hao; Xiaolong Jiang; Yao Hu; Weidi Xie; |
| 429 | Fast Feedforward 3D Gaussian Splatting Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. |
Yihang Chen; Qianyi Wu; Mengyao Li; Weiyao Lin; Mehrtash Harandi; Jianfei Cai; |
| 430 | No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the core principles of CFG and introduce a new method, independent condition guidance (ICG), which provides the benefits of CFG without the need for any special training procedures. |
Seyedmorteza Sadat; Manuel Kansy; Otmar Hilliges; Romann M. Weber; |
| 431 | Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. |
Seyedmorteza Sadat; Otmar Hilliges; Romann M. Weber; |
| 432 | Morphing Tokens Draw Strong Masked Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our pilot study identifies spatial inconsistency in supervisory signals and suggests that addressing it can improve representation learning. Building upon this insight, we introduce Dynamic Token Morphing (DTM), a novel method that dynamically aggregates tokens while preserving context to generate contextualized targets, thereby likely reducing spatial inconsistency. |
Taekyung Kim; Byeongho Heo; Dongyoon Han; |
| 433 | Mixture-of-Agents Enhances Large Language Model Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. |
Junlin Wang; Jue WANG; Ben Athiwaratkun; Ce Zhang; James Zou; |
| 434 | AgentStudio: A Toolkit for Building General Virtual Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, current evaluations lack in-depth analyses that decompose fundamental agent capabilities. We introduce AgentStudio, a trinity of environments, tools, and benchmarks to address these issues. |
Longtao Zheng; Zhiyuan Huang; Zhenghai Xue; Xinrun Wang; Bo An; Shuicheng YAN; |
| 435 | Is Your Multimodal Language Model Oversensitive to Safe Queries? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the initial step in investigating this behavior, we identify three representative types of stimuli that trigger the oversensitivity of existing MLLMs: $\textbf{\textit{Exaggerated Risk}}$, $\textbf{\textit{Negated Harm}}$, and $\textbf{\textit{Counterintuitive Interpretation}}$. To systematically evaluate MLLMs’ oversensitivity to these stimuli, we propose the $\textbf{M}$ultimodal $\textbf{O}$ver$\textbf{S}$en$\textbf{S}$itivity $\textbf{Bench}$mark (MOSSBench). |
Xirui Li; Hengguang Zhou; Ruochen Wang; Tianyi Zhou; Minhao Cheng; Cho-Jui Hsieh; |
| 436 | ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we argue that for an agent to fully automate scientific discovery, it must be able to complete all essential tasks in the workflow. |
Ziru Chen; Shijie Chen; Yuting Ning; Qianheng Zhang; Boshi Wang; Botao Yu; Yifei Li; Zeyi Liao; Chen Wei; Zitong Lu; Vishal Dey; Mingyi Xue; Frazier N. Baker; Benjamin Burns; Daniel Adu-Ampratwum; Xuhui Huang; Xia Ning; Song Gao; Yu Su; Huan Sun; |
| 437 | VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations. |
Yichao Liang; Nishanth Kumar; Hao Tang; Adrian Weller; Joshua B. Tenenbaum; Tom Silver; Joao F. Henriques; Kevin Ellis; |
| 438 | RazorAttention: Efficient KV Cache Compression Through Retrieval Heads Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel compression technique for KV cache that preserves all token information. |
Hanlin Tang; Yang Lin; Jing Lin; Qingsen Han; Danning Ke; Shikuan Hong; Yiwu Yao; Gongyi Wang; |
| 439 | Towards Semantic Equivalence of Tokenization in Multimodal LLM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods aggressively fragment visual input, corrupting the visual semantic integrity. To address this, this paper proposes a novel dynamic Semantic-Equivalent Vision Tokenizer (SeTok), which groups visual features into semantic units via a dynamic clustering algorithm, flexibly determining the number of tokens based on image complexity. |
Shengqiong Wu; Hao Fei; Xiangtai Li; Jiayi Ji; Hanwang Zhang; Tat-Seng Chua; Shuicheng YAN; |
| 440 | Restructuring Vector Quantization with The Rotation Trick Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a way to propagate gradients through the vector quantization layer of VQ-VAEs. |
Christopher Fifty; Ronald Guenther Junkins; Dennis Duan; Aniketh Iyengar; Jerry Weihong Liu; Ehsan Amid; Sebastian Thrun; Christopher Re; |
| 441 | Efficient Dictionary Learning with Switch Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Switch Sparse Autoencoders, a novel SAE architecture aimed at reducing the compute cost of training SAEs. |
Anish Mudide; Joshua Engels; Eric J Michaud; Max Tegmark; Christian Schroeder de Witt; |
| 442 | Can Watermarks Be Used to Detect LLM IP Infringement For Free? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the potential of LLM watermarks for detecting model infringement.To demonstrate the effectiveness of this approach, we construct a challenging model set containing multiple suspect LLMs on which direct detection methods struggle to yield effective results. |
Zhengyue Zhao; Xiaogeng Liu; Somesh Jha; Patrick McDaniel; Bo Li; Chaowei Xiao; |
| 443 | Towards A General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomaly detection model, which is pre-trained on extensive multi-domain datasets and can subsequently apply to a multitude of downstream scenarios. |
Qichao Shentu; Beibu Li; Kai Zhao; Yang Shu; Zhongwen Rao; Lujia Pan; Bin Yang; Chenjuan Guo; |
| 444 | From Tokens to Words: On The Inner Lexicon of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present evidence that LLMs engage in an intrinsic detokenization process, where subword sequences are combined into coherent whole-word representations at their last token. |
Guy Kaplan; Matanel Oren; Yuval Reif; Roy Schwartz; |
| 445 | Input Space Mode Connectivity in Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Mode connectivity was originally studied within parameter space, where it describes the existence of low-loss paths between different solutions (loss minimizers) obtained through gradient descent. We present theoretical and empirical evidence of its presence in the input space of deep networks, thereby highlighting the broader nature of the phenomenon. |
Jakub Vrabel; Ori Shem-Ur; Yaron Oz; David Krueger; |
| 446 | PivotMesh: Generic 3D Mesh Generation Via Pivot Vertices Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a generic and scalable mesh generation framework PivotMesh, which makes an initial attempt to extend the native mesh generation to large-scale datasets. |
Haohan Weng; Yikai Wang; Tong Zhang; C. L. Philip Chen; Jun Zhu; |
| 447 | CREAM: Consistency Regularized Self-Rewarding Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We then introduce the regularization to this generalized framework to mitigate the overconfident preference labeling in the self-rewarding process. Based on this theoretical insight, we propose a Consistency Regularized sElf-rewarding lAnguage Model (CREAM) that leverages the consistency of rewards across different iterations to regularize the self-rewarding training, helping the model to learn from more reliable preference data. |
Zhaoyang Wang; Weilei He; Zhiyuan Liang; Xuchao Zhang; Chetan Bansal; Ying Wei; Weitong Zhang; Huaxiu Yao; |
| 448 | $\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our objective appears to work particularly well in lower-data regimes, with gains over CLIP of $17.2\%$ on ImageNet and $18.0\%$ on ImageNet Real when training with CC3M. |
Vlad Sobal; Mark Ibrahim; Randall Balestriero; Vivien Cabannes; Diane Bouchacourt; Pietro Astolfi; Kyunghyun Cho; Yann LeCun; |
| 449 | FaithEval: Can Your Language Model Stay Faithful to Context, Even If The Moon Is Made of Marshmallows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce FaithEval, a novel and comprehensive benchmark tailored to evaluate the faithfulness of LLMs in contextual scenarios across three diverse tasks: unanswerable, inconsistent, and counterfactual contexts. |
Yifei Ming; Senthil Purushwalkam; Shrey Pandit; Zixuan Ke; Xuan-Phi Nguyen; Caiming Xiong; Shafiq Joty; |
| 450 | Dense Video Object Captioning from Disjoint Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new task and model for dense video object captioning — detecting, tracking and captioning trajectories of objects in a video. |
Xingyi Zhou; Anurag Arnab; Chen Sun; Cordelia Schmid; |
| 451 | Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates this phenomenon, identifying the detrimental impact of retrieved hard negatives as a key contributor. To mitigate this and enhance the robustness of long-context LLM-based RAG, we propose both training-free and training-based approaches. |
Bowen Jin; Jinsung Yoon; Jiawei Han; Sercan O Arik; |
| 452 | Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we present ISG, a comprehensive evaluation framework for interleaved text-and-image generation.In conjunction with ISG, we introduce a benchmark, ISG-Bench, encompassing 1,150 samples across 8 categories and 21 subcategories. |
Dongping Chen; Ruoxi Chen; Shu Pu; Zhaoyi Liu; Yanru Wu; Caixi Chen; Benlin Liu; Yue Huang; Yao Wan; Pan Zhou; Ranjay Krishna; |
| 453 | Capturing The Temporal Dependence of Training Data Influence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, exactly evaluating the trajectory-specific LOO presents a significant computational challenge. To address this, we propose \emph{data value embedding}, a novel technique enabling efficient approximation of trajectory-specific LOO. |
Jiachen T. Wang; Dawn Song; James Zou; Prateek Mittal; Ruoxi Jia; |
| 454 | Data Shapley in One Training Run Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present several techniques that allow the efficient scaling of In-Run Data Shapley to the size of foundation models. |
Jiachen T. Wang; Prateek Mittal; Dawn Song; Ruoxi Jia; |
| 455 | Intelligent Go-Explore: Standing on The Shoulders of Giant Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration (i.e., determine which states to save and explore from, and what actions to consider next), which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these handcrafted heuristics with the intelligence and internalized human notions of interestingness captured by giant pretrained foundation models (FMs). |
Cong Lu; Shengran Hu; Jeff Clune; |
| 456 | The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple heuristic privacy analysis of noisy clipped stochastic gradient descent (DP-SGD) in the setting where only the last iterate is released and the intermediate iterates remain hidden. |
Milad Nasr; Thomas Steinke; Borja Balle; Christopher A. Choquette-Choo; Arun Ganesh; Matthew Jagielski; Jamie Hayes; Abhradeep Guha Thakurta; Adam Smith; Andreas Terzis; |
| 457 | Distributional Associations Vs In-Context Reasoning: A Study of Feed-forward and Attention Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the core of the Transformer architecture behind such models are feed-forward and attention layers, which are often associated to knowledge and reasoning, respectively. In this paper, we study this distinction empirically and theoretically in a controlled synthetic setting where certain next-token predictions involve both distributional and in-context information. |
Lei Chen; Joan Bruna; Alberto Bietti; |
| 458 | The Foundations of Tokenization: Statistical and Computational Concerns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, the impact of tokenization on language model estimation has been investigated primarily through empirical means. The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models. |
Juan Luis Gastaldi; John Terilla; Luca Malagutti; Brian DuSell; Tim Vieira; Ryan Cotterell; |
| 459 | CycleResearcher: Improving Automated Research Via Automated Review Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our iterative preference training framework consists of CycleResearcher, which conducts research tasks, and CycleReviewer, which simulates the peer review process, providing iterative feedback via reinforcement learning. To train these models, we develop two new datasets, Review-5k and Research-14k, reflecting real-world machine learning research and peer review dynamics. |
Yixuan Weng; Minjun Zhu; Guangsheng Bao; Hongbo Zhang; Jindong Wang; Yue Zhang; Linyi Yang; |
| 460 | Controlling Space and Time with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 4DiM, a cascaded diffusion model for 4D novel view synthesis (NVS), supporting generation with arbitrary camera trajectories and timestamps, in natural scenes, conditioned on one or more images. |
Daniel Watson; Saurabh Saxena; Lala Li; Andrea Tagliasacchi; David J. Fleet; |
| 461 | A Little Goes A Long Way: Efficient Long Context Training and Inference with Partial Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper argues that integrating length extension with a GPU-friendly KV cache reduction architecture not only reduces training overhead during length extension, but also achieves better long-context performance. |
Suyu Ge; Xihui Lin; Yunan Zhang; Jiawei Han; Hao Peng; |
| 462 | Data Scaling Laws in Imitation Learning for Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment.With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90\% success rates in novel environments with unseen objects. |
Fanqi Lin; Yingdong Hu; Pingyue Sheng; Chuan Wen; Jiacheng You; Yang Gao; |
| 463 | RelCon: Relative Contrastive Learning for A Motion Foundation Model for Wearable Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. |
Maxwell A Xu; Jaya Narain; Gregory Darnell; Haraldur T Hallgrimsson; Hyewon Jeong; Darren Forde; Richard Andres Fineman; Karthik Jayaraman Raghuram; James Matthew Rehg; Shirley You Ren; |
| 464 | Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, asynchronous training relies on an underexplored regime, online but *off-policy* RLHF: learning on samples from previous iterations of our model which give a worse training signal. We tackle the fundamental challenge in this regime: how much off-policyness can we tolerate for asynchronous training to speed up learning but maintain performance? |
Michael Noukhovitch; Shengyi Huang; Sophie Xhonneux; Arian Hosseini; Rishabh Agarwal; Aaron Courville; |
| 465 | Artificial Kuramoto Oscillatory Neurons Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More recently, it was also hypothesized that dynamic (spatiotemporal) representations play an important role in both neuroscience and AI. Building on these ideas, we introduce Artificial Kuramoto Oscillatory Neurons (*AKOrN*) as a dynamical alternative to threshold units, which can be combined with arbitrary connectivity designs such as fully connected, convolutional, or attentive mechanisms. |
Takeru Miyato; Sindy Löwe; Andreas Geiger; Max Welling; |
| 466 | Rotated Runtime Smooth: Training-Free Activation Smoother for Accurate INT4 Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Rotated Runtime Smooth (**RRS**), a plug-and-play activation smoother for quantization, consisting of Runtime Smooth and the Rotation operation. |
Ke Yi; Zengke Liu; jianwei zhang; Chengyuan Li; Tong Zhang; Junyang Lin; Jingren Zhou; |
| 467 | DOTS: Learning to Reason Dynamically in LLMs Via Optimal Reasoning Trajectories Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DOTS, an approach enabling LLMs to reason Dynamically via Optimal reasoning Trajectories Search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM. |
Murong Yue; Wenlin Yao; Haitao Mi; Dian Yu; Ziyu Yao; Dong Yu; |
| 468 | TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce visual trace prompting, a simple yet effective approach to facilitate VLA models’ spatial-temporal awareness for action prediction by encoding state-action trajectories visually. |
Ruijie Zheng; Yongyuan Liang; Shuaiyi Huang; Jianfeng Gao; Hal Daumé III; Andrey Kolobov; Furong Huang; Jianwei Yang; |
| 469 | RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods often overlook the need for repository-level code understanding, which is crucial for accurately grasping the broader context and developing effective solutions. On this basis, we present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. |
Siru Ouyang; Wenhao Yu; Kaixin Ma; Zilin Xiao; Zhihan Zhang; Mengzhao Jia; Jiawei Han; Hongming Zhang; Dong Yu; |
| 470 | Hymba: A Hybrid-head Architecture for Small Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates attention mechanisms and state space models (SSMs) within the same layer, offering parallel and complementary processing of the same inputs. |
Xin Dong; Yonggan Fu; Shizhe Diao; Wonmin Byeon; ZIJIA CHEN; Ameya Sunil Mahabaleshwarkar; Shih-Yang Liu; Matthijs Van keirsbilck; Min-Hung Chen; Yoshi Suhara; Yingyan Celine Lin; Jan Kautz; Pavlo Molchanov; |
| 471 | Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, Sarrof et al. (2024) demonstrated that the failure of LRNNs like Mamba to solve parity stems from restricting the value range of their diagonal state-transition matrices to $[0, 1]$ and that incorporating negative values can resolve this issue. We extend this result to non-diagonal LRNNs such as DeltaNet. |
Riccardo Grazzi; Julien Siems; Arber Zela; Jörg K.H. Franke; Frank Hutter; Massimiliano Pontil; |
| 472 | ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop ProtComposer to generate protein structures conditioned on spatial protein layouts that are specified via a set of 3D ellipsoids capturing substructure shapes and semantics. |
Hannes Stark; Bowen Jing; Tomas Geffner; Jason Yim; Tommi Jaakkola; Arash Vahdat; Karsten Kreis; |
| 473 | How Much Is A Noisy Image Worth? Data Scaling Laws for Ambient Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ambient Diffusion and related frameworks train diffusion models with solely corrupted data (which are usually cheaper to acquire) but ambient models significantly underperform models trained on clean data. We study this phenomenon at scale by training more than $80$ models on data with different corruption levels across three datasets ranging from $30,000$ to $\approx 1.3$M samples. |
Giannis Daras; Yeshwanth Cherapanamjeri; Constantinos Costis Daskalakis; |
| 474 | Towards Interpreting Visual Information Processing in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the processing of visual tokens in the language model component of LLaVA, a prominent VLM. |
Clement Neo; Luke Ong; Philip Torr; Mor Geva; David Krueger; Fazl Barez; |
| 475 | IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose IRIS, a neuro-symbolic approach that systematically combines LLMs with static analysis to perform whole-repository reasoning for security vulnerability detection. |
Ziyang Li; Saikat Dutta; Mayur Naik; |
| 476 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we evaluate and enhance the 3D awareness of ViT-based models. |
Yang You; Yixin Li; Congyue Deng; Yue Wang; Leonidas Guibas; |
| 477 | MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering to pioneer long-duration video generation with intricate plot progressions and high visual fidelity. |
Canyu Zhao; Mingyu Liu; Wen Wang; Weihua Chen; Fan Wang; Hao Chen; Bo Zhang; Chunhua Shen; |
| 478 | Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Principled solutions to reward hacking have been impeded by the lack of a good definition for the problem. To address this gap, we introduce a definition of reward hacking based on the correlation between proxy and true rewards for states and actions seen by a “reference policy” that breaks down under optimization. |
Cassidy Laidlaw; Shivam Singhal; Anca Dragan; |
| 479 | InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while existing methods can add watermarks or steganographic information to individual 3D assets, they often require time-consuming per-scene training and optimization, leading to watermarking overheads that can far exceed the time required for asset generation itself, making deployment impractical for generating large collections of 3D objects. To address this, we propose InstantSplamp a framework that seamlessly integrates the 3D steganography pipeline into large 3D generative models without introducing explicit additional time costs. |
Chenxin Li; Hengyu Liu; Zhiwen Fan; Wuyang Li; Yifan Liu; Panwang Pan; Yixuan Yuan; |
| 480 | Non-myopic Generation of Language Models for Reasoning and Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper revisits LLM reasoning from an optimal control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. |
Chang Ma; Haiteng Zhao; Junlei Zhang; Junxian He; Lingpeng Kong; |
| 481 | SyllableLM: Learning Coarse Semantic Units for Speech Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a controllable self-supervised technique to merge speech representations into coarser syllable-like units while still preserving semantic information. |
Alan Baade; Puyuan Peng; David Harwath; |
| 482 | Consistency Checks for Language Model Forecasters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new, general consistency metric based on *arbitrage*: for example, if a forecasting AI illogically predicts that both the Democratic and Republican parties have 60\% probability of winning the 2024 US presidential election, an arbitrageur could trade against the forecaster’s predictions and make a profit.We then build a standard, proper-scoring-rule forecasting benchmark, and show that our (instantaneous) consistency metrics correlate strongly with LLM forecasters’ ground truth Brier scores (which are only known in the future).We also release a consistency benchmark that resolves in 2028, providing a long-term evaluation tool for forecasting. |
Daniel Paleka; Abhimanyu Pallavi Sudhir; Alejandro Alvarez; Vineeth Bhat; Adam Shen; Evan Wang; Florian Tramèr; |
| 483 | Diffusion-based Neural Network Weights Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Diffusion-based Neural Network Weights Generation, D2NWG, a novel framework that leverages diffusion processes to synthesize task-specific network weights. |
Bedionita Soro; Bruno Andreis; Hayeon Lee; Wonyong Jeong; Song Chong; Frank Hutter; Sung Ju Hwang; |
| 484 | Uncovering Gaps in How Humans and LLMs Interpret Subjective Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we uncover instances of *misalignment* between LLMs’ actual operational semantics and what humans expect. |
Erik Jones; Arjun Patrawala; Jacob Steinhardt; |
| 485 | Grounding Multimodal Large Language Model in GUI World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an effective GUI grounding framework, which includes an automated data collection engine that gathers extensive GUI screenshots and annotations to ensure broad generalization. |
Weixian Lei; Difei Gao; Mike Zheng Shou; |
| 486 | Mix-CPT: A Domain Adaptation Framework Via Decoupling Knowledge Learning and Format Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this method may lead to inefficient knowledge memorization due to a lack of awareness of knowledge utilization during the continual pre-training and demands LLMs to simultaneously learn knowledge utilization and format alignment with divergent training objectives during the fine-tuning. To enhance the domain adaptation of LLMs, we revise this process and propose a new domain adaptation framework including domain knowledge learning and general format alignment, called \emph{Mix-CPT}. |
Jinhao Jiang; Junyi Li; Xin Zhao; Yang Song; Tao Zhang; Ji-Rong Wen; |
| 487 | AlphaEdit: Null-Space Constrained Model Editing for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While effective, current studies have demonstrated that this perturbation inevitably disrupt the originally preserved knowledge within LLMs, especially in sequential editing scenarios. To address this, we introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters. |
Junfeng Fang; Houcheng Jiang; Kun Wang; Yunshan Ma; Jie Shi; Xiang Wang; Xiangnan He; Tat-Seng Chua; |
| 488 | STBLLM: Breaking The 1-Bit Barrier with Structured Binary LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first structural binarization method for LLM compression to less than 1-bit precision. |
Peijie Dong; Lujun Li; Yuedong Zhong; DaYou Du; Ruibo FAN; Yuhan Chen; Zhenheng Tang; Qiang Wang; Wei Xue; Yike Guo; Xiaowen Chu; |
| 489 | Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Lumina-T2X family — a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a simple and scalable generative framework that can be adapted to various modalities, e.g., transforming noise into images, videos, multi-view 3D objects, or audio clips conditioned on text instructions. |
Peng Gao; Le Zhuo; Dongyang Liu; Ruoyi Du; Xu Luo; Longtian Qiu; Yuhang Zhang; Rongjie Huang; Shijie Geng; Renrui Zhang; Junlin Xie; Wenqi Shao; Zhengkai Jiang; Tianshuo Yang; Weicai Ye; Tong He; Jingwen He; Junjun He; Yu Qiao; Hongsheng Li; |
| 490 | Masked Diffusion Models Are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision, which results in inaccurate categorical sampling. |
Kaiwen Zheng; Yongxin Chen; Hanzi Mao; Ming-Yu Liu; Jun Zhu; Qinsheng Zhang; |
| 491 | World Model on Million-Length Video And Language With Blockwise RingAttention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Enabling long-context understanding remains a key challenge in scaling existing sequence models — a crucial component in developing generally intelligent models that can process and operate over long temporal horizons that potentially consist of millions of tokens. In this paper, we aim to address these challenges by providing a comprehensive exploration of the full development process for producing 1M context language models and video-language models, setting new benchmarks in language retrieval and new capabilities in long video understanding. |
Hao Liu; Wilson Yan; Matei Zaharia; Pieter Abbeel; |
| 492 | Rethinking Invariance in In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify two crucial elements in the design of an invariant ICL algorithm: information non-leakage and context interdependence, which are not simultaneously achieved by any of the existing methods. |
Lizhe Fang; Yifei Wang; Khashayar Gatmiry; Lei Fang; Yisen Wang; |
| 493 | What Is Wrong with Perplexity for Long-context Language Modeling? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. |
Lizhe Fang; Yifei Wang; Zhaoyang Liu; Chenheng Zhang; Stefanie Jegelka; Jinyang Gao; Bolin Ding; Yisen Wang; |
| 494 | Sparse Autoencoders Reveal Selective Remapping of Visual Concepts During Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms. |
Hyesu Lim; Jinho Choi; Jaegul Choo; Steffen Schneider; |
| 495 | Long-Sequence Recommendation Models Need Decoupled Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Initial attempts to address this issue with some common methods (e.g., linear projections—a technique borrowed from language processing) proved ineffective, shedding light on the unique challenges of recommendation models. To overcome this, we propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are initialized and learned separately to fully decouple attention and representation. |
Ningya Feng; Junwei Pan; Jialong Wu; Baixu Chen; Ximei Wang; QianLi; Xian Hu; Jie Jiang; Mingsheng Long; |
| 496 | HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we design dual structure-aware adapters to adaptively fit task-related homogeneous and heterogeneous structural information. |
Yujie Mo; Runpeng Yu; Xiaofeng Zhu; Xinchao Wang; |
| 497 | Efficiently Democratizing Medical LLMs for 50 Languages Via A Mixture of Language Family Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Technically, we propose a novel MoE routing method that employs language-specific experts and cross-lingual routing.To address this, we first construct a high-quality medical dataset and conduct analysis to ensure its quality. |
Guorui Zheng; Xidong Wang; Juhao Liang; Nuo Chen; Yuping Zheng; Benyou Wang; |
| 498 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Real-world tasks require handling intricate interactions, advanced spatial reasoning, long-term planning, and continuous exploration of new strategies—areas in which we lack effective methodologies for comprehensively evaluating these capabilities. To address this gap, we introduce BALROG, a novel benchmark designed to assess the agentic capabilities of LLMs and VLMs through a diverse set of challenging games. |
Davide Paglieri; Bartłomiej Cupiał; Samuel Coward; Ulyana Piterbarg; Maciej Wolczyk; Akbir Khan; Eduardo Pignatelli; Łukasz Kuciński; Lerrel Pinto; Rob Fergus; Jakob Nicolaus Foerster; Jack Parker-Holder; Tim Rocktäschel; |
| 499 | MoDeGPT: Modular Decomposition for Large Language Model Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent compression methods based on low-rank matrices show potential solutions, they often suffer from significant loss of accuracy or introduce substantial overhead in parameters and inference time. In this paper, we introduce Modular De- composition (MoDeGPT), a new, efficient, and structured compression framework that overcomes these limitations. |
Chi-Heng Lin; Shangqian Gao; James Seale Smith; Abhishek Patel; Shikhar Tuli; Yilin Shen; Hongxia Jin; Yen-Chang Hsu; |
| 500 | SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. |
Hojoon Lee; Dongyoon Hwang; Donghu Kim; Hyunseung Kim; Jun Jet Tai; Kaushik Subramanian; Peter R. Wurman; Jaegul Choo; Peter Stone; Takuma Seno; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~3,700 papers), please visit Paper Digest: ICLR-2025 (Full List).