Paper Digest: NeurIPS 2024 Papers & Highlights
Note: NeurIPS-2024 accepts more than 4,500 papers, this page only includes 500 of them selected by our daily paper digest ranking algorithm. To browse all accepted papers or learn more about the NeurIPS-2024 statistics, readers can read All 4,500 NeurIPS-2024 accepted papers in a separate page, which takes quite some time to load. On this pape, readers are also able to filter papers by keywords. For example, using ‘related code’ as the filter keyword will produce a list of all papers with code available to download.
To search or review papers within NIPS-2024 related to a specific topic, please use the search by venue (NIPS-2024), review by venue (NIPS-2024) and question answering by venue (NIPS-2024) services. To browse papers by author, here is a list of all ~17,000 authors (NIPS-2024). You may also like to explore our “Best Paper” Digest (NeurIPS), which lists the most influential NeurIPS papers since 1987.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: NeurIPS 2024 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | SGLang: Efficient Execution of Structured Language Model Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SGLang, a system for efficient execution of complex language model programs. |
Lianmin Zheng; Liangsheng Yin; Zhiqiang Xie; Chuyue (Livia) Sun; Jeff Huang; Cody Hao Yu; Shiyi Cao; Christos Kozyrakis; Ion Stoica; Joseph Gonzalez; Clark Barrett; Ying Sheng; |
2 | You Don’t Need Data-Augmentations in Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we challenge the importance of invariance and data-augmentation in JEAs at scale. |
Théo Moutakanni; Maxime Oquab; Marc Szafraniec; Maria Vakalopoulou; Piotr Bojanowski; |
3 | The Mamba in The Llama: Distilling and Accelerating Hybrid Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent research suggests that state-space models (SSMs) like Mamba can be competitive with Transformer models for language modeling with advantageous deployment characteristics. Given the focus and expertise on training large-scale Transformer models, we consider the challenge of converting these pretrained models into SSMs for deployment. |
Junxiong Wang; Daniele Paliotta; Avner May; Alexander Rush; Tri Dao; |
4 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) block quantization and incoherent processing that leverages hardware support for FP8 low-precision. |
Jay Shah; Ganesh Bikshandi; Ying Zhang; Vijay Thakkar; Pradeep Ramani; Tri Dao; |
5 | Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we propose a natural bidirectional extension of the Mamba model Hydra, parameterized as a quasiseparable matrix mixer, which demonstrates superior performance over other sequence models including Transformers on non-causal tasks. |
Sukjun Hwang; Aakash Lahoti; Ratish Puduppully; Tri Dao; Albert Gu; |
6 | Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a method that is able to distill a pre-trained Transformer architecture into alternative architectures such as state space models (SSMs). |
Aviv Bick; Kevin Li; Eric Xing; J. Zico Kolter; Albert Gu; |
7 | Improving Alignment and Robustness with Short Circuiting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: AI systems are can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that short-circuits models as they respond with harmful outputs. |
Andy Zou; Long Phan; Justin Wang; Derek Duenas; Maxwell Lin; Maksym Andriushchenko; J. Zico Kolter; Matt Fredrikson; Dan Hendrycks; |
8 | Repurposing Language Models Into Embedding Models: Finding The Compute-Optimal Recipe Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pretrained decoder-only language models. |
Albert Q. Jiang; Alicja Ziarko; Bartosz Piotrowski; Wenda Li; Mateja Jamnik; Piotr Miłoś; |
9 | Multi-language Diversity Benefits Autoformalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we create mma, a large, flexible, multi-language, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. |
Albert Q. Jiang; Wenda Li; Mateja Jamnik; |
10 | Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we find that DAA methods deteriorate not only across a wide range of KL-budgets, but also often before even a single epoch of the dataset is completed. Through extensive empirical experimentation this work formulates the reward over-optimization or hacking problem for DAAs and explores its consequences across objectives, training regimes, and model scales. |
Rafael Rafailov; Yaswanth Chittepu; Ryan Park; Harshit Sushil Sikchi; Joey Hejna; Brad Knox; Chelsea Finn; Scott Niekum; |
11 | The FineWeb Datasets: Decanting The Web for The Finest Text Data at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. |
Guilherme Penedo; Hynek Kydlíček; Loubna Ben allal; Anton Lozhkov; Margaret Mitchell; Colin Raffel; Leandro Von Werra; Thomas Wolf; |
12 | Yo’LLaVA: Your Personalized Language and Vision Assistant Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Similarly, when looking at a friend’s image, the interest lies in seeing their activities (e.g., *my friend* is holding a cat), rather than merely observing generic human actions (e.g., *a man* is holding a cat). In this paper, we introduce the novel task of personalizing LMMs, so that they can have conversations about a specific subject. |
Thao Nguyen; Haotian Liu; Yuheng Li; Mu Cai; Utkarsh Ojha; Yong Jae Lee; |
13 | MINT-1T: Scaling Open-Source Multimodal Data By 10x: A Multimodal Dataset with One Trillion Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. |
Anas Awadalla; Le Xue; Oscar Lo; Manli Shu; Hannah Lee; Etash Guha; Sheng Shen; Mohamed Awadalla; Silvio Savarese; Caiming Xiong; Ran Xu; Yejin Choi; Ludwig Schmidt; |
14 | Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its widespread use, the way preference-based learning is applied varies wildly, with differing data, learning algorithms, and evaluations used, making disentangling the impact of each aspect difficult. In this work, we identify four core aspects of preference-based learning: preference data, learning algorithm, reward model, and policy training prompts, systematically investigate the impact of these components on downstream model performance, and suggest a recipe for strong learning for preference feedback. |
Hamish Ivison; Yizhong Wang; Jiacheng Liu; Zeqiu Wu; Valentina Pyatkin; Nathan Lambert; Noah Smith; Yejin Choi; Hannaneh Hajishirzi; |
15 | LocCa: Visual Pretraining with Location-aware Captioners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This opens up the largely-unexplored potential of using natural language as a flexible and powerful interface for handling diverse pretraining tasks. In this paper, we demonstrate this with a novel visual pretraining paradigm, LocCa, that incorporates location-aware tasks into captioners to teach models to extract rich information from images. |
Bo Wan; Michael Tschannen; Yongqin Xian; Filip Pavetic; Ibrahim Alabdulmohsin; Xiao Wang; André Susano Pinto; Andreas Steiner; Lucas Beyer; Xiaohua Zhai; |
16 | Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MEGALODON, an neural architecture for efficient sequence modeling with unlimited context length. |
Xuezhe Ma; Xiaomeng Yang; Wenhan Xiong; Beidi Chen; LILI YU; Hao Zhang; Jonathan May; Luke Zettlemoyer; Omer Levy; Chunting Zhou; |
17 | ReVideo: Remake A Video with Motion and Content Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel attempt to Remake a Video (ReVideo) which stands out from existing methods by allowing precise video editing in specific areas through the specification of both content and motion. |
Chong Mou; Mingdeng Cao; Xintao Wang; Zhaoyang Zhang; Ying Shan; Jian Zhang; |
18 | LLM Circuit Analyses Are Consistent Across Training and Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we track how model mechanisms, operationalized as circuits, emerge and evolve across 300 billion tokens of training in decoder-only LLMs, in models ranging from 70 million to 2.8 billion parameters. |
Curt Tigges; Michael Hanna; Qinan Yu; Stella Biderman; |
19 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. |
Shengbang Tong; Ellis Brown; Lvhui Chen; Sanghyun Woo; Adithya Jairam Vedagiri IYER; Sai Charitha Akula; Shusheng Yang; Jihan Yang; Manoj Middepogu; Ziteng Wang; Xichen Pan; Rob Fergus; Yann LeCun; Saining Xie; |
20 | Learning-to-Cache: Accelerating Diffusion Transformer Via Layer Caching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters. |
Xinyin Ma; Gongfan Fang; Michael Bi Mi; Xinchao Wang; |
21 | QUEEN: QUantized Efficient ENcoding for Streaming Free-viewpoint Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel framework for QUantized and Efficient ENcoding (QUEEN) for streaming FVV using 3D Gaussian Splatting (3D-GS). |
Sharath Girish; Tianye Li; Amrita Mazumdar; Abhinav Shrivastava; david luebke; Shalini De Mello; |
22 | Chain-of-Thought Reasoning Without Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study takes a novel approach by asking: Can LLMs reason effectively without any prompting? |
Xuezhi Wang; Denny Zhou; |
23 | Iterative Reasoning Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. |
Richard Yuanzhe Pang; Weizhe Yuan; He He; Kyunghyun Cho; Sainbayar Sukhbaatar; Jason Weston; |
24 | What Matters When Building Vision-language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. |
Hugo Laurençon; Leo Tronchon; Matthieu Cord; Victor Sanh; |
25 | Stylus: Automatic Adapter Selection for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt’s keywords. |
Michael Luo; Justin Wong; Brandon Trabucco; Yanping Huang; Joseph Gonzalez; zhifeng Chen; Ruslan Salakhutdinov; Ion Stoica; |
26 | Observational Scaling Laws and The Predictability of Langauge Model Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an alternative, $observational$ approach that bypasses model training and instead builds scaling laws from $\sim$80 publically available models. |
Yangjun Ruan; Chris Maddison; Tatsunori Hashimoto; |
27 | GenAI Arena: An Open Evaluation Platform for Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an open platform \arena to evaluate different image and video generative models, where users can actively participate in evaluating these models. |
Dongfu Jiang; Max KU; Tianle Li; Yuansheng Ni; Shizhuo Sun; Rongqi Fan; Wenhu Chen; |
28 | Depth Anything V2 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. |
Lihe Yang; Bingyi Kang; Zilong Huang; Zhen Zhao; Xiaogang Xu; Jiashi Feng; Hengshuang Zhao; |
29 | The Art of Saying No: Contextual Noncompliance in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should *not* comply with user requests. |
Faeze Brahman; Sachin Kumar; Vidhisha Balachandran; Pradeep Dasigi; Valentina Pyatkin; Abhilasha Ravichander; Sarah Wiegreffe; Nouha Dziri; Khyathi Chandu; Jack Hessel; Yulia Tsvetkov; Noah Smith; Yejin Choi; Hannaneh Hajishirzi; |
30 | SafeSora: Towards Safety Alignment of Text2Video Generation Via A Human Preference Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the *SafeSora* dataset to promote research on aligning text-to-video generation with human values. |
Josef Dai; Tianle Chen; Xuyao Wang; Ziran Yang; Taiye Chen; Jiaming Ji; Yaodong Yang; |
31 | VHELM: A Holistic Evaluation of Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). |
Tony Lee; Haoqin Tu; Chi Heem Wong; Wenhao Zheng; Yiyang Zhou; Yifan Mai; Josselin Roberts; Michihiro Yasunaga; Huaxiu Yao; Cihang Xie; Percy Liang; |
32 | MAmmoTH2: Scaling Instructions from The Web Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning. |
Xiang Yue; Tianyu Zheng; Ge Zhang; Wenhu Chen; |
33 | JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as *jailbreak artifacts*; (2) a jailbreaking dataset comprising 100 behaviors—both original and sourced from prior work—which align with OpenAI’s usage policies; (3) a standardized evaluation framework at https://github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. |
Patrick Chao; Edoardo Debenedetti; Alexander Robey; Maksym Andriushchenko; Francesco Croce; Vikash Sehwag; Edgar Dobriban; Nicolas Flammarion; George J. Pappas; Florian Tramer; Hamed Hassani; Eric Wong; |
34 | Efficient LLM Jailbreak Via Adaptive Dense-to-sparse Constrained Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which has been shown to successfully jailbreak multiple open-source LLMs. |
Kai Hu; Weichen Yu; Tianjun Yao; Xiang Li; Wenhe Liu; Lijun Yu; Yining Li; Kai Chen; Zhiqiang Shen; Matt Fredrikson; |
35 | Parameter-Inverted Image Pyramid Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). |
Xizhou Zhu; Xue Yang; Zhaokai Wang; Hao Li; Wenhan Dou; Junqi Ge; Lewei Lu; Yu Qiao; Jifeng Dai; |
36 | Visual Autoregressive Modeling: Scalable Image Generation Via Next-Scale Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine next-scale prediction or next-resolution prediction, diverging from the standard raster-scan next-token prediction. |
Keyu Tian; Yi Jiang; Zehuan Yuan; BINGYUE PENG; Liwei Wang; |
37 | Can LLMs Implicitly Learn Numeric Parameter Constraints in Data Science APIs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this assumption has not been rigorously studied in the literature. In this paper, we empirically investigate the proficiency of LLMs to handle these implicit numerical constraints when generating DS programs. |
Yinlin Deng; Chunqiu Steven Xia; Zhezhen Cao; Meiziniu Li; LINGMING ZHANG; |
38 | Large Scale Transfer Learning for Tabular Data Via Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, while recent foundation models have reduced the need for developing task-specific datasets and predictors in domains such as language modeling and computer vision, this transfer learning paradigm has not had similar impact in the tabular domain. In this work, we seek to narrow this gap and present TabuLa-8B, a language model for tabular prediction. |
Josh Gardner; Juan Perdomo; Ludwig Schmidt; |
39 | Tree of Attacks: Jailbreaking Black-Box LLMs Automatically Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present *Tree of Attacks with Pruning* (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. |
Anay Mehrotra; Manolis Zampetakis; Paul Kassianik; Blaine Nelson; Hyrum Anderson; Yaron Singer; Amin Karbasi; |
40 | Image2Struct: A Benchmark for Evaluating Vision-Language Models in Extracting Structured Information from Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce three tasks in the domain of web pages, LaTeX, and music and two new metrics that allow efficient and automatic comparison between a pair of images. |
Josselin Roberts; Tony Lee; Chi Heem Wong; Michihiro Yasunaga; Yifan Mai; Percy Liang; |
41 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark that features video-language interleaved inputs up to an hour long. |
Haoning Wu; DONGXU LI; Bei Chen; Junnan Li; |
42 | Learning to Reason Via Program Generation, Emulation, and Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To adapt the COGEX model to a new task, we introduce a method for performing program search to find a single program whose pseudo-execution yields optimal performance when applied to all the instances of a given dataset. |
Nathaniel Weir; Muhammad Khalifa; Linlu Qiu; Orion Weller; Peter Clark; |
43 | Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we unveil a new vulnerability: the privacy backdoor attack. |
Yuxin Wen; Leo Marchyok; Sanghyun Hong; Jonas Geiping; Tom Goldstein; Nicholas Carlini; |
44 | LLM Evaluators Recognize and Favor Their Own Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate if self-recognition capability contributes to self-preference. |
Arjun Panickssery; Samuel Bowman; Shi Feng; |
45 | Quantifying The Bitter Lesson: How Safety Benchmarks Measure Capabilities Instead of Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the spirit of the Bitter Lesson, we ask whether such effort is wasteful. To quantify this question, we leverage spectral analysis to measure an underlying capabilities component, the direction in benchmark-performance-space which explains most variation in model performance. |
Richard Ren; Steven Basart; Adam Khoja; Alexander Pan; Alice Gatti; Long Phan; Xuwang Yin; Mantas Mazeika; Gabe Mukobi; Ryan Kim; Stephen Fitz; Dan Hendrycks; |
46 | DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we introduce DiscoveryWorld, a virtual environment that enables benchmarking an agent’s ability to perform complete cycles of novel scientific discovery in an inexpensive, simulated, multi-modal, long-horizon, and fictional setting. |
Peter A Jansen; Marc-Alexandre Côté; Tushar Khot; Erin Bransom; Bhavana Dalvi Mishra; Bodhisattwa Prasad Majumder; Oyvind Tafjord; Peter Clark; |
47 | Are More LLM Calls All You Need? Towards The Scaling Properties of Compound AI Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we initiate the study of scaling properties of compound inference systems. |
Lingjiao Chen; Jared Quincy Davis; Boris Hanin; Peter Bailis; Ion Stoica; Matei A Zaharia; James Zou; |
48 | Refusal in Language Models Is Mediated By A Single Direction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. |
Andy Arditi; Oscar Obeso; Aaquib Syed; Nina Panickssery; Daniel Paleka; Wes Gurnee; Neel Nanda; |
49 | Humanoid Locomotion As Next Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. |
Ilija Radosavovic; Bike Zhang; Baifeng Shi; Jathushan Rajasegaran; Sarthak Kamat; Trevor Darrell; Koushil Sreenath; Jitendra Malik; |
50 | TaskBench: Benchmarking Large Language Models for Task Automation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there is a lack of systematic and standardized benchmarks to promote the development of LLMs in task automation. To address this, we introduce TaskBench to evaluate the capability of LLMs in task automation. |
Yongliang Shen; Kaitao Song; Xu Tan; Wenqi Zhang; Kan Ren; Siyu Yuan; Weiming Lu; Dongsheng Li; Yueting Zhuang; |
51 | Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These methods require efficient approximations, and although learning a network that directly predicts the desired output is a promising solution, training such models with exact labels is often infeasible. We therefore explore training amortized models with noisy labels, and we find that this is inexpensive and surprisingly effective. |
Ian Covert; Chanwoo Kim; Su-In Lee; James Zou; Tatsunori Hashimoto; |
52 | Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. |
Michael Wornow; Avanika Narayan; Ben Viggiano; Ishan Khare; Tathagat Verma; Tibor Thompson; Miguel Hernandez; Sudharsan Sundar; Chloe Trujillo; Krrish Chawla; Rongfei Lu; Justin Shen; Divya Nagaraj; Joshua Martinez; Vardhan Agrawal; Althea Hudson; Nigam Shah; Christopher Ré; |
53 | Rethinking Score Distillation As A Bridge Between Image Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, SDS has a number of characteristic artifacts that limit its utility in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from some current source distribution to a target distribution. |
David McAllister; Songwei Ge; Jia-Bin Huang; David Jacobs; Alexei Efros; Aleksander Holynski; Angjoo Kanazawa; |
54 | What Can Foundation Models’ Embeddings Do? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models’ embeddings with unified image and dataset-level understanding spanning modality and granularity. |
Xueyan Zou; Linjie Li; Jianfeng Wang; Jianwei Yang; Mingyu Ding; Junyi Wei; Zhengyuan Yang; Feng Li; Hao Zhang; Shilong Liu; Arul Aravinthan; Yong Jae Lee; Lijuan Wang; |
55 | Graph-based Uncertainty Metrics for Long-form Language Model Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in Large Language Models (LLMs) have significantly improved text generation capabilities, but these systems are still known to hallucinate and granular uncertainty estimation for long-form LLM generations remains challenging. In this work, we propose Graph Uncertainty — which represents the relationship between LLM generations and claims within them as a bipartite graph and estimates the claim-level uncertainty with a family of graph centrality metrics. |
Mingjian Jiang; Yangjun Ruan; Prasanna Sattigeri; Salim Roukos; Tatsunori Hashimoto; |
56 | Fractal Patterns May Illuminate The Success of Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the fractal structure of language, aiming to provide a precise formalism for quantifying properties that may have been previously suspected but not formally shown. |
Ibrahim Alabdulmohsin; Vinh Tran; Mostafa Dehghani; |
57 | Evaluating Copyright Takedown Methods for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods,the impact on the model’s ability to retain uncopyrightable factual knowledge from the copyrighted content, and how well the model maintains its general utility and efficiency. |
Boyi Wei; Weijia Shi; Yangsibo Huang; Noah Smith; Chiyuan Zhang; Luke Zettlemoyer; Kai Li; Peter Henderson; |
58 | Fully Transparent Self-Alignment for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. |
Yuxiang Wei; Federico Cassano; Jiawei Liu; Yifeng Ding; Naman Jain; Zachary Mueller; Harm de Vries; Leandro Von Werra; Arjun Guha; LINGMING ZHANG; |
59 | DataComp-LM: In Search of The Next Generation of Training Sets for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DataComp for Language Models, a testbed for controlled dataset experiments with the goal of improving language models. |
Amro Abbas; Alon Albalak; Kushal Arora; Hritik Bansal; Yonatan Bitton; Yair Carmon; Khyathi Chandu; Mayee Chen; Giannis Daras; Achal Dave; Alex Dimakis; Alaaeldin El-Nouby; Fartash Faghri; Alex Fang; Samir Yitzhak Gadre; Josh Gardner; Saurabh Garg; Dhruba Ghosh; Aaron Gokaslan; Dirk Groeneveld; Etash Guha; Suchin Gururangan; Reinhard Heckel; Cheng-Yu Hsieh; Gabriel Ilharco; Maor Ivgi; Jenia Jitsev; Matt Jordan; Sham Kakade; Sedrick Scott Keh; Maciej Kilian; Pang Wei Koh; Thomas Kollar; Jeffrey Li; Kyle Lo; Kalyani Marathe; Jean Mercat; Niklas Muennighoff; Marianna Nezhurina; Thao Nguyen; Sewoong Oh; Hadi Pouransari; Sarah Pratt; Sunny Sanyal; Ludwig Schmidt; Vaishaal Shankar; Rulin Shao; Georgios Smyrnis; Luca Soldaini; Shuran Song; Alexander Toshev; Igor Vasiljevic; Stephanie Wang; Mitchell Wortsman; Rui Xin; Luke Zettlemoyer; Hanlin Zhang; Jieyu Zhang; |
60 | Smoothie: Label Free Language Model Routing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data. |
Neel Guha; Mayee Chen; Trevor Chow; Ishan Khare; Christopher Ré; |
61 | You Only Cache Once: Decoder-Decoder Architectures for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. |
Yutao Sun; Li Dong; Yi Zhu; Shaohan Huang; Wenhui Wang; Shuming Ma; Quanlu Zhang; Jianyong Wang; Furu Wei; |
62 | WizardArena: Post-training Large Language Models Via Simulated Offline Chatbot Arena Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the manual and temporal costs associated with post-training, this paper introduces a Simulated Chatbot Arena named WizardArena, which is fully based on and powered by open-source LLMs. |
Haipeng Luo; Qingfeng Sun; Can Xu; Pu Zhao; Qingwei Lin; Jian-Guang Lou; Shifeng Chen; Yansong Tang; Weizhu Chen; |
63 | MInference: Accelerating Pre-filling for Long-Context LLMs Via Dynamic Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for speeding up pre-filling often fail to maintain acceptable accuracy or efficiency when applied to longcontext LLMs. To address this gap, we introduce MInference, a sparse calculation method designed to accelerate pre-filling of long-sequence processing. |
Huiqiang Jiang; Yucheng LI; Chengruidong Zhang; Qianhui Wu; Xufang Luo; Surin Ahn; Zhenhua Han; Amir Abdi; Dongsheng Li; Chin-Yew Lin; Yuqing Yang; Lili Qiu; |
64 | Geometric-Averaged Preference Optimization for Soft Preference Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. |
Hiroki Furuta; Kuang-Huei Lee; Shixiang (Shane) Gu; Yutaka Matsuo; Aleksandra Faust; Heiga Zen; Izzeddin Gur; |
65 | VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. |
Xiang Li; Jian Ding; Mohamed Elhoseiny; |
66 | Visual Sketchpad: Sketching As A Visual Chain of Thought for Multimodal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. |
Yushi Hu; Weijia Shi; Xingyu Fu; Dan Roth; Mari Ostendorf; Luke Zettlemoyer; Noah Smith; Ranjay Krishna; |
67 | Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the original model and the model with SAE activations inserted. |
Dan Braun; Jordan Taylor; Nicholas Goldowsky-Dill; Lee Sharkey; |
68 | Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. |
Senthooran Rajamanoharan; Arthur Conmy; Lewis Smith; Tom Lieberum; Vikrant Varma; Janos Kramar; Rohin Shah; Neel Nanda; |
69 | 3DCoMPaT200: Language Grounded Large-Scale 3D Vision Dataset for Compositional Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To foster richer and fine-grained part-level 3D understanding, we introduce 3DCoMPaT200, a large-scale dataset tailored for compositional understanding of object parts and materials, with 200 object categories with approximately 5 times larger object vocabulary compared to 3DCoMPaT and almost 4 times larger part categories. |
Mahmoud Ahmed; Xiang Li; Arpit Prajapati; Mohamed Elhoseiny; |
70 | DFBA: Data Free Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. |
Bochuan Cao; Jinyuan Jia; Chuxuan Hu; Wenbo Guo; Zhen Xiang; Jinghui Chen; Bo Li; Dawn Song; |
71 | Make Your LLM Fully Utilize The Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents **information-intensive (IN2) training**, a purely data-driven solution to overcome lost-in-the-middle. |
Shengnan An; Zexiong Ma; Zeqi Lin; Nanning Zheng; Jian-Guang Lou; Weizhu Chen; |
72 | Finding Transformer Circuits With Edge Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we frame circuit discovery as an optimization problem and propose _Edge Pruning_ as an effective and scalable solution. |
Adithya Bhaskar; Alexander Wettig; Dan Friedman; Danqi Chen; |
73 | Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its success, two primary limitations have been identified: the inefficacy of Exponential Moving Average (EMA) from I-JEPA in preventing entire collapse and the inadequacy of I-JEPA prediction in accurately learning the mean of patch representations. Addressing these challenges, this study introduces a novel framework, namely C-JEPA (Contrastive-JEPA), which integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy. |
Shentong Mo; Shengbang Tong; |
74 | Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. |
Zhiqing Sun; Longhui Yu; Yikang Shen; Weiyang Liu; Yiming Yang; Sean Welleck; Chuang Gan; |
75 | Exploring Context Window of Large Language Models Via Decomposed Positional Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. |
Zican Dong; Junyi Li; Xin Men; Xin Zhao; Bingning Wang; Zhen Tian; weipeng chen; Ji-Rong Wen; |
76 | I Don’t Know: Explicit Modeling of Uncertainty with An [IDK] Token Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel calibration method that can be used to combat hallucinations. |
Roi Cohen; Konstantin Dobler; Eden Biran; Gerard de Melo; |
77 | SimPO: Simple Preference Optimization with A Reference-Free Reward Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SimPO, a simpler yet more effective approach. |
Yu Meng; Mengzhou Xia; Danqi Chen; |
78 | Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate an alternative approach involving multiple experts for denoising, and introduce Remix-DiT, a novel method designed to enhance output quality at a low cost. |
Gongfan Fang; Xinyin Ma; Xinchao Wang; |
79 | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces MaskLLM, a learnable pruning method that establishes Semi-structured (or “N:M”) Sparsity in LLMs, aimed at reducing computational overhead during inference. |
Gongfan Fang; Hongxu Yin; Saurav Muralidharan; Greg Heinrich; Jeff Pool; Jan Kautz; Pavlo Molchanov; Xinchao Wang; |
80 | VisionLLM V2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. |
Jiannan Wu; Muyan Zhong; Sen Xing; Zeqiang Lai; Zhaoyang Liu; Wenhai Wang; Zhe Chen; Xizhou Zhu; Lewei Lu; Tong Lu; Ping Luo; Yu Qiao; Jifeng Dai; |
81 | Vision Model Pre-training on Interleaved Image-Text Data Via Latent Compression Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent success of compression learning in natural language processing, we propose a novel vision model pre-training method called Latent Compression Learning (LCL) for interleaved image-text data. |
CHENYU YANG; Xizhou Zhu; Jinguo Zhu; Weijie Su; Junjie Wang; Xuan Dong; Wenhai Wang; Bin Li; Jie Zhou; Yu Qiao; Jifeng Dai; |
82 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from scientific papers. |
Zirui Wang; Mengzhou Xia; Luxi He; Howard Chen; Yitao Liu; Richard Zhu; Kaiqu Liang; Xindi Wu; Haotian Liu; Sadhika Malladi; Chevalier; Sanjeev Arora; Danqi Chen; |
83 | Query-Based Adversarial Prompt Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. |
Jonathan Hayase; Ema Borevković; Nicholas Carlini; Florian Tramer; Milad Nasr; |
84 | Be Like A Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate training data exposure without sacrificing model performance, we introduce a simple but subtle modification to the standard next-token prediction objective for autoregressive LLMs that we call the goldfish loss. |
Abhimanyu Hans; John Kirchenbauer; Yuxin Wen; Neel Jain; Hamid Kazemi; Prajwal Singhania; Siddharth Singh; Gowthami Somepalli; Jonas Geiping; Abhinav Bhatele; Tom Goldstein; |
85 | DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With this question in mind, we further conduct qualitative and quantitative pre-experiments, which validate the negative impact of detection-segmentation imbalance issue on the model performance. To address this issue, this paper proposes DI-MaskDINO model, the core idea of which is to improve the final performance by alleviating the detection-segmentation imbalance. |
Zhixiong Nan; Li Xianghong; Tao Xiang; Jifeng Dai; |
86 | TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the practical application of 3D-SBDD generative models is hampered by their slow processing speeds. To address this bottleneck, we introduce TurboHopp, an accelerated pocket-conditioned 3D scaffold hopping model that merges the strategic effectiveness of traditional scaffold hopping with rapid generation capabilities of consistency models. |
Kiwoong Yoo; Owen Oertell; Junhyun Lee; Sanghoon Lee; Jaewoo Kang; |
87 | Learning Segmentation from Point Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a way to train a segmentation network using long-term point trajectories as a supervisory signal to complement optical flow. |
Laurynas Karazija; Iro Laina; Christian Rupprecht; Andrea Vedaldi; |
88 | Achieving Efficient Alignment Through Learned Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce *Aligner*, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model. |
Jiaming Ji; Boyuan Chen; Hantao Lou; Donghai Hong; Borong Zhang; Xuehai Pan; Tianyi (Alex) Qiu; Juntao Dai; Yaodong Yang; |
89 | Learning 1D Causal Visual Representation with De-focus Attention Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The issue of over-focus hinders the model’s ability to extract diverse visual features and to receive effective gradients for optimization. To address this, we propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns. |
Tao Chenxin; Xizhou Zhu; Shiqian Su; Lewei Lu; Changyao Tian; Xuan Luo; Gao Huang; Hongsheng Li; Yu Qiao; Jie Zhou; Jifeng Dai; |
90 | Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To guide progress in interpretable dictionary learning, we introduce a new SAE training technique, $p$-annealing, which demonstrates improved performance on our metric. |
Adam Karvonen; Benjamin Wright; Can Rager; Rico Angell; Jannik Brinkmann; Logan Smith; Claudio Mayrink Verdun; David Bau; Samuel Marks; |
91 | Multistep Distillation of Diffusion Models Via Moment Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method for making diffusion models faster to sample. |
Tim Salimans; Emiel Hoogeboom; Thomas Mensink; Jonathan Heek; |
92 | WildVision: Evaluating Vision-Language Models in The Wild with Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our comprehensive analysis of 20K real-world interactions reveals important insights into the failure cases of top-performing VLMs. |
Yujie Lu; Dongfu Jiang; Wenhu Chen; William Yang Wang; Yejin Choi; Bill Yuchen Lin; |
93 | Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning Via The Lens of Representation Complexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work investigates the potential hierarchy of representation complexity among these RL paradigms. |
Guhao Feng; Han Zhong; |
94 | RL-GPT: Integrating Reinforcement Learning and Code-as-policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. |
Shaoteng Liu; Haoqi Yuan; Minda Hu; Yanwei Li; Yukang Chen; Shu Liu; Zongqing Lu; Jiaya Jia; |
95 | QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. |
Saleh Ashkboos; Amirkeivan Mohtashami; Maximilian Croci; Bo Li; Pashmina Cameron; Martin Jaggi; Dan Alistarh; Torsten Hoefler; James Hensman; |
96 | Many-Shot In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While promising, many-shot ICL can be bottlenecked by the available amount of human-generated outputs. To mitigate this limitation, we explore two new settings: (1) Reinforced ICL that uses model-generated chain-of-thought rationales in place of human rationales, and (2) Unsupervised ICL where we remove rationales from the prompt altogether, and prompts the model only with domain-specific inputs. |
Rishabh Agarwal; Avi Singh; Lei Zhang; Bernd Bohnet; Luis Rosias; Stephanie Chan; Biao Zhang; Ankesh Anand; Zaheer Abbas; Azade Nova; John Co-Reyes; Eric Chu; Feryal Behbahani; Aleksandra Faust; Hugo Larochelle; |
97 | Boosting Text-to-Video Generative Model with MLLMs Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon this finding, we utilize MLLMs to perform fine-grained video preference annotations across two dimensions, resulting in the creation of VideoPrefer, which includes 135,000 preference annotations. Utilizing this dataset, we introduce VideoRM, the first general-purpose reward model tailored for video preference in the text-to-video domain. |
Xun Wu; Shaohan Huang; Guolong Wang; Jing Xiong; Furu Wei; |
98 | Multi-Head Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Multi-Head Mixture-of-Experts (MH-MoE). |
Xun Wu; Shaohan Huang; Wenhui Wang; Shuming Ma; Li Dong; Furu Wei; |
99 | Multimodal Large Language Models Make Text-to-Image Generative Models Align Better Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited applicability for instruction tuning in open-source text-to-image generative models and hinder further exploration. To address these challenges and promote the alignment of generative models through instruction tuning, we leverage multimodal large language models to create VisionPrefer, a high-quality and fine-grained preference dataset that captures multiple preference aspects. |
Xun Wu; Shaohan Huang; Guolong Wang; Jing Xiong; Furu Wei; |
100 | A Careful Examination of Large Language Model Performance on Grade School Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability.To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). |
Hugh Zhang; Jeff Da; Dean Lee; Vaughn Robinson; Catherine Wu; William Song; Tiffany Zhao; Pranav Raja; Charlotte Zhuang; Dylan Slack; Qin Lyu; Sean Hendryx; Russell Kaplan; Michele Lunati; Summer Yue; |
101 | Benchmarking LLMs Via Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect — uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. |
Fanghua Ye; Mingming Yang; Jianhui Pang; Longyue Wang; Derek Wong; Emine Yilmaz; Shuming Shi; Zhaopeng Tu; |
102 | Sparse Maximal Update Parameterization: A Holistic Approach to Sparse Training Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Without stable dynamics and effective training recipes, it is costly to test sparsity at scale, which is key to surpassing dense networks and making the business case for sparsity acceleration in hardware. A holistic approach is needed to tackle these challenges and we propose S\textmuPar as one such approach. S\textmuPar ensures activations, gradients, and weight updates all scale independently of sparsity level. |
Nolan Dey; Shane Bergsma; Joel Hestness; |
103 | Learning Scene-specific Descriptions Via Adaptive Renormalization for Open-vocabulary Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-language models such as CLIP and follow a standard zero-shot pipeline – computing similarity between the query image and the text embeddings for each category (i.e., text classifiers). In this work, we argue that the text classifiers adopted by existing OVSGG methods, i.e., category-/part-level prompts, are scene-agnostic as they remain unchanged across contexts. |
Guikun Chen; Jin Li; Wenguan Wang; |
104 | Neural Model Checking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a machine learning approach to model checking hardware designs. |
Mirco Giacobbe; Daniel Kroening; Abhinandan Pal; Michael Tautschnig; |
105 | Interpreting The Weight Space of Customized Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the space of weights spanned by a large collection of customized diffusion models. |
Amil Dravid; Yossi Gandelsman; Kuan-Chieh Wang; Rameen Abdal; Gordon Wetzstein; Alexei Efros; Kfir Aberman; |
106 | Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. |
Ruihan Gao; Kangle Deng; Gengshan Yang; Wenzhen Yuan; Jun-Yan Zhu; |
107 | MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. |
Yubo Wang; Xueguang Ma; Ge Zhang; Yuansheng Ni; Abhranil Chandra; Shiguang Guo; Weiming Ren; Aaran Arulraj; Xuan He; Ziyan Jiang; Tianle Li; Max KU; Wang; Alex Zhuang; Rongqi Fan; Xiang Yue; Wenhu Chen; |
108 | Mind’s Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind’s Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. |
Wenshan Wu; Shaoguang Mao; Yadong Zhang; Yan Xia; Li Dong; Lei Cui; Furu Wei; |
109 | Neural Gaffer: Relighting Any Object Via Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. |
Haian Jin; Yuan Li; Fujun Luan; Yuanbo Xiangli; Sai Bi; Kai Zhang; Zexiang Xu; Jin Sun; Noah Snavely; |
110 | Chain of Thoughtlessness? An Analysis of CoT in Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain, and examines the performance of two state-of-the-art LLMs across two axes: generality of examples given in prompt, and complexity of problems queried with each prompt. |
Kaya Stechly; Karthik Valmeekam; Subbarao Kambhampati; |
111 | Crafting Interpretable Embeddings By Asking LLMs Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. |
Vinamra Benara; Chandan Singh; John Morris; Richard Antonello; Ion Stoica; Alexander Huth; Jianfeng Gao; |
112 | BitDelta: Your Fine-Tune May Only Be Worth One Bit Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. |
James Liu; Guangxuan Xiao; Kai Li; Jason Lee; Song Han; Tri Dao; Tianle Cai; |
113 | Knowledge Circuit in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the computation graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge. |
Yunzhi Yao; Ningyu Zhang; Zekun Xi; Mengru Wang; Ziwen Xu; Shumin Deng; Huajun Chen; |
114 | HYDRA: Model Factorization Framework for Black-Box LLM Personalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing solutions have primarily focused on prompt design to incorporate user-specific profiles and behaviors; however, such approaches often struggle to generalize effectively due to their inability to capture shared knowledge among all users. To address these challenges, we propose HYDRA, a model factorization framework that captures both user-specific behavior patterns from historical data and shared general knowledge among all users to deliver personalized generation. |
Yuchen Zhuang; Haotian Sun; Yue Yu; Rushi Qiang; Qifan Wang; Chao Zhang; Bo Dai; |
115 | SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate how the role of interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates language model agents to autonomously use computers to solve software engineering tasks. |
John Yang; Carlos Jimenez; Alexander Wettig; Kilian Lieret; Shunyu Yao; Karthik Narasimhan; Ofir Press; |
116 | Guiding A Diffusion Model with A Bad Version of Itself Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. |
Tero Karras; Miika Aittala; Tuomas Kynkäänniemi; Jaakko Lehtinen; Timo Aila; Samuli Laine; |
117 | How Do Large Language Models Handle Multilingualism? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To verify $\texttt{MWork}$, we introduce Parallel Language-specific Neuron Detection ($\texttt{PLND}$) to identify activated neurons for inputs in different languages without any labeled data. |
Yiran Zhao; Wenxuan Zhang; Guizhen Chen; Kenji Kawaguchi; Lidong Bing; |
118 | Stabilize The Latent Space for Image Autoregressive Modeling: A Unified Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence. To address this discrepancy, we introduce a unified perspective on the relationship between latent space and generative models, emphasizing the stability of latent space in image generative modeling. |
Yongxin Zhu; Bocheng Li; Hang Zhang; Xin Li; Linli Xu; Lidong Bing; |
119 | Algorithmic Capabilities of Random Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To what extent do they depend on the supervisory signal provided to models, and to what extent are they attributable to behavior already present in models at the beginning of training? To investigate these questions, we investigate what functions can be learned by randomly initialized transformers in which only the embedding layers are optimized, so that the only input–output mappings learnable from data are those already implemented (up to a choice of encoding scheme) by the randomly initialized model. |
Ziqian Zhong; Jacob Andreas; |
120 | Confidence Regulation Neurons in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. |
Alessandro Stolfo; Ben Wu; Wes Gurnee; Yonatan Belinkov; Xingyi Song; Mrinmaya Sachan; Neel Nanda; |
121 | MathPile: A Billion-Token-Scale Pretraining Corpus for Math Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce MathPile, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. |
Zengzhi Wang; Xuefeng Li; Rui Xia; Pengfei Liu; |
122 | Analysing The Generalisation and Reliability of Steering Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that steering vectors have substantial limitations both in- and out-of-distribution. |
Daniel Tan; David Chanin; Aengus Lynch; Brooks Paige; Dimitrios Kanoulas; Adrià Garriga-Alonso; Robert Kirk; |
123 | Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFN), which are less studied than attention blocks. |
Xiuying Wei; Skander Moalla; Razvan Pascanu; Caglar Gulcehre; |
124 | Who’s Asking? User Personas and The Mechanics of Latent Misalignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite investments in improving model safety, studies show that misaligned capabilities remain latent in safety-tuned models. In this work, we shed light on the mechanics of this phenomenon. |
Asma Ghandeharioun; Ann Yuan; Marius Guerard; Emily Reif; |
125 | LLM Dataset Inference: Detect Datasets, Not Strings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a new *dataset inference* method to accurately identify the datasets used to train large language models. |
Pratyush Maini; Hengrui Jia; Nicolas Papernot; Adam Dziedzic; |
126 | Not All Tokens Are What You Need for Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. |
Zhenghao Lin; Zhibin Gou; Yeyun Gong; Xiao Liu; yelong shen; Ruochen Xu; Chen Lin; Yujiu Yang; Jian Jiao; Nan Duan; Weizhu Chen; |
127 | Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. |
Boyuan Chen; Diego Martí Monsó; Yilun Du; Max Simchowitz; Russ Tedrake; Vincent Sitzmann; |
128 | CALVIN: Improved Contextual Video Captioning Via Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Scene descriptions, especially in movies, require a deeper contextual understanding, unlike general-purpose video captioning. To address this challenge, we propose a model, CALVIN, a specialized video LLM that leverages previous movie context to generate fully contextual scene descriptions. |
Gowthami Somepalli; Arkabandhu Chowdhury; Jonas Geiping; Basri Ronen; Tom Goldstein; David Jacobs; |
129 | Scaling Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. |
Biao Zhang; Garrett Tanzer; Orhan Firat; |
130 | Trajectory Flow Matching with Applications to Clinical Time Series Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current algorithms for training Neural SDEs require backpropagation through the SDE dynamics, greatly limiting their scalability and stability. To address this, we propose \textbf{Trajectory Flow Matching} (TFM), which trains a Neural SDE in a \textit{simulation-free} manner, bypassing backpropagation through the dynamics. |
Xi (Nicole) Zhang; Yuan Pu; Yuki Kawamura; Andrew Loza; Yoshua Bengio; Dennis Shung; Alexander Tong; |
131 | LiT: Unifying LiDAR Languages with LiDAR Translator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These gaps, akin to language barriers, hinder the synergistic use of diverse LiDAR datasets, limiting the scalability and unification of perception models. To address this challenge, we present the \textit{LiDAR Translator (LiT)}, a novel framework designed to unify LiDAR data into a single target “language”. |
Yixing Lao; Tao Tang; Xiaoyang Wu; Peng Chen; Kaicheng Yu; Hengshuang Zhao; |
132 | KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in sub-4-bit precision. Our work, KVQuant, facilitates low precision KV cache quantization by incorporating several novel methods: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; and (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges. |
Coleman Hooper; Sehoon Kim; Hiva Mohammadzadeh; Michael Mahoney; Sophia Shao; Kurt Keutzer; Amir Gholami; |
133 | JiuZhang3.0: Efficiently Improving Mathematical Reasoning By Training Small Data Synthesis Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. |
Kun Zhou; Beichen Zhang; jiapeng wang; Zhipeng Chen; Xin Zhao; Jing Sha; Zhichao Sheng; Shijin Wang; Ji-Rong Wen; |
134 | Metric Flow Matching for Smooth Interpolations on The Data Manifold Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Metric Flow Matching (MFM), a novel simulation-free framework for conditional flow matching where interpolants are approximate geodesics learned by minimizing the kinetic energy of a data-induced Riemannian metric. |
Kacper Kapusniak; Peter Potaptchik; Teodora Reu; Leo Zhang; Alexander Tong; Michael Bronstein; Joey Bose; Francesco Di Giovanni; |
135 | Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using image models naively for solving inverse video problems often suffers from flickering, texture-sticking, and temporal inconsistency in generated videos. To tackle these problems, in this paper, we view frames as continuous functions in the 2D space, and videos as a sequence of continuous warping transformations between different frames. |
Giannis Daras; Weili Nie; Karsten Kreis; Alex Dimakis; Morteza Mardani; Nikola Kovachki; Arash Vahdat; |
136 | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents UltraEdit, a large-scale (~ 4M editing samples), automatically generated dataset for instruction-based image editing. |
Haozhe Zhao; Xiaojian (Shawn) Ma; Liang Chen; Shuzheng Si; Rujie Wu; Kaikai An; Peiyu Yu; Minjia Zhang; Qing Li; Baobao Chang; |
137 | XLSTM: Extended Long Short-Term Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. |
Maximilian Beck; Korbinian Pöppel; Markus Spanring; Andreas Auer; Oleksandra Prudnikova; Michael Kopp; Günter Klambauer; Johannes Brandstetter; Sepp Hochreiter; |
138 | BAKU: An Efficient Transformer for Multi-Task Policy Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present BAKU, a simple transformer architecture that enables efficient learning of multi-task robot policies. |
Siddhant Haldar; Zhuoran Peng; Lerrel Pinto; |
139 | Improved Distribution Matching Distillation for Fast Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is not only computationally expensive for large-scale text-to-image synthesis, but it also limits the student’s quality, tying it too closely to the teacher’s original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. |
Tianwei Yin; Michaël Gharbi; Taesung Park; Richard Zhang; Eli Shechtman; Fredo Durand; Bill Freeman; |
140 | Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel black-box approach for producing a diverse collection of adversarial prompts. |
Mikayel Samvelyan; Sharath Chandra Raparthy; Andrei Lupu; Eric Hambro; Aram Markosyan; Manish Bhatt; Yuning Mao; Minqi Jiang; Jack Parker-Holder; Jakob Foerster; Tim Rocktäschel; Roberta Raileanu; |
141 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Notably, we introduce a rule-based camera trajectory generation method, enabling the synthetic pipeline to incorporate diverse and precise camera motion annotation, which can rarely found in real-world data. |
Zhenzhi Wang; Yixuan Li; Yanhong Zeng; Youqing Fang; Yuwei Guo; Wenran Liu; Jing Tan; Kai Chen; Bo Dai; Tianfan Xue; Dahua Lin; |
142 | The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About The Subjective and Multicultural Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of feedback processes. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. |
Hannah Rose Kirk; Alexander Whitefield; Paul Rottger; Andrew M. Bean; Katerina Margatina; Rafael Mosquera; Juan Ciro; Max Bartolo; Adina Williams; He He; Bertie Vidgen; Scott Hale; |
143 | Transformers Can Do Arithmetic with The Right Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. |
Sean McLeish; Arpit Bansal; Alex Stein; Neel Jain; John Kirchenbauer; Brian Bartoldson; Bhavya Kailkhura; Abhinav Bhatele; Jonas Geiping; Avi Schwarzschild; Tom Goldstein; |
144 | TAPVid-3D: A Benchmark for Tracking Any Point in 3D Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). |
Skanda Koppula; Ignacio Rocco; Yi Yang; joseph heyward; Joao Carreira; Andrew Zisserman; Gabriel Brostow; Carl Doersch; |
145 | AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. |
Ma Chang; Junlei Zhang; Zhihao Zhu; Cheng Yang; Yujiu Yang; Yaohui Jin; Zhenzhong Lan; Lingpeng Kong; Junxian He; |
146 | NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing benchmarks focus on outdated content and limited fields, facing difficulties in real-time updating and leaving new terms unexplored. To address this problem, we propose an adaptive benchmark, NewTerm, for real-time evaluation of new terms. |
Hexuan Deng; Wenxiang Jiao; Xuebo Liu; Min Zhang; Zhaopeng Tu; |
147 | One-Step Effective Diffusion Network for Real-World Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, the random noise introduces uncertainty in the output, which is unfriendly to image restoration tasks. To address these issues, we propose a one-step effective diffusion network, namely OSEDiff, for the Real-ISR problem. |
Rongyuan Wu; Lingchen Sun; Zhiyuan Ma; Lei Zhang; |
148 | TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs via context optimization. |
Benjamin Feuer; Robin Schirrmeister; Valeriia Cherepanova; Chinmay Hegde; Frank Hutter; Micah Goldblum; Niv Cohen; Colin White; |
149 | Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reduce the within-class similarity, we introduce class-wise supervision during the image synthesizing process by batching the samples within classes, instead of across classes. |
Lingao Xiao; Yang He; |
150 | The Road Less Scheduled Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing learning rate schedules that do not require specification of the optimization stopping step $T$ are greatly out-performed by learning rate schedules that depend on $T$. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. |
Aaron Defazio; Xingyu Yang; Ahmed Khaled; Konstantin Mishchenko; Harsh Mehta; Ashok Cutkosky; |
151 | Large Language Model-Driven Audio Codec Is A Few-Shot Audio Task Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel and LLMs-driven audio codec model, LLM-Codec, to transfer the audio modality into the textual space, \textit{i.e.} representing audio tokens with words or sub-words in the vocabulary of LLMs, while keeping high audio reconstruction quality. |
Dongchao Yang; Haohan Guo; Yuanyuan Wang; Rongjie Huang; Xiang Li; Xu Tan; Xixin Wu; Helen Meng; |
152 | Self-Retrieval: End-to-End Information Retrieval with One Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce \emph{Self-Retrieval}, a novel end-to-end LLM-driven information retrieval architecture. |
Qiaoyu Tang; Jiawei Chen; Zhuoqun Li; Bowen Yu; Yaojie Lu; ChengFu; Haiyang Yu; Hongyu Lin; Fei Huang; Ben He; Xianpei Han; Le Sun; Yongbin Li; |
153 | UNITS: A Unified Multi-Task Time Series Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce UniTS, a multi-task time series model that uses task tokenization to express predictive and generative tasks within a single model. |
Shanghua Gao; Teddy Koker; Owen Queen; Tom Hartvigsen; Theodoros Tsiligkaridis; Marinka Zitnik; |
154 | RoPINN: Region Optimized Physics-Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, since PDEs are usually defined on continuous domains, solely optimizing models on scattered points may be insufficient to obtain an accurate solution for the whole domain. To mitigate this inherent deficiency of the default scatter-point optimization, this paper proposes and theoretically studies a new training paradigm as region optimization. |
Haixu Wu; Huakun Luo; Yuezhou Ma; Jianmin Wang; Mingsheng Long; |
155 | GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach for single-shot novel view synthesis, a semantic-preserving generative warping framework that enables T2I generative models to learn where to warp and where to generate, through augmenting cross-view attention with self-attention. |
Junyoung Seo; Kazumi Fukuda; Takashi Shibuya; Takuya Narihira; Naoki Murata; Shoukang Hu; Chieh-Hsin Lai; Seungryong Kim; Yuki Mitsufuji; |
156 | Counterfactual PPO Enhanced Shared Reflector for LLM-based Multi-agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, named COPPER, to enhance the collaboration ability of multi-agent systems through learnable self-reflection mechanism. |
Xiaohe Bo; Zeyu Zhang; Quanyu Dai; Xueyang Feng; Lei Wang; Rui Li; Xu Chen; Ji-Rong Wen; |
157 | Transcoders Find Interpretable LLM Feature Circuits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this we explore **transcoders**, which seek to faithfully approximate a densely activating MLP layer with a wider, sparsely-activating MLP layer. We successfully train transcoders on language models with 120M, 410M, and 1.4B parameters, and find them to perform at least on par with SAEs in terms of sparsity, faithfulness, and human-interpretability. |
Jacob Dunefsky; Philippe Chlenski; Neel Nanda; |
158 | One-Shot Safety Alignment for Large Language Models Via Optimal Dualization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem. |
Xinmeng Huang; Shuo Li; Edgar Dobriban; Osbert Bastani; Hamed Hassani; Dongsheng Ding; |
159 | Understanding Emergent Abilities of Language Models from The Loss Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to study emergent abilities in the lens of pre-training loss, instead of model size or training compute. |
Zhengxiao Du; Aohan Zeng; Yuxiao Dong; Jie Tang; |
160 | Mr.Bean: A Comprehensive Meta-Reasoning Benchmark for Analyzing Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a process-based benchmark Mr. Bean that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. |
Zhongshen Zeng; Yinhong Liu; Yingjia Wan; Jingyao Li; Pengguang Chen; Jianbo Dai; Yuxuan Yao; Rongwu Xu; Zehan Qi; Wanru Zhao; Linling Shen; Jianqiao Lu; Haochen Tan; Yukang Chen; Hao Zhang; Zhan Shi; Bailin Wang; Zhijiang Guo; Jiaya Jia; |
161 | CosAE: Learnable Fourier Series for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Cosine Autoencoder (CosAE), a novel, generic Autoencoder that seamlessly leverages the classic Fourier series with a feed-forward neural network. |
Sifei Liu; Shalini De Mello; Jan Kautz; |
162 | Super Consistency of Neural Network Landscapes and Learning Rate Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From an optimization perspective, this phenomenon is puzzling, as it implies that the loss landscape is consistently similar across very different model sizes. In this work, we study the landscape through the lens of the Hessian, with a focus on its largest eigenvalue (i.e. the sharpness), and find that certain spectral properties under $\mu$P are largely independent of the width and depth of the network along the training trajectory. |
Lorenzo Noci; Alexandru Meterez; Thomas Hofmann; Antonio Orvieto; |
163 | CAT3D: Create Anything in 3D with Multi-View Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. |
Ruiqi Gao; Aleksander Holynski; Philipp Henzler; Arthur Brussee; Ricardo Martin Brualla; Pratul Srinivasan; Jonathan Barron; Ben Poole; |
164 | MADiff: Offline Multi-agent Learning with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite the effectiveness shown for single-agent learning, it remains unclear how DMs can operate in multi-agent problems, where agents can hardly complete teamwork without good coordination by independently modeling each agent’s trajectories. In this paper, we propose MADiff, a novel generative multi-agent learning framework to tackle this problem. |
Zhengbang Zhu; Minghuan Liu; Liyuan Mao; Bingyi Kang; Minkai Xu; Yong Yu; Stefano Ermon; Weinan Zhang; |
165 | Segment Anything Without Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Unsupervised SAM (UnSAM), a segment anything model for interactive and automatic whole-image segmentation which does not require human annotations. |
Xudong Wang; Jingfeng Yang; Trevor Darrell; |
166 | SIRIUS : Contexual Sparisty with Correction for Efficient LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces \textsc{Sirius}, an efficient correction mechanism, which enables accurate LLM inference with contextual sparsity. |
Yang Zhou; Zhuoming Chen; Zhaozhuo Xu; Victoria Lin; Beidi Chen; |
167 | InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents InterpBench, a collection of semi-synthetic yet realistic transformers with known circuits for evaluating these techniques. |
Rohan Gupta; Iván Arcuschin Moreno; Thomas Kwa; Adrià Garriga-Alonso; |
168 | Out-of-Distribution Detection with A Single Unconditional Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To that end, we introduce our method, Diffusion Paths, (DiffPath) in this work. |
Alvin Heng; alexandre thiery; Harold Soh; |
169 | Is Value Function Learning Really The Main Bottleneck of Offline RL? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to understand bottlenecks in current offline RL algorithms. |
Seohong Park; Kevin Frans; Sergey Levine; Aviral Kumar; |
170 | Video Diffusion Models Are Training-free Motion Interpreter and Controller Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging MOFT, we propose a novel training-free video motion control framework. |
Zeqi Xiao; Yifan Zhou; Shuai Yang; Xingang Pan; |
171 | Scalable Optimization in The Modular Norm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the natural norm particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. |
Jeremy Bernstein; Tim Large; Yang Liu; Jacob Huh; Hyojin Bahng; Phillip Isola; |
172 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. |
Shubham Toshniwal; Ivan Moshkov; Sean Narenthiran; Daria Gitman; Fei Jia; Igor Gitman; |
173 | Aligning to Thousands of Varying Preferences Via System Message Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A major challenge in adopting a more individualized approach to LLM alignment is scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual’s preferences. To address these challenges, we propose a new paradigm where users specify what they value most within the system messages steering the LLM’s generation behavior to better align with the user’s intentions. |
Seongyun Lee; Sue Park; Seungone Kim; Minjoon Seo; |
174 | S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning By Structured Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of Structured Sparse Fine-Tuning (S$^{2}$FT) methods for LLMs, which concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability. |
Xinyu Yang; Jixuan Leng; Geyang Guo; Jiawei Zhao; Ryumei Nakada; Linjun Zhang; Huaxiu Yao; Beidi Chen; |
175 | Neural Isometries: Taming Transformations for Equivariant ML Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. |
Thomas Mitchel; Michael Taylor; Vincent Sitzmann; |
176 | Sequoia: Scalable and Robust Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Sequoia, a scalable and robust algorithm for speculative decoding. |
Zhuoming Chen; Avner May; Ruslan Svirschevski; Yu-Hsun Huang; Max Ryabinin; Zhihao Jia; Beidi Chen; |
177 | Dissecting The Failure of Invariant Learning on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a Structural Causal Model (SCM) to theoretically dissect the performance of two prominent invariant learning methods–Invariant Risk Minimization (IRM) and Variance-Risk Extrapolation (VREx)–in node-level OOD settings. |
Qixun Wang; Yifei Wang; Yisen Wang; Xianghua Ying; |
178 | Paloma: A Benchmark for Evaluating Language Model Fit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Perplexity Analysis for Language Model Assessment (Paloma), a benchmark to measure LM fit to 546 English and code domains, instead of assuming perplexity on one distribution extrapolates to others. |
Ian Magnusson; Akshita Bhagia; Valentin Hofmann; Luca Soldaini; Ananya Harsh Jha; Oyvind Tafjord; Dustin Schwenk; Evan Walsh; Yanai Elazar; Kyle Lo; Dirk Groeneveld; Iz Beltagy; Hannaneh Hajishirzi; Noah Smith; Kyle Richardson; Jesse Dodge; |
179 | FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework called FlowTurbo to accelerate the sampling of flow-based models while still enhancing the sampling quality. |
Wenliang Zhao; Minglei Shi; Xumin Yu; Jie Zhou; Jiwen Lu; |
180 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. |
Hoyeon Chang; Jinho Park; Seonghyeon Ye; Sohee Yang; Youngkyung Seo; Du-Seong Chang; Minjoon Seo; |
181 | Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems’ multiple abilities in a closed-loop manner. |
Xiaosong Jia; Zhenjie Yang; Qifeng Li; Zhiyuan Zhang; Junchi Yan; |
182 | OneBit: Towards Extremely Low-bit Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper boldly quantizes the weight matrices of LLMs to 1-bit, paving the way for the extremely low bit-width deployment of LLMs. For this target, we introduce a 1-bit model compressing framework named OneBit, including a novel 1-bit parameter representation method to better quantize LLMs as well as an effective parameter initialization method based on matrix decomposition to improve the convergence speed of the quantization framework. |
Yuzhuang Xu; Xu Han; Zonghan Yang; Shuo Wang; Qingfu Zhu; Zhiyuan Liu; Weidong Liu; Wanxiang Che; |
183 | Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose multiple metrics to rigorously quantify agents’ performance and alignment with the assigned role. |
Sahar Abdelnabi; Amr Gomaa; Sarath Sivaprasad; Schönherr; Mario Fritz; |
184 | ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ZipCache, an accurate and efficient KV cache quantization method for large language models (LLMs). |
Yefei He; Luoming Zhang; Weijia Wu; Jing Liu; Hong Zhou; Bohan Zhuang; |
185 | BertaQA: How Much Do Language Models Know About Local Culture? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This raises the question of how well these models perform on topics relevant to other cultures, whose presence on the web is not that prominent. To address this gap, we introduce BertaQA, a multiple-choice trivia dataset that is parallel in English and Basque. |
Julen Etxaniz; Gorka Azkune; Aitor Soroa; Oier Lacalle; Mikel Artetxe; |
186 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hypothesizing that difficult queries are crucial to learn complex reasoning, we propose Difficulty-Aware Rejection Tuning (DART), a method that allocates difficult queries more trials during the synthesis phase, enabling more extensive training on difficult samples. |
Yuxuan Tong; Xiwen Zhang; Rui Wang; Ruidong Wu; Junxian He; |
187 | LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low Resource and Extinct Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. |
Andrew M. Bean; Simeon Hellsten; Harry Mayne; Jabez Magomere; Ethan Chi; Ryan Chi; Scott Hale; Hannah Rose Kirk; |
188 | A Universal Growth Rate for Learning with Smooth Surrogate Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
189 | ReFT: Representation Finetuning for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. |
Zhengxuan Wu; Aryaman Arora; Zheng Wang; Atticus Geiger; Dan Jurafsky; Christopher D Manning; Christopher Potts; |
190 | Zero-shot Image Editing with Reference Imitation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. |
Xi Chen; Yutong Feng; Mengting Chen; Yiyang Wang; Shilong Zhang; Yu Liu; Yujun Shen; Hengshuang Zhao; |
191 | SyncVIS: Synchronized Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite harvesting remarkable progress, existing works follow asynchronous designs, which model video sequences via either video-level queries only or adopting query-sensitive cascade structures, resulting in difficulties when handling complex and challenging video scenarios. In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS. |
Rongkun Zheng; Lu Qi; Xi Chen; Yi Wang; Kun Wang; Yu Qiao; Hengshuang Zhao; |
192 | Emu3D: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Emu3D, a significant advancement in text-to-3D which produces faithful, high-quality meshes with full material control. |
Yawar Siddiqui; Filippos Kokkinos; Tom Monnier; Mahendra Kariya; Yanir Kleiman; Emilien Garreau; Oran Gafni; Natalia Neverova; Andrea Vedaldi; David Novotny; Roman Shapovalov; |
193 | Catastrophic Goodhart: Regularizing RLHF with KL Divergence Does Not Mitigate Heavy-tailed Reward Misspecification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, if error is heavy-tailed, some policies obtain arbitrarily high reward despite achieving no more utility than the base model—a phenomenon we call catastrophic Goodhart. We adapt a discrete optimization method developed for adversarial attacks to measure the tails of open-source reward models, finding that they are consistent with light-tailed error. |
Thomas Kwa; Adrià Garriga-Alonso; |
194 | FLAME : Factuality-Aware Alignment for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL). |
Sheng-Chieh Lin; Luyu Gao; Barlas Oguz; Wenhan Xiong; Jimmy Lin; Scott Yih; Xilun Chen; |
195 | SemCoder: Training Code Language Models with Comprehensive Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to bridge the gap between Code LLMs’ reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. |
Yangruibo Ding; Jinjun Peng; Marcus Min; Gail Kaiser; Junfeng Yang; Baishakhi Ray; |
196 | Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs), particularly GPT-4V, to present a novel approach, Make-it-Real: 1) We demonstrate that GPT-4V can effectively recognize and describe materials, allowing the construction of a detailed material library. |
Ye Fang; Zeyi Sun; Tong Wu; Jiaqi Wang; Ziwei Liu; Gordon Wetzstein; Dahua Lin; |
197 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To quantify situational awareness in LLMs, we introduce a range of behavioral tests, based on question answering and instruction following. |
Rudolf Laine; Bilal Chughtai; Jan Betley; Kaivalya Hariharan; Mikita Balesni; Jérémy Scheurer; Marius Hobbhahn; Alexander Meinke; Owain Evans; |
198 | GSDF: 3DGS Meets SDF for Improved Neural Rendering and Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although both neural implicit surfaces and explicit Gaussian primitives have advanced with neural rendering techniques, current methods impose strict constraints on density fields or primitive shapes, which enhances the affinity for geometric reconstruction at the sacrifice of rendering quality. To address this dilemma, we introduce GSDF, a dual-branch architecture combining 3D Gaussian Splatting (3DGS) and neural Signed Distance Fields (SDF). |
Mulin Yu; Tao Lu; Linning Xu; Lihan Jiang; Yuanbo Xiangli; Bo Dai; |
199 | G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we develop a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning. |
Xiaoxin He; Yijun Tian; Yifei Sun; Nitesh Chawla; Thomas Laurent; Yann LeCun; Xavier Bresson; Bryan Hooi; |
200 | Adam with Model Exponential Moving Average Is Effective for Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). |
Kwangjun Ahn; Ashok Cutkosky; |
201 | Normalization and Effective Learning Rates in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to make the learning rate schedule explicit with a simple re-parameterization which we call Normalize-and-Project (NaP), which couples the insertion of normalization layers with weight projection, ensuring that the effective learning rate remains constant throughout training. |
Clare Lyle; Zeyu Zheng; Khimya Khetarpal; James Martens; Hado van Hasselt; Razvan Pascanu; Will Dabney; |
202 | Scaling Transformer Neural Networks for Skillful and Reliable Medium-range Weather Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we introduce Stormer, a simple transformer model that achieves state-of-the art performance on weather forecasting with minimal changes to the standard transformer backbone. |
Tung Nguyen; Rohan Shah; Hritik Bansal; Troy Arcomano; Romit Maulik; Rao Kotamarthi; Ian Foster; Sandeep Madireddy; Aditya Grover; |
203 | Optimal Multiclass U-Calibration Error and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneously. |
Haipeng Luo; Spandan Senapati; Vatsal Sharan; |
204 | PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents’ actions. |
Yijia Shao; Tianshi Li; Weiyan Shi; Yanchen Liu; Diyi Yang; |
205 | Poseidon: Efficient Foundation Models for PDEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Poseidon, a foundation model for learning the solution operators of PDEs. |
Maximilian Herde; Bogdan Raonic; Tobias Rohner; Roger Käppeli; Roberto Molinaro; Emmanuel de Bézenac; Siddhartha Mishra; |
206 | SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for The Legal Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SaulLM-medium and SaulLM-large, two large language models (LLMs) families tailored for the legal sector. |
Pierre Colombo; Telmo Pessoa Pires; Malik Boudiaf; Rui Melo; Gabriel Hautreux; Etienne Malaboeuf; Johanne Charpentier; Dominic Culver; Michael Desa; |
207 | Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that it is possible to manipulate the internal model representations as well as edit model weights, based on the mechanism we discover, in order to significantly improve performance on our synthetic Laundry List task, which requires recall from a list, often improving task accuracy by over 20\%. |
Jack Merullo; Carsten Eickhoff; Ellie Pavlick; |
208 | Preference Learning Algorithms Do Not Learn Preference Rankings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via *ranking accuracy*. |
Angelica Chen; Sadhika Malladi; Lily Zhang; Xinyi Chen; Qiuyi (Richard) Zhang; Rajesh Ranganath; Kyunghyun Cho; |
209 | Large Language Model Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study how to perform unlearning, i.e. forgetting undesirable (mis)behaviors, on large language models (LLMs). |
Yuanshun Yao; Xiaojun Xu; Yang Liu; |
210 | CV-VAE: A Compatible Video VAE for Latent Generative Video Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization. To address this issue, we propose a method for training a video VAE of latent video models, namely CV-VAE, whose latent space is compatible with that of a given image VAE, e.g., image VAE of Stable Diffusion (SD). |
Sijie Zhao; Yong Zhang; Xiaodong Cun; Shaoshu Yang; Muyao Niu; Xiaoyu Li; Wenbo HU; Ying Shan; |
211 | Turning Indirect Knowledge Into Direct Demonstrations for Computer Agents at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Synatra, an approach that effectively transforming the indirect knowledge into direct supervisions at scale. |
Tianyue Ou; Frank F. Xu; Aman Madaan; Jiarui Liu; Robert Lo; Abishek Sridhar; Sudipta Sengupta; Dan Roth; Graham Neubig; Shuyan Zhou; |
212 | Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Heterogeneous Pre-trained Transformers (HPT), which pre-train a large, shareable trunk of a policy neural network to learn a task and embodiment agnostic shared representation. |
Lirui Wang; Xinlei Chen; Jialiang Zhao; Kaiming He; |
213 | Learn More, But Bother Less: Parameter Efficient Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel parameter-efficient approach for continual learning in LLMs, which empirically investigates knowledge transfer from previously learned tasks to new tasks through low-rank matrix parameters, enhancing the learning of new tasks without significant interference. |
Fuli Qiao; Mehrdad Mahdavi; |
214 | FasterDiT: Towards Faster Diffusion Transformers Training Without Architecture Modification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to accelerate DiT training without any architectural modification. |
Jingfeng Yao; Cheng Wang; Wenyu Liu; Xinggang Wang; |
215 | Fine-grained Analysis of In-context Linear Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a stronger characterization of the optimization and generalization landscape of ICL through contributions on architectures, low-rank parameterization, and correlated designs: (1) We study the landscape of 1-layer linear attention and 1-layer H3, a state-space model. |
Yingcong Li; Ankit Rawat; Samet Oymak; |
216 | Prospective Representation Learning for Non-Exemplar Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a Prospective Representation Learning (PRL) scheme to prepare the model for handling conflicts in advance. |
Wuxuan Shi; Mang Ye; |
217 | Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in deep learning. |
Chia-Hsiang Kao; Bharath Hariharan; |
218 | Recurrent Neural Networks: Vanishing and Exploding Gradients Are Not The End of The Story Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. |
Nicolas Zucchet; Antonio Orvieto; |
219 | Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods’ full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. |
Zhikai Chen; Haitao Mao; Jingzhe Liu; Yu Song; Bingheng Li; Wei Jin; Bahare Fatemi; Anton Tsitsulin; Bryan Perozzi; Hui Liu; Jiliang Tang; |
220 | Invisible Image Watermarks Are Provably Removable Using Generative AI Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They also prevent people from misusing images, especially those generated by AI models. We propose a family of regeneration attacks to remove these invisible watermarks. |
Xuandong Zhao; Kexun Zhang; Zihao Su; Saastha Vasan; Ilya Grishchenko; Christopher Kruegel; Giovanni Vigna; Yu-Xiang Wang; Lei Li; |
221 | Amortized Planning with Large-Scale Transformers: A Case Study on Chess Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper uses chess, a landmark planning problem in AI, to assess transformers’ performance on a planning task where memorization is futile – even at large scale. |
Anian Ruoss; Grégoire Delétang; Sourabh Medapati; Jordi Grau-Moya; Kevin Li; Elliot Catt; John Reid; Cannada Lewis; Tim Genewein; Joel Veness; |
222 | Probablistic Emulation of A Global Climate Model with Spherical DYffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present the first conditional generative model able to produce global climate ensemble projections that are accurate and physically consistent. |
Salva Rühling Cachay; Brian Henn; Oliver Watt-Meyer; Christopher S. Bretherton; Rose Yu; |
223 | InterDreamer: Less Supervision for More Generalizable Text-Driven 3D Human-Object Interaction Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. |
ziyin wang; Sirui Xu; Yu-Xiong Wang; Liangyan Gui; |
224 | MotionBooth: Motion-Aware Customized Text-to-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. |
Jianzong Wu; Xiangtai Li; Yanhong Zeng; Jiangning Zhang; Qianyu Zhou; Yining Li; Yunhai Tong; Kai Chen; |
225 | CorDA: Context-Oriented Decomposition Adaptation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable adapters from weight decomposition oriented by the context of downstream task or world knowledge. |
Yibo Yang; Xiaojie Li; Zhongzhu Zhou; Shuaiwen Song; Jianlong Wu; Liqiang Nie; Bernard Ghanem; |
226 | MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce \textit{model-aware data selection with data influence models (MATES)}, where a data influence model continuously adapts to the evolving data preferences of the main pretraining model, thus selecting data most effective for the model’s current learning progress. |
Zichun Yu; Spandan Das; Chenyan Xiong; |
227 | Diversity Is Not All You Need: Training A Robust Cooperative Agent Needs Specialist Partners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a principled method for quantifying both the diversity and specialization of a partner population based on the concept of mutual information. |
Rujikorn Charakorn; Poramate Manoonpong; Nat Dilokthanakul; |
228 | IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an automated evaluation framework IQA-EVAL to Interactive Question Answering Evaluations, more specifically, we introduce LLM-based Evaluation Agent (LEA) that can: (1) simulate human behaviors to generate interactions with IQA models; (2) automatically evaluate the generated interactions. |
Ruosen Li; Ruochen Li; Barry Wang; Xinya Du; |
229 | MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel semi-automatic question generation strategy by composing event structures from information extraction (IE) datasets and present the first Multi-hop Event-centric Question Answering (MEQA) benchmark. |
Ruosen Li; Zimu Wang; Son Tran; Lei Xia; Xinya Du; |
230 | Code Agents Are State of The Art Software Testers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth patches, and golden tests. |
Niels Mündler; Mark Müller; Jingxuan He; Martin Vechev; |
231 | DARG: Dynamic Evaluation of Large Language Models Via Adaptive Reasoning Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity. |
Zhehao Zhang; Jiaao Chen; Diyi Yang; |
232 | Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through extensive experiments on several chat models (Meta’s Llama 2-Chat, Mistral AI’s Mistral 7B Instruct v0.2, and OpenAI’s GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the “Pure Tuning, Safe Testing” (PTST) strategy — fine-tune models without a safety prompt, but include it at test time. |
Kaifeng Lyu; Haoyu Zhao; Xinran Gu; Dingli Yu; Anirudh Goyal; Sanjeev Arora; |
233 | Connecting The Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Could an LLM infer the dangerous knowledge by piecing together these hints? As a step towards answering this question, we study \textit{inductive out-of-context reasoning} (OOCR). |
Johannes Treutlein; Dami Choi; Jan Betley; Cem Anil; Samuel Marks; Roger Grosse; Owain Evans; |
234 | QTIP: Quantization with Trellises and Incoherence Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce QTIP, which instead uses trellis coded quantization (TCQ) to achieve ultra-high-dimensional quantization. |
Albert Tseng; Qingyao Sun; David Hou; Christopher De Sa; |
235 | Robust Reinforcement Learning from Corrupted Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach — $R^3M$, which models the potentially corrupted preference label as sparse outliers. |
Alexander Bukharin; Ilgee Hong; Haoming Jiang; Zichong Li; Qingru Zhang; Zixuan Zhang; Tuo Zhao; |
236 | Implicit Bias of Mirror Flow on Separable Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. |
Scott Pesme; Radu-Alexandru Dragomir; Nicolas Flammarion; |
237 | Calibrated Self-Rewarding Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These approaches are resource-intensive and may not effectively reflect the target LVLM’s preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. |
Yiyang Zhou; Zhiyuan Fan; Dongjie Cheng; Sihan Yang; Zhaorun Chen; Chenhang Cui; Xiyao Wang; Yun Li; Linjun Zhang; Huaxiu Yao; |
238 | Universal In-Context Approximation By Prompting Fully Recurrent Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. |
Aleksandar Petrov; Tom Lamb; Alasdair Paren; Philip Torr; Adel Bibi; |
239 | Approaching Human-Level Forecasting with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. |
Danny Halawi; Fred Zhang; Chen Yueh-Han; Jacob Steinhardt; |
240 | Revisiting Few-Shot Object Detection with Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the task of few-shot object detection (FSOD) in the context of recent foundational VLMs. |
Anish Madan; Neehar Peri; Shu Kong; Deva Ramanan; |
241 | HelpSteer 2: Open-source Dataset for Training Top-performing Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4.0). |
Zhilin Wang; Yi Dong; Olivier Delalleau; Jiaqi Zeng; Gerald Shen; Daniel Egert; Jimmy Zhang; Makesh Narsimhan Sreedhar; Oleksii Kuchaiev; |
242 | Return of Unconditional Generation: A Self-supervised Representation Generation Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This gap can be attributed to the lack of semantic information provided by labels. In this work, we show that one can close this gap by generating semantic representations in the representation space produced by a self-supervised encoder. |
Tianhong Li; Dina Katabi; Kaiming He; |
243 | Autoregressive Image Generation Without Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to model the per-token probability distribution using a diffusion procedure, which allows us to apply autoregressive models in a continuous-valued space. |
Tianhong Li; Yonglong Tian; He Li; Mingyang Deng; Kaiming He; |
244 | Training-Free Visual Prompt Learning for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. |
Mingrui Wu; Xinyue Cai; Jiayi Ji; Jiale Li; Oucheng Huang; Gen Luo; Hao Fei; GUANNAN JIANG; Xiaoshuai Sun; Rongrong Ji; |
245 | GrootVL: Tree Topology Is All You Need in State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree topology based on spatial relationships and input features. |
Yicheng Xiao; Lin Song; shaoli huang; Jiangshan Wang; Siyu Song; Yixiao Ge; Xiu Li; Ying Shan; |
246 | NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that VLMs still struggle with natural images and questions that humans can easily answer, which we term natural adversarial samples. |
Baiqi Li; Zhiqiu Lin; WENXUAN PENG; Jean de Dieu Nyandwi; Daniel Jiang; Zixian Ma; Simran Khanuja; Ranjay Krishna; Graham Neubig; Deva Ramanan; |
247 | Weak-to-Strong Search: Align Large Language Models Via Searching Over Small Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce *weak-to-strong search*, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large model. |
Zhanhui Zhou; Zhixuan Liu; Jie Liu; Zhichen Dong; Chao Yang; Yu Qiao; |
248 | Aligning LLM Agents By Learning Latent Preference from User Edits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a learning framework, PRELUDE that infers a description of the user’s latent preference based on historic edit data and using it to define a prompt policy that drives future response generation. |
Ge Gao; Alexey Taymanov; Eduardo Salinas; Paul Mineiro; Dipendra Misra; |
249 | Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods, including manual design, procedural generation, diffusion-based scene generation, and large language model (LLM) guided scene design, are hindered by limitations such as excessive human effort, reliance on predefined rules or training datasets, and limited 3D spatial reasoning ability. Since pre-trained 2D image generative models better capture scene and object configuration than LLMs, we address these challenges by introducing $\textit{Architect}$, a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. |
Yian Wang; Xiaowen Qiu; Jiageng Liu; Zhehuan Chen; Jiting Cai; Yufei Wang; Tsun-Hsuan Johnson Wang; Zhou Xian; Chuang Gan; |
250 | Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO), to create robust system-level defenses. |
Andy Zhou; Bo Li; Haohan Wang; |
251 | Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. |
Frederik Kunstner; Robin Yadav; Alan Milligan; Mark Schmidt; Alberto Bietti; |
252 | InfLLM: Training-Free Long-Context Extrapolation for LLMs with An Efficient Context Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. |
Chaojun Xiao; Pengle Zhang; Xu Han; Guangxuan Xiao; Yankai Lin; Zhengyan Zhang; Zhiyuan Liu; Maosong Sun; |
253 | Visual CoT: Advancing Multi-Modal Language Models with A Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts. |
Hao Shao; Shengju Qian; Han Xiao; Guanglu Song; ZHUOFAN ZONG; Letian Wang; Yu Liu; Hongsheng Li; |
254 | Near-Minimax-Optimal Distributional Reinforcement Learning with A Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions in the generative model regime (up to logarithmic factors), the first result of this kind for any distributional RL algorithm. |
Mark Rowland; Kevin Li; Remi Munos; Clare Lyle; Yunhao Tang; Will Dabney; |
255 | FinBen: An Holistic Financial Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. |
Qianqian Xie; Weiguang Han; Zhengyu Chen; Ruoyu Xiang; Xiao Zhang; Yueru He; Mengxi Xiao; Dong Li; Yongfu Dai; Duanyu Feng; Yijing Xu; Haoqiang Kang; Ziyan Kuang; Chenhan Yuan; Kailai Yang; Zheheng Luo; Tianlin Zhang; Zhiwei Liu; GUOJUN XIONG; Zhiyang Deng; Yuechen Jiang; Zhiyuan Yao; Haohang Li; Yangyang Yu; Gang Hu; Huang Jiajia; Xiaoyang Liu; Alejandro Lopez-Lira; Benyou Wang; Yanzhao Lai; Hao Wang; Min Peng; Sophia Ananiadou; Jimin Huang; |
256 | T2V-Turbo: Breaking The Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve **both fast and high-quality video generation**. |
Jiachen Li; Weixi Feng; Tsu-Jui Fu; Xinyi Wang; S Basu; Wenhu Chen; William Yang Wang; |
257 | Statistical Estimation in The Spiked Tensor Model Via The Quantum Approximate Optimization Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the performance of the QAOA on the spiked tensor model, a statistical estimation problem that exhibits a large computational-statistical gap classically. |
Leo Zhou; Joao Basso; Song Mei; |
258 | Faster Neighborhood Attention: Reducing The O(n^2) Cost of Self Attention at The Threadblock Level Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to massively improve upon existing infrastructure by providing two new methods for implementing neighborhood attention. |
Ali Hassani; Wen-Mei Hwu; Humphrey Shi; |
259 | An Image Is Worth 32 Tokens for Reconstruction and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these 2D tokenizations face challenges in managing the inherent redundancies present in images, where adjacent regions frequently display similarities. To overcome this issue, we introduce **T**ransformer-based 1-D**i**mensional **Tok**enizer (TiTok), an innovative approach that tokenizes images into 1D latent sequences. |
Qihang Yu; Mark Weber; Xueqing Deng; Xiaohui Shen; Daniel Cremers; Liang-Chieh Chen; |
260 | Smoothed Energy Guidance: Guiding Diffusion Models By Attenuating Energy Curvature of Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation. |
Susung Hong; |
261 | No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and loss of plasticity. |
Skander Moalla; Andrea Miele; Razvan Pascanu; Caglar Gulcehre; |
262 | TableRAG: Million-Token Tabular Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. |
Si-An Chen; Lesly Miculicich; Julian Eisenschlos; Zifeng Wang; Zilong Wang; Yanfei Chen; YASUHISA FUJII; Hsuan-Tien Lin; Chen-Yu Lee; Tomas Pfister; |
263 | No-Regret Learning for Fair Multi-Agent Social Welfare Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. |
Mengxiao Zhang; Ramiro Deo-Campo Vuong; Haipeng Luo; |
264 | Contextual Multinomial Logit Bandits with General Value Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. |
Mengxiao Zhang; Haipeng Luo; |
265 | ARC: A Generalist Graph Anomaly Detector with In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current GAD methods necessitate training specific to each dataset, resulting in high training costs, substantial data requirements, and limited generalizability when being applied to new datasets and domains. To address these limitations, this paper proposes ARC, a generalist GAD approach that enables a “one-for-all” GAD model to detect anomalies across various graph datasets on-the-fly. |
Yixin Liu; Shiyuan Li; Yu Zheng; Qingfeng Chen; Chengqi Zhang; Shirui Pan; |
266 | Fast Sampling Via Discrete Non-Markov Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a discrete non-Markov diffusion model, which admits an accelerated reverse sampling for discrete data generation. |
Zixiang Chen; Angela Yuan; Yongqian Li; Yiwen Kou; Junkai Zhang; Quanquan Gu; |
267 | Benchmarking Complex Instruction-Following with Multiple Constraints Composition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and manually collect a high-quality dataset accordingly. |
Bosi Wen; Pei Ke; Xiaotao Gu; Lindong Wu; Hao Huang; Jinfeng Zhou; Wenchuang Li; Binxin Hu; Wendy Gao; Jiaxing Xu; Yiming Liu; Jie Tang; Hongning Wang; Minlie Huang; |
268 | Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an innovative framework termed Image Textualization, which automatically produces high-quality image descriptions by leveraging existing mult-modal large language models (MLLMs) and multiple vision expert models in a collaborative manner. |
Renjie Pi; Jianshu Zhang; Jipeng Zhang; Rui Pan; Zhekai Chen; Tong Zhang; |
269 | Reversing The Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. |
Jiabao Ji; Yujian Liu; Yang Zhang; Gaowen Liu; Ramana Kompella; Sijia Liu; Shiyu Chang; |
270 | VMamba: Visual State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we transplant Mamba, a state-space language model, into VMamba, a vision backbone that works in linear time complexity. |
Liu Yue; Yunjie Tian; Yuzhong Zhao; Hongtian Yu; Lingxi Xie; Yaowei Wang; Qixiang Ye; Jianbin Jiao; Yunfan Liu; |
271 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing question-answering (QA) datasets based on scientific papers are limited in scale and focus solely on textual content. To address this limitation, we introduce SPIQA (Scientific Paper Image Question Answering), the first large-scale QA dataset specifically designed to interpret complex figures and tables within the context of scientific research articles across various domains of computer science. |
Shraman Pramanick; Rama Chellappa; Subhashini Venugopalan; |
272 | GTBench: Uncovering The Strategic Reasoning Capabilities of LLMs Via Game-Theoretic Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper evaluates LLMs’ reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. |
Jinhao Duan; Renming Zhang; James Diffenderfer; Bhavya Kailkhura; Lichao Sun; Elias Stengel-Eskin; Mohit Bansal; Tianlong Chen; Kaidi Xu; |
273 | MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new variant of the Adam optimizer called microAdam that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. |
Ionut-Vlad Modoranu; Mher Safaryan; Grigory Malinovsky; Eldar Kurtić; Thomas Robert; Peter Richtarik; Dan Alistarh; |
274 | Universal Neural Functionals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes an algorithm that automatically constructs permutation equivariant models, which we refer to as universal neural functionals (UNFs), for any weight space. |
Allan Zhou; Chelsea Finn; James Harrison; |
275 | Learning Action and Reasoning-Centric Image Editing from Videos and Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a new automatic metric that focuses on discriminative understanding. |
Benno Krojer; Dheeraj Vattikonda; Luis Lara; Varun Jampani; Eva Portelance; Chris Pal; Siva Reddy; |
276 | DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to domain-information catastrophic forgetting in collaborative training and therefore makes model perform sub-optimally on the individual domain. To address this issue, we introduce DoFIT, a new Domain-aware FIT framework that alleviates catastrophic forgetting through two new designs. |
Binqian Xu; Xiangbo Shu; Haiyang Mei; Zechen Bai; Basura Fernando; Mike Zheng Shou; Jinhui Tang; |
277 | MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. |
Xuan Ju; Yiming Gao; Zhaoyang Zhang; Ziyang Yuan; Xintao Wang; AILING ZENG; Yu Xiong; Qiang Xu; Ying Shan; |
278 | R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By carefully deriving X-ray rasterization functions, we discover a previously unknown \emph{integration bias} in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. |
Ruyi Zha; Tao Jun Lin; Yuanhao Cai; Jiwen Cao; Yanhao Zhang; Hongdong Li; |
279 | Predicting Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. |
Alexander Havrilla; Wenjing Liao; |
280 | Offline Behavior Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose two naive OBD objectives, DBC and PBC, which measure distillation performance via the decision difference between policies trained on distilled data and either offline data or a near-expert policy. |
Shiye Lei; Sen Zhang; Dacheng Tao; |
281 | One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. |
Zechen Bai; Tong He; Haiyang Mei; Pichao WANG; Ziteng Gao; Joya Chen; liulei; Zheng Zhang; Mike Zheng Shou; |
282 | StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple but effective self-attention mechanism, termed Consistent Self-Attention, that boosts the consistency between the generated images. |
Yupeng Zhou; Daquan Zhou; Ming-Ming Cheng; Jiashi Feng; Qibin Hou; |
283 | Elo Uncovered: Robustness and Best Practices in Language Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct extensive evaluation of Elo behaviour, illustrating that individual Elo computations exhibit volatility and investigating the impact of varying the Elo rating system’s hyperparameters. |
Meriem Boubdir; Edward Kim; Beyza Ermis; Sara Hooker; Marzieh Fadaee; |
284 | Reranking Laws for Language Generation: A Communication-Theoretic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A simple and often used strategy is to first let the LLM generate multiple hypotheses and then employ a reranker to choose the best one. In this paper, we draw a parallel between this strategy and the use of redundancy to decrease the error rate in noisy communication channels. |
António Farinhas; Haau-Sing Li; André Martins; |
285 | Even Sparser Graph Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We establish theoretical conditions when a narrow network’s attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets. |
Hamed Shirzad; Honghao Lin; Balaji Venkatachalam; Ameya Velingker; David Woodruff; Danica J. Sutherland; |
286 | Learning from Highly Sparse Spatio-temporal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a theoretical analysis revealing that such iterative models are not only susceptible to data sparsity but also to graph sparsity, causing unstable performances on different datasets. To overcome these limitations, we introduce a novel method named One-step Propagation and Confidence-based Refinement (OPCR). |
Leyan Deng; Chenwang Wu; Defu Lian; Enhong Chen; |
287 | Understanding The Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. |
Jerome Sieber; Carmen Amo Alonso; Alexandre Didier; Melanie Zeilinger; Antonio Orvieto; |
288 | CALE: Continuous Arcade Learning Environment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare et al., 2013]. |
Jesse Farebrother; Pablo Samuel Castro; |
289 | Vivid-ZOO: Multi-View Video Generation with Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text. |
Bing Li; Cheng Zheng; Wenxuan Zhu; Jinjie Mai; Biao Zhang; Peter Wonka; Bernard Ghanem; |
290 | Probing The Decision Boundaries of In-context Learning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification. |
Siyan Zhao; Tung Nguyen; Aditya Grover; |
291 | A General Protocol to Probe Large Vision Models for 3D Physical Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our objective in this paper is to probe large vision models to determine to what extent they `understand’ different physical properties of the 3D scene depicted in an image. |
Guanqi Zhan; Chuanxia Zheng; Weidi Xie; Andrew Zisserman; |
292 | Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. |
Zhengfei Kuang; Shengqu Cai; Hao He; Yinghao Xu; Hongsheng Li; Leonidas Guibas; Gordon Wetzstein; |
293 | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via 360-degree novel view synthesis. |
Wen-Hsuan Chu; Lei Ke; Katerina Fragkiadaki; |
294 | Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through a new analysis of BC with the logarithmic loss, we show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. |
Dylan J Foster; Adam Block; Dipendra Misra; |
295 | Adaptive Preference Scaling for Reinforcement Learning with Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty in preference strength. |
Ilgee Hong; Zichong Li; Alexander Bukharin; Yixiao Li; Haoming Jiang; Tianbao Yang; Tuo Zhao; |
296 | Base of RoPE Bounds Context Length Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit the role of RoPE in LLMs and propose a novel property of long-term decay, we derive that the \textit{base of RoPE bounds context length}: there is an absolute lower bound for the base value to obtain certain context length capability. |
Xin Men; Mingyu Xu; Qingyu Zhang; Bingning Wang; Hongyu Lin; Xianpei Han; weipeng chen; |
297 | WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce WildGuard—an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. |
Seungju Han; Kavel Rao; Allyson Ettinger; Liwei Jiang; Bill Yuchen Lin; Nathan Lambert; Nouha Dziri; Yejin Choi; |
298 | SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs’ spatial perception and reasoning capabilities. |
AnChieh Cheng; Hongxu Yin; Yang Fu; Qiushan Guo; Ruihan Yang; Jan Kautz; Xiaolong Wang; Sifei Liu; |
299 | Iteratively Refined Behavior Regularization for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. |
Yi Ma; Jianye Hao; Xiaohan Hu; YAN ZHENG; Chenjun Xiao; |
300 | Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present Vitron, a universal pixel-level vision LLM designed for comprehensive understanding, generating, segmenting, and editing of both static images and dynamic videos. |
Hao Fei; Shengqiong Wu; Hanwang Zhang; Tat-Seng Chua; Shuicheng Yan; |
301 | Needle In A Multimodal Haystack Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents. |
Weiyun Wang; Shuibo Zhang; Yiming Ren; Yuchen Duan; Tiantong Li; Shuo Liu; Mengkang Hu; Zhe Chen; Kaipeng Zhang; Lewei Lu; Xizhou Zhu; Ping Luo; Yu Qiao; Jifeng Dai; Wenqi Shao; Wenhai Wang; |
302 | Transferable Boltzmann Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, flow matching has been employed to train Boltzmann Generators for small molecular systems in Cartesian coordinates. We extend this work and propose a first framework for Boltzmann Generators that are transferable across chemical space, such that they predict zero-shot Boltzmann distributions for test molecules without being retraining for these systems. |
Leon Klein; Frank Noe; |
303 | TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as _IE as a tool_. |
Avi Caciularu; Alon Jacovi; Eyal Ben-David; Sasha Goldshtein; Tal Schuster; Jonathan Herzig; Gal Elidan; Amir Globerson; |
304 | WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically explore how model weights interact with unlearning processes in LLMs and we design the weight attribution-guided LLM unlearning method, WAGLE, which unveils the interconnections between ‘influence’ of weights and ‘influence’ of data to forget and retain in LLM generation. |
Jinghan Jia; Jiancheng Liu; Yihua Zhang; Parikshit Ram; Nathalie Baracaldo; Sijia Liu; |
305 | SpeechAlign: Speech Language Models Can Self-Improve Via Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences. |
Dong Zhang; Zhaowei Li; Shimin Li; Xin Zhang; Pengyu Wang; Yaqian Zhou; Xipeng Qiu; |
306 | MambaTalk: Co-Speech Gesture Generation with Selective State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the potential of state space models (SSMs). |
Zunnan Xu; Yukang Lin; Haonan Han; Sicheng Yang; Ronghui Li; Yachao Zhang; Xiu Li; |
307 | Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. |
Bowen Ping; Shuo Wang; Hanqing Wang; Xu Han; Yuzhuang Xu; Yukun Yan; Yun Chen; Baobao Chang; Zhiyuan Liu; Maosong Sun; |
308 | Adaptive Proximal Gradient Method for Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). |
Yura Malitsky; Konstantin Mishchenko; |
309 | OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address this issue, we introduce OSWorld, the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems such as Ubuntu, Windows, and macOS. |
Tianbao Xie; Danyang Zhang; Jixuan Chen; Xiaochuan Li; Siheng Zhao; Ruisheng Cao; Jing Hua Toh; Zhoujun Cheng; Dongchan Shin; Fangyu Lei; Yitao Liu; Yiheng Xu; Shuyan Zhou; Silvio Savarese; Caiming Xiong; Victor Zhong; Tao Yu; |
310 | PeRFlow: Piecewise Rectified Flow As Universal Plug-and-Play Accelerator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. |
Hanshu Yan; Xingchao Liu; Jiachun Pan; Jun Hao Liew; Qiang Liu; Jiashi Feng; |
311 | L4GM: Large 4D Gaussian Reconstruction Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input — in a single feed-forward pass that takes only a second. |
Jiawei Ren; Cheng Xie; Ashkan Mirzaei; hanxue liang; xiaohui zeng; Karsten Kreis; Ziwei Liu; Antonio Torralba; Sanja Fidler; Seung Wook Kim; Huan Ling; |
312 | From An Image to A Scene: Learning to Imagine The World from A Million 360° Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce 360-1M, a 360° video dataset consisting of 1 million videos, and a process for efficiently finding corresponding frames from diverse viewpoints at scale. |
Matthew Wallingford; Anand Bhattad; Aditya Kusupati; Vivek Ramanujan; Matt Deitke; Aniruddha Kembhavi; Roozbeh Mottaghi; Wei-Chiu Ma; Ali Farhadi; |
313 | Efficient Lifelong Model Evaluation in An Era of Rapid Progress Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies. In our work, we seek to mitigate this challenge by compiling \textit{ever-expanding} large-scale benchmarks called \textit{Lifelong Benchmarks}. |
Ameya Prabhu; Vishaal Udandarao; Philip Torr; Matthias Bethge; Adel Bibi; Samuel Albanie; |
314 | DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, recreating existing figures that are not stored in formats preserving semantic information is equally complex. To tackle this problem, we introduce DeTikZify, a novel multimodal language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs based on sketches and existing figures. |
Jonas Belouadi; Simone Ponzetto; Steffen Eger; |
315 | Who Evaluates The Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce T2IScoreScore (TS2), a curated set of semantic error graphs containing a prompt and a set increasingly erroneous images. |
Michael Saxon; Fatima Jahara; Mahsa Khoshnoodi; Yujie Lu; Aditya Sharma; William Yang Wang; |
316 | Generalizable Implicit Motion Modeling for Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing paradigms either simply consider linear combinations of bidirectional flows or directly predict bilateral flows with the condition of timestamps, lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. |
Zujin Guo; Wei Li; Chen Change Loy; |
317 | Conservative Fine-Tuning of Diffusion Models from Offline Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In offline scenarios, existing approaches tend to suffer from overoptimization, as they may be misled by the reward model in out-of-distribution regions. To address this, we introduce a conservative fine-tuning approach, BRAID, by optimizing a conservative reward model, which includes additional penalization outside of offline data distributions. |
Masatoshi Uehara; Yulai Zhao; Ehsan Hajiramezanali; Gabriele Scalia; Gokcen Eraslan; Avantika Lal; Sergey Levine; Tommaso Biancalani; |
318 | APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents APIGen, an automated data generation pipeline designed to produce verifiable high-quality datasets for function-calling applications. |
Zuxin Liu; Thai Hoang; Jianguo Zhang; Ming Zhu; Tian Lan; Shirley kokane; Juntao Tan; Weiran Yao; Zhiwei Liu; Yihao Feng; Rithesh R N; Liangwei Yang; Silvio Savarese; Juan Carlos Niebles; Huan Wang; Shelby Heinecke; Caiming Xiong; |
319 | Decoupled Kullback-Leibler Divergence Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we delve deeper into the Kullback–Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error ($\mathbf{w}$MSE) loss and 2) a Cross-Entropy loss incorporating soft labels. |
Jiequan Cui; Zhuotao Tian; Zhisheng Zhong; Xiaojuan Qi; Bei Yu; Hanwang Zhang; |
320 | Improved Off-policy Training of Diffusion Samplers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. |
Marcin Sendera; Minsu Kim; Sarthak Mittal; Pablo Lemos; Luca Scimeca; Jarrid Rector-Brooks; Alexandre Adam; Yoshua Bengio; Nikolay Malkin; |
321 | Multiple Physics Pretraining for Spatiotemporal Surrogate Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. |
Michael McCabe; Bruno Régaldo-Saint Blancard; Liam Parker; Ruben Ohana; Miles Cranmer; Alberto Bietti; Michael Eickenberg; Siavash Golkar; Geraud Krawezik; Francois Lanusse; Mariel Pettee; Tiberiu Tesileanu; Kyunghyun Cho; Shirley Ho; |
322 | Mixture of Tokens: Continuous MoE Through Cross-Example Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the observation that the adaptation of fully continuous methods has been an overarching trend in deep learning, we develop Mixture of Tokens (MoT), a simple, continuous architecture that is capable of scaling the number of parameters similarly to sparse MoE models. |
Szymon Antoniak; Michał Krutul; Maciej Pióro; Jakub Krajewski; Jan Ludziejewski; Kamil Ciebiera; Krystian Król; Tomasz Odrzygóźdź; Marek Cygan; Sebastian Jaszczur; |
323 | Transfer Q-star : Principled Decoding for LLM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose $\texttt{Transfer Q}^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $\rho_{\texttt{BL}}$ aligned with a baseline reward $r_{\texttt{BL}}$ (which can be different from the target reward $r$). |
Souradip Chakraborty; Soumya Suvra Ghosal; Ming Yin; Dinesh Manocha; Mengdi Wang; Amrit Singh Bedi; Furong Huang; |
324 | Improving Context-Aware Preference Modeling for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we contribute several \textit{context-conditioned} preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. |
Silviu Pitis; Ziang Xiao; Nicolas Le Roux; Alessandro Sordoni; |
325 | HEST-1k: A Dataset For Spatial Transcriptomics and Histology Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce HEST-1k, a collection of 1,108 spatial transcriptomic profiles, each linked to a WSI and metadata. |
Guillaume Jaume; Paul Doucet; Andrew Song; Ming Y. Lu; Cristina Almagro Pérez; Sophia Wagner; Anurag Vaidya; Richard Chen; Drew Williamson; Ahrong Kim; Faisal Mahmood; |
326 | When Does Perceptual Alignment Benefit Vision Representations? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we investigate how aligning vision model representations to human perceptual judgments impacts their usability in standard computer vision tasks. |
Shobhita Sundaram; Stephanie Fu; Lukas Muttenthaler; Netanel Tamir; Lucy Chai; Simon Kornblith; Trevor Darrell; Phillip Isola; |
327 | MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Meta-Objective Aligner (MetaAligner), the first policy-agnostic and generalizable method for multi-objective preference alignment. |
Kailai Yang; Zhiwei Liu; Qianqian Xie; Jimin Huang; Tianlin Zhang; Sophia Ananiadou; |
328 | Scaling Laws and Compute-Optimal Training Without Fixed Training Duration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across different lengths for the same model size. |
Alexander Hägele; Elie Bakouch; Atli Kosson; Loubna Ben allal; Leandro Von Werra; Martin Jaggi; |
329 | SplitNeRF: Split Sum Approximation Neural Field for Joint Geometry, Illumination, and Material Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel approach for digitizing real-world objects by estimating their geometry, material properties, and environmental lighting from a set of posed images with fixed lighting. |
Jesus Zarzar; Bernard Ghanem; |
330 | From Trojan Horses to Castle Walls: Unveiling Bilateral Data Poisoning Effects in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While state-of-the-art diffusion models (DMs) excel in image generation, concerns regarding their security persist. Earlier research highlighted DMs’ vulnerability to data poisoning attacks, but these studies placed stricter requirements than conventional methods like ‘BadNets’ in image classification. |
Zhuoshi Pan; Yuguang Yao; Gaowen Liu; Bingquan Shen; H. Vicky Zhao; Ramana Kompella; Sijia Liu; |
331 | PaGoDA: Progressive Growing of A One-Step Generator from A Low-Resolution Diffusion Teacher Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyond that of the original teacher DM. |
Dongjun Kim; Chieh-Hsin Lai; Wei-Hsiang Liao; Yuhta Takida; Naoki Murata; Toshimitsu Uesaka; Yuki Mitsufuji; Stefano Ermon; |
332 | MVSplat360: Benchmarking 360$\textdegree$ Generalizable 3D Novel View Synthesis from Sparse Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MVSplat360, a feed-forward approach for 360$\textdegree$ novel view synthesis (NVS) in wild scene scenarios given only sparse observations. |
Yuedong Chen; Chuanxia Zheng; Haofei Xu; Bohan Zhuang; Andrea Vedaldi; Tat-Jen Cham; Jianfei Cai; |
333 | Data-Efficient Learning with Neural Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an algorithm for learning neural programs, called ISED, that only relies on input-output samples of black-box components. |
Alaia Solko-Breslin; Seewon Choi; Ziyang Li; Neelay Velingker; Rajeev Alur; Mayur Naik; Eric Wong; |
334 | TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. |
Chenyang Le; Yao Qian; Dongmei Wang; Long Zhou; Shujie LIU; Xiaofei Wang; Midia Yousefi; Yanmin Qian; Jinyu Li; Michael Zeng; |
335 | The Iterative Optimal Brain Surgeon: Faster Sparse Recovery By Leveraging Second-Order Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, these results still lack a solid theoretical understanding, and it is unclear whether they can be improved by leveraging connections to the wealth of work on sparse recovery algorithms. In this paper, we draw new connections between these two areas and present new sparse recovery algorithms inspired by the OBS framework that come with theoretical guarantees under reasonable assumptions and have strong practical performance. |
Diyuan Wu; Ionut-Vlad Modoranu; Mher Safaryan; Denis Kuznedelev; Dan Alistarh; |
336 | FilterNet: Harnessing Frequency Filters for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore a novel perspective of enlightening signal processing for deep time series forecasting. |
Kun Yi; Wei Fan; Qi Zhang; Hui He; Jingru Fei; Shufeng Hao; Defu Lian; |
337 | OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. |
Zihao Wang; Shaofei Cai; Zhancun Mu; Haowei Lin; Ceyao Zhang; Xuejie Liu; Qing Li; Anji Liu; Xiaojian (Shawn) Ma; Yitao Liang; |
338 | Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge for testing social perception and cooperation in embodied agents. |
Weihua Du; Qiushi Lyu; Jiaming Shan; Zhenting Qi; Hongxin Zhang; Sunli Chen; Andi Peng; Tianmin Shu; Kwonjoon Lee; Behzad Dariush; Chuang Gan; |
339 | Recursive Introspection: Teaching Foundation Model Agents How to Self-Improve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even strong proprietary large language models (LLMs) do not exhibit the ability of continually improving their responses sequentially, even in scenarios where they are explicitly told that they are making a mistake. In this paper, we develop $\textbf{RISE}$: $\textbf{R}$ecursive $\textbf{I}$ntro$\textbf{s}$p$\textbf{e}$ction, an approach for fine-tuning LLMs to introduce this ability. |
Yuxiao Qu; Tianjun Zhang; Naman Garg; Aviral Kumar; |
340 | WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce WildTeaming, an automatic red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes selections of multiple mined tactics for systematic exploration of novel and even more challenging jailbreaks. |
Liwei Jiang; Kavel Rao; Seungju Han; Allyson Ettinger; Faeze Brahman; Sachin Kumar; Niloofar Mireshghallah; Ximing Lu; Maarten Sap; Nouha Dziri; Yejin Choi; |
341 | InterControl: Zero-shot Human Interaction Generation By Controlling Every Joint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. |
Zhenzhi Wang; Jingbo Wang; Yixuan Li; Dahua Lin; Bo Dai; |
342 | CogVLM: Visual Expert for Pretrained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce CogVLM, a powerful open-source visual language foundation model. |
Weihan Wang; Qingsong Lv; Wenmeng Yu; Wenyi Hong; Ji Qi; Yan Wang; Junhui Ji; Zhuoyi Yang; Lei Zhao; Song XiXuan; Jiazheng Xu; Keqin Chen; Bin Xu; Juanzi Li; Yuxiao Dong; Ming Ding; Jie Tang; |
343 | Differentiable Structure Learning with Partial Orders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main difficulty lies in adapting these constraints, typically suited for the space of total orderings, to the continuous optimization context of structure learning in the graph space. To bridge this gap, this paper formalizes a set of equivalent constraints that map partial orders onto graph spaces and introduces a plug-and-play module for their efficient application. |
Taiyu Ban; Lyuzhou Chen; Xiangyu Wang; Xin Wang; Derui Lyu; Huanhuan Chen; |
344 | CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the two challenges, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with the image-to-text concept matching mechanism. |
DONGZHI JIANG; Guanglu Song; Xiaoshi Wu; Renrui Zhang; Dazhong Shen; ZHUOFAN ZONG; Yu Liu; Hongsheng Li; |
345 | Towards Visual Text Design Transfer Across Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions. |
Yejin Choi; Jiwan Chung; Sumin Shim; Giyeong Oh; Youngjae Yu; |
346 | Demystify Mamba in Vision: A Linear Attention Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the similarities and disparities between Mamba and linear attention Transformer, providing comprehensive analyses to demystify the key factors behind Mamba’s success. |
Dongchen Han; Ziyi Wang; Zhuofan Xia; Yizeng Han; Yifan Pu; Chunjiang Ge; Jun Song; Shiji Song; Bo Zheng; Gao Huang; |
347 | Bridging The Divide: Reconsidering Softmax and Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take a step forward to close the gap between the linear and Softmax attention with novel theoretical analyses, which demystify the core factors behind the performance deviations. |
Dongchen Han; Yifan Pu; Zhuofan Xia; Yizeng Han; Xuran Pan; Xiu Li; Jiwen Lu; Shiji Song; Gao Huang; |
348 | Parallelizing Linear Transformers with The Delta Rule Over Sequence Length Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work describes a hardware-efficient algorithm for training a generalized variant of linear Transformers (of which DeltaNet is a special case) which exploits the WY representation for computing products of Householder matrices. |
Songlin Yang; Bailin Wang; Yu Zhang; Yikang Shen; Yoon Kim; |
349 | Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. |
Yuxi Ren; Xin Xia; Yanzuo Lu; Jiacheng Zhang; Jie Wu; Pan Xie; XING WANG; Xuefeng Xiao; |
350 | Cardinality-Aware Set Prediction and Top-$k$ Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. |
Corinna Cortes; Anqi Mao; Christopher Mohri; Mehryar Mohri; Yutao Zhong; |
351 | SimGen: Simulator-conditioned Driving Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a simulator-conditioned scene generation framework called SimGen that can learn to generate diverse driving scenes by mixing data from the simulator and the real world. |
Yunsong Zhou; Michael Simon; Zhenghao (Mark) Peng; Sicheng Mo; Hongzi Zhu; Minyi Guo; Bolei Zhou; |
352 | Pandora’s Box: Towards Building Universal Attackers Against Real-World Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the research gap and practical demands, in this paper, we make the first attempt to build a universal attacker against real-world LVLMs, focusing on two critical aspects: (i) restricting access to only the LVLM inputs and outputs. |
Daizong Liu; Mingyu Yang; Xiaoye Qu; Pan Zhou; Xiang Fang; Keke Tang; Yao Wan; Lichao Sun; |
353 | CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. |
Peng Xia; Ze Chen; Juanxi Tian; Yangrui Gong; Ruibo Hou; Yue Xu; Zhenbang Wu; Zhiyuan Fan; Yiyang Zhou; Kangyu Zhu; Wenhao Zheng; Zhaoyang Wang; Xiao Wang; Xuchao Zhang; Chetan Bansal; Marc Niethammer; Junzhou Huang; Hongtu Zhu; Yun Li; Jimeng Sun; Zongyuan Ge; Gang Li; James Zou; Huaxiu Yao; |
354 | Towards Neuron Attributions in Multi-Modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while neuron attribution has made significant progress in deciphering text-only LLMs, its application to Multimodal LLMs (MLLMs) remains less explored. To address this gap, we propose a novel Neuron Attribution method tailored for MLLMs, termed NAM. |
Junfeng Fang; Zac Bi; Ruipeng Wang; Houcheng Jiang; Yuan Gao; Kun Wang; An Zhang; Jie Shi; Xiang Wang; Tat-Seng Chua; |
355 | Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. |
Jiacheng Ye; Shansan Gong; Liheng Chen; Lin Zheng; Jiahui Gao; Han Shi; Chuan Wu; Xin Jiang; Zhenguo Li; Wei Bi; Lingpeng Kong; |
356 | QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of sampling a set of high-quality and diverse translations. |
Gonçalo Faria; Sweta Agrawal; António Farinhas; Ricardo Rei; José de Souza; André Martins; |
357 | MeshFormer : High-Quality Mesh Generation with 3D-Guided Reconstruction Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. |
Minghua Liu; Chong Zeng; Xinyue Wei; Ruoxi Shi; Linghao Chen; Chao Xu; Mengqi Zhang; Zhaoning Wang; Xiaoshuai Zhang; Isabella Liu; Hongzhi Wu; Hao Su; |
358 | Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, while existing SHGL methods share a similar essential with clustering approaches, they encounter two significant limitations: (i) noise in graph structures is often introduced during the message-passing process to weaken node representations, and (ii) cluster-level information may be inadequately captured and leveraged, diminishing the performance in downstream tasks. In this paper, we address these limitations by theoretically revisiting SHGL from the spectral clustering perspective and introducing a novel framework enhanced by rank and dual consistency constraints. |
YUJIE MO; Zhihe Lu; Runpeng Yu; Xiaofeng Zhu; Xinchao Wang; |
359 | DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline and offline-to-online RL. |
Yifei Zhou; Hao Bai; Jiayi Pan; Mert Cemri; Alane Suhr; Sergey Levine; Aviral Kumar; |
360 | Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that it is possible to take Multi-Query Attention a step further by also sharing key and value heads between adjacent layers, yielding a new attention design we call Cross-Layer Attention (CLA). |
William Brandon; Mayank Mishra; Aniruddha Nrusimha; Rameswar Panda; Jonathan Ragan-Kelley; |
361 | Scalable and Effective Arithmetic Tree Generation for RL-Driven Adder and Multiplier Designs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To boost computing performance, this work focuses on the two most common and fundamental arithmetic modules, adders and multipliers. |
Yao Lai; Jinxin Liu; David Pan; Ping Luo; |
362 | Amortizing Intractable Inference in Diffusion Models for Vision, Language, and Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning. |
Siddarth Venkatraman; Moksh Jain; Luca Scimeca; Minsu Kim; Marcin Sendera; Mohsin Hasan; Luke Rowe; Sarthak Mittal; Pablo Lemos; Emmanuel Bengio; Alexandre Adam; Jarrid Rector-Brooks; Yoshua Bengio; Glen Berseth; Nikolay Malkin; |
363 | MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference. |
Akide Liu; Jing Liu; Zizheng Pan; Yefei He; Reza Haffari; Bohan Zhuang; |
364 | UQE: A Query Engine for Unstructured Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections. |
Hanjun Dai; Bethany Wang; Xingchen Wan; Bo Dai; Sherry Yang; Azade Nova; Pengcheng Yin; Mangpo Phothilimthana; Charles Sutton; Dale Schuurmans; |
365 | A Practitioner’s Guide to Real-World Continual Multimodal Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How to best update foundation models, in cases beyond small edits but not warranting re-pretraining, remains unclear. This work aims to provide extensive guidance on effective continual model updates in such scenarios. |
Karsten Roth; Vishaal Udandarao; Sebastian Dziadzio; Ameya Prabhu; Mehdi Cherti; Oriol Vinyals; Olivier Henaff; Samuel Albanie; Matthias Bethge; Zeynep Akata; |
366 | Neural Network Learns Low-dimensional Polynomials with SGD Near The Information-theoretic Limit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree-$q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). |
Kazusato Oko; Denny Wu; Jason Lee; Taiji Suzuki; |
367 | Deep Graph Mating Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first learning-free model reuse task within the non-Euclidean domain, termed as Deep Graph Mating (Grama). |
Yongcheng Jing; Seok-Hee Hong; Dacheng Tao; |
368 | Decoupling Semantic Similarity from Spatial Alignment for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Revisiting the established similarity calculations for RSMs we expose their sensitivity to spatial alignment. In this paper we propose to solve this through _semantic RSMs_, which are invariant to spatial permutation. |
Tassilo Wald; Constantin Ulrich; Priyank Jaini; Gregor Koehler; David Zimmerer; Stefan Denner; Fabian Isensee; Michael Baumgartner; Klaus Maier-Hein; |
369 | The Impact of Initialization on The Finetuning Dynamics in LoRA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021). |
Soufiane Hayou; Nikhil Ghosh; Bin Yu; |
370 | ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ‘ODGS’ which includes a new rasterization appropriate for omnidirectional image projection. |
Suyoung Lee; Jaeyoung Chung; Jaeyoo Huh; Kyoung Mu Lee; |
371 | LM-HT SNN: Enhancing The Performance of SNN to ANN Counterpart Through Learnable Multi-hierarchical Threshold Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-threshold model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. |
Zecheng Hao; Xinyu Shi; Yujia Liu; Zhaofei Yu; Tiejun Huang; |
372 | ActionAtlas: A VideoQA Benchmark for Fine-grained Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our world is full of varied actions and moves in specialized fields that we, as humans, seek to identify and learn about. To evaluate the effectiveness of multi-modal models in helping us recognize such fine-grained actions, we introduce ActionAtlas, a video question answering (VideoQA) benchmark on fine-grained action recognition with short videos across various sports. |
Mohammadreza (Reza) Salehi; Jae Sung Park; Aditya Kusupati; Ranjay Krishna; Yejin Choi; Hannaneh Hajishirzi; Ali Farhadi; |
373 | NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This has resulted in an inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVSIM, a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. |
Daniel Dauner; Marcel Hallgarten; Tianyu Li; Xinshuo Weng; Zhiyu Huang; Zetong Yang; Hongyang Li; Igor Gilitschenski; Boris Ivanovic; Marco Pavone; Andreas Geiger; Kashyap Chitta; |
374 | Unveiling Encoder-Free Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we bridge the gap between encoder-based and encoder-free models and present a simple yet effective training recipe towards pure LVLMs. |
Haiwen Diao; Yufeng Cui; Xiaotong Li; Yueze Wang; Huchuan Lu; Xinlong Wang; |
375 | Understanding Visual Feature Reliance Through The Lens of Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new metric for quantifying feature complexity, based on V-information and capturing whether a feature requires complex computational transformations to be extracted. |
Thomas FEL; Louis Béthune; Andrew Lampinen; Thomas Serre; Katherine Hermann; |
376 | Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a variety of discrete latent representations, including textual descriptions, detection bounding boxes, object blobs, and visual tokens. |
Jiatao Gu; Ying Shen; Shuangfei Zhai; Yizhe Zhang; Navdeep Jaitly; Joshua Susskind; |
377 | Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we instead demonstrate that Transformers learn to approximate higher-order optimization methods for ICL. |
Deqing Fu; Tian-qi Chen; Robin Jia; Vatsal Sharan; |
378 | A Tractable Inference Perspective of Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While it is still possible to approximate such queries, we observe that such crude estimates undermine the benefits brought by expressive sequence models. To overcome this problem, this paper proposes Trifle (Tractable Inference for Offline RL), which leverages modern tractable generative models to bridge the gap between good sequence models and high expected returns at evaluation time. |
Xuejie Liu; Anji Liu; Guy Van den Broeck; Yitao Liang; |
379 | ReMI: A Dataset for Reasoning with Multiple Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ReMI, a dataset designed to assess LLMs’ ability to reason with multiple images. |
Mehran Kazemi; Nishanth Dikkala; Ankit Anand; Petar Devic; Ishita Dasgupta; Fangyu Liu; Bahare Fatemi; Pranjal Awasthi; Sreenivas Gollapudi; Dee Guo; Ahmed Qureshi; |
380 | PointMamba: A Simple State Space Model for Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. |
Dingkang Liang; Xin Zhou; Wei Xu; xingkui zhu; Zhikang Zou; Xiaoqing Ye; Xiao Tan; Xiang Bai; |
381 | Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to design dual-target drugs with diffusion models that are trained on single-target protein-ligand complex pairs. |
Xiangxin Zhou; Jiaqi Guan; Yijia Zhang; Xingang Peng; Liang Wang; Jianzhu Ma; |
382 | Breaking The False Sense of Security in Backdoor Defense Through Re-Activation Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More practically, we extend our backdoor re-activation to black-box scenario, where the defense model can only be queried by the adversary during inference, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. |
Mingli Zhu; Siyuan Liang; Baoyuan Wu; |
383 | Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. |
Aniket Didolkar; Anirudh Goyal; Nan Rosemary Ke; Siyuan Guo; Michal Valko; Timothy Lillicrap; Danilo Jimenez Rezende; Yoshua Bengio; Michael Mozer; Sanjeev Arora; |
384 | ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ConceptMix, a scalable, controllable, and customizable benchmark consisting of two stages: (a) With categories of visual concepts (e.g., objects, colors, shapes, spatial relationships), it randomly samples an object and $k$-tuples of visual concepts to generate text prompts with GPT-4o for image generation. |
Xindi Wu; Dingli Yu; Yangsibo Huang; Olga Russakovsky; Sanjeev Arora; |
385 | Towards Flexible Visual Relationship Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the complexity and interconnectedness of these tasks, it is crucial to have a flexible framework that can effectively address these tasks in a cohesive manner. In this work, we propose Flex-VRS, a single model that seamlessly integrates the above three aspects in standard and promptable visual relationship segmentation, and further possesses the capability for open-vocabulary segmentation to adapt to novel scenarios. |
Fangrui Zhu; Jianwei Yang; Huaizu Jiang; |
386 | Learning with Fitzpatrick Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Fitzpatrick losses, a new family of convex lossfunctions based on the Fitzpatrick function. |
Seta Rakotomandimby; Jean-Philippe Chancelier; Michel De Lara; Mathieu Blondel; |
387 | Understanding and Minimising Outlier Features in Neural Network Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As highlights, we emphasise the importance of controlling signal propagation throughout training, and propose the Outlier Protected transformer block, which removes standard Pre-Norm layers to mitigate OFs, without loss of convergence speed or training stability. |
Bobby He; Lorenzo Noci; Daniele Paliotta; Imanol Schlag; Thomas Hofmann; |
388 | Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods typically focus solely on adapting VLMs from a single modality and fail to accumulate task-specific knowledge as more samples are processed. To address this, we introduce Dual Prototype Evolving (DPE), a novel test-time adaptation approach for VLMs that effectively accumulates task-specific knowledge from multi-modalities. |
Ce Zhang; Simon Stepputtis; Katia Sycara; Yaqi Xie; |
389 | BonBon Alignment for Large Language Models: on The Sweetness of Best-of-n Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive *BonBon Alignment* as a method for achieving this. |
Lin Gui; Cristina Garbacea; Victor Veitch; |
390 | Panacea: Pareto Alignment Via Preference Adaptation for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Panacea, an innovative approach that reframes alignment as a multi-dimensional preference optimization problem. |
Yifan Zhong; Chengdong Ma; Xiaoyuan Zhang; Ziran Yang; Haojun Chen; Qingfu Zhang; Siyuan Qi; Yaodong Yang; |
391 | Normalization Layer Per-Example Gradients Are Sufficient to Predict Gradient Noise Scale in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing the norms while computing the parameter gradients. |
Gavia Gray; aman tiwari; Shane Bergsma; Joel Hestness; |
392 | Unveiling The Power of Diffusion Features For Personalized Segmentation and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a significant flaw in these models is evident: they struggle to locate a desired instance when other instances within the same class are presented. In this paper, we explore text-to-image diffusion models for these tasks. |
Dvir Samuel; Rami Ben-Ari; Matan Levy; Nir Darshan; Gal Chechik; |
393 | Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length training technique, to tackle these challenges. |
Hadi Pouransari; Chun-Liang Li; Jen-Hao Chang; Pavan Kumar Anasosalu Vasu; Cem Koc; Vaishaal Shankar; Oncel Tuzel; |
394 | Can LLMs Learn By Teaching? A Preliminary Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration of this ambitious agenda. |
Xuefei Ning; Zifu Wang; Shiyao Li; Zinan Lin; Peiran Yao; Tianyu Fu; Matthew Blaschko; Guohao Dai; Huazhong Yang; Yu Wang; |
395 | Knowledge Composition Using Task Vectors with Learned Anisotropic Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This paper builds on these properties of task vectors and aims to answer (1) whether components of task vectors, particularly parameter blocks, exhibit similar characteristics, and (2) how such blocks can be used to enhance knowledge composition and transfer. |
Frederic Z. Zhang; Paul Albert; Cristian Rodriguez-Opazo; Ehsan Abbasnejad; Anton van den Hengel; |
396 | Diffusion Models Are Certifiably Robust Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. |
Huanran Chen; Yinpeng Dong; Shitong Shao; Hao Zhongkai; Xiao Yang; Hang Su; Jun Zhu; |
397 | FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, letting users apply characteristics like lighting, texture, and dynamics from different images. |
Tong Wu; Yinghao Xu; Ryan Po; Mengchen Zhang; Guandao Yang; Jiaqi Wang; Ziwei Liu; Dahua Lin; Gordon Wetzstein; |
398 | MediQ: Question-Asking LLMs for Adaptive and Reliable Medical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MEDIQ, a framework to simulate realistic clinical interactions, which incorporates a Patient System and an adaptive Expert System. |
Shuyue Stella Li; Vidhisha Balachandran; Shangbin Feng; Jonathan Ilgen; Emma Pierson; Pang Wei Koh; Yulia Tsvetkov; |
399 | Consistency Purification: Effective and Efficient Diffusion Purification Towards Certified Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. |
Yiquan Li; Zhongzhu Chen; Kun Jin; Jiongxiao Wang; Bo Li; Chaowei Xiao; |
400 | Variational Distillation of Diffusion Policies Into Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. |
Hongyi Zhou; Denis Blessing; Ge Li; Onur Celik; Xiaogang Jia; Gerhard Neumann; Rudolf Lioutikov; |
401 | TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Even though SRL has become the foundation of almost all geospatial artificial intelligence (GeoAI) research, we have not yet seen significant efforts to develop an extensive deep learning framework and benchmark to support SRL model development and evaluation. To fill this gap, we propose TorchSpatial, a learning framework and benchmark· for location (point) encoding, which is one of the most fundamental data types of spatial representation learning. |
Nemin Wu; Qian Cao; Zhangyu Wang; Zeping Liu; Yanlin Qi; Jielu Zhang; Joshua Ni; X. Yao; Hongxu Ma; Lan Mu; Stefano Ermon; Tanuja Ganu; Akshay Nambi; Ni Lao; Gengchen Mai; |
402 | Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present deep prior assembly, a novel framework that assembles diverse deep priors from large models for scene generation from single images in a zero-shot manner. |
Junsheng Zhou; Yu-Shen Liu; Zhizhong Han; |
403 | Dimension-free Deterministic Equivalents for Random Feature Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we investigate the generalization performance of random feature ridge regression (RFRR). |
Leonardo Defilippis; Bruno Loureiro; Theodor Misiakiewicz; |
404 | Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability. |
Shenyuan Gao; Jiazhi Yang; Li Chen; Kashyap Chitta; Yihang Qiu; Andreas Geiger; Jun Zhang; Hongyang Li; |
405 | Full-Atom Peptide Design with Geometric Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a generative model for full-atom Peptide design with Geometric LAtent Diffusion (PepGLAD). |
Xiangzhe Kong; Yinjun Jia; Wenbing Huang; Yang Liu; |
406 | Metric Transforms and Low Rank Representations of Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new linear-algebraic tool based on Abelian group representation theory, and use it to address three key problems in machine learning.1. |
Timothy Chu; Josh Alman; Gary L. Miller; Shyam Narayanan; Mark Sellke; Zhao Song; |
407 | CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. |
Leying Zhang; Yao Qian; Long Zhou; Shujie LIU; Dongmei Wang; Xiaofei Wang; Midia Yousefi; Yanmin Qian; Jinyu Li; Lei He; sheng zhao; Michael Zeng; |
408 | CAPE: Context-Adaptive Positional Encoding for Length Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Context-Adaptive Positional Encoding (CAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. |
Chuanyang Zheng; Yihang Gao; Han Shi; Minbin Huang; Jingyao Li; Jing Xiong; Xiaozhe Ren; Michael Ng; Xin Jiang; Zhenguo Li; Yu Li; |
409 | SafeWorld: Geo-Diverse Safety Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On top of it, we propose a multi-dimensional automatic safety evaluation framework that assesses the contextual appropriateness, accuracy, and comprehensiveness of responses. |
Da Yin; Haoyi Qiu; Kung-Hsiang Huang; Kai-Wei Chang; Nanyun Peng; |
410 | Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. |
Jonas Spinner; Victor Breso; Pim de Haan; Tilman Plehn; Jesse Thaler; Johann Brehmer; |
411 | Transformer Efficiently Learns Low-dimensional Target Functions In-context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study ICL of a nonlinear function class via transformer with nonlinear MLP layer: given a class of single-index target functions $f_*(x) = \sigma_*(\langle x,\beta\rangle)$, where the index features $\beta\in\mathbb{R}^d$ are drawn from a rank-$r$ subspace, we show that a nonlinear transformer optimized by gradient descent on the empirical loss learns $f_*$ in-context with a prompt length that only depends on the dimension of function class $r$; in contrast, an algorithm that directly learns $f_*$ on test prompt yields a statistical complexity that scales with the ambient dimension $d$. |
Kazusato Oko; Yujin Song; Taiji Suzuki; Denny Wu; |
412 | Data Mixture Inference Attack: BPE Tokenizers Reveal Training Data Compositions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle a task which we call *data mixture inference*, which aims to uncover the distributional make-up of the pretraining data. |
Jonathan Hayase; Alisa Liu; Yejin Choi; Sewoong Oh; Noah Smith; |
413 | Q-VLM: Post-training Quantization for Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. |
Changyuan Wang; Ziwei Wang; Xiuwei Xu; Yansong Tang; Jie Zhou; Jiwen Lu; |
414 | Empowering and Assessing The Utility of Large Language Models in Crop Science Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, their untapped potential in crop science presents an opportunity for advancement. To narrow this gap, we introduce CROP, which includes a novel instruction tuning dataset specifically designed to enhance LLMs’ professional capabilities in the crop science sector, along with a benchmark that serves as a comprehensive evaluation of LLMs’ understanding of the domain knowledge. |
Hang Zhang; Jiawei SUN; Renqi Chen; Wei Liu; Zhonghang Yuan; Xinzhe Zheng; Zhefan Wang; Zhiyuan Yang; Hang Yan; Han-Sen Zhong; Xiqing Wang; Fan Yang; Nanqing Dong; Wanli Ouyang; |
415 | Language Models As Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We examine the property by considering lossless gradient compression — a critical application in distributed learning — that depends heavily on precise probability modeling. To achieve this, we introduce LM-GC, a novel method that integrates LLMs with arithmetic coding. |
Hui-Po Wang; Mario Fritz; |
416 | FlexPlanner: Flexible 3D Floorplanning Via Deep Reinforcement Learning in Hybrid Action Space with Multi-Modality Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, they typically face difficulties in aligning the cross-die modules in 3D ICs due to their heuristic representations, which could potentially result in severe data transfer failures. To address these issues, we propose FlexPlanner, a flexible learning-based method in hybrid action space with multi-modality representation to simultaneously handle position, aspect ratio, and alignment of blocks. |
Ruizhe Zhong; Xingbo Du; Shixiong Kai; Zhentao Tang; Siyuan Xu; Jianye Hao; Mingxuan Yuan; Junchi Yan; |
417 | Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to The Edge of Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study whether transformers can learn to *implicitly* reason over parametric knowledge, a skill that even the most capable language models struggle with. |
Boshi Wang; Xiang Yue; Yu Su; Huan Sun; |
418 | A Simple Image Segmentation Framework Via In-Context Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods still struggle with task ambiguity in in-context segmentation, as not all in-context examples can accurately convey the task information. In order to address this issue, we present SINE, a simple image $\textbf{S}$egmentation framework utilizing $\textbf{in}$-context $\textbf{e}$xamples. |
Yang Liu; Chenchen Jing; Hengtao Li; Muzhi Zhu; Hao Chen; Xinlong Wang; Chunhua Shen; |
419 | Breaking The Multi-Task Barrier in Meta-Reinforcement Learning with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is difficult to scale towards more general behavior without confronting challenges in multi-task optimization, but few solutions are compatible with meta-RL’s goal of learning from large training sets of unlabeled tasks. To address this challenge, we revisit the idea that multi-task RL is bottlenecked by imbalanced training losses created by uneven return scales across different tasks. |
Jake Grigsby; Justin Sasek; Samyak Parajuli; Ikechukwu D. Adebi; Amy Zhang; Yuke Zhu; |
420 | Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. |
Ke Wang; Junting Pan; Weikang Shi; Zimu Lu; Houxing Ren; Aojun Zhou; Mingjie Zhan; Hongsheng Li; |
421 | Segment Any Change Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the segment any change models (AnyChange), a new type of change detection model that supports zero-shot prediction and generalization on unseen change types and data distributions. |
Zhuo Zheng; Yanfei Zhong; Liangpei Zhang; Stefano Ermon; |
422 | Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitive process. |
Jianbiao Mei; Yukai Ma; Xuemeng Yang; Licheng Wen; Xinyu Cai; Xin Li; Daocheng Fu; Bo Zhang; Pinlong Cai; Min Dou; Botian Shi; Liang He; Yong Liu; Yu Qiao; |
423 | PromptFix: You Prompt and We Fix The Photo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the stochastic nature of the diffusion process leads to deficiencies in image generation or editing tasks that require the detailed preservation of the generated images. To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks. |
yongsheng yu; Ziyun Zeng; Hang Hua; Jianlong Fu; Jiebo Luo; |
424 | Exploring Molecular Pretraining Model at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. |
ji xh; Zhen Wang; Zhifeng Gao; Hang Zheng; Linfeng Zhang; Guolin Ke; |
425 | Unleashing The Potential of The Diffusion Model in Few-shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing from the extensive potential unveiled by the Diffusion Model in both semantic correspondence and open vocabulary segmentation, our work initiates an investigation into employing the Latent Diffusion Model for Few-shot Semantic Segmentation. |
Muzhi Zhu; Yang Liu; Zekai Luo; Chenchen Jing; Hao Chen; Guangkai Xu; Xinlong Wang; Chunhua Shen; |
426 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts. |
ZHUOFAN ZONG; Bingqi Ma; Dazhong Shen; Guanglu Song; Hao Shao; DONGZHI JIANG; Hongsheng Li; Yu Liu; |
427 | Stochastic Optimal Control Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. |
Carles Domingo i Enrich; Jiequn Han; Brandon Amos; Joan Bruna; Ricky T. Q. Chen; |
428 | Compact Language Models Via Pruning and Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction <3% of the original training data can be a suitable alternative to repeated, full retraining. |
Saurav Muralidharan; Sharath Turuvekere Sreenivas; Raviraj Joshi; Marcin Chochowski; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; Jan Kautz; Pavlo Molchanov; |
429 | EM Distillation for One-step Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose EM Distillation (EMD), a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of perceptual quality. |
Sirui Xie; Zhisheng Xiao; Diederik Kingma; Tingbo Hou; Ying Nian Wu; Kevin Murphy; Tim Salimans; Ben Poole; Ruiqi Gao; |
430 | LSH-MoE: Communication-efficient MoE Training Via Locality-Sensitive Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose LSH-MoE, a communication-efficient MoE training framework using locality-sensitive hashing (LSH). |
Xiaonan Nie; Liu Qibin; Fangcheng Fu; Shenhan Zhu; Xupeng Miao; Xiaoyang Li; Yang Zhang; Shouda Liu; Bin CUI; |
431 | A Simplicity Bias in The Learning Dynamics of Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To conduct this analysis, we develop a procedure to generate \textit{clones} of a given natural language data set, which capture the interactions between tokens up to a specified order. |
Riccardo Rende; Federica Gerace; Alessandro Laio; Sebastian Goldt; |
432 | Federated Fine-tuning of Large Language Models Under Heterogeneous Tasks and Client Resources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the buckets effect in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. |
Jiamu Bai; Daoyuan Chen; Bingchen Qian; Liuyi Yao; Yaliang Li; |
433 | Deep Bayesian Active Learning for Preference Modeling in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. |
Luckeciano Carvalho Melo; Panagiotis Tigas; Alessandro Abate; Yarin Gal; |
434 | InfoRM: Mitigating Reward Hacking in RLHF Via Information-Theoretic Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this problem from an information-theoretic perspective and propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective to filter out irrelevant information. |
Yuchun Miao; Sen Zhang; Liang Ding; Rong Bao; Lefei Zhang; Dacheng Tao; |
435 | Fight Back Against Jailbreaking Via Prompt Adversarial Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, motivated by adversarial training paradigms for achieving reliable robustness, we propose an approach named **Prompt Adversarial Tuning (PAT)** that trains a prompt control attached to the user prompt as a guard prefix. |
Yichuan Mo; Yuji Wang; Zeming Wei; Yisen Wang; |
436 | Alleviating Distortion in Image Generation Via Multi-Resolution Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. |
Qihao Liu; Zhanpeng Zeng; Ju He; Qihang Yu; Xiaohui Shen; Liang-Chieh Chen; |
437 | FuseFL: One-Shot Federated Learning Through The Lens of Causality with Progressive Model Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. |
Zhenheng Tang; Yonggang Zhang; Peijie Dong; Yiu-ming Cheung; Amelie Zhou; Bo Han; Xiaowen Chu; |
438 | Mitigating Fine-tuning Based Jailbreak Attack with Backdoor Enhanced Safety Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. |
Jiongxiao Wang; Jiazhao LI; Yiquan Li; Xiangyu Qi; Junjie Hu; Sharon Li; Patrick McDaniel; Muhao Chen; Bo Li; Chaowei Xiao; |
439 | A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. |
Heyang Zhao; Jiafan He; Quanquan Gu; |
440 | RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. |
Fangqiang Ding; Xiangyu Wen; Yunzhou Zhu; Yiming Li; Chris Xiaoxuan Lu; |
441 | Explaining Text Datasets with Language Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make model parameters directly interpretable, we introduce a family of statistical models—including clustering, time-series, and classification models—parameterized by *natural language predicates*. |
Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt; |
442 | Multi-Label Learning with Stronger Consistency Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of surrogate losses and algorithms for multi-label learning, supported by $H$-consistency bounds. |
Anqi Mao; Yutao Zhong; Mehryar Mohri; |
443 | A Closer Look at Deep Learning Phenomena Through A Telescoping Lens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that this model presents a pedagogical formalism allowing us to isolate components of the training process even in complex contemporary settings, providing a sharp lens to reason about the effects of design choices such as architecture and optimization strategy, and reveals surprising parallels between neural network learning and gradient boosting. |
Alan Jeffares; Alicia Curth; Mihaela van der Schaar; |
444 | Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a comprehensive study of surrogate loss functions for learning to defer. |
Anqi Mao; Yutao Zhong; Mehryar Mohri; |
445 | Don’t Compress Gradients in Random Reshuffling: Compress Gradient Differences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (RR) method, perform better than ones that sample the gradients with-replacement. In this work, we close this gap in the literature and provide the first analysis of methods with gradient compression and without-replacement sampling. |
Abdurakhmon Sadiev; Grigory Malinovsky; Eduard Gorbunov; Igor Sokolov; Ahmed Khaled; Konstantin Burlachenko; Peter Richtarik; |
446 | Understanding Transformer Reasoning Capabilities Via Graph Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network’s depth, width, and number of extra tokens for algorithm execution. |
Clayton Sanford; Bahare Fatemi; Ethan Hall; Anton Tsitsulin; Mehran Kazemi; Jonathan Halcrow; Bryan Perozzi; Vahab Mirrokni; |
447 | Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting. |
Kaihang Pan; Zhaoyu Fan; Juncheng Li; Qifan Yu; Hao Fei; Siliang Tang; Richang Hong; Hanwang Zhang; QIANRU SUN; |
448 | ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-modal Zero-shot Offboard Panoptic Perception (ZOPP) framework for autonomous driving scenes. |
Tao MA; Hongbin Zhou; Qiusheng Huang; Xuemeng Yang; Jianfei Guo; Bo Zhang; Min Dou; Yu Qiao; Botian Shi; Hongsheng Li; |
449 | When and How Does Synthetic Data Improve Reasoning Capabilities of Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts. In this paper, we investigate this for reasoning problems via an empirical study, followed by a theoretical formalization of our observations. |
Amrith Setlur; Saurabh Garg; Naman Garg; Xinyang Geng; Virginia Smith; Aviral Kumar; |
450 | LACIE: Listener-Aware Finetuning for Calibration in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that directly models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. |
Elias Stengel-Eskin; Peter Hase; Mohit Bansal; |
451 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel markerless algorithm to track 3D human poses in severe occlusion and close interaction to obtain our annotations with minimal manual intervention. |
Rawal Khirodkar; Jyun-Ting Song; Jinkun Cao; Zhengyi Luo; Kris Kitani; |
452 | Semi-Truths: A Large-Scale Dataset for Testing Robustness of AI-Generated Image Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Do they exhibit biases towards specific scenes or data distributions? To address these questions, we introduce Semi-Truths, featuring 27,635 real images, 245,360 masks, and 850,226 AI-augmented images featuring varying degrees of targeted and localized edits, created using diverse augmentation methods, diffusion models, and data distributions. |
Anisha Pal; Julia Kruk; Mansi Phute; Manognya Bhattaram; Diyi Yang; Duen Horng Chau; Judy Hoffman; |
453 | 4Diffusion: Multi-view Video Diffusion Model for 4D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel 4D generation pipeline, named 4Diffusion, aimed at generating spatial-temporally consistent 4D content from a monocular video. |
Haiyu Zhang; Xinyuan Chen; Yaohui WANG; Xihui Liu; Yunhong Wang; Yu Qiao; |
454 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework for zero-shot object navigation. |
Hang Yin; Xiuwei Xu; Zhenyu Wu; Jie Zhou; Jiwen Lu; |
455 | Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. |
Yuanpu Cao; Tianrong Zhang; Bochuan Cao; Ziyi Yin; Lu Lin; Fenglong Ma; Jinghui Chen; |
456 | Simulated Humanoid Grasping on Diverse Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. |
Zhengyi Luo; Jinkun Cao; Sammy Christen; Alexander Winkler; Kris Kitani; Weipeng Xu; |
457 | GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. |
pengcheng chen; Jin Ye; Guoan Wang; Yanjun Li; Zhongying Deng; Wei Li; Tianbin Li; Haodong Duan; Ziyan Huang; Yanzhou Su; Benyou Wang; Shaoting Zhang; Bin Fu; Jianfei Cai; Bohan Zhuang; Eric Seibel; Junjun He; Yu Qiao; |
458 | Game-Traversal-Benchmark: Evaluating Planning Abilities Of Large Language Models Via Traversing 2D Game Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They have also shown potential outside the natural language domain, but can LLMs plan? There has been a debate around this question. We contribute to this debate by proposing Game-Traversal-Benchmark (GTB), a benchmark consisting of diverse 2D grid-based game maps to evaluate the planning and reasoning abilities of an LLM. |
Muhammad Umair Nasir; Steven James; Julian Togelius; |
459 | Membership Inference on Text-to-Image Diffusion Models Via Conditional Likelihood Discrepancy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. |
Shengfang Zhai; Huanran Chen; Yinpeng Dong; Jiajun Li; Qingni Shen; Yansong Gao; Hang Su; Yang Liu; |
460 | GPT As Visual Explainer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Language Model as Visual Explainer (\texttt{LVX}), a systematic approach for interpreting the internal workings of vision models using a tree-structured linguistic explanation, without the need for model training. |
Xingyi Yang; Xinchao Wang; |
461 | Large Language Models As Hyper-Heuristics for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation, featuring minimal manual intervention and open-ended heuristic spaces. |
Haoran Ye; Jiarui Wang; Zhiguang Cao; Federico Berto; Chuanbo Hua; HAEYEON KIM; Jinkyoo Park; Guojie Song; |
462 | Can Models Learn Skill Composition from Examples? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we employ a setup akin to Skill-Mix to evaluate the capacity of smaller models to learn compositional generalization from examples. |
Haoyu Zhao; Simran Kaur; Dingli Yu; Anirudh Goyal; Sanjeev Arora; |
463 | Stress-Testing Capability Elicitation With Password-Locked Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the conditions under which fine-tuning-based elicitation suffices to elicit capabilities. |
Ryan Greenblatt; Fabien Roger; Dmitrii Krasheninnikov; David Krueger; |
464 | Communication Bounds for The Distributed Experts Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the experts problem in the distributed setting where an expert’s cost needs to be aggregated across multiple servers. |
Zhihao Jia; Qi Pang; Trung Tran; David Woodruff; Zhihao Zhang; Wenting Zheng; |
465 | Learning Cooperative Trajectory Representations for Motion Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a forecasting-oriented representation paradigm to utilize motion and interaction features from cooperative information. |
Hongzhi Ruan; Haibao Yu; Wenxian Yang; Siqi Fan; Zaiqing Nie; |
466 | AnonFair: A Flexible Toolkit for Algorithmic Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AnonFair, a new open source toolkit for enforcing algorithmic fairness. |
Eoin Delaney; Zihao Fu; Chris Russell; |
467 | Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new weight decay technique, Selective Projection Decay (SPD), that selectively imposes a strong penalty on certain layers while allowing others to change freely. |
Junjiao Tian; Chengyue Huang; Zsolt Kira; |
468 | GenRec: Unifying Video Generation and Recognition with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. |
Zejia Weng; Xitong Yang; Zhen Xing; Zuxuan Wu; Yu-Gang Jiang; |
469 | LoFiT: Localized Fine-tuning on LLM Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning serves as an effective alternative to such representation intervention methods. |
Fangcong Yin; Xi Ye; Greg Durrett; |
470 | UPS: Unified Projection Sharing for Lightweight Single-Image Super-resolution and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel Unified Projection Sharing algorithm(UPS) to decouple the feature extraction and similarity modeling, achieving notable performance. |
Kun Zhou; Xinyu Lin; Zhonghang LIU; Xiaoguang Han; Jiangbo Lu; |
471 | DOFEN: Deep Oblivious Forest ENsemble Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces DOFEN, which stands for Deep Oblivious Forest ENsemble. |
KuanYu Chen; Ping-Han Chiang; Hsin-Rung Chou; Chih-Sheng Chen; Tien-Hao Chang; |
472 | Towards Understanding The Working Mechanism of Text-to-Image Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, the strong latent Diffusion Probabilistic Model (DPM) has been applied to high-quality Text-to-Image (T2I) generation (e.g., Stable Diffusion), by injecting the encoded target text prompt into the gradually denoised diffusion image generator. |
Mingyang Yi; Aoxue Li; Yi Xin; Zhenguo Li; |
473 | Lexicon3D: Probing Visual Encoding Models for Complex 3D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios. |
Yunze Man; Shuhong Zheng; Zhipeng Bao; Martial Hebert; Liangyan Gui; Yu-Xiong Wang; |
474 | EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To explore the feasibility of training a text-to-image generation model comparable to advanced models using publicly available resources, we introduce EvolveDirector. |
Rui Zhao; Hangjie Yuan; Yujie Wei; Shiwei Zhang; Yuchao Gu; Lingmin Ran; Xiang Wang; Jay Zhangjie Wu; David Junhao Zhang; Yingya Zhang; Mike Zheng Shou; |
475 | Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Universal Physics Transformers (UPTs), an efficient and unified learning paradigm for a wide range of spatio-temporal problems. |
Benedikt Alkin; Andreas Fürst; Simon Schmid; Lukas Gruber; Markus Holzleitner; Johannes Brandstetter; |
476 | 3D Gaussian Splatting As Markov Chain Monte Carlo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we rethink the set of 3D Gaussians as a random sample drawn from an underlying probability distribution describing the physical representation of the scene—in other words, Markov Chain Monte Carlo (MCMC) samples. |
Shakiba Kheradmand; Daniel Rebain; Gopal Sharma; Weiwei Sun; Yang-Che Tseng; Hossam Isack; Abhishek Kar; Andrea Tagliasacchi; Kwang Moo Yi; |
477 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present \ModelFullName (\ModelName), which processes long in-context text using visual tokens. |
Jinpeng Wang; Linjie Li; Yiqi Lin; Min Li; Lijuan Wang; Mike Zheng Shou; |
478 | Renovating Names in Open-Vocabulary Segmentation Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for renovating names in open-vocabulary segmentation benchmarks (RENOVATE). |
Haiwen Huang; Songyou Peng; Dan Zhang; Andreas Geiger; |
479 | Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically study the impact of data curation on iterated retraining of generative models and show that it can be seen as an implicit preference optimization mechanism. |
Damien Ferbach; Quentin Bertrand; Joey Bose; Gauthier Gidel; |
480 | Lean Workbook: A Large-scale Lean Problem Set Formalized from Natural Language Math Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A significant challenge in this area is the scarcity of training data available in these formal languages. To address this issue, we propose a novel pipeline that iteratively generates and filters synthetic data to translate natural language mathematical problems into Lean 4 statements, and vice versa. |
Huaiyuan Ying; Zijian Wu; Yihan Geng; JIayu Wang; Dahua Lin; Kai Chen; |
481 | Selective Attention: Enhancing Transformer Through Principled Context Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While self-attention has enjoyed major success, it notably treats all queries $q$ in the same way by applying the mapping $V^\top\text{softmax}(Kq)$, where $V,K$ are the query and key respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. |
Xuechen Zhang; Xiangyu Chang; Mingchen Li; Amit Roy-Chowdhury; Jiasi Chen; Samet Oymak; |
482 | Inference Via Interpolation: Contrastive Representations Provably Enable Planning and Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These sorts of probabilistic inference questions are challenging when observations are high-dimensional. In this paper, we show how these questions can have compact, closed form solutions in terms of learned representations. |
Benjamin Eysenbach; Vivek Myers; Ruslan Salakhutdinov; Sergey Levine; |
483 | Theoretical and Empirical Insights Into The Origins of Degree Bias in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias. |
Arjun Subramonian; Jian Kang; Yizhou Sun; |
484 | EGODE: An Event-attended Graph ODE Framework for Modeling Rigid Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach named Event-attend Graph ODE(EGODE) for effective rigid dynamics modeling. |
Jingyang Yuan; Gongbo Sun; Zhiping Xiao; Hang Zhou; Xiao Luo; Junyu Luo; Yusheng Zhao; Wei Ju; Ming Zhang; |
485 | XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, traditional techniques such as global feature alignment or vision-language model distillation tend to impose only approximate correspondence, struggling notably with delineating fine-grained segmentation boundaries. To address this gap, we propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D. |
Ziyi Wang; Yanbo Wang; Xumin Yu; Jie Zhou; Jiwen Lu; |
486 | Learning 3D Garment Animation from Trajectories of A Piece of Cloth Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, instead of using garment-wise supervised-learning we adopt a disentangled scheme to learn how to animate observed garments: 1). |
Yidi Shao; Chen Change Loy; Bo Dai; |
487 | Online Adaptation of Language Models with A Memory of Amortized Contexts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank. |
Jihoon Tack; Jaehyung Kim; Eric Mitchell; Jinwoo Shin; Yee Whye Teh; Jonathan Richard Schwarz; |
488 | Unlocking The Potential of Global Human Expertise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is difficult to identify, combine, and refine complementary information in an increasingly large and diverse knowledge base. This paper argues that artificial intelligence (AI) can play a crucial role in this process. |
Elliot Meyerson; Olivier Francon; Darren Sargent; Babak Hodjat; Risto Miikkulainen; |
489 | RankRAG: Unifying Retrieval-Augmented Generation and Context Ranking in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method called RankRAG, which instruction-tunes a single LLM for both context ranking and answer generation in RAG. |
Yue Yu; Wei Ping; Zihan Liu; Boxin Wang; Jiaxuan You; Chao Zhang; Mohammad Shoeybi; Bryan Catanzaro; |
490 | HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed Via Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time. |
Yuanhao Cai; Zihao Xiao; Yixun Liang; Minghan Qin; Yulun Zhang; Xiaokang Yang; Yaoyao Liu; Alan Yuille; |
491 | $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Addressing the limitations of static $\beta$ values, we introduce a novel framework that dynamically calibrates $\beta$ at the batch level, informed by data quality considerations. |
Junkang Wu; Yuexiang Xie; Zhengyi Yang; Jiancan Wu; Jinyang Gao; Bolin Ding; Xiang Wang; Xiangnan He; |
492 | Antigen-Specific Antibody Design Via Direct Energy-based Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences, considering both rationality and functionality. |
Xiangxin Zhou; Dongyu Xue; Ruizhe Chen; Zaixiang Zheng; Liang Wang; Quanquan Gu; |
493 | CountGD: Multi-Modal Open-World Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to improve the generality and accuracy of open-vocabulary object counting in images. |
Niki Amini-Naieni; Tengda Han; Andrew Zisserman; |
494 | Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, … |
Xiaosen Zheng; Tianyu Pang; Chao Du; Qian Liu; Jing Jiang; Min Lin; |
495 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the finding that existing tokenizers are tailored to either image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. |
Junke Wang; Yi Jiang; Zehuan Yuan; BINGYUE PENG; Zuxuan Wu; Yu-Gang Jiang; |
496 | The Best of Both Worlds: Toward An Honest and Helpful Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a training-free method named Curiosity-Driven Prompting, which enables LLMs to express their internal confusion and uncertainty about the given query and then optimize their responses. |
Gao Chujie; Qihui Zhang; Dongping Chen; Yue Huang; Siyuan Wu; Zhengyan Fu; Yao Wan; Xiangliang Zhang; Lichao Sun; |
497 | Vector Quantization Prompting for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. |
Li Jiao; Qiuxia LAI; YU LI; Qiang Xu; |
498 | Mitigating Backdoor Attack By Injecting Proactive Defensive Backdoor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. |
Shaokui Wei; Hongyuan Zha; Baoyuan Wu; |
499 | Measuring Per-Unit Interpretability at Scale Without Humans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the first scalable method to measure the per-unit interpretability in vision DNNs. |
Roland S. Zimmermann; David Klindt; Wieland Brendel; |
500 | MVGamba: Unify 3D Content Generation As State Space Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). |
Xuanyu Yi; Zike Wu; Qiuhong Shen; Qingshan Xu; Pan Zhou; Joo-Hwee Lim; Shuicheng Yan; Xinchao Wang; Hanwang Zhang; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~4,500 papers), please visit Paper Digest: NeurIPS-2024 (Full List).