Paper Digest: NeurIPS 2023 Highlights
Note: NeurIPS-2023 accepts more than 3,500 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 3,500 NeurIPS-2023 papers in a separate page, which takes quite some time to load.
To search or review papers within NIPS-2023 related to a specific topic, please use the search by venue (NIPS-2023) and review by venue (NIPS-2023) services. To browse papers by author, here is a list of all authors (NIPS-2023). You may also like to explore our “Best Paper” Digest (NeurIPS), which lists the most influential NeurIPS papers since 1987.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: NeurIPS 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Toolformer: Language Models Can Teach Themselves to Use Tools Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that LMs can teach themselves to *use external tools* via simple APIs and achieve the best of both worlds. |
Timo Schick; Jane Dwivedi-Yu; Roberto Dessi; Roberta Raileanu; Maria Lomeli; Eric Hambro; Luke Zettlemoyer; Nicola Cancedda; Thomas Scialom; |
2 | Self-Refine: Iterative Refinement with Self-Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. |
Aman Madaan; Niket Tandon; Prakhar Gupta; Skyler Hallinan; Luyu Gao; Sarah Wiegreffe; Uri Alon; Nouha Dziri; Shrimai Prabhumoye; Yiming Yang; Shashank Gupta; Bodhisattwa Prasad Majumder; Katherine Hermann; Sean Welleck; Amir Yazdanbakhsh; Peter Clark; |
3 | Vicuna Evaluation: Exploring LLM-as-a-Judge and Chatbot Arena Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them. |
Lianmin Zheng; Wei-Lin Chiang; Ying Sheng; Siyuan Zhuang; Zhanghao Wu; Yonghao Zhuang; Zi Lin; Zhuohan Li; Dacheng Li; Eric Xing; Hao Zhang; Joseph Gonzalez; Ion Stoica; |
4 | Mathematical Capabilities of ChatGPT Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. |
Simon Frieder; Luca Pinchetti; Chevalier; Ryan-Rhys Griffiths; Tommaso Salvatori; Thomas Lukasiewicz; Philipp Petersen; Julius Berner; |
5 | The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as larger models requiring pretraining on trillions of tokens are considered, it is unclear how scalable is curation, and whether we will run out of unique high-quality data soon. At variance with previous beliefs, we show that properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models trained on The Pile. |
Guilherme Penedo; Quentin Malartic; Daniel Hesslow; Ruxandra Cojocaru; Hamza Alobeidli; Alessandro Cappelli; Baptiste Pannier; Ebtesam Almazrouei; Julien Launay; |
6 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. |
Wenliang Dai; Junnan Li; DONGXU LI; Anthony Meng Huat Tiong; Junqi Zhao; Weisheng Wang; Boyang Li; Pascale N Fung; Steven Hoi; |
7 | QLoRA: Efficient Finetuning of Quantized LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. |
Tim Dettmers; Artidoro Pagnoni; Ari Holtzman; Luke Zettlemoyer; |
8 | Language Is Not All You Need: Aligning Perception with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce KOSMOS-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). |
Shaohan Huang; Li Dong; Wenhui Wang; Yaru Hao; Saksham Singhal; Shuming Ma; Tengchao Lv; Lei Cui; Owais Khan Mohammed; Barun Patra; Qiang Liu; Kriti Aggarwal; Zewen Chi; Nils Bjorck; Vishrav Chaudhary; Subhojit Som; XIA SONG; Furu Wei; |
9 | Reflexion: Language Agents with Verbal Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Reflexion, a novel framework to reinforce language agents not by updating weights, but instead through linguistic feedback. |
Noah Shinn; Federico Cassano; Ashwin Gopinath; Karthik Narasimhan; Shunyu Yao; |
10 | Scaling Data-Constrained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. |
Niklas Muennighoff; Alexander Rush; Boaz Barak; Teven Le Scao; Nouamane Tazi; Aleksandra Piktus; Thomas Wolf; Colin Raffel; Sampo Pyysalo; |
11 | LIMA: Less Is More for Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling |
Chunting Zhou; Pengfei Liu; Puxin Xu; Srinivasan Iyer; Jiao Sun; Yuning Mao; Xuezhe Ma; Avia Efrat; Ping Yu; LILI YU; Susan Zhang; Gargi Ghosh; Mike Lewis; Luke Zettlemoyer; Omer Levy; |
12 | Segment Everything Everywhere All at Once Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present SEEM, a promotable and interactive model for segmenting everything everywhere all at once in an image. |
Xueyan Zou; Jianwei Yang; Hao Zhang; Feng Li; Linjie Li; Jianfeng Wang; Lijuan Wang; Jianfeng Gao; Yong Jae Lee; |
13 | Language Models Can Solve Computer Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent recursively criticizes and improves its output (RCI). |
Geunwoo Kim; Pierre Baldi; Stephen McAleer; |
14 | Stable Bias: Evaluating Societal Representations in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This evaluation, however, is made more difficult by the synthetic nature of these systems’ outputs: common definitions of diversity are grounded in social categories of people living in the world, whereas the artificial depictions of fictive humans created by these systems have no inherent gender or ethnicity. To address this need, we propose a new method for exploring the social biases in TTI systems. |
Sasha Alexandra Luccioni; Christopher Akiki; Margaret Mitchell; Yacine Jernite; |
15 | HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in Hugging Face Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. |
Yongliang Shen; Kaitao Song; Xu Tan; Dongsheng Li; Weiming Lu; Yueting Zhuang; |
16 | Direct Preference Optimization: Your Language Model Is Secretly A Reward Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper, we leverage a mapping between reward functions and optimal policies to show that this constrained reward maximization problem can be optimized exactly with a single stage of policy training, essentially solving a classification problem on the human preference data. |
Rafael Rafailov; Archit Sharma; Eric Mitchell; Christopher D Manning; Stefano Ermon; Chelsea Finn; |
17 | StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that (1) when the generative model is properly configured, training self-supervised methods on synthetic images can match or beat the real image counterpart;(2) by treating the multiple images generated from the same text prompt as positives for each other, we develop a multi-positive contrastive learning method, which we call StableRep. |
Yonglong Tian; Lijie Fan; Phillip Isola; Huiwen Chang; Dilip Krishnan; |
18 | Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs�e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always (A)�which models systematically fail to mention in their explanations. |
Miles Turpin; Julian Michael; Ethan Perez; Samuel Bowman; |
19 | ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a comprehensive solution to learn and improve text-to-image models from human preference feedback. |
Jiazheng Xu; Xiao Liu; Yuchen Wu; Yuxuan Tong; Qinkai Li; Ming Ding; Jie Tang; Yuxiao Dong; |
20 | Self-Supervised Learning with Lie Symmetries for Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. |
Grégoire Mialon; Quentin Garrido; Hannah Lawrence; Danyal Rehman; Bobak Kiani; Yann LeCun; |
21 | Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. |
Yuval Kirstain; Adam Polyak; Uriel Singer; Shahbuland Matiana; Joe Penna; Omer Levy; |
22 | Distributed Inference and Fine-tuning of Large Language Models Over The Internet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. |
Alexander Borzunov; Dmitry Baranchuk; Tim Dettmers; Max Ryabinin; Younes Belkada; Artem Chumachenko; Pavel Samygin; Colin Raffel; |
23 | Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and an LLM for general-purpose visual and language understanding. |
Haotian Liu; Chunyuan Li; Qingyang Wu; Yong Jae Lee; |
24 | Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from The Data Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to advance our understanding by presenting a straightforward and unified explanation from the data perspective. |
Huayang Li; Tian Lan; Zihao Fu; Deng Cai; Lemao Liu; Nigel Collier; Taro Watanabe; Yixuan Su; |
25 | AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Replicating and understanding this instruction-following process faces three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these bottlenecks with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. |
Yann Dubois; Xuechen Li; Rohan Taori; Tianyi Zhang; Ishaan Gulrajani; Jimmy Ba; Carlos Guestrin; Percy Liang; Tatsunori Hashimoto; |
26 | Dissecting Knowledge Distillation: An Exploration of Its Inner Workings and Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Does its data invariance properties become similar? Our work presents a comprehensive study to try to answer these questions. |
Utkarsh Ojha; Yuheng Li; Anirudh Sundara Rajan; Yingyu Liang; Yong Jae Lee; |
27 | OpenProteinSet: Training Data for Structural Biology at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. |
Gustaf Ahdritz; Nazim Bouatta; Sachin Kadyan; Lukas Jarosch; Dan Berenberg; Ian Fisk; Andrew Watkins; Stephen Ra; Richard Bonneau; Mohammed AlQuraishi; |
28 | Does Localization Inform Editing? Surprising Differences in Causality-Based Localization Vs. Knowledge Editing in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored. |
Peter Hase; Mohit Bansal; Been Kim; Asma Ghandeharioun; |
29 | Towards Automated Circuit Discovery for Mechanistic Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes a novel algorithm, Automatic Circuit DisCovery (ACDC), to automate the identification of the important units in the network. |
Arthur Conmy; Augustine Mavor-Parker; Aengus Lynch; Stefan Heimersheim; Adrià Garriga-Alonso; |
30 | Visual Instruction Inversion: Image Editing Via Image Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for image editing via visual prompting. |
Thao Nguyen; Yuheng Li; Utkarsh Ojha; Yong Jae Lee; |
31 | Paraphrasing Evades Detectors of AI-generated Text, But Retrieval Is An Effective Defense Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. |
Kalpesh Krishna; Yixiao Song; Marzena Karpinska; John Wieting; Mohit Iyyer; |
32 | SceneScape: Text-Driven Consistent Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for text-driven perpetual view generation — synthesizing long-term videos of various scenes solely, given an input text prompt describing the scene and camera poses. |
Rafail Fridman; Amit Abecasis; Yoni Kasten; Tali Dekel; |
33 | Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Annotator, a general and efficient active learning baseline, in which a plain voxel-centric online selection strategy is tailored to probe and annotate the salient and exemplar voxel girds within each LiDAR scan, broadening the potential of segmentation performance even under distribution shift. |
Binhui Xie; Shuang Li; Qingju Guo; Chi Liu; Xinjing Cheng; |
34 | Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that commonly used homophily measures have critical drawbacks preventing the comparison of homophily levels across different datasets. |
Oleg Platonov; Denis Kuznedelev; Artem Babenko; Liudmila Prokhorenkova; |
35 | Tree of Thoughts: Deliberate Problem Solving with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. |
Shunyu Yao; Dian Yu; Jeffrey Zhao; Izhak Shafran; Tom Griffiths; Yuan Cao; Karthik Narasimhan; |
36 | Guide Your Agent with Adaptive Multimodal Rewards Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we instead propose to utilize the knowledge captured within large vision-language models for improving the generalization capability of control agents. |
Changyeon Kim; Younggyo Seo; Hao Liu; Lisa Lee; Jinwoo Shin; Honglak Lee; Kimin Lee; |
37 | C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. |
Yuzhen Huang; Yuzhuo Bai; Zhihao Zhu; Junlei Zhang; Jinghan Zhang; Tangjun Su; Junteng Liu; Chuancheng Lv; Yikai Zhang; jiayi lei; Yao Fu; Maosong Sun; Junxian He; |
38 | Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, if we want to make use of the semantic knowledge in a language model while still situating it in an embodied setting, we must construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. |
Wenlong Huang; Fei Xia; Dhruv Shah; Danny Driess; Andy Zeng; Yao Lu; Pete Florence; Igor Mordatch; Sergey Levine; Karol Hausman; brian ichter; |
39 | BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite initial defenses proposed in recent studies, these methods have very limited generalizability and scalability. To address this issue, we propose BIRD, a technique to detect and remove backdoors from a pretrained DRL policy in a clean environment without requiring any knowledge about the attack specifications and accessing its training process. |
Xuan Chen; Wenbo Guo; Guanhong Tao; Xiangyu Zhang; Dawn Song; |
40 | Battle of The Backbones: A Large-Scale Comparison of Pretrained Models Across Computer Vision Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. |
Micah Goldblum; Hossein Souri; Renkun Ni; Manli Shu; Viraj Prabhu; Gowthami Somepalli; Prithvijit Chattopadhyay; Adrien Bardes; Mark Ibrahim; Judy Hoffman; Rama Chellappa; Andrew Wilson; Tom Goldstein; |
41 | A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. |
Valeriia Cherepanova; Gowthami Somepalli; Jonas Geiping; C. Bayan Bruss; Andrew Wilson; Tom Goldstein; Micah Goldblum; |
42 | Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the belief that the inductive bias of a model architecture is more important than the bias mitigation strategy, we take a different approach to bias mitigation. |
Samuel Dooley; Rhea Sukthanker; John Dickerson; Colin White; Frank Hutter; Micah Goldblum; |
43 | Are Aligned Neural Networks Adversarially Aligned? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They respond helpfully to user questions, but when asked to perform some behavior that would cause harm, will politely refuse. We study to what extent these models are aligned even when interacting with an adversarial user who constructs worst-case adversarial example inputs. |
Nicholas Carlini; Florian Tramer; Daphne Ippolito; Ludwig Schmidt; Milad Nasr; Matthew Jagielski; Pang Wei Koh; Irena Gao; Christopher A. Choquette-Choo; |
44 | Patch N’ Pack: NaViT, A Vision Transformer for Any Aspect Ratio and Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence packing during training to process inputs of arbitrary resolutions and aspect ratios. |
Mostafa Dehghani; Basil Mustafa; Josip Djolonga; Jonathan Heek; Matthias Minderer; Mathilde Caron; Andreas Steiner; Joan Puigcerver; Robert Geirhos; Ibrahim Alabdulmohsin; Avital Oliver; Piotr Padlewski; Alexey Gritsenko; Mario Lucic; Neil Houlsby; |
45 | Likelihood-Based Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take the first steps towards closing the perplexity gap between autoregressive and diffusion-based language models, with the goal of building and releasing a diffusion model which outperforms the smallest widely-adopted autoregressive model (GPT-2 124M). |
Ishaan Gulrajani; Tatsunori Hashimoto; |
46 | Structural Pruning for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones, without the need for extensive re-training. |
Gongfan Fang; Xinyin Ma; Xinchao Wang; |
47 | Data Selection for Language Models Via Importance Resampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we extend the classic importance resampling approach used in low-dimensions for LM data selection. We propose Data Selection with Importance Resampling (DSIR), an efficient and scalable framework that estimates importance weights in a reduced feature space for tractability and selects data with importance resampling according to these weights. |
Sang Michael Xie; Shibani Santurkar; Tengyu Ma; Percy Liang; |
48 | Learning Universal Policies Via Text-Guided Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images, exhibiting combinatorial generalization across domains. Motivated by this success, we investigate whether such tools can be used to construct more general-purpose agents. |
Yilun Du; Mengjiao (Sherry) Yang; Bo Dai; Hanjun Dai; Ofir Nachum; Josh Tenenbaum; Dale Schuurmans; Pieter Abbeel; |
49 | Counterfactual Memorization in Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate a notion of counterfactual memorization which characterizes how a model’s predictions change if a particular document is omitted during training. |
Chiyuan Zhang; Daphne Ippolito; Katherine Lee; Matthew Jagielski; Florian Tramer; Nicholas Carlini; |
50 | Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. |
Zhiqing Sun; Yikang Shen; Qinhong Zhou; Hongxin Zhang; Zhenfang Chen; David Cox; Yiming Yang; Chuang Gan; |
51 | Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. |
Zeqiu Wu; Yushi Hu; Weijia Shi; Nouha Dziri; Alane Suhr; Prithviraj (Raj) Ammanabrolu; Noah Smith; Mari Ostendorf; Hannaneh Hajishirzi; |
52 | VisoGender: A Dataset for Benchmarking Gender Bias in Image-text Pronoun Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models. |
Siobhan Mackenzie Hall; Fernanda Gonçalves Abrantes; Hanwen Zhu; Grace Sodunke; Aleksandar Shtedritski; Hannah Rose Kirk; |
53 | TextDiffuser: Diffusion Models As Text Painters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. |
Jingye Chen; Yupan Huang; Tengchao Lv; Lei Cui; Qifeng Chen; Furu Wei; |
54 | Isotropic Loss Design for Non-contrastive SSL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we analytically study learning dynamics under cosine similarity in the eigenspace of the predictor network and show that collapse is avoided through implicit variance regularization similar to Euclidean loss but with fundamentally different dynamics. |
Manu Srinath Halvagal; Axel Laborieux; Friedemann Zenke; |
55 | ToolkenGPT: Augmenting Frozen Language Models with Massive Tools Via Tool Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the latter method offers adaptability to new tools, it struggles with the inherent context length constraint of LLMs when many new tools are presented, and mastering a new set of tools with few-shot examples remains challenging, resulting in suboptimal performance. To address these limitations, we propose a novel solution, named **ToolkenGPT**, wherein LLMs effectively learn to master tools as predicting tokens through **tool embeddings** for solving complex tasks. |
Shibo Hao; Tianyang Liu; Zhen Wang; Zhiting Hu; |
56 | Generating Images with Multimodal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. |
Jing Yu Koh; Daniel Fried; Russ Salakhutdinov; |
57 | Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose using reinforcement learning (RL) to fine-tune text-to-image models. |
Ying Fan; Olivia Watkins; Yuqing Du; Hao Liu; Moonkyung Ryu; Craig Boutilier; Pieter Abbeel; Mohammad Ghavamzadeh; Kangwook Lee; Kimin Lee; |
58 | Solving Inverse Problems Provably Via Posterior Sampling with Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first framework to solve general inverse problems leveraging pre-trained *latent* diffusion models. |
Litu Rout; Negin Raoof; Giannis Daras; Constantine Caramanis; Alex Dimakis; Sanjay Shakkottai; |
59 | Ordering-based Conditions for Global Convergence of Policy Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation. |
Jincheng Mei; Bo Dai; Alekh Agarwal; Mohammad Ghavamzadeh; Csaba Szepesvari; Dale Schuurmans; |
60 | Optimizing Prompts for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. |
Yaru Hao; Zewen Chi; Li Dong; Furu Wei; |
61 | Language Models Augmented with Decoupled Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Decoupled-Memory-Augmented LLMs (DeMA), which enables LLMs to memorize long history. |
Weizhi Wang; Li Dong; Hao Cheng; Xiaodong Liu; Xifeng Yan; Jianfeng Gao; Furu Wei; |
62 | Extensible Prompts for Language Models on Zero-shot Language Style Customization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL). |
Tao Ge; Hu Jing; Li Dong; Shaoguang Mao; Yan Xia; Xun Wang; Si-Qing Chen; Furu Wei; |
63 | LLaVA-Med: Training A Large Language-and-Vision Assistant for Biomedicine in One Day Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. |
Chunyuan Li; Cliff Wong; Sheng Zhang; Naoto Usuyama; Haotian Liu; Jianwei Yang; Tristan Naumann; Hoifung Poon; Jianfeng Gao; |
64 | Tracr: Compiled Transformers As A Laboratory for Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show how to compile human-readable programs into standard decoder-only transformer models. |
David Lindner; Janos Kramar; Sebastian Farquhar; Matthew Rahtz; Tom McGrath; Vladimir Mikulik; |
65 | Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation. |
Zichen Zhang; Johannes Kirschner; Junxi Zhang; Francesco Zanini; Alex Ayoub; Masood Dehghan; Dale Schuurmans; |
66 | Multi-Objective Agency Requires Non-Markovian Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a practical non-Markovian aggregation scheme that overcomes the impossibility with only one additional parameter for each objective. |
Silviu Pitis; |
67 | Bypass Exponential Time Preprocessing: Fast Neural Network Training Via Weight-Data Correlation Preprocessing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration. |
Josh Alman; 杰昊 梁; Zhao Song; Ruizhe Zhang; Danyang Zhuo; |
68 | LLM-Pruner: On The Structural Pruning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM. |
Xinyin Ma; Gongfan Fang; Xinchao Wang; |
69 | Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We define a novel self-supervised learning (SSL) framework for properly arranging representations in an abstract coding space, and show that it can produce grid codes when constrained to perform high-efficiency representation of space with recurrent neural networks. |
Rylan Schaeffer; Mikail Khona; Tzuhsuan Ma; Cristobal Eyzaguirre; Sanmi Koyejo; Ila Fiete; |
70 | Are Emergent Abilities of Large Language Models A Mirage? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. |
Rylan Schaeffer; Brando Miranda; Sanmi Koyejo; |
71 | Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We use a linear assignment algorithm to place images into longer bodies of text using CLIP features, a process that we show outperforms alternatives. |
Wanrong Zhu; Jack Hessel; Anas Awadalla; Samir Yitzhak Gadre; Jesse Dodge; Alex Fang; Youngjae Yu; Ludwig Schmidt; William Yang Wang; Yejin Choi; |
72 | DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper broadens the current scope of neural solvers for NPC problems by introducing a new graph-based diffusion framework, namely DIFUSCO. |
Zhiqing Sun; Yiming Yang; |
73 | Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning. |
Pan Lu; Baolin Peng; Hao Cheng; Michel Galley; Kai-Wei Chang; Ying Nian Wu; Song-Chun Zhu; Jianfeng Gao; |
74 | Unlimiformer: Long-Range Transformers with Unlimited Length Input Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single $k$-nearest-neighbor ($k$NN) index, while the returned $k$NN distances are the attention dot-product scores. |
Amanda Bertsch; Uri Alon; Graham Neubig; Matthew Gormley; |
75 | Simple and Controllable Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. |
Jade Copet; Felix Kreuk; Itai Gat; Tal Remez; Gabriel Synnaeve; Yossi Adi; Alexandre Defossez; |
76 | Setting The Trap: Capturing and Defeating Backdoor Threats in PLMs Through Honeypots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. |
Ruixiang Tang; Jiayi Yuan; Yiming Li; Zirui Liu; Rui Chen; Xia Hu; |
77 | Faith and Fate: Limits of Transformers on Compositionality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a measure of compositional complexity, we introduce computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. |
Nouha Dziri; Ximing Lu; Melanie Sclar; Xiang (Lorraine) Li; Liwei Jiang; Bill Yuchen Lin; Sean Welleck; Peter West; Chandra Bhagavatula; Ronan Le Bras; Jena Hwang; Soumya Sanyal; Xiang Ren; Allyson Ettinger; Zaid Harchaoui; Yejin Choi; |
78 | What Makes Good Examples for Visual In-Context Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To demystify in-context learning in computer vision, we conduct an extensive research and identify a critical problem: downstream performance is highly sensitivie to the choice of visual in-context examples. To address this problem, we propose a prompt retrieval framework specifically for large vision models, allowing the selection of in-context examples to be fully automated. |
Yuanhan Zhang; Kaiyang Zhou; Ziwei Liu; |
79 | Fast Attention Requires Bounded Entries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate whether faster algorithms are possible by \emph{implicitly} making use of the matrix $A$. |
Josh Alman; Zhao Song; |
80 | Simplifying and Empowering Transformers for Large-Graph Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. |
Qitian Wu; Wentao Zhao; Chenxiao Yang; Hengrui Zhang; Fan Nie; Haitian Jiang; Yatao Bian; Junchi Yan; |
81 | RealTime QA: What’s The Answer Right Now? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce RealTime QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). |
Jungo Kasai; Keisuke Sakaguchi; yoichi takahashi; Ronan Le Bras; Akari Asai; Xinyan Yu; Dragomir Radev; Noah Smith; Yejin Choi; Kentaro Inui; |
82 | Segment Anything in 3D with NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to generalize SAM to segment 3D objects. |
Jiazhong Cen; Zanwei Zhou; Jiemin Fang; chen yang; Wei Shen; Lingxi Xie; Dongsheng Jiang; XIAOPENG ZHANG; Qi Tian; |
83 | LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address a more general problem where the poses of the fragments are *unknown* in 3D space. |
Jiaqi Guan; Xingang Peng; PeiQi Jiang; Yunan Luo; Jian Peng; Jianzhu Ma; |
84 | ClusterFomer: Clustering As A Universal Visual Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ClusterFormer, a universal vision model that is based on the Clustering paradigm with TransFormer. |
James Liang; Yiming Cui; Qifan Wang; Tong Geng; Wenguan Wang; Dongfang Liu; |
85 | Perfect Linear Concept Erasure in Closed Form Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that a rank $k – 1$ orthogonal projection is sufficient to perfectly guard a $k$-class concept from all linear adversaries with convex loss functions, and provide the formula in closed form. |
Nora Belrose; David Schneider-Joseph; Shauli Ravfogel; Ryan Cotterell; Edward Raff; Stella Biderman; |
86 | Language Models Meet World Models: Embodied Experiences Enhance Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. |
Jiannan Xiang; Tianhua Tao; Yi Gu; Tianmin Shu; Zirui Wang; Zichao Yang; Zhiting Hu; |
87 | Provably Bounding Neural Network Preimages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present the INVPROP algorithm for verifying properties over the preimage of a linearly constrained output set of a neural network, which can be combined with branch-and-bound to increase precision. |
Christopher Brix; Suhas Kotha; Huan Zhang; J. Zico Kolter; Krishnamurthy Dvijotham; |
88 | Does Progress on ImageNet Transfer to Real-world Datasets? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collected for comparing models. |
Alex Fang; Simon Kornblith; Ludwig Schmidt; |
89 | What Is The Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. |
Khashayar Gatmiry; Zhiyuan Li; Tengyu Ma; Sashank Reddi; Stefanie Jegelka; Ching-Yao Chuang; |
90 | VisionLLM: Large Language Model Is Also An Open-Ended Decoder for Vision-Centric Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present an LLM-based framework for vision-centric tasks, termed VisionLLM. |
Wenhai Wang; Zhe Chen; Xiaokang Chen; Jiannan Wu; Xizhou Zhu; Gang Zeng; Ping Luo; Tong Lu; Jie Zhou; Yu Qiao; Jifeng Dai; |
91 | Fine-Tuning Language Models with Just Forward Passes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a memory-efficient zeroth-order optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. |
Sadhika Malladi; Tianyu Gao; Eshaan Nichani; Alex Damian; Jason Lee; Danqi Chen; Sanjeev Arora; |
92 | Incentives in Federated Learning: Equilibria, Dynamics, and Mechanisms for Welfare Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we model a collaborative FL framework, where every agent attempts to achieve an optimal trade-off between her learning payoff and data sharing cost. |
Aniket Murhekar; Zhuowen Yuan; Bhaskar Ray Chaudhury; Bo Li; Ruta Mehta; |
93 | Scaling Riemannian Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements. |
Aaron Lou; Minkai Xu; Adam Farris; Stefano Ermon; |
94 | (Provable) Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the first time, we propose a sound notion of adversarial robustness that accounts for task equivariance. |
Jan Schuchardt; Yan Scholten; Stephan Günnemann; |
95 | PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning About Change Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There is a strong need for systematic and extensible planning benchmarks with sufficient diversity to evaluate whether LLMs have innate planning capabilities. Motivated by this, we propose PlanBench, an extensible benchmark suite based on the kinds of domains used in the automated planning community, especially in the International Planning Competition, to test the capabilities of LLMs in planning or reasoning about actions and change. |
Karthik Valmeekam; Matthew Marquez; Alberto Olmo; Sarath Sreedharan; Subbarao Kambhampati; |
96 | On The Planning Abilities of Large Language Models – A Critical Investigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. |
Karthik Valmeekam; Matthew Marquez; Sarath Sreedharan; Subbarao Kambhampati; |
97 | InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the generated code and its final execution environment. To address this gap, we introduce InterCode, a lightweight, flexible, and easy-to-use framework for constructing interactive code environments with multiple types of feedback signals. |
John Yang; Akshara Prabhakar; Karthik Narasimhan; Shunyu Yao; |
98 | Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. |
Mitsuhiko Nakamoto; Yuexiang Zhai; Anikait Singh; Max Sobol Mark; Yi Ma; Chelsea Finn; Aviral Kumar; Sergey Levine; |
99 | Diffusion Self-Guidance for Controllable Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce self-guidance, a method that provides precise control over properties of the generated image by guiding the internal representations of diffusion models. |
Dave Epstein; Allan Jabri; Ben Poole; Alexei Efros; Aleksander Holynski; |
100 | Stable and Low-precision Training for Large-scale Vision-language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. |
Mitchell Wortsman; Tim Dettmers; Luke Zettlemoyer; Ari Morcos; Ali Farhadi; Ludwig Schmidt; |
101 | Collaborative Development of NLP Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the exhaustive delineation of a concept is challenging, and an improper approach can create shortcuts or interfere with original data or other concepts. To address these challenges, we introduce CoDev, a framework that enables multi-user interaction with the model, thereby mitigating individual limitations. |
Fereshte Khani; Marco Tulio Ribeiro; |
102 | The Clock and The Pizza: Two Stories in Mechanistic Explanation of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex: small changes to model hyperparameters and initializations can induce discovery of qualitatively different algorithms from a fixed training set, and even learning of multiple different solutions in parallel. |
Ziqian Zhong; Ziming Liu; Max Tegmark; Jacob Andreas; |
103 | Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact, an entire family of generative models can be constructed by varying this choice. |
Arpit Bansal; Eitan Borgnia; Hong-Min Chu; Jie Li; Hamid Kazemi; Furong Huang; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
104 | Where Are We in The Search for An Artificial Visual Cortex for Embodied Intelligence? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual ‘foundation models’ for Embodied AI. |
Arjun Majumdar; Karmesh Yadav; Sergio Arnaud; Jason Yecheng Ma; Claire Chen; Sneha Silwal; Aryan Jain; Vincent-Pierre Berges; Tingfan Wu; Jay Vakil; Pieter Abbeel; Jitendra Malik; Dhruv Batra; Yixin Lin; Oleksandr Maksymets; Aravind Rajeswaran; Franziska Meier; |
105 | In-Context Impersonation Reveals Large Language Models’ Strengths and Biases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. |
Leonard Salewski; Isabel Rio-Torto; Stephan Alaniz; Eric Schulz; Zeynep Akata; |
106 | Synthetic Pretraining for Few-shot Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the more challenging yet realistic setting of few-shot black-box optimization, where only a few labeled data points are available. |
Tung Nguyen; Sudhanshu Agrawal; Aditya Grover; |
107 | ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. |
Tung Nguyen; Jason Jewik; Hritik Bansal; Prakhar Sharma; Aditya Grover; |
108 | SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This hackability is so dire that blind models with no access to the image outperform state-of-the-art vision-language models. To remedy this rampant vulnerability, we introduce $\textit{SugarCrepe}$, a new benchmark for vision-language compositionality evaluation. |
Cheng-Yu Hsieh; Jieyu Zhang; Zixian Ma; Aniruddha Kembhavi; Ranjay Krishna; |
109 | Scalable 3D Captioning with Pretrained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects. |
Tiange Luo; Chris Rockwell; Honglak Lee; Justin Johnson; |
110 | Timewarp: Transferable Acceleration of Molecular Dynamics By Learning Time-Coarsened Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present *Timewarp*, an enhanced sampling method which uses a normalising flow as a proposal distribution in a Markov chain Monte Carlo method targeting the Boltzmann distribution. |
Leon Klein; Andrew Foong; Tor Fjelde; Bruno Mlodozeniec; Marc Brockschmidt; Sebastian Nowozin; Frank Noe; Ryota Tomioka; |
111 | Inference-Time Intervention: Eliciting Truthful Answers from A Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs). |
Kenneth Li; Oam Patel; Fernanda Viégas; Hanspeter Pfister; Martin Wattenberg; |
112 | DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. |
Sang Michael Xie; Hieu Pham; Xuanyi Dong; Nan Du; Hanxiao Liu; Yifeng Lu; Percy Liang; Quoc V Le; Tengyu Ma; Adams Wei Yu; |
113 | Neural Functional Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, constructing expressive and efficient neural functional architectures that can handle high-dimensional weight-space objects remains challenging. This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers and composes them into deep equivariant models called neural functional Transformers (NFTs). |
Allan Zhou; Kaien Yang; Yiding Jiang; Kaylee Burns; Winnie Xu; Samuel Sokota; J. Zico Kolter; Chelsea Finn; |
114 | Permutation Equivariant Neural Functionals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We approach the design of neural functionals through the lens of symmetry, in particular by focusing on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. We introduce a framework for building *permutation equivariant* neural functionals, whose architectures encode these symmetries as an inductive bias. |
Allan Zhou; Kaien Yang; Kaylee Burns; Adriano Cardace; Yiding Jiang; Samuel Sokota; J. Zico Kolter; Chelsea Finn; |
115 | Bridging Discrete and Backpropagation: Straight-Through and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. |
Liyuan Liu; Chengyu Dong; Xiaodong Liu; Bin Yu; Jianfeng Gao; |
116 | Symbolic Discovery of Optimization Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. |
Xiangning Chen; Chen Liang; Da Huang; Esteban Real; Kaiyuan Wang; Hieu Pham; Xuanyi Dong; Thang Luong; Cho-Jui Hsieh; Yifeng Lu; Quoc V Le; |
117 | Is Your Code Generated By ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus – a code synthesis benchmarking framework to rigorously evaluate the functional correctness of LLM-synthesized code. |
Jiawei Liu; Chunqiu Steven Xia; Yuyao Wang; LINGMING ZHANG; |
118 | Reward Imputation with Sketching for Contextual Batched Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient approach called Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching, which approximates the full-information feedbacks. |
Xiao Zhang; Ninglu Shao; Zihua Si; Jun Xu; Wenhan Wang; Hanjing Su; Ji-Rong Wen; |
119 | REASONER: An Explainable Recommendation Dataset with Comprehensive Labeling Ground Truths Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the past few years, while a lot of promising explainable recommender models have been proposed, the datasets used to evaluate them still suffer from several limitations, for example, the explanation ground truths are not labeled by the real users, the explanations are mostly single-modal and around only one aspect. To bridge these gaps, in this paper, we build a new explainable recommendation dataset, which, to our knowledge, is the first contribution that provides a large amount of real user labeled multi-modal and multi-aspect explaination ground truths. |
Xu Chen; Jingsen Zhang; Lei Wang; Quanyu Dai; Zhenhua Dong; Ruiming Tang; Rui Zhang; Li Chen; Xin Zhao; Ji-Rong Wen; |
120 | Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In CARP, we test four LLMs with CoT prompting, and find that they are all prone to make mistakes at the early steps of the solution, leading to incorrect answers. Based on this finding, we propose a new approach that can deliberate the reasoning steps with tool interfaces, namely \textbf{DELI}. |
Beichen Zhang; Kun Zhou; Xilin Wei; Xin Zhao; Jing Sha; Shijin Wang; Ji-Rong Wen; |
121 | OpenAGI: When LLM Meets Domain Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce \textbf{OpenAGI}, an open-source AGI research platform designed for multi-step, real-world tasks. |
Yingqiang Ge; Wenyue Hua; Kai Mei; jianchao ji; Juntao Tan; Shuyuan Xu; Zelong Li; Yongfeng Zhang; |
122 | Scaling in Depth: Unlocking Robustness Certification on ImageNet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates strategies for expanding certifiably robust training to larger, deeper models. |
Kai Hu; Andy Zou; Zifan Wang; Klas Leino; Matt Fredrikson; |
123 | Why Diffusion Models Memorize and How to Mitigate Copying Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. |
Gowthami Somepalli; Vasu Singla; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
124 | Big Little Transformer Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The inference latency is further exacerbated by autoregressive generative tasks, as models need to run iteratively to generate tokens sequentially without leveraging token-level parallelization. To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications. |
Sehoon Kim; Karttikeya Mangalam; Suhong Moon; Jitendra Malik; Michael Mahoney; Amir Gholami; Kurt Keutzer; |
125 | Alexa Arena: A User-Centric Interactive Platform for Embodied AI Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. |
Qiaozi Gao; Govindarajan Thattai; Suhaila Shakiah; Xiaofeng Gao; Shreyas Pansare; Vasu Sharma; Gaurav Sukhatme; Hangjie Shi; Bofei Yang; Desheng Zhang; Lucy Hu; Karthika Arumugam; Shui Hu; Matthew Wen; Dinakar Guthy; Shunan Chung; Rohan Khanna; Osman Ipek; Leslie Ball; Kate Bland; Heather Rocker; Michael Johnston; Reza Ghanadan; Dilek Hakkani-Tur; Prem Natarajan; |
126 | Grounding Neural Inference with Satisfiability Modulo Theories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a set of techniques for integrating Satisfiability Modulo Theories (SMT) solvers into the forward and backward passes of a deep network layer, called SMTLayer. |
Matt Fredrikson; Kaiji Lu; Somesh Jha; Saranya Vijayakumar; Vijay Ganesh; Zifan Wang; |
127 | 3D-LLM: Injecting The 3D World Into Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to inject the 3D world into large language models, and introduce a whole new family of 3D-LLMs. |
Yining Hong; Haoyu Zhen; Peihao Chen; Shuhong Zheng; Yilun Du; Zhenfang Chen; Chuang Gan; |
128 | $k$-Means Clustering with Distance-Based Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we initiate the study of Euclidean clustering with Distance-based privacy. |
Alessandro Epasto; Vahab Mirrokni; Shyam Narayanan; Peilin Zhong; |
129 | Multi-scale Diffusion Denoised Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the trade-off between accuracy and certified robustness of denoised smoothing: for example, we question on which representation of diffusion model would maximize the certified robustness of denoised smoothing. |
Jongheon Jeong; Jinwoo Shin; |
130 | Backprop-Free Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, different from the time-consuming forward-backward passes, we introduce a backprop-free fashion for dataset distillation with significantly improved efficiency. |
Songhua Liu; Xinchao Wang; |
131 | VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VisIT-Bench, a robust benchmark for diverse real-life vision-language instructions across 70 tasks, from recognition to reasoning. |
Yonatan Bitton; Hritik Bansal; Jack Hessel; Rulin Shao; Wanrong Zhu; Anas Awadalla; Josh Gardner; Rohan Taori; Ludwig Schmidt; |
132 | Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). |
Yingcong Li; Kartik Sreenivasan; Angeliki Giannou; Dimitris Papailiopoulos; Samet Oymak; |
133 | Jailbroken: How Does LLM Safety Training Fail? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models trained for safety and harmlessness remain susceptible to adversarial misuse, as evidenced by the prevalence of �jailbreak� attacks on early releases of ChatGPT that elicit undesired behavior. Going beyond recognition of the issue, we investigate why such attacks succeed and how they can be created. |
Alexander Wei; Nika Haghtalab; Jacob Steinhardt; |
134 | Fair Graph Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the proposed coherence metric, we introduce a framework for fair graph distillation using a bi-level optimization algorithm. |
Qizhang Feng; Zhimeng Jiang; Ruiquan Li; Yicheng Wang; Na Zou; Jiang Bian; Xia Hu; |
135 | Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. |
Neel Guha; Mayee Chen; Kush Bhatia; Azalia Mirhoseini; Frederic Sala; Christopher Ré; |
136 | On Evaluating Adversarial Robustness of Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose evaluating the robustness of open-source large VLMs in the most realistic and high-risk setting, where adversaries have only black-box system access and seek to deceive the model into returning the targeted responses. |
Yunqing Zhao; Tianyu Pang; Chao Du; Xiao Yang; Chongxuan LI; Ngai-Man (Man) Cheung; Min Lin; |
137 | The Learnability of In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup. |
Noam Wies; Yoav Levine; Amnon Shashua; |
138 | NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable systematic research progress on 3D reconstruction from casual image captures, we propose `NAVI’: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. |
Varun Jampani; Kevis-kokitsi Maninis; Andreas Engelhardt; Arjun Karpur; Karen Truong; Kyle Sargent; Stefan Popov; Andre Araujo; Ricardo Martin Brualla; Kaushal Patel; Daniel Vlasic; Vittorio Ferrari; Ameesh Makadia; Ce Liu; Yuanzhen Li; Howard Zhou; |
139 | Text Alignment Is An Efficient Unified Model for Massive NLP Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. |
Yuheng Zha; Yichi Yang; Ruichen Li; Zhiting Hu; |
140 | Meta-in-context Learning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. |
Julian Coda-Forno; Marcel Binz; Zeynep Akata; Matt Botvinick; Jane Wang; Eric Schulz; |
141 | Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, access to LLMs today is largely restricted to black-box text generation APIs; raw runtimes measured through this interface do not satisfy these desiderata: model providers can implement software and hardware optimizations orthogonal to the model, and shared infrastructure introduces performance contention. We propose a new metric for inference efficiency that puts models on equal footing as though they were served on uniform hardware and software and without performance contention. |
Deepak Narayanan; Keshav Santhanam; Peter Henderson; Rishi Bommasani; Tony Lee; Percy Liang; |
142 | Lexinvariant Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: First, we prove that we can construct a lexinvariant LM to converge to the true language model at a uniform rate that is polynomial in terms of the context length, with a constant factor that is sublinear in the vocabulary size. Second, to build a lexinvariant LM, we simply encode tokens using random Gaussian vectors, such that each token maps to the same representation within each sequence but different representations across sequences. |
Qian Huang; Eric Zelikman; Sarah Chen; Yuhuai Wu; Gregory Valiant; Percy Liang; |
143 | Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce *ecosystem-level analysis*: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. |
Connor Toups; Rishi Bommasani; Kathleen Creel; Sarah Bana; Dan Jurafsky; Percy Liang; |
144 | Holistic Evaluation of Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing evaluations primarily focus on image-text alignment and quality. To address this limitation, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). |
Tony Lee; Michihiro Yasunaga; Chenlin Meng; Yifan Mai; Joon Sung Park; Agrim Gupta; Yunzhi Zhang; Deepak Narayanan; Hannah Teufel; Marco Bellagente; Minguk Kang; Taesung Park; Jure Leskovec; Jun-Yan Zhu; Fei-Fei Li; Jiajun Wu; Stefano Ermon; Percy Liang; |
145 | INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. |
Shih-Cheng Huang; Zepeng Huo; Ethan Steinberg; Chia-Chun Chiang; Curtis Langlotz; Matthew Lungren; Serena Yeung; Nigam Shah; Jason Fries; |
146 | Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It remains a challenging task since (1) it is hard to distinguish concealed objects from the background due to the intrinsic similarity and (2) the sparsely-annotated training data only provide weak supervision for model learning. In this paper, we propose a new WSCOS method to address these two challenges. |
Chunming He; Kai Li; Yachao Zhang; Guoxia Xu; Longxiang Tang; Yulun Zhang; Zhenhua Guo; Xiu Li; |
147 | Data Portraits: Recording Foundation Model Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even while these models are now key in AI system building, it can be difficult to answer the straightforward question: has the model already encountered a given example during training? We therefore propose a widespread adoption of Data Portraits: artifacts that record training data and allow for downstream inspection. |
Marc Marone; Benjamin Van Durme; |
148 | MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. |
Zeyuan Ma; Hongshu Guo; Jiacheng Chen; Zhenrui Li; Guojun Peng; Yue-Jiao Gong; Yining Ma; Zhiguang Cao; |
149 | Mirror Diffusion Models for Constrained and Watermarked Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. |
Guan-Horng Liu; Tianrong Chen; Evangelos Theodorou; Molei Tao; |
150 | Towards Label-free Scene Understanding By Vision Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data. |
Runnan Chen; Youquan Liu; Lingdong Kong; Nenglun Chen; Xinge ZHU; Yuexin Ma; Tongliang Liu; Wenping Wang; |
151 | Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a systematic and efficient method to expedite the annotation process for organ segmentation. |
Chongyu Qu; Tiezheng Zhang; Hualin Qiao; jie liu; Yucheng Tang; Alan Yuille; Zongwei Zhou; |
152 | Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. |
Ibrahim Alabdulmohsin; Lucas Beyer; Alexander Kolesnikov; Xiaohua Zhai; |
153 | The Impact of Positional Encoding on Length Generalization in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5’s Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE). |
Amirhossein Kazemnejad; Inkit Padhi; Karthikeyan Natesan Ramamurthy; Payel Das; Siva Reddy; |
154 | Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called *dual pseudo training* (DPT), built upon strong semi-supervised learners and diffusion models. |
Zebin You; Yong Zhong; Fan Bao; Jiacheng Sun; Chongxuan LI; Jun Zhu; |
155 | VPP: Efficient Universal 3D Generation Via Voxel-Point Progressive Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the characteristics of different representations, we propose VPP, a voxel-point progressive representation for both efficient and universal 3D generation. |
Zekun Qi; Muzhou Yu; Runpei Dong; Kaisheng Ma; |
156 | Large Language Models Implicitly Learn to Straighten Neural Sentence Trajectories to Construct A Predictive Representation of Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We quantify straightness using a1-dimensional curvature metric, and provide support for the trajectory straighteninghypothesis across four results: i) In trained models, the curvature progressivelydecreases from the first to the middle layers of the network. |
Eghbal Hosseini; Evelina Fedorenko; |
157 | (Un)interpretability of Transformers: A Case Study with Dyck Grammars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, through a combination of theoretical results and carefully controlled experiments on synthetic data, we take a critical viewof methods that exclusively focus on individual parts of the model, rather than consider the network as a whole. |
Kaiyue Wen; Yuchen Li; Bingbin Liu; Andrej Risteski; |
158 | Learning in The Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We consider the learning of a single-index target function $f_*: \mathbb{R}^d\to\mathbb{R}$ under spiked covariance data: $f_*(\boldsymbol{x}) = … |
Jimmy Ba; Murat Erdogdu; Taiji Suzuki; Zhichao Wang; Denny Wu; |
159 | Meet in The Middle: A New Pre-training Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce “Meet in the Middle” (MIM) a new pre-training paradigm that improves data efficiency by training in two directions, left-to-right and right-to-left, and encouraging the respective modelsto agree on their token distribution for each position. |
Anh Nguyen; Nikos Karampatziakis; Weizhu Chen; |
160 | Swap Agnostic Learning, or Characterizing Omniprediction Via Multicalibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce and study the notion of Swap Agnostic Learning. |
Parikshit Gopalan; Michael Kim; Omer Reingold; |
161 | DataComp: In Search of The Next Generation of Multimodal Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. |
Samir Yitzhak Gadre; Gabriel Ilharco; Alex Fang; Jonathan Hayase; Georgios Smyrnis; Thao Nguyen; Ryan Marten; Mitchell Wortsman; Dhruba Ghosh; Jieyu Zhang; Eyal Orgad; Rahim Entezari; Giannis Daras; Sarah Pratt; Vivek Ramanujan; Yonatan Bitton; Kalyani Marathe; Stephen Mussmann; Richard Vencu; Mehdi Cherti; Ranjay Krishna; Pang Wei Koh; Olga Saukh; Alexander Ratner; Shuran Song; Hannaneh Hajishirzi; Ali Farhadi; Romain Beaumont; Sewoong Oh; Alex Dimakis; Jenia Jitsev; Yair Carmon; Vaishaal Shankar; Ludwig Schmidt; |
162 | Benchmarking Distribution Shift in Tabular Data with TableShift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a consequence, the robustness of tabular models to distribution shift is poorly understood. To address this issue, we introduce TableShift, a distribution shift benchmark for tabular data. |
Josh Gardner; Zoran Popovic; Ludwig Schmidt; |
163 | GenEval: An Object-focused Framework for Evaluating Text-to-image Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce GenEval, an object-focused framework to evaluate compositional image properties such as object co-occurrence, position, count, and color. |
Dhruba Ghosh; Hannaneh Hajishirzi; Ludwig Schmidt; |
164 | AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). |
Tong Wu; Zhihao Fan; Xiao Liu; Yeyun Gong; yelong shen; Jian Jiao; Hai-Tao Zheng; Juntao Li; zhongyu wei; Jian Guo; Nan Duan; Weizhu Chen; |
165 | On The Connection Between Pre-training Data Diversity and Fine-tuning Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. |
Vivek Ramanujan; Thao Nguyen; Sewoong Oh; Ali Farhadi; Ludwig Schmidt; |
166 | Improving Multimodal Datasets with Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work focuses on caption quality as one major source of noise, and studies the effectiveness of generated captions in increasing the utility of web-scraped datapoints with nondescript text. |
Thao Nguyen; Samir Yitzhak Gadre; Gabriel Ilharco; Sewoong Oh; Ludwig Schmidt; |
167 | Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose $\texttt{ARCO}$, a semi-supervised contrastive learning (CL) framework with stratified group theory for medical image segmentation. |
Chenyu You; Weicheng Dai; Yifei Min; Fenglin Liu; David Clifton; S. Kevin Zhou; Lawrence Staib; James Duncan; |
168 | Focused Transformer: Contrastive Training for Context Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We pinpoint a key challenge, referred to as the distraction issue, where keys associated with distinct semantic values may overlap, making them challenging to differentiate. To address this issue, we propose the Focused Transformer (FoT), a method that utilizes a training process inspired by contrastive learning. |
Szymon Tworkowski; Konrad Staniszewski; Mikołaj Pacek; Yuhuai Wu; Henryk Michalewski; Piotr Miłoś; |
169 | EmbodiedGPT: Vision-Language Pre-Training Via Embodied Chain of Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities. |
Yao Mu; Qinglong Zhang; Mengkang Hu; Wenhai Wang; Mingyu Ding; Jun Jin; Bin Wang; Jifeng Dai; Yu Qiao; Ping Luo; |
170 | Any-to-Any Generation Via Composable Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities. |
Zineng Tang; Ziyi Yang; Chenguang Zhu; Michael Zeng; Mohit Bansal; |
171 | Generator Born from Classifier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make a bold attempt toward an ambitious task: given a pre-trained classifier, we aim to reconstruct an image generator, without relying on any data samples. |
Runpeng Yu; Xinchao Wang; |
172 | How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the advancements in previous efforts, challenges remain due to various noises in the perception procedure, including communication redundancy, transmission delay, and collaboration heterogeneity. To tackle these issues, we propose How2comm, a collaborative perception framework that seeks a trade-off between perception performance and communication bandwidth. |
Dingkang Yang; Kun Yang; Yuzheng Wang; Jing Liu; Zhi Xu; Peng Zhai; Lihua Zhang; Rongbin Yin; |
173 | Improving CLIP Training with Language Rewrites Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Language augmented CLIP (LaCLIP), a simple yet highly effective approach to enhance CLIP training through language rewrites. |
Lijie Fan; Dilip Krishnan; Phillip Isola; Dina Katabi; Yonglong Tian; |
174 | From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. |
Weiwen Xu; Xin Li; Wenxuan Zhang; Meng Zhou; Wai Lam; Luo Si; Lidong Bing; |
175 | PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PrimDiffusion, the first diffusion-based framework for 3D human generation. |
Zhaoxi Chen; Fangzhou Hong; Haiyi Mei; Guangcong Wang; Lei Yang; Ziwei Liu; |
176 | FELM: Benchmarking Factuality Evaluation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as FELM. |
shiqi chen; Yiran Zhao; Jinghan Zhang; I-Chun Chern; Siyang Gao; Pengfei Liu; Junxian He; |
177 | Training-Free Composition of Parameter-Efficient Modules with Arithmetic Operation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities. |
Jinghan Zhang; shiqi chen; Junteng Liu; Junxian He; |
178 | DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These event markers may not be readily available or could be challenging to acquire during real-time inference, and the sequence of eye fixations may not align with the order of spoken words. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. |
Yiqun Duan; Charles Chau; Zhen Wang; Yu-Kai Wang; Chin-teng Lin; |
179 | Textually Pretrained Speech Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. |
Michael Hassid; Tal Remez; Tu Anh Nguyen; Itai Gat; Alexis CONNEAU; Felix Kreuk; Jade Copet; Alexandre Defossez; Gabriel Synnaeve; Emmanuel Dupoux; Roy Schwartz; Yossi Adi; |
180 | EvoPrompting: Language Models for Code-Level Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the recent impressive accomplishments of language models (LMs) for code generation, we explore the use of LMs as general adaptive mutation and crossover operators for an evolutionary neural architecture search (NAS) algorithm. |
Angelica Chen; David Dohan; David So; |
181 | Mitigating Test-Time Bias for Fair Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: So motivated, we introduce a straightforward technique, Post-hoc Bias Mitigation (PBM), that post-processes the outputs from the pre-trained vision-language model. |
Fanjie Kong; Shuai Yuan; Weituo Hao; Ricardo Henao; |
182 | Red Teaming Deep Neural Networks with Feature Synthesis Tools Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key insight is that we can train models that respond to specific triggers (e.g., a specific patch inserted into an image) with specific outputs (i.e. a label) and then evaluate interpretability tools based on whether they help humans identify these triggers. |
Stephen Casper; Tong Bu; Yuxiao Li; Jiawei Li; Kevin Zhang; Kaivalya Hariharan; Dylan Hadfield-Menell; |
183 | UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. |
Wenliang Zhao; Lujia Bai; Yongming Rao; Jie Zhou; Jiwen Lu; |
184 | (S)GD Over Diagonal Linear Networks: Implicit Bias, Large Stepsizes and Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2$-layer diagonal linear networks. |
Mathieu Even; Scott Pesme; Suriya Gunasekar; Nicolas Flammarion; |
185 | RAPHAEL: Text-to-Image Generation Via Large Mixture of Diffusion Paths Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. |
Zeyue Xue; Guanglu Song; Qiushan Guo; Boxiao Liu; Zhuofan Zong; Yu Liu; Ping Luo; |
186 | Controlling Text-to-Image Diffusion By Orthogonal Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method — Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. |
Zeju Qiu; Weiyang Liu; Haiwen Feng; Yuxuan Xue; Yao Feng; Zhen Liu; Dan Zhang; Adrian Weller; Bernhard Schölkopf; |
187 | ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. |
Chun-Han Yao; Amit Raj; Wei-Chih Hung; Michael Rubinstein; Yuanzhen Li; Ming-Hsuan Yang; Varun Jampani; |
188 | Continuous-Time Functional Diffusion Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. |
Giulio Franzese; Giulio Corallo; Simone Rossi; Markus Heinonen; Maurizio Filippone; Pietro Michiardi; |
189 | Window-Based Distribution Shift Detection for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we study the case of monitoring the healthy operation of a deep neural network (DNN) receiving a stream of data, with the aim of detecting input distributional deviations over which the quality of the network’s predictions is potentially damaged. |
Guy Bar Shalom; Yonatan Geifman; Ran El-Yaniv; |
190 | SEGA: Instructing Text-to-Image Models Using Semantic Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. |
Manuel Brack; Felix Friedrich; Dominik Hintersdorf; Lukas Struppek; Patrick Schramowski; Kristian Kersting; |
191 | Recommender Systems with Generative Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates directly. |
Shashank Rajput; Nikhil Mehta; Anima Singh; Raghunandan Hulikal Keshavan; Trung Vu; Lukasz Heldt; Lichan Hong; Yi Tay; Vinh Tran; Jonah Samost; Maciej Kula; Ed Chi; Mahesh Sathiamoorthy; |
192 | Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training By Diminishing Bias Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel framework named Unifying Cross-Lingual Medical Vision-Language Pre-Training (\textbf{Med-UniC}), designed to integrate multi-modal medical data from the two most prevalent languages, English and Spanish. |
Zhongwei Wan; Che Liu; Mi Zhang; Jie Fu; Benyou Wang; Sibo Cheng; Lei Ma; César Quilodrán-Casas; Rossella Arcucci; |
193 | VidChapters-7M: Video Chapters at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This important topic has been understudied due to the lack of publicly released datasets. To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total. |
Antoine Yang; Arsha Nagrani; Ivan Laptev; Josef Sivic; Cordelia Schmid; |
194 | Provable Convergence Guarantees for Black-box Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While black-box variational inference is widely used, there is no proof that its stochastic optimization succeeds. We suggest this is due to a theoretical gap in existing stochastic optimization proofs—namely the challenge of gradient estimators with unusual noise bounds,and a composite non-smooth objective. |
Justin Domke; Robert Gower; Guillaume Garrigos; |
195 | PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces PIXIU, a comprehensive framework including the first financial LLM based on fine-tuning LLaMA with instruction data, the first instruction data with 128K data samples to support the fine-tuning, and an evaluation benchmark with 8 tasks and 15 datasets. |
Qianqian Xie; Weiguang Han; Xiao Zhang; Yanzhao Lai; Min Peng; Alejandro Lopez-Lira; Jimin Huang; |
196 | Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of planning in Minecraft, a popular, democratized yet challenging open-ended environment for developing multi-task embodied agents. |
Zihao Wang; Shaofei Cai; Guanzhou Chen; Anji Liu; Xiaojian (Shawn) Ma; Yitao Liang; |
197 | Learning Generalizable Agents Via Saliency-guided Features Decorrelation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. |
Sili Huang; Yanchao Sun; Jifeng Hu; Siyuan Guo; Bo Yang; Hechang Chen; Yi Chang; Lichao Sun; |
198 | Decision Stacks: Flexible Reinforcement Learning Via Modular Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules. |
Siyan Zhao; Aditya Grover; |
199 | Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we analyze the sufficient conditions to guarantee fairness (i.e., low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target dataset for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group. |
Zhimeng Jiang; Xiaotian Han; Hongye Jin; Guanchu Wang; Rui Chen; Na Zou; Xia Hu; |
200 | Are Diffusion Models Vision-And-Language Reasoners? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. |
Benno Krojer; Elinor Poole-Dayan; Vikram Voleti; Chris Pal; Siva Reddy; |
201 | Fast Optimal Locally Private Mean Estimation Via Random Projections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new algorithmic framework, namely ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a $1+o(1)$-factor. |
Hilal Asi; Vitaly Feldman; Jelani Nelson; Huy Nguyen; Kunal Talwar; |
202 | A Metadata-Driven Approach to Understand Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a \emph{metadata-driven} approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. |
Ting Wei Li; Qiaozhu Mei; Jiaqi Ma; |
203 | An Information Theory Perspective on Variance-Invariance-Covariance Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an information-theoretic perspective on the VICReg objective. |
Ravid Shwartz-Ziv; Randall Balestriero; Kenji Kawaguchi; Tim G. J. Rudner; Yann LeCun; |
204 | A Generative Model of The Hippocampal Formation Trained with Theta Driven Local Learning Rules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce a biologically plausible model of the hippocampal formation tantamount to a Helmholtz machine that we apply to a temporal stream of inputs. |
Tom M George; Kimberly Stachenfeld; Caswell Barry; Claudia Clopath; Tomoki Fukai; |
205 | Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-intricate setting, i.e., generating intricate visual content from simple abstract text prompts. |
Shengqiong Wu; Hao Fei; Hanwang Zhang; Tat-Seng Chua; |
206 | DiffComplete: Diffusion-based Generative 3D Shape Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new diffusion-based approach for shape completion on 3D range scans. |
Ruihang Chu; Enze Xie; Shentong Mo; Zhenguo Li; Matthias Niessner; Chi-Wing Fu; Jiaya Jia; |
207 | Real-World Image Variation By Aligning Diffusion Inversion Chain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our investigation uncovers that this domain gap originates from a latents’ distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. |
Yuechen Zhang; Jinbo Xing; Eric Lo; Jiaya Jia; |
208 | Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models. |
Alvin Heng; Harold Soh; |
209 | Object-Centric Slot Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach. |
Jindong Jiang; Fei Deng; Gautam Singh; Sungjin Ahn; |
210 | Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, in this paper, we propose a new framework to leverage neural methods to answer complex logical queries based on an EVKG, which can satisfy not only traditional first-order logic constraints but also implicit logical constraints over eventualities concerning their occurrences and orders. |
Jiaxin Bai; Xin Liu; Weiqi Wang; Chen Luo; Yangqiu Song; |
211 | 4D Panoptic Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component. |
Jingkang Yang; Jun CEN; WENXUAN PENG; Shuai Liu; Fangzhou Hong; Xiangtai Li; Kaiyang Zhou; Qifeng Chen; Ziwei Liu; |
212 | How Far Can Camels Go? Exploring The State of Instruction Tuning on Open Resources Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. |
Yizhong Wang; Hamish Ivison; Pradeep Dasigi; Jack Hessel; Tushar Khot; Khyathi Chandu; David Wadden; Kelsey MacMillan; Noah Smith; Iz Beltagy; Hannaneh Hajishirzi; |
213 | Learning Mask-aware CLIP Representations for Zero-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This issue mainly relates to the fact that CLIP is trained with image-level supervision. To alleviate this issue, we propose a simple yet effective method, named Mask-aware Fine-tuning (MAFT). |
Siyu Jiao; Yunchao Wei; Yaowei Wang; Yao Zhao; Humphrey Shi; |
214 | Quantizable Transformers: Removing Outliers By Helping Attention Heads Do Nothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve the exact zeros needed in the attention matrix for a no-update, the input to the softmax is pushed to be larger and larger during training, causing outliers in other parts of the network. Based on these observations, we propose two simple (independent) modifications to the attention mechanism – _clipped softmax_ and _gated attention_. |
Yelysei Bondarenko; Markus Nagel; Tijmen Blankevoort; |
215 | Sharpness-Aware Minimization Leads to Low-Rank Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. |
Maksym Andriushchenko; Dara Bahri; Hossein Mobahi; Nicolas Flammarion; |
216 | Unified Segment-to-Segment Framework for Simultaneous Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified segment-tosegment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. |
Shaolei Zhang; Yang Feng; |
217 | A Hierarchical Spatial Transformer for Large Numbers of Point Samples in Continuous Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There are also works related to operator learning on numerical simulations in the continous space, but these methods often do not address the hierarchical spatial representation on irregular points. To fill this gap, this paper proposes a new hierarchical spatial transformer model for a large number of irregular point samples in continuous space. |
Wenchong He; Zhe Jiang; Tingsong Xiao; |
218 | What Indeed Can GPT Models Do in Chemistry? A Comprehensive Benchmark on Eight Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, rather than pursuing state-of-the-art performance, we aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain. |
Taicheng Guo; kehan Guo; Bozhao Nan; Zhenwen Liang; Zhichun Guo; Nitesh Chawla; Olaf Wiest; Xiangliang Zhang; |
219 | WalkLM: A Uniform Language Model Fine-tuning Framework for Attributed Graph Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a fundamentally different approach than GNNs, to simultaneously achieve deep joint modeling of complex attributes and flexible structures of real-world graphs and obtain unsupervised generic graph representations that are not limited to specific downstream predictions. |
Yanchao Tan; Zihao Zhou; Hang Lv; Weiming Liu; Carl Yang; |
220 | Emergent Correspondence from Image Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that correspondence emerges in diffusion models without any explicit supervision. |
Luming Tang; Menglin Jia; Qianqian Wang; Cheng Perng Phoo; Bharath Hariharan; |
221 | MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. |
Marco Bellagente; Hannah Teufel; Manuel Brack; Björn Deiseroth; Felix Friedrich; Constantin Eichenberg; Andrew Dai; Robert Baldock; Souradeep Nanda; Koen Oostermeijer; Andres Felipe Cruz-Salinas; Patrick Schramowski; Kristian Kersting; Samuel Weinbach; |
222 | Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes POMP, a prompt pre-training method for vision-language models. |
Shuhuai Ren; Aston Zhang; Yi Zhu; Shuai Zhang; Shuai Zheng; Mu Li; Alexander Smola; Xu Sun; |
223 | Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a semi-offline evaluation framework as an intermediate step between offline and online evaluation, where human users provide annotations of unobserved counterfactual trajectories. |
Shengpu Tang; Jenna Wiens; |
224 | PyNeRF: Pyramidal Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple modification to grid-based models by training model heads at different spatial grid resolutions. |
Haithem Turki; Michael Zollhöfer; Christian Richardt; Deva Ramanan; |
225 | Mixture Weight Estimation and Model Prediction in Multi-source Multi-target Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a problem of learning a model from multiple sources with the goal to performwell on a new target distribution. |
Yuyang Deng; Ilja Kuzborskij; Mehrdad Mahdavi; |
226 | H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Often, a large amount of transient state information, referred to as the $\mathsf{KV}$ $\mathsf{cache}$, is stored in GPU memory in addition to model parameters, scaling linearly with the sequence length and batch size. In this paper, we introduce a novel approach for implementing the $\mathsf{KV}$ $\mathsf{cache}$ which significantly reduces its memory footprint. |
Zhenyu Zhang; Ying Sheng; Tianyi Zhou; Tianlong Chen; Lianmin Zheng; Ruisi Cai; Zhao Song; Yuandong Tian; Christopher Ré; Clark Barrett; Zhangyang Atlas Wang; Beidi Chen; |
227 | Distributed Personalized Empirical Risk Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To learn personalized models at scale, we propose a distributed algorithm that replaces the standard model averaging with model shuffling to simultaneously optimize PERM objectives for all devices. |
Yuyang Deng; Mohammad Mahdi Kamani; Pouria Mahdavinia; Mehrdad Mahdavi; |
228 | RoboDepth: Robust Out-of-Distribution Depth Estimation Under Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Common corruptions, however, tend to occur in practical scenarios, especially for safety-critical applications like autonomous driving. To fill in this gap, we present a comprehensive robustness test suite dubbed RoboDepth consisting of 18 corruptions from three categories: i) weather and lighting conditions; ii) sensor failure and movement; and iii) data processing issues. |
Lingdong Kong; Shaoyuan Xie; Hanjiang Hu; Lai Xing Ng; Benoit Cottereau; Wei Tsang Ooi; |
229 | Real-World 3D Object Inverse Rendering Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new real-world 3D Object inverse Rendering Benchmark, dubbed 3D-ORB. |
Zhengfei Kuang; Yunzhi Zhang; Hong-Xing Yu; Samir Agarwala; Shangzhe Wu; Jiajun Wu; |
230 | What You See Is What You Read? Improving Text-Image Alignment Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study methods for automatic text-image alignment evaluation. |
Michal Yarom; Yonatan Bitton; Soravit Changpinyo; Roee Aharoni; Jonathan Herzig; Oran Lang; Eran Ofek; Idan Szpektor; |
231 | High-Fidelity Audio Compression with Improved RVQGAN Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. |
Rithesh Kumar; Prem Seetharaman; Alejandro Luebs; Ishaan Kumar; Kundan Kumar; |
232 | AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset. |
Jiakang Yuan; Bo Zhang; Xiangchao Yan; Botian Shi; Tao Chen; Yikang LI; Yu Qiao; |
233 | Benchmarking Large Language Models on CMExam – A Comprehensive Chinese Medical Exam Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. |
Junling Liu; Peilin Zhou; Yining Hua; Dading Chong; Zhongyu Tian; Andrew Liu; Helin Wang; Chenyu You; Zhenhua Guo; LEI ZHU; Michael Li; |
234 | LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation. |
Bo Liu; Yifeng Zhu; Chongkai Gao; Yihao Feng; Qiang Liu; Yuke Zhu; Peter Stone; |
235 | ELDEN: Exploration Via Local Dependencies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent’s actions may change the state of one factor that, in order, may affect the state of another factor. |
Zizhao Wang; Jiaheng Hu; Roberto Martín-Martín; Peter Stone; |
236 | Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. |
Lifan Yuan; Yangyi Chen; Ganqu Cui; Hongcheng Gao; FangYuan Zou; Xingyi Cheng; Heng Ji; Zhiyuan Liu; Maosong Sun; |
237 | Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called \sas, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. |
Zirui Liu; Guanchu Wang; Shaochen (Henry) Zhong; Zhaozhuo Xu; Daochen Zha; Ruixiang Tang; Zhimeng Jiang; Kaixiong Zhou; Vipin Chaudhary; Shuai Xu; Xia Hu; |
238 | TART: A Plug-and-play Transformer Module for Task-agnostic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and, as a proof of concept, propose TART which generically improves an LLM’s reasoning abilities using a synthetically trained reasoning module. |
Kush Bhatia; Avanika Narayan; Christopher De Sa; Christopher Ré; |
239 | Skill-it! A Data-driven Skills Framework for Understanding and Training Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for learning skills more quickly for both continual pre-training and fine-tuning regimes, where we aim to learn multiple skills in the former and an individual skill in the latter. |
Mayee Chen; Nicholas Roberts; Kush Bhatia; Jue WANG; Ce Zhang; Frederic Sala; Christopher Ré; |
240 | A Case for Reframing Automated Medical Image Classification As Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent work has drastically reduced the cost of training segmentation networks. In light of this recent work, we reexamine the choice of training classification vs. segmentation models. |
Sarah Hooper; Mayee Chen; Khaled Saab; Kush Bhatia; Curtis Langlotz; Christopher Ré; |
241 | Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we explore Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension. |
Dan Fu; Jessica Grogan; Isys Johnson; Simran Arora; Evan Sabri Eyuboglu; Armin Thomas; Benjamin Spector; Michael Poli; Atri Rudra; Christopher Ré; |
242 | Have It Your Way: Individualized Privacy Assignment for DP-SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets. |
Franziska Boenisch; Christopher Mühl; Adam Dziedzic; Roy Rinberg; Nicolas Papernot; |
243 | On Efficient Training Algorithms For Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit three algorithms: layer stacking, layer dropping, and selective backpropagation. |
Jean Kaddour; Oscar Key; Piotr Nawrot; Pasquale Minervini; Matt Kusner; |
244 | Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we make the first attempt to evaluate the robustness of NR-VQA models against adversarial attacks, and propose a patch-based random search method for black-box attack. |
Aoxiang Zhang; Yu Ran; Weixuan Tang; Yuan-Gen Wang; |
245 | Deep Language Networks: Joint Prompt Training of Stacked LLMs Using Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By viewing large language models (LLMs) as stochastic layers in a deep network, where the tunable parameters are the prompts at each layer, we chain multiple LLMs, feeding the output of the one at layer $l$ to the one at layer $l+1$, jointly training them using variational inference. |
Alessandro Sordoni; Eric Yuan; Marc-Alexandre Côté; Matheus Pereira; Adam Trischler; Ziang Xiao; Arian Hosseini; Friederike Niedtner; Nicolas Le Roux; |
246 | Benchmarking Robustness to Adversarial Image Obfuscations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reach this goal, these malicious actors may obfuscate policy violating images (e.g., overlay harmful images by carefully selected benign images or visual patterns) to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. |
Florian Stimberg; Ayan Chakrabarti; Chun-Ta Lu; Hussein Hazimeh; Otilia Stretcu; Wei Qiao; Yintao Liu; Merve Kaya; Cyrus Rashtchian; Ariel Fuxman; Mehmet Tek; Sven Gowal; |
247 | M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. |
Wenxuan Zhang; Mahani Aljunied; Chang Gao; Yew Ken Chia; Lidong Bing; |
248 | Fine-grained Expressivity of Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, we provide a theoretical framework for graph and graphon similarity combining various topological variants of classical characterizations of the $1$-WL. |
Jan Böker; Ron Levie; Ningyuan Huang; Soledad Villar; Christopher Morris; |
249 | Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop MAE as a unified, modality-agnostic SSL framework. |
Huiwon Jang; Jihoon Tack; Daewon Choi; Jongheon Jeong; Jinwoo Shin; |
250 | Auditing Fairness By Betting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models. |
Ben Chugg; Santiago Cortes-Gomez; Bryan Wilder; Aaditya Ramdas; |
251 | Molecule Joint Auto-Encoding: Self-Supervised Learning of 2D and 3D Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a pretraining method for molecule joint auto-encoding (MoleculeJAE). |
weitao Du; Jiujiu Chen; Xuecang Zhang; Zhi-Ming Ma; Shengchao Liu; |
252 | Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA). |
Gen Luo; Yiyi Zhou; Tianhe Ren; Shengxin Chen; Xiaoshuai Sun; Rongrong Ji; |
253 | LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance visual planning skills of LLMs. |
Weixi Feng; Wanrong Zhu; Tsu-Jui Fu; Varun Jampani; Arjun Akula; Xuehai He; S Basu; Xin Eric Wang; William Yang Wang; |
254 | Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks. |
Grace Luo; Lisa Dunlap; Dong Huk Park; Aleksander Holynski; Trevor Darrell; |
255 | Reverse Engineering Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Understanding the learned representation and underlying mechanisms of Self-Supervised Learning (SSL) often poses a challenge. In this paper, we ‘reverse engineer’ SSL, conducting an in-depth empirical analysis of its learned internal representations, encompassing diverse models, architectures, and hyperparameters. |
Ido Ben-Shaul; Ravid Shwartz-Ziv; Tomer Galanti; Shai Dekel; Yann LeCun; |
256 | Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach that integrates the two modalities from two branches, leveraging spike streams as auxiliary visual cues for guiding deblurring in high-speed motion scenes. |
Shiyan Chen; Jiyuan Zhang; Yajing Zheng; Zhaofei Yu; Tiejun Huang; |
257 | Adaptive Online Replanning with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how we may effectively replan with diffusion models. |
Siyuan Zhou; Yilun Du; Shun Zhang; Mengdi Xu; Yikang Shen; Wei Xiao; Dit-Yan Yeung; Chuang Gan; |
258 | DiffVL: Scaling Up Soft Body Manipulation Using Vision-Language Driven Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks — a combination of vision and natural language, given in multiple stages — that can be readily leveraged by a differential physics solver. |
Zhiao Huang; Feng Chen; Yewen Pu; Chunru Lin; Hao Su; Chuang Gan; |
259 | In-Context Learning Unlocked for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. |
Zhendong Wang; Yifan Jiang; Yadong Lu; yelong shen; Pengcheng He; Weizhu Chen; Zhangyang Atlas Wang; Mingyuan Zhou; |
260 | Beta Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce beta diffusion with multiplicative transitions over time as a novel method for generative modeling of range-bounded data supported over disjoint regions. |
Mingyuan Zhou; Tianqi Chen; Huangjie Zheng; Zhendong Wang; |
261 | Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. |
Zhendong Wang; Yifan Jiang; Huangjie Zheng; Peihao Wang; Pengcheng He; Zhangyang Atlas Wang; Weizhu Chen; Mingyuan Zhou; |
262 | When Do Neural Nets Outperform Boosted Trees on Tabular Data? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. |
Duncan McElfresh; Sujay Khandagale; Jonathan Valverde; Vishak Prasad C; Ganesh Ramakrishnan; Micah Goldblum; Colin White; |
263 | Ambient Diffusion: Learning Clean Distributions from Corrupted Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. |
Giannis Daras; Kulin Shah; Yuval Dagan; Aravind Gollakota; Alex Dimakis; Adam Klivans; |
264 | Neural Oscillators Are Universal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Coupled oscillators are being increasingly used as the basis of machine learning (ML) architectures, for instance in sequence modeling, graph representation learning and in physical neural networks that are used in analog ML devices. We introduce an abstract class of *neural oscillators* that encompasses these architectures and prove that neural oscillators are universal, i.e, they can approximate any continuous and casual operator mapping between time-varying functions, to desired accuracy. |
Samuel Lanthaler; T. Konstantin Rusch; Siddhartha Mishra; |
265 | Martingale Diffusion Models: Mitigating Sampling Drift By Learning to Be Consistent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, the standard training objective via Denoising Score Matching (DSM) is only designed to optimize over non-drifted data. To train on drifted data, we propose to enforce a \emph{Martingale} property (MP) which states that predictions of the model on its own generated data follow a Martingale, thus being consistent with the outputs that it generates. |
Giannis Daras; Yuval Dagan; Alex Dimakis; Constantinos Daskalakis; |
266 | ForecastPFN: Synthetically-Trained Zero-Shot Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a different approach and devise ForecastPFN, a zero-shot forecasting model that is trained purely on a novel synthetic data distribution. |
Samuel Dooley; Gurnoor Singh Khurana; Chirag Mohapatra; Siddartha V Naidu; Colin White; |
267 | Res-Tuning: A Flexible and Efficient Tuning Paradigm Via Unbinding Tuner from Backbone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally \textit{unbinds} tuners from the backbone. |
Zeyinzi Jiang; Chaojie Mao; Ziyuan Huang; Ao Ma; Yiliang Lv; Yujun Shen; Deli Zhao; Jingren Zhou; |
268 | Recurrent Hypernetworks Are Surprisingly SOTA in Meta-RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an extensive empirical investigation and suggest a method that works without the need for additional tuning. |
Jacob Beck; Risto Vuorio; Zheng Xiong; Shimon Whiteson; |
269 | Data Quality in Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time. |
Suneel Belkhale; Yuchen Cui; Dorsa Sadigh; |
270 | Inverse Preference Learning: Preference-based RL Without A Reward Function Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of using highly complex architectures, we develop a new and parameter-efficient algorithm, Inverse Preference Learning (IPL), specifically designed for learning from offline preference data. Our key insight is that for a fixed policy, the $Q$-function encodes all information about the reward function, effectively making them interchangeable. |
Joey Hejna; Dorsa Sadigh; |
271 | Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods struggle to map points and pixels to a shared latent space robustly since points and pixels have very different characteristics with patterns learned in different manners (MLP and CNN), and they also fail to construct supervision directly on the transformation since the PnP is non-differentiable, which leads to unstable registration results. To address these problems, we propose to learn a structured cross-modality latent space to represent pixel features and 3D features via a differentiable probabilistic PnP solver. |
Junsheng Zhou; Baorui Ma; Wenyuan Zhang; Yi Fang; Yu-Shen Liu; Zhizhong Han; |
272 | Understanding The Latent Space of Diffusion Models Through The Lens of Riemannian Geometry Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through analysis, we show that 1) the model focuses on low-frequency components early in the generative process and attunes to high-frequency details later.;2) At early timesteps, different samples share similar tangent spaces.; and 3) Simpler datasets that DMs trained on, the more consistent the tangent space for each timestep. |
Yong-Hyun Park; Mingi Kwon; Jaewoong Choi; Junghyo Jo; Youngjung Uh; |
273 | Tree-Rings Watermarks: Invisible Fingerprints for Diffusion Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. |
Yuxin Wen; John Kirchenbauer; Jonas Geiping; Tom Goldstein; |
274 | Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe an easy-to-use approach to automatically optimize hard text prompts through efficient gradient-based optimization. |
Yuxin Wen; Neel Jain; John Kirchenbauer; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
275 | Penalising The Biases in Norm Regularisation Enforces Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond simple intuitions, the relation between regularising parameters’ norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters’ norm required to represent a function is given by the total variation of its second derivative, weighted by a $\sqrt{1+x^2}$ factor. |
Etienne Boursier; Nicolas Flammarion; |
276 | Towards Revealing The Mystery Behind Chain of Thought: A Theoretical Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. |
Guhao Feng; Yuntian Gu; Haotian Ye; Bohang Zhang; Di He; Liwei Wang; |
277 | Efficient Diffusion Policies For Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. |
Bingyi Kang; Xiao Ma; Chao Du; Tianyu Pang; Shuicheng Yan; |
278 | Paxion: Patching Action Knowledge in Video-Language Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite recent video-language models’ (VidLM) impressive performance on various benchmark tasks, our diagnostic tasks reveal their surprising deficiency (near-random performance) in action knowledge, suggesting that current models rely on object recognition abilities as a shortcut for action understanding. To remedy this, we propose a novel framework, **Paxion**, along with a new **Discriminative Video Dynamics Modeling (DVDM)** objective. |
Zhenhailong Wang; Ansel Blume; Sha Li; Genglin Liu; Jaemin Cho; Zineng Tang; Mohit Bansal; Heng Ji; |
279 | Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we uncover two factors limiting the VL models’ compositional reasoning performance. |
Sivan Doveh; Assaf Arbelle; Sivan Harary; Roei Herzig; Donghyun Kim; Paola Cascante-Bonilla; Amit Alfassy; Rameswar Panda; Raja Giryes; Rogerio Feris; Shimon Ullman; Leonid Karlinsky; |
280 | Bounce: A Reliable Bayesian Optimization Algorithm for Combinatorial and Mixed Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill the need for a reliable algorithm for combinatorial and mixed spaces, this paper proposes Bounce that relies on a novel map of various variable types into nested embeddings of increasing dimensionality. |
Leonard Papenmeier; Luigi Nardi; Matthias Poloczek; |
281 | On The Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main obstruction is posed by the inherent identifiability nature of the PPCA model resulting from the rotational symmetry of the parameterization. To resolve this ambiguity, we propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space. |
Arghya Datta; Sayak Chakrabarty; |
282 | Efficient Neural Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present **M**e**L**o**D**y (**M** for music; **L** for LM; **D** for diffusion), an LM-guided diffusion model that generates music audios of state-of-the-art quality meanwhile reducing 95.7\% or 99.6\% forward passes in MusicLM, respectively, for sampling 10s or 30s music. |
Max W. Y. Lam; Qiao Tian; Tang Li; Zongyu Yin; Siyuan Feng; Ming Tu; Yuliang Ji; Rui Xia; Mingbo Ma; Xuchen Song; Jitong Chen; Wang Yuping; Yuxuan Wang; |
283 | Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that characterizes user intents by building attribute transition graphs and matching attribute patterns. |
Xin Liu; Zheng Li; Yifan Gao; Jingfeng Yang; Tianyu Cao; Zhengyang Wang; Bing Yin; Yangqiu Song; |
284 | Distribution-Free Statistical Dispersion Control for Societal Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We initiate the study of distribution-free control of statistical dispersion measures with societal implications and propose a simple yet flexible framework that allows us to handle a much richer class of statistical functionals beyond previous work. |
Zhun Deng; Thomas Zollo; Jake Snell; Toniann Pitassi; Richard Zemel; |
285 | Synthetic Experience Replay Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to flexibly upsample an agent’s collected experience. |
Cong Lu; Philip Ball; Yee Whye Teh; Jack Parker-Holder; |
286 | Benchmark of Machine Learning Force Fields for Semiconductor Simulations: Datasets, Metrics, and Comparative Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present comprehensive benchmark suite which consists of two semiconductor material datasets and 10 MLFF models with 6 evaluation metrics. |
Geonu Kim; Byunggook Na; Gunhee Kim; Hyuntae Cho; Seungjin Kang; Hee Sun Lee; Saerom Choi; Heejae Kim; Seungwon Lee; Yongdeok Kim; |
287 | Provable Benefits of Annealing for Estimating Normalizing Constants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: First, we show that using NCE is more efficient than the importance sampling estimator, but in the limit of infinitesimal path steps, the difference vanishes. Second, we find that using the geometric path brings down the estimation error from an exponential to a polynomial function of the parameter distance between the target and proposal distributions. |
Omar Chehab; Aapo Hyvarinen; Andrej Risteski; |
288 | MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given depth/pose. |
Shitao Tang; Fuyang Zhang; Jiacheng Chen; Peng Wang; Yasutaka Furukawa; |
289 | Provable Benefits of Score Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood—both computational and statistical—are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. |
Chirag Pabbaraju; Dhruv Rohatgi; Anish Prasad Sevekari; Holden Lee; Ankur Moitra; Andrej Risteski; |
290 | DreamHuman: Animatable 3D Avatars from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present \emph{DreamHuman}, a method to generate realistic animatable 3D human avatar models entirely from textual descriptions. |
Nikos Kolotouros; Thiemo Alldieck; Andrei Zanfir; Eduard Bazavan; Mihai Fieraru; Cristian Sminchisescu; |
291 | Deep Equilibrium Based Neural Operators for Steady-State PDEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory. |
Tanya Marwah; Ashwini Pokle; J. Zico Kolter; Zachary Lipton; Jianfeng Lu; Andrej Risteski; |
292 | Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the uncertain temporal asynchrony and limited communication conditions that are present in traffic environments can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework. |
Haibao Yu; Yingjuan Tang; Enze Xie; Jilei Mao; Ping Luo; Zaiqing Nie; |
293 | ProteinBench: Benchmarking Protein Design on Diverse Tasks, Models, and Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ProteinBench, a new benchmark for protein design, which comprises extended protein design tasks, integrated models, and diverse evaluation metrics. |
Zhangyang Gao; Cheng Tan; Yijie Zhang; Xingran Chen; Lirong Wu; Stan Z. Li; |
294 | Localized Symbolic Knowledge Distillation for Visual Commonsense Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build LocalizedVisual Commonsense model which allows users to specify (multiple) regions-as-input. |
Jae Sung Park; Jack Hessel; Khyathi Chandu; Paul Pu Liang; Ximing Lu; Qiuyuan Huang; Peter West; Jianfeng Gao; Ali Farhadi; Yejin Choi; |
295 | H3T: Efficient Integration of Memory Optimization and Parallelism for Large-scale Transformer Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework to automatically find an efficient integration of memory optimization and parallelism for High-Throughput Transformer Training (named H3T), which is rarely considered by existing efforts for training big Transformer-based models. |
Yuzhong Wang; Xu Han; Weilin Zhao; Guoyang Zeng; Zhiyuan Liu; Maosong Sun; |
296 | One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we answer the questions by fairly and comprehensively investigating the adversarial performance of 10+ popular structured pruning methods. |
Shaochen (Henry) Zhong; Zaichuan You; Jiamu Zhang; Sebastian Zhao; Zachary LeClaire; Zirui Liu; Vipin Chaudhary; Shuai Xu; Xia Hu; |
297 | Assumption Violations in Causal Discovery and The Robustness of Score Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Because causal discovery without further assumptions is an ill-posed problem, each algorithm comes with its own set of usually untestable assumptions, some of which are hard to meet in real datasets. Motivated by these considerations, this paper extensively benchmarks the empirical performance of recent causal discovery methods on observational _iid_ data generated under different background conditions, allowing for violations of the critical assumptions required by each selected approach. |
Francesco Montagna; Atalanti Mastakouri; Elias Eulig; Nicoletta Noceti; Lorenzo Rosasco; Dominik Janzing; Bryon Aragam; Francesco Locatello; |
298 | LLMScore: Unveiling The Power of Large Language Models in Text-to-Image Synthesis Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose LLMScore, a new framework that offers evaluation scores with multi-granularity compositionality. |
Yujie Lu; Xianjun Yang; Xiujun Li; Xin Eric Wang; William Yang Wang; |
299 | SpokenWOZ: A Large-Scale Speech-Text Dataset for Spoken Task-Oriented Dialogue in Multiple Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. |
Shuzheng Si; Wentao Ma; Haoyu Gao; Yuchuan Wu; Ting-En Lin; Yinpei Dai; Hangyu Li; Rui Yan; Fei Huang; Yongbin Li; |
300 | CP-SLAM: Collaborative Neural Point-based SLAM System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a collaborative implicit neural simultaneous localization and mapping (SLAM) system with RGB-D image sequences, which consists of complete front-end and back-end modules including odometry, loop detection, sub-map fusion, and global refinement. |
Jiarui Hu; Mao Mao; Hujun Bao; Guofeng Zhang; Zhaopeng Cui; |
301 | The Adversarial Consistency of Surrogate Risks for Binary Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the consistency of surrogate risks for robust binary classification. |
Natalie Frank; Jonathan Niles-Weed; |
302 | Unified 3D Segmenter As Prototypical Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ProtoSEG, a prototype-based model that unifies semantic, instance, and panoptic segmentation tasks. |
Zheyun Qin; Cheng Han; Lu Xiankai; Qifan Wang; Xiushan Nie; Yilong Yin; |
303 | Where Did I Come From? Origin Attribution of AI-Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods only focus on specific types of generative models and require additional procedures during the training phase or generation phase. This makes them unsuitable for pre-trained models that lack these specific operations and may impair generation quality. To address this problem, we first develop an alteration-free and model-agnostic origin attribution method via reverse-engineering on image generation models, i.e., inverting the input of a particular model for a specific image. |
Zhenting Wang; Chen Chen; Yi Zeng; Lingjuan Lyu; Shiqing Ma; |
304 | Private Estimation Algorithms for Stochastic Block Models and Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms. |
Hongjie Chen; Vincent Cohen-Addad; Tommaso d’Orsi; Alessandro Epasto; Jacob Imola; David Steurer; Stefan Tiegel; |
305 | Tailoring Self-Attention for Graph Via Rooted Subtrees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. |
Siyuan Huang; Yunchong Song; Jiayue Zhou; Zhouhan Lin; |
306 | Learning Transformer Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a procedure for training Transformers that are mechanistically interpretable by design. |
Dan Friedman; Alexander Wettig; Danqi Chen; |
307 | Binarized Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. |
Yichi Zhang; Ankush Garg; Yuan Cao; Lukasz Lew; Behrooz Ghorbani; Zhiru Zhang; Orhan Firat; |
308 | MADLAD-400: Monolingual And Document-Level Large Audited Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on CommonCrawl, spanning 419 languages. |
Sneha Kudugunta; Isaac Caswell; Biao Zhang; Xavier Garcia; Derrick Xin; Aditya Kusupati; Romi Stella; Ankur Bapna; Orhan Firat; |
309 | DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DASpeech, a non-autoregressive direct S2ST model which realizes both fast and high-quality S2ST. |
Qingkai Fang; Yan Zhou; Yang Feng; |
310 | Mixed Samples As Probes for Unsupervised Model Selection in Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MixVal, a novel target-only method that employs \textit{mixup} to synthesize in-between target samples for validation. |
Dapeng Hu; Jian Liang; Jun Hao Liew; Chuhui Xue; Song Bai; Xinchao Wang; |
311 | GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. |
Xin Li; Dongze Lian; Zhihe Lu; Jiawang Bai; Zhibo Chen; Xinchao Wang; |
312 | Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to conventional approaches, which primarily focus on the spatial domain exploration, we propose a paradigm shift toward the Fourier domain. |
Keji He; Chenyang Si; Zhihe Lu; Yan Huang; Liang Wang; Xinchao Wang; |
313 | Don’t Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on task-related texts improves the performance of fine-tuning (FT) in downstream tasks. |
Zhengxiang Shi; Aldo Lipani; |
314 | Can LLM Already Serve As A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present BIRD, a BIg benchmark for laRge-scale Database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains. |
Jinyang Li; Binyuan Hui; Ge Qu; Binhua Li; Jiaxi Yang; Bowen Li; Bailin Wang; Bowen Qin; Ruiying Geng; Nan Huo; Xuanhe Zhou; Ma Chenhao; Guoliang Li; Kevin Chang; Fei Huang; Reynold Cheng; Yongbin Li; |
315 | Pre-Training Protein Encoder Via Siamese Sequence-Structure Diffusion Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, inspired by the success of denoising diffusion models in generative tasks, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling. |
Zuobai Zhang; Minghao Xu; Aurelie Lozano; Vijil Chenthamarakshan; Payel Das; Jian Tang; |
316 | Hierarchical Randomized Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. |
Yan Scholten; Jan Schuchardt; Aleksandar Bojchevski; Stephan Günnemann; |
317 | A Comprehensive Study on Text-attributed Graphs: Benchmarking and Rethinking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CS-TAG, a comprehensive and diverse collection of challenging benchmark datasets for TAGs. |
Hao Yan; Chaozhuo Li; Ruosong Long; Chao Yan; Jianan Zhao; Wenwen Zhuang; Jun Yin; Peiyan Zhang; Weihao Han; Hao Sun; Weiwei Deng; Qi Zhang; Lichao Sun; Xing Xie; Senzhang Wang; |
318 | SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. |
Bill Yuchen Lin; Yicheng Fu; Karina Yang; Prithviraj (Raj) Ammanabrolu; Faeze Brahman; Shiyu Huang; Chandra Bhagavatula; Yejin Choi; Xiang Ren; |
319 | Simplicity Bias in 1-Hidden Layer Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we rigorously define as well as thoroughly establish SB for *one hidden layer* neural networks in the infinite width regime. |
Depen Morwani; Jatin Batra; Prateek Jain; Praneeth Netrapalli; |
320 | StyleDrop: Text-to-Image Synthesis of Any Style Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce *StyleDrop*, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. |
Kihyuk Sohn; Lu Jiang; Jarred Barber; Kimin Lee; Nataniel Ruiz; Dilip Krishnan; Huiwen Chang; Yuanzhen Li; Irfan Essa; Michael Rubinstein; Yuan Hao; Glenn Entis; Irina Blok; Daniel Castro Chin; |
321 | Information Maximization Perspective of Orthogonal Matching Pursuit with Applications to Explainable AI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our first contribution is to establish a fundamental connection between IP and OMP, where we prove that IP with random, continuous and linear queries “almost” reduces to OMP, with the difference being that IP selects atoms in order of \emph{normalized correlation gain}. |
Aditya Chattopadhyay; Ryan Pilgrim; Rene Vidal; |
322 | Language Model Tokenizers Introduce Unfairness Between Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. |
Aleksandar Petrov; Emanuele La Malfa; Philip Torr; Adel Bibi; |
323 | Direct Diffusion Bridge Using Data Consistency for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Several recent works have tried to alleviate this problem by building a diffusion process, directly bridging the clean and the corrupted for specific inverse problems. In this paper, we first unify these existing works under the name Direct Diffusion Bridges (DDB), showing that while motivated by different theories, the resulting algorithms only differ in the choice of parameters. |
Hyungjin Chung; Jeongsol Kim; Jong Chul Ye; |
324 | Transformer-based Planning for Symbolic Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these models primarily rely on supervised pretraining goals borrowed from text generation and overlook equation-specific objectives like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search into the transformer decoding process. |
Parshin Shojaee; Kazem Meidani; Amir Barati Farimani; Chandan Reddy; |
325 | Simplifying Neural Network Training Under Class Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, architecture size, pre-training, optimizer, and label smoothing, can achieve state-of-the-art performance without any specialized loss functions or samplers. |
Ravid Shwartz-Ziv; Micah Goldblum; Yucen Li; C. Bayan Bruss; Andrew Wilson; |
326 | The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that smooth fictitious play, a well-known learning model in game theory, can achieve both consensus and convergence to equilibrium in diverse multi-agent settings. |
Shuyue Hu; Harold Soh; Georgios Piliouras; |
327 | High-dimensional Asymptotics of Denoising Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. |
Hugo Cui; Lenka Zdeborová; |
328 | Clifford Group Equivariant Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{E}(n)$-equivariant networks. |
David Ruhe; Johannes Brandstetter; Patrick Forré; |
329 | Improving Few-Shot Generalization By Exploring and Exploiting Auxiliary Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. |
Alon Albalak; Colin Raffel; William Yang Wang; |
330 | Few-shot Generation Via Recalling The Episodic-Semantic Memory Like Human Being Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the cognitive systems of human being, in this work, we carefully design a variational structured memory module (VSM), which can simultaneously store both episodic and semantic memories to assistant existing generative models to efficiently recall memory during generation. |
Zhibin Duan; Zhiyi Lv; Chaojie Wang; Bo Chen; Bo An; Mingyuan Zhou; |
331 | Training Private Models That Know What They Don’t Know Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selective classifiers—that can abstain when they are unsure—under a differential privacy constraint. |
Stephan Rabanser; Anvith Thudi; Abhradeep Guha Thakurta; Krishnamurthy Dvijotham; Nicolas Papernot; |
332 | A Unified Conditional Framework for Diffusion-based Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a unified conditional framework based on diffusion models for image restoration. |
Yi Zhang; Xiaoyu Shi; Dasong Li; Xiaogang Wang; Jian Wang; Hongsheng Li; |
333 | Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge the gap, we delve into the reasons underpinning the success of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss (AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods. |
An Zhang; Leheng Sheng; Zhibo Cai; Xiang Wang; Tat-Seng Chua; |
334 | Feature-Learning Networks Are Consistent Across Widths At Realistic Scales Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. |
Nikhil Vyas; Alexander Atanasov; Blake Bordelon; Depen Morwani; Sabarish Sainathan; Cengiz Pehlevan; |
335 | UniControl: A Unified Diffusion Model for Controllable Visual Generation In The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. |
Can Qin; Shu Zhang; Ning Yu; Yihao Feng; Xinyi Yang; Yingbo Zhou; Huan Wang; Juan Carlos Niebles; Caiming Xiong; Silvio Savarese; Stefano Ermon; Yun Fu; Ran Xu; |
336 | Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. |
Tao Lei; Junwen Bai; Siddhartha Brahma; Joshua Ainslie; Kenton Lee; Yanqi Zhou; Nan Du; Vincent Zhao; Yuexin Wu; Bo Li; Yu Zhang; Ming-Wei Chang; |
337 | Two-Stage Learning to Defer with Multiple Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a two-stage scenario for learning to defer, which we argue is crucial in practice for many applications. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
338 | Structured Prediction with Stronger Consistency Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These loss functions readily lead to new structured prediction algorithms with stronger theoretical guarantees, based on their minimization. We describe efficient algorithms for minimizing several of these surrogate losses, including a new *structured logistic loss*. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
339 | $H$-Consistency Bounds: Characterization and Extensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present new and tight $H$-consistency bounds for both the family of constrained losses and that of comp-sum losses, which covers the familiar cross-entropy, or logistic loss applied to the outputs of a neural network. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
340 | Segment Anything in High Quality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM’s original promptable design, efficiency, and zero-shot generalizability. |
Lei Ke; Mingqiao Ye; Martin Danelljan; Yifan liu; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu; |
341 | Derandomized Novelty Detection with FDR Control Via Conformal E-values Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to make conformal inferences more stable by leveraging suitable conformal e-values instead of p-values to quantify statistical significance. |
Meshi Bashari; Amir Epstein; Yaniv Romano; Matteo Sesia; |
342 | Pengi: An Audio Language Model for Audio Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. |
Soham Deshmukh; Benjamin Elizalde; Rita Singh; Huaming Wang; |
343 | Module-wise Adaptive Distillation for Multimodality Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student’s performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently. |
Chen Liang; Jiahui Yu; Ming-Hsuan Yang; Matthew Brown; Yin Cui; Tuo Zhao; Boqing Gong; Tianyi Zhou; |
344 | Contrastive Lift: 3D Object Instance Segmentation By Slow-Fast Contrastive Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. |
Yash Bhalgat; Iro Laina; João Henriques; Andrea Vedaldi; Andrew Zisserman; |
345 | FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods try to solve this problem by learning a navigation policy, which captures semantic features of the goal image and observation image independently and lastly fuses them for predicting a sequence of navigation actions. However, these methods suffer from two major limitations. 1) They may miss detailed information in the goal image, and thus fail to reason the goal location. 2) More critically, it is hard to focus on the goal-relevant regions in the observation image, because they attempt to understand observation without goal conditioning. In this paper, we aim to overcome these limitations by designing a Fine-grained Goal Prompting (\sexyname) method for image-goal navigation. |
Xinyu Sun; Peihao Chen; Jugang Fan; Jian Chen; Thomas Li; Mingkui Tan; |
346 | Transfer Visual Prompt Generator Across LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate VPG transferability across LLMs for the first time, aiming to reduce the cost of VPG training. |
Ao Zhang; Hao Fei; Yuan Yao; Wei Ji; Li Li; Zhiyuan Liu; Tat-Seng Chua; |
347 | Diverse Conventions for Human-AI Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. |
Bidipta Sarkar; Andy Shih; Dorsa Sadigh; |
348 | Representational Strengths and Limitations of Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we establish both positive and negative results on the representation power of attention layers, with a focus on intrinsic complexity parameters such as width, depth, and embedding dimension. |
Clayton Sanford; Daniel Hsu; Matus Telgarsky; |
349 | Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. |
Haonan Duan; Adam Dziedzic; Nicolas Papernot; Franziska Boenisch; |
350 | JourneyDB: A Benchmark for Generative Image Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Synthetic images, in comparison to real data, encompass a higher level of diversity in terms of both content and style, thereby presenting significant challenges for the models to fully grasp. In light of this challenge, we introduce a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images within the context of multi-modal visual understanding. |
Junting Pan; Keqiang Sun; Yuying Ge; Hao Li; Haodong Duan; Xiaoshi Wu; Renrui Zhang; Aojun Zhou; Zipeng Qin; Yi Wang; Jifeng Dai; Yu Qiao; Hongsheng Li; |
351 | InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations). |
Junda Wu; Tong Yu; Rui Wang; Zhao Song; Ruiyi Zhang; Handong Zhao; Chaochao Lu; Shuai Li; Ricardo Henao; |
352 | Proximity-Informed Calibration for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the empirical findings, we propose ProCal, a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity. |
Miao Xiong; Ailin Deng; Pang Wei Koh; Jiaying Wu; Shen Li; Jianqing Xu; Bryan Hooi; |
353 | Context-TAP: Tacking Any Point Demands Context Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel framework Context-TAP, which effectively improves point trajectory accuracy by aggregating spatial context features in videos. |
Weikang Bian; Zhaoyang Huang; Xiaoyu Shi; Yitong Dong; Yijin Li; Hongsheng Li; |
354 | Generate What You Prefer: Reshaping Sequential Recommendation Via Guided Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sequential recommendation aims to recommend the next item that matches a user’s interest, based on a sequence of items he/she interacted with before. Scrutinizing previous studies, we can summarize a common learning-to-classify paradigm — given a positive item, a recommender model performs negative sampling to add negative items and learns to classify whether the user prefers them or not, based on his/her historical item sequence. |
Zhengyi Yang; Jiancan Wu; Zhicai Wang; Xiang Wang; Yancheng Yuan; Xiangnan He; |
355 | VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Variational Optimization with Conservative Eestimation algorithm (VOCE) to solve the problem of optimizing safety policies in the offline dataset. |
Jiayi Guan; Guang Chen; Jiaming Ji; Long Yang; ao zhou; Zhijun Li; changjun jiang; |
356 | Consensus and Subjectivity of Skin Tone Annotation for ML Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines the subjectivity of skin tone annotation through a series of annotation experiments using the Monk Skin Tone (MST) scale~\cite{Monk2022Monk}, a small pool of professional photographers, and a much larger pool of trained crowdsourced annotators. |
Candice Schumann; Gbolahan Olanubi; Auriel Wright; Ellis Monk; Courtney Heldreth; Susanna Ricco; |
357 | Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Mind-Video that learns spatiotemporal information from continuous fMRI data of the cerebral cortex progressively through masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model that incorporates network temporal inflation. |
Zijiao Chen; Jiaxin Qing; Juan Helen Zhou; |
358 | The Quantization Model of Neural Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. |
Eric Michaud; Ziming Liu; Uzay Girit; Max Tegmark; |
359 | Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Voicebox, the most versatile text-guided generative model for speech at scale. |
Matthew Le; Bowen Shi; Apoorv Vyas; Brian Karrer; Leda Sari; Yossi Adi; Vimal Manohar; Jay Mahadeokar; Wei-Ning Hsu; |
360 | CLadder: Assessing Causal Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Much of the existing work in NLP on causality focuses on understanding commonsense causal relationships, thus failing to assess whether the model’s reasoning abilities include *formal* causal inference. To address this, we propose a new NLP task, *causal inference in natural language*, inspired by the “causal inference engine” postulated by Judea Pearl. |
Zhijing Jin; Yuen Chen; Felix Leeb; Luigi Gresele; Ojasv Kamal; Zhiheng LYU; Kevin Blin; Fernando Gonzalez Adauto; Max Kleiman-Weiner; Mrinmaya Sachan; Bernhard Schölkopf; |
361 | OpenMask3D: Open-Vocabulary 3D Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While such a representation can be directly employed to perform semantic segmentation, existing methods have limitations in their ability to handle object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. |
Ayca Takmaz; Elisabetta Fedele; Robert Sumner; Marc Pollefeys; Federico Tombari; Francis Engelmann; |
362 | Block-State Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a hybrid layer named Block-State Transformer (*BST*), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. |
Jonathan Pilault; Mahan Fathi; Pierre-Luc Bacon; Chris Pal; Orhan Firat; Ross Goroshin; |
363 | BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents That Solve Fuzzy Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These comparisons serve as a fixed, preliminary leaderboard for evaluating newly-developed algorithms. To enable this comparison, we present a streamlined codebase for benchmarking new algorithms against the leaderboard. |
Stephanie Milani; Anssi Kanervisto; Karolis Ramanauskas; Sander Schulhoff; Brandon Houghton; Rohin Shah; |
364 | Random-Access Infinite Context Length for Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach that allows access to the complete context while retaining random-access flexibility, closely resembling running attention on the entire context. |
Amirkeivan Mohtashami; Martin Jaggi; |
365 | Unsupervised Image Denoising with Score Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach which is more general and applicable to complicated noise models. |
Yutong Xie; Mingze Yuan; Bin Dong; Quanzheng Li; |
366 | A Meta Learning Model for Scalable Hyperbolic Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method, Hyperbolic GRAph Meta Learner (H-GRAM), that learns transferable information from a set of support local subgraphs, in the form of hyperbolic meta gradients and label hyperbolic protonets, to enable faster learning over a query set of new tasks dealing with disjoint subgraphs. |
Nurendra Choudhary; Nikhil Rao; Chandan Reddy; |
367 | Learning to Augment Distributions for Out-of-distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose Distributional-Augmented OOD Learning (DAOL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution. |
Qizhou Wang; Zhen Fang; Yonggang Zhang; Feng Liu; Yixuan Li; Bo Han; |
368 | MAViL: Masked Audio-Video Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Masked Audio-Video Learners (MAViL) to learn audio-visual representations with three complementary forms of self-supervision: (1) reconstructing masked raw audio and video inputs, (2) intra-modal and inter-modal contrastive learning with masking, and (3) self-training to predict aligned and contextualized audio-video representations learned from the first two objectives. |
Po-Yao Huang; Vasu Sharma; Hu Xu; Chaitanya Ryali; haoqi fan; Yanghao Li; Shang-Wen Li; Gargi Ghosh; Jitendra Malik; Christoph Feichtenhofer; |
369 | Energy-based Attention for Associative Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel architecture, called the Energy transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks. |
Benjamin Hoover; Yuchen Liang; Bao Pham; Rameswar Panda; Hendrik Strobelt; Duen Horng Chau; Mohammed Zaki; Dmitry Krotov; |
370 | On Quantum Backpropagation, Information Reuse, and Cheating Measurement Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that achieving backpropagation scaling is impossible without access to multiple copies of a state. With this added ability, we introduce an algorithm with foundations in shadow tomography that matches backpropagation scaling in quantum resources while reducing classical auxiliary computational costs to open problems in shadow tomography. |
Amira Abbas; Robbie King; Hsin-Yuan Huang; William J. Huggins; Ramis Movassagh; Dar Gilboa; Jarrod McClean; |
371 | Noise-Adaptive Thompson Sampling for Linear Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study linear contextual bandits with heteroscedastic noise and propose the first noise-adaptive Thompson sampling-style algorithm that achieves a variance-dependent regret upper bound of $\widetilde O\Big(d^{3/2} + d^{3/2} \sqrt{\sum_{t=1}^T \sigma_t^2}\Big)$, where $d$ is the dimension of the context vectors and $\sigma_t^2$ is the variance of the reward in round $t$. |
Ruitu Xu; Yifei Min; Tianhao Wang; |
372 | Disentangled Wasserstein Autoencoder for Protein Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Identifying and modifying those functional sites is critical for protein engineering but computationally non-trivial, and requires significant domain knowledge. To automate this process from a data-driven perspective, we propose a disentangled Wasserstein autoencoder with an auxiliary classifier, which isolates the function-related patterns from the rest with theoretical guarantees. |
Tianxiao Li; Hongyu Guo; Filippo Grazioli; Mark Gerstein; Martin Renqiang Min; |
373 | DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives – including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. |
Boxin Wang; Weixin Chen; Hengzhi Pei; Chulin Xie; Mintong Kang; Chenhui Zhang; Chejian Xu; Zidi Xiong; Ritik Dutta; Rylan Schaeffer; Sang Truong; Simran Arora; Mantas Mazeika; Dan Hendrycks; Zinan Lin; Yu Cheng; Sanmi Koyejo; Dawn Song; Bo Li; |
374 | EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. |
Michael Wornow; Rahul Thapa; Ethan Steinberg; Jason Fries; Nigam Shah; |
375 | Multi-Head Adapter Routing for Cross-Task Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Polytropon [Ponti et al., 2023] ($\texttt{Poly}$) jointly learns an inventory of adapters and a *routing* function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. In this paper, we investigate the role that adapter routing plays in its success and design new variants based on our findings. |
Lucas Page-Caccia; Edoardo Maria Ponti; Zhan Su; Matheus Pereira; Nicolas Le Roux; Alessandro Sordoni; |
376 | Equivariant Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce equivariant flow matching, a new training objective for equivariant CNFs that is based on the recently proposed optimal transport flow matching. |
Leon Klein; Andreas Krämer; Frank Noe; |
377 | Interpretability at Scale: Identifying Causal Mechanisms in Alpaca Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters — an approach we call Boundless DAS. |
Zhengxuan Wu; Atticus Geiger; Christopher Potts; Noah Goodman; |
378 | How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in A Pre-trained Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. |
Michael Hanna; Ollie Liu; Alexandre Variengien; |
379 | Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose multidimensional backtracking, an extension of the backtracking line-search to find good diagonal preconditioners for smooth convex problems. |
Frederik Kunstner; Victor Sanches Portella; Mark Schmidt; Nicholas Harvey; |
380 | Deep Reinforcement Learning with Plasticity Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces plasticity injection, a minimalistic intervention that increases the network plasticity without changing the number of trainable parameters or biasing the predictions. |
Evgenii Nikishin; Junhyuk Oh; Georg Ostrovski; Clare Lyle; Razvan Pascanu; Will Dabney; Andre Barreto; |
381 | LANCE: Stress-testing Visual Models By Generating Language-guided Counterfactual Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). |
Viraj Prabhu; Sriram Yenamandra; Prithvijit Chattopadhyay; Judy Hoffman; |
382 | Domain Watermark: Effective and Harmless Dataset Copyright Protection Is Closed at Hand Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit backdoor-based dataset ownership verification (DOV), which is currently the only feasible approach to protect the copyright of open-source datasets. |
Junfeng Guo; Yiming Li; Lixu Wang; Shu-Tao Xia; Heng Huang; Cong Liu; Bo Li; |
383 | The Probability Flow ODE Is Provably Fast Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide the first polynomial-time convergence guarantees for the probabilistic flow ODE implementation (together with a corrector step) of score-based generative modeling. |
Sitan Chen; Sinho Chewi; Holden Lee; Yuanzhi Li; Jianfeng Lu; Adil Salim; |
384 | Towards Robust and Expressive Whole-body Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework to enhance the robustness of whole-body pose and shape estimation. |
Hui En Pang; Zhongang Cai; Lei Yang; Tianwei Zhang; Qingyi Tao; Zhonghua Wu; Ziwei Liu; |
385 | FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. |
Mingyuan Zhang; Huirong Li; Zhongang Cai; Jiawei Ren; Lei Yang; Ziwei Liu; |
386 | SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. |
Zhongang Cai; Wanqi Yin; Ailing Zeng; CHEN WEI; Qingping SUN; Wang Yanjun; Hui En Pang; Haiyi Mei; Mingyuan Zhang; Lei Zhang; Chen Change Loy; Lei Yang; Ziwei Liu; |
387 | Language Models Are Visual Reasoning Coordinators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning. |
Liangyu Chen; Bo Li; Sheng Shen; Jingkang Yang; Chunyuan Li; Kurt Keutzer; Trevor Darrell; Ziwei Liu; |
388 | Segment Any Point Cloud Sequences By Distilling Vision Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce ***Seal***, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. |
Youquan Liu; Lingdong Kong; Jun CEN; Runnan Chen; Wenwei Zhang; Liang Pan; Kai Chen; Ziwei Liu; |
389 | InsActor: Instruction-driven Physics-based Characters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present $\textbf{InsActor}$, a principled generative framework that leverages recent advancements in diffusion-based human motion models to produce instruction-driven animations of physics-based characters. |
Jiawei Ren; Mingyuan Zhang; Cunjun Yu; Xiao Ma; Liang Pan; Ziwei Liu; |
390 | Towards The Difficulty for A Deep Neural Network to Learn Concepts of Different Complexities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Each interactive concept is encoded by the DNN to represent the collaboration between a set of input variables. Therefore, in this study, we aim to theoretically explain that interactive concepts involving more input variables (i.e., more complex concepts) are more difficult to learn. |
Dongrui Liu; Huiqi Deng; Xu Cheng; Qihan Ren; Kangrui Wang; Quanshi Zhang; |
391 | CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Pruner (CAP), a new unstructured pruning framework which significantly pushes the compressibility limits for state-of-the-art architectures. |
Denis Kuznedelev; Eldar Kurtić; Elias Frantar; Dan Alistarh; |
392 | ZipLM: Inference-Aware Structured Pruning of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM. |
Eldar Kurtić; Elias Frantar; Dan Alistarh; |
393 | Knowledge Distillation Performs Partial Variance Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we shed new light on the inner workings of this method, by examining it from an optimization perspective. |
Mher Safaryan; Alexandra Peste; Dan Alistarh; |
394 | To Repeat or Not To Repeat: Insights from Scaling LLM Under Token-Crisis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we empirically investigate three key aspects under this approach. |
Fuzhao Xue; Yao Fu; Wangchunshu Zhou; Zangwei Zheng; Yang You; |
395 | Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs. |
Zangwei Zheng; Xiaozhe Ren; Fuzhao Xue; Yang Luo; Xin Jiang; Yang You; |
396 | Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, due to the rapidly evolving process of the field and the knowledge gap between science (e.g., physics, chemistry, \& biology) and machine learning communities, a benchmarking study on geometrical representation for such data has not been conducted. To address such an issue, in this paper, we first provide a unified view of the current symmetry-informed geometric methods, classifying them into three main categories: invariance, equivariance with spherical frame basis, and equivariance with vector frame basis. |
Shengchao Liu; weitao Du; Yanjing Li; Zhuoxinran Li; Zhiling Zheng; Chenru Duan; Zhi-Ming Ma; Omar Yaghi; Animashree Anandkumar; Christian Borgs; Jennifer Chayes; Hongyu Guo; Jian Tang; |
397 | Self-Chained Image-Language Model for Video Localization and Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although humans often find a video moment to focus on and rewind the moment to answer questions, training a query-aware video moment localizer often requires expensive annotations and high computational costs. To address this issue, we propose Self-Chained Video Localization-Answering (SeViLA), a novel framework that leverages a single image-language model (BLIP-2) to tackle both temporal keyframe localization and question answering on videos. |
Shoubin Yu; Jaemin Cho; Prateek Yadav; Mohit Bansal; |
398 | Can Language Models Teach? Teacher Explanations Improve Student Performance Via Theory of Mind Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Next, when the teacher is constrained by a budget, we decompose the teaching problem along two axes for better efficiency: (1) deciding when it is worth explaining a data point, and (2) understanding how the teacher should personalize explanations to better teach the student. We tackle both these problems by proposing a Theory of Mind approach, in which the teacher builds two few-shot mental models of the student. |
Swarnadeep Saha; Peter Hase; Mohit Bansal; |
399 | Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present OMIGA, a new offline multi-agent RL algorithm with implicit global-to-local value regularization. |
Xiangsen Wang; Haoran Xu; Yinan Zheng; Xianyuan Zhan; |
400 | Text-to-Image Diffusion Models Are Zero Shot Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, what knowledge their representations capture is not fully understood, and they have not been thoroughly explored on downstream tasks. We investigate diffusion models by proposing a method for evaluating them as zero-shot classifiers. |
Kevin Clark; Priyank Jaini; |
401 | MomentDiff: Generative Video Moment Retrieval from Random to Real Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To evaluate the influence of the temporal location biases, we propose two “anti-bias” datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom. |
Pandeng Li; Chen-Wei Xie; Hongtao Xie; Liming Zhao; Lei Zhang; Yun Zheng; Deli Zhao; Yongdong Zhang; |
402 | MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a hardware-algorithm co-design method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory, where we jointly design transformer architectures and construct the inference compiler to fit the memory resource constraint. |
Yinan Liang; Ziwei Wang; Xiuwei Xu; Yansong Tang; Jie Zhou; Jiwen Lu; |
403 | How Does Adaptive Optimization Impact Local Neural Network Geometry? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For iterate trajectories produced by running a generic optimization algorithm OPT, we introduce $R^{\text{OPT}}\_{\text{med}}$, a statistic that is analogous to the condition number of the loss Hessian evaluated at the iterates. |
Kaiqi Jiang; Dhruv Malik; Yuanzhi Li; |
404 | ResShift: Efficient Diffusion Model for Image Super-resolution By Residual Shifting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. |
Zongsheng Yue; Jianyi Wang; Chen Change Loy; |
405 | Rubik’s Cube: High-Order Channel Interactions with A Hierarchical Receptive Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of these methods, \emph{e.g.}, convolution and the FFN architecture of transformers, only take implicit advantage of the first-order channel interaction and have yet to fully tap into its potential for high-order modeling. To address this, our study delves into modeling channel-dimension relationships, and proposes a simple yet effective and efficient high-order channel-wise operator for image restoration. |
Naishan Zheng; man zhou; Chong Zhou; Chen Change Loy; |
406 | Guiding Diffusion Models for Versatile Face Restoration Via Partial Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce $\textit{partial guidance}$, a fresh perspective that is more adaptable to real-world degradations compared to existing works. |
Peiqing Yang; Shangchen Zhou; Qingyi Tao; Chen Change Loy; |
407 | Mutual Information Regularized Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset by directly constraining the policy improvement direction. |
Xiao Ma; Bingyi Kang; Zhongwen Xu; Min Lin; Shuicheng Yan; |
408 | Towards Self-Interpretable Graph-Level Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i.e., the vital subgraph that leads to the predictions. |
Yixin Liu; Kaize Ding; Qinghua Lu; Fuyi Li; Leo Yu Zhang; Shirui Pan; |
409 | Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we advocate a new Structure-Free Graph Condensation paradigm, named SFGC, to distill a large-scale graph into a small-scale graph node set without explicit graph structures, i.e., graph-free data. |
Xin Zheng; Miao Zhang; Chunyang Chen; Quoc Viet Hung Nguyen; Xingquan Zhu; Shirui Pan; |
410 | GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Evaluating the performance of graph neural networks (GNNs) is an essential task for practical GNN model deployment and serving, as deployed GNNs face significant performance uncertainty when inferring on unseen and unlabeled test graphs, due to mismatched training-test graph distributions. In this paper, we study a *new* problem, **GNN model evaluation**, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs, by precisely estimating its performance (e.g., node classification accuracy) on unseen graphs without labels. |
Xin Zheng; Miao Zhang; Chunyang Chen; Soheila Molaei; Chuan Zhou; Shirui Pan; |
411 | Open Visual Knowledge Extraction Via Relation-Oriented Multimodality Model Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. |
Hejie Cui; Xinyu Fang; Zihan Zhang; Ran Xu; Xuan Kan; Xin Liu; Manling Li; Yangqiu Song; Carl Yang; |
412 | Semantic Image Synthesis with Unconditional Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new approach, for reflecting user’s detailed guiding masks on a pretrained unconditional generator. |
JungWoo Chae; Hyunin Cho; Sooyeon Go; Kyungmook Choi; Youngjung Uh; |
413 | Autodecoding Latent 3D Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such data is scarce for 3D generation, prohibiting the learning of large-scale diffusion models for 3D synthesis. We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. |
Evangelos Ntavelis; Aliaksandr Siarohin; Kyle Olszewski; Chaoyang Wang; Luc V Gool; Sergey Tulyakov; |
414 | Your Representations Are in The Network: Composable and Parallel Adaptation for Large Scale Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a framework for transfer learning that efficiently adapts a large base-model by learning lightweight cross-attention modules attached to its intermediate activations. |
Yonatan Dukler; Alessandro Achille; Hao Yang; Varsha Vivek; Luca Zancato; Benjamin Bowman; Avinash Ravichandran; Charless Fowlkes; Ashwin Swaminathan; Stefano Soatto; |
415 | Optimal Cross-learning for Contextual Bandits with Unknown Context Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of designing contextual bandit algorithms in the “cross-learning” setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. |
Jon Schneider; Julian Zimmert; |
416 | Calibration By Distribution Matching: Trainable Kernel Calibration Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. |
Charles Marx; Sofian Zalouk; Stefano Ermon; |
417 | Learning to Compress Prompts with Gist Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we present gisting, which trains an LM to compress prompts into smaller sets of gist tokens which can be cached and reused for compute efficiency. |
Jesse Mu; Xiang Li; Noah Goodman; |
418 | Thinker: Learning to Plan and Act Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. |
Stephen Chung; Ivan Anokhin; David Krueger; |
419 | Bypassing The Simulator: Near-Optimal Adversarial Linear Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution. |
Haolin Liu; Chen-Yu Wei; Julian Zimmert; |
420 | Robust Mean Estimation Without Moments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the guarantees that they achieve in the heavy-tailed setting are weaker than those for sub-Gaussian distributions with known covariance. In this work, we show that such a tradeoff, between error guarantees and heavy-tails, is not necessary for symmetric distributions. |
Gleb Novikov; David Steurer; Stefan Tiegel; |
421 | Hardware Resilience Properties of Text-Guided Image Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. |
Syed Talal Wasim; Kabila Haile Soboka; Abdulrahman Mahmoud; Salman Khan; David Brooks; Gu-Yeon Wei; |
422 | Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as implicit topic models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LLM, then directly generalize the selected demonstrations to larger LLMs. |
Xinyi Wang; Wanrong Zhu; Michael Saxon; Mark Steyvers; William Yang Wang; |
423 | KuaiSim: A Comprehensive Simulator for Recommender Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing simulators have shown promising results but also have limitations such as simplified user feedback, lacking consistency with real-world data, the challenge of simulator evaluation, and difficulties in migration and expansion across RSs. To address these challenges, we propose KuaiSim, a comprehensive user environment that provides user feedback with multi-behavior and cross-session responses. |
Kesen Zhao; Shuchang Liu; Qingpeng Cai; Xiangyu Zhao; Ziru Liu; Dong Zheng; Peng Jiang; Kun Gai; |
424 | Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose an effective approach for topic modeling under the low-resource regime, the core of which is the adaptive generation of semantic matching word embeddings by integrating the contextual information of each task. |
Yishi Xu; Jianqiao Sun; Yudi Su; Xinyang Liu; Zhibin Duan; Bo Chen; Mingyuan Zhou; |
425 | Dynamically Masked Discriminator for GANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method for GANs from the viewpoint of online continual learning. |
Wentian Zhang; Haozhe Liu; Bing Li; Jinheng Xie; Yawen Huang; Yuexiang Li; Yefeng Zheng; Bernard Ghanem; |
426 | Preference-grounded Token-level Guidance for Language Model Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is, therefore, a *granularity mismatch* between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. |
Shentao Yang; Shujian Zhang; Congying Xia; Yihao Feng; Caiming Xiong; Mingyuan Zhou; |
427 | CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. |
Guohao Li; Hasan Hammoud; Hani Itani; Dmitrii Khizbullin; Bernard Ghanem; |
428 | Norm-guided Latent Space Exploration for Text-to-image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this metric and use it to further define centroids in the latent seed space. |
Dvir Samuel; Rami Ben-Ari; Nir Darshan; Haggai Maron; Gal Chechik; |
429 | A Randomized Approach for Tight Privacy Accounting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new differential privacy paradigm called estimate-verify-release (EVR), which tackles the challenges of providing a strict upper bound for the privacy parameter in DP compositions by converting an *estimate* of privacy parameter into a formal guarantee. |
Jiachen T. Wang; Saeed Mahloujifar; Tong Wu; Ruoxi Jia; Prateek Mittal; |
430 | Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset’s domains and augment the training data via language-guided image editing. |
Lisa Dunlap; Alyssa Umino; Han Zhang; Jiezhi Yang; Joseph Gonzalez; Trevor Darrell; |
431 | A Privacy-Friendly Approach to Data Valuation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical challenges in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). |
Jiachen T. Wang; Yuqing Zhu; Yu-Xiang Wang; Ruoxi Jia; Prateek Mittal; |
432 | An Inverse Scaling Law for CLIP Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. |
Xianhang Li; Zeyu Wang; Cihang Xie; |
433 | Robust Multi-Agent Reinforcement Learning Via Adversarial Regularization: Theoretical Foundation and Stable Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we show that we can gain robustness by controlling a policy’s Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Motivated by these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization. |
Alexander Bukharin; Yan Li; Yue Yu; Qingru Zhang; Zhehui Chen; Simiao Zuo; Chao Zhang; Songan Zhang; Tuo Zhao; |
434 | Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. |
Shenao Zhang; Boyi Liu; Zhaoran Wang; Tuo Zhao; |
435 | Subject-driven Text-to-Image Generation Via Apprenticeship Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with {in-context} learning. |
wenhu chen; Hexiang Hu; Yandong Li; Nataniel Ruiz; Xuhui Jia; Ming-Wei Chang; William Cohen; |
436 | Why Deep Models Often Cannot Beat Non-deep Counterparts on Molecular Property Prediction? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we benchmark 12 representative models (3 non-deep models and 9 deep models) on 15 molecule datasets. |
Jun Xia; Lecheng Zhang; Xiao Zhu; Yue Liu; Zhangyang Gao; Bozhen Hu; Cheng Tan; Jiangbin Zheng; Siyuan Li; Stan Z. Li; |
437 | Foundation Model Is Efficient Multimodal Multitask Model Selector Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although recent-advanced approaches employed lightweight metrics to measure models’ transferability, they often depend heavily on the prior knowledge of a single task, making them inapplicable in a multi-modal multi-task scenario. To tackle this issue, we propose an efficient multitask model selector (EMMS), which employs large-scale foundation models to transform diverse label formats such as categories, texts, and bounding boxes of different downstream tasks into a unified noisy label embedding. |
fanqing meng; Wenqi Shao; zhanglin peng; Chonghe Jiang; Kaipeng Zhang; Yu Qiao; Ping Luo; |
438 | Video-Mined Task Graphs for Keystep Recognition in Instructional Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos. |
Kumar Ashutosh; Santhosh Kumar Ramakrishnan; Triantafyllos Afouras; Kristen Grauman; |
439 | Augmentation-free Dense Contrastive Distillation for Efficient Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods heavily rely on data augmentation and memory buffer, which entail high computational resource demands when applying them to handle semantic segmentation that requires to preserve high-resolution feature maps for making dense pixel-wise predications. In order to alleviate this problem, we present Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD), a new contrastive distillation learning paradigm to train compact and accurate deep neural networks for semantic segmentation applications. |
Jiawei Fan; Chao Li; Xiaolong Liu; Meina Song; Anbang Yao; |
440 | LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. |
Neel Guha; Julian Nyarko; Daniel Ho; Christopher Ré; Adam Chilton; Aditya K; Alex Chohlas-Wood; Austin Peters; Brandon Waldon; Daniel Rockmore; Diego Zambrano; Dmitry Talisman; Enam Hoque; Faiz Surani; Frank Fagan; Galit Sarfaty; Gregory Dickinson; Haggai Porat; Jason Hegland; Jessica Wu; Joe Nudell; Joel Niklaus; John Nay; Jonathan Choi; Kevin Tobia; Margaret Hagan; Megan Ma; Michael Livermore; Nikon Rasumov-Rahe; Nils Holzenberger; Noam Kolt; Peter Henderson; Sean Rehaag; Sharad Goel; Shang Gao; Spencer Williams; Sunny Gandhi; Tom Zur; Varun Iyer; Zehua Li; |
441 | Cola: A Benchmark for Compositional Text-to-image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Compositional reasoning is a hallmark of human visual intelligence; yet despite the size of large vision-language models, they struggle to represent simple compositions by combining objects with their attributes. To measure this lack of compositional capability, we design Cola, a text-to-image retrieval benchmark to Compose Objects Localized with Attributes. |
Arijit Ray; Filip Radenovic; Abhimanyu Dubey; Bryan Plummer; Ranjay Krishna; Kate Saenko; |
442 | Learning Threshold Neurons Via Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. |
Kwangjun Ahn; Sebastien Bubeck; Sinho Chewi; Yin Tat Lee; Felipe Suarez; Yi Zhang; |
443 | Improving Category Discovery When No Representation Rules Them All Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we tackle the problem of Generalized Category Discovery (GCD). |
Sagar Vaze; Andrea Vedaldi; Andrew Zisserman; |
444 | FaceComposer: A Unified Framework for Versatile Facial Content Creation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents FaceComposer, a unified generative model that accomplishesa variety of facial content creation tasks, including text-conditioned face synthesis,text-guided face editing, face animation etc. |
Jiayu Wang; Kang Zhao; Yifeng Ma; Shiwei Zhang; Yingya Zhang; Yujun Shen; Deli Zhao; Jingren Zhou; |
445 | VideoComposer: Compositional Video Synthesis with Motion Controllability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the paradigm of compositional generation, this work presents VideoComposer that allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions. |
Xiang Wang; Hangjie Yuan; Shiwei Zhang; Dayou Chen; Jiuniu Wang; Yingya Zhang; Yujun Shen; Deli Zhao; Jingren Zhou; |
446 | Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new direction-oriented multi-objective problem by regularizing the common descent direction within a neighborhood of a direction that optimizes a linear combination of objectives such as the average loss in MTL. |
Peiyao Xiao; Hao Ban; Kaiyi Ji; |
447 | Fairness-guided Few-shot Prompting for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. |
Huan Ma; Changqing Zhang; Yatao Bian; Lemao Liu; Zhirui Zhang; Peilin Zhao; Shu Zhang; Huazhu Fu; Qinghua Hu; Bingzhe Wu; |
448 | DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DoWG (Distance over Weighted Gradients), a new parameter-free optimizer that combines adaptive gradient weighting with distance estimation. |
Ahmed Khaled; Konstantin Mishchenko; Chi Jin; |
449 | Revisiting The Evaluation of Image Synthesis with GANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. |
mengping yang; Ceyuan Yang; Yichi Zhang; Qingyan Bai; Yujun Shen; Bo Dai; |
450 | Language Models Can Improve Event Prediction By Few-Shot Abductive Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction accuracy of event sequence models. |
Xiaoming Shi; Siqiao Xue; Kangrui Wang; Fan Zhou; James Zhang; Jun Zhou; Chenhao Tan; Hongyuan Mei; |
451 | Can Language Models Solve Graph Problems in Natural Language? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose NLGraph (Natural Language Graph), a comprehensive benchmark of graph-based problem solving designed in natural language. |
Heng Wang; Shangbin Feng; Tianxing He; Zhaoxuan Tan; Xiaochuang Han; Yulia Tsvetkov; |
452 | NeuralGF: Unsupervised Point Normal Estimation By Learning Neural Gradient Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, normal orientation consistency across shapes remains difficult to achieve without a separate post-processing procedure. To resolve these issues, we propose a novel method for estimating oriented normals directly from point clouds without using ground truth normals as supervision. |
Qing Li; Huifang Feng; Kanle Shi; Yue Gao; Yi Fang; Yu-Shen Liu; Zhizhong Han; |
453 | Is Distance Matrix Enough for Geometric Deep Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we expand on the families of counterexamples that MPNNs are unable to distinguish from their distance matrices, by constructing families of novel and symmetric geometric graphs, to better understand the inherent limitations of MPNNs. |
Zian Li; Xiyuan Wang; Yinan Huang; Muhan Zhang; |
454 | Exploring Loss Functions for Time-based Training Strategy in Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After that, we infer that loss functions providing adequate positive overall gradients help training by theoretical analysis. Based on this, we propose the enhanced counting loss to replace the commonly used mean square counting loss. |
Yaoyu Zhu; Wei Fang; Xiaodong Xie; Tiejun Huang; Zhaofei Yu; |
455 | Learning Neural Implicit Through Volume Rendering with Attentive Depth Fusion Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, rendering a view each time suffers from incomplete depth at holes and unawareness of occluded structures from the depth supervision, which severely affects the accuracy of geometry inference via volume rendering. To resolve this issue, we propose to learn neural implicit representations from multi-view RGBD images through volume rendering with an attentive depth fusion prior. |
Pengchong Hu; Zhizhong Han; |
456 | Diffusion Optimization Models with Trajectory Alignment for Constrained Design Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, engineering optimization methods based on physics still outperform generative models when dealing with constrained environments where data is scarce and precision is paramount. To address these challenges, we introduce Diffusion Optimization Models (DOM) and Trajectory Alignment (TA), a learning framework that demonstrates the efficacy of aligning the sampling trajectory of diffusion models with the optimization trajectory derived from traditional physics-based methods. |
Giorgio Giannone; Akash Srivastava; Ole Winther; Faez Ahmed; |
457 | Meta-Adapter: An Online Few-shot Learner for Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples, resulting in longer inference time and the risk of overfitting in certain domains. To tackle these challenges, we propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner. |
cheng cheng; Lin Song; Ruoyi Xue; Hang Wang; Hongbin Sun; Yixiao Ge; Ying Shan; |
458 | Resolving Interference When Merging Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter’s values across models. To address this, we propose our method, TrIm, Elect Sign & Merge (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. |
Prateek Yadav; Derek Tam; Leshem Choshen; Colin Raffel; Mohit Bansal; |
459 | Margin Maximization in Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we initiate the study of a softmax-attention model $f(X)=v^\top X^\top \text{softmax}(XW^\top p)$, where, $X$ is the tokenized input, $v$ is the value weights, $W$ is the key-query weights, and $p$ is a tunable token/prompt. |
Davoud Ataee Tarzanagh; Yingcong Li; Xuechen Zhang; Samet Oymak; |
460 | Replicability in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). |
Amin Karbasi; Grigoris Velegkas; Lin Yang; Felix Zhou; |
461 | Online Performative Gradient Descent for Learning Nash Equilibria in Decision-Dependent Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since agents are strategically coupled, traditional gradient-based methods are infeasible without the gradient oracle. To overcome this challenge, we model the strategic interactions by a general parametric model and propose a novel online algorithm, Online Performative Gradient Descent (OPGD), which leverages the ideas of online stochastic approximation and projected gradient descent to learn the Nash equilibrium in the context of function approximation for the unknown gradient. |
Zihan Zhu; Ethan Fang; Zhuoran Yang; |
462 | Towards Optimal Caching and Model Selection for Large Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model selector to choose from an ensemble of models for query processing. |
Banghua Zhu; Ying Sheng; Lianmin Zheng; Clark Barrett; Michael Jordan; Jiantao Jiao; |
463 | Doubly-Robust Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce doubly-robust self-training, an innovative semi-supervised algorithm that provably balances between two extremes. |
Banghua Zhu; Mingyu Ding; Philip Jacobson; Ming Wu; Wei Zhan; Michael Jordan; Jiantao Jiao; |
464 | Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel actor-critic algorithm for an environment with delayed feedback, which addresses the state-space explosion problem of conventional approaches. |
Jangwon Kim; Hangyeol Kim; Jiwook Kang; Jongchan Baek; Soohee Han; |
465 | Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (\ie subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. |
ZHIYUAN LIU; Yaorui Shi; An Zhang; Enzhi Zhang; Kenji Kawaguchi; Xiang Wang; Tat-Seng Chua; |
466 | Saddle-to-Saddle Dynamics in Diagonal Linear Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we fully describe the trajectory of gradient flow over $2$-layer diagonal linear networks for the regression setting in the limit of vanishing initialisation. |
Scott Pesme; Nicolas Flammarion; |
467 | Testing The General Deductive Reasoning Capacity of Large Language Models Using OOD Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. |
Abulhair Saparov; Richard Yuanzhe Pang; Vishakh Padmakumar; Nitish Joshi; Mehran Kazemi; Najoung Kim; He He; |
468 | Fairly Recommending with Social Attributes: A Flexible and Controllable Optimization Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce social attribute-aware IGF metrics from the perspective of social utility, and propose a new IGF problem that considers both direct and social utilities. |
Jinqiu Jin; Haoxuan Li; Fuli Feng; Sihao Ding; Peng Wu; Xiangnan He; |
469 | Unleashing The Power of Graph Data Augmentation on Covariate Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most strategies, such as invariant learning or graph augmentation, typically struggle with limited training environments or perturbed stable features, thus exposing limitations in handling the covariate shift issue. To address this, we develop a simple yet effective data augmentation strategy, Adversarial Invariant Augmentation (AIA), to handle the graph covariate shift. |
Yongduo Sui; Qitian Wu; Jiancan Wu; Qing Cui; Longfei Li; Jun Zhou; Xiang Wang; Xiangnan He; |
470 | Understanding Contrastive Learning Via Distributionally Robust Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (\eg labels). |
Junkang Wu; Jiawei Chen; Jiancan Wu; Wentao Shi; Xiang Wang; Xiangnan He; |
471 | Evaluating Post-hoc Explanations for Graph Neural Networks Via Robustness Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, in this work, we endeavor to confront the issue via introducing a novel evaluation metric, termed **O**OD-resistant **A**dversarial **R**obustness (OAR). |
Junfeng Fang; Wei Liu; Xiang Wang; Zemin Liu; An Zhang; Yuan Gao; Xiangnan He; |
472 | 3D Open-vocabulary Segmentation with Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We tackle the challenges in 3D open-vocabulary segmentation by exploiting the open-vocabulary multimodal knowledge and object reasoning capability of pre-trained foundation models CLIP and DINO, without necessitating any fine-tuning. |
Kunhao Liu; Fangneng Zhan; Jiahui Zhang; MUYU XU; Yingchen Yu; Abdulmotaleb El Saddik; Christian Theobalt; Eric Xing; Shijian Lu; |
473 | MixFormerV2: Efficient Fully Transformer Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their efficiency remains an obstacle to practical deployment on both GPU and CPU platforms. In this paper, to overcome this issue, we propose a fully transformer tracking framework, coined as \emph{MixFormerV2}, without any dense convolutional operation and complex score prediction module. |
Yutao Cui; Tianhui Song; Gangshan Wu; Limin Wang; |
474 | Solving A Class of Non-Convex Minimax Optimization in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a class of federated nonconvex minimax optimization problems. |
Xidong Wu; Jianhui Sun; Zhengmian Hu; Aidong Zhang; Heng Huang; |
475 | Federated Conditional Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the nonconvex conditional stochastic optimization in federated learning and proposes the first federated conditional stochastic optimization algorithm (FCSG) with a conditional stochastic gradient estimator. |
Xidong Wu; Jianhui Sun; Zhengmian Hu; Junyi Li; Aidong Zhang; Heng Huang; |
476 | Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, implicit bias in non-smooth neural networks trained by gradient descent remains an open question. In this paper, we aim to answer this question by studying the implicit bias of gradient descent for training two-layer fully connected (leaky) ReLU neural networks. |
Yiwen Kou; Zixiang Chen; Quanquan Gu; |
477 | ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ViCA-NeRF, a view-consistency-aware method for 3D editing with text instructions. |
Jiahua Dong; Yu-Xiong Wang; |
478 | EDGI: Equivariant Diffusion for Planning with Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group ℤ, and the object permutation group Sₙ. |
Johann Brehmer; Joey Bose; Pim de Haan; Taco Cohen; |
479 | Geometric Algebra Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. |
Johann Brehmer; Pim de Haan; Sönke Behrends; Taco Cohen; |
480 | Diversify \& Conquer: Outcome-directed Curriculum RL Via Out-of-Distribution Disagreement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called $\textbf{D}$iversify for $\textbf{D}$isagreement \& $\textbf{C}$onquer ($\textbf{D2C}$). |
Daesol Cho; Seungjae Lee; H. Jin Kim; |
481 | Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we quantify the uncertainty of dynamics models using Conformal Prediction (CP), which is an effective technique for constructing prediction sets that achieve valid coverage. |
Jiankai Sun; Yiqi Jiang; Jianing Qiu; Parth Nobel; Mykel J Kochenderfer; Mac Schwager; |
482 | The Expressive Power of Pooling in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we derive sufficient conditions for a pooling operator to fully preserve the expressive power of the MP layers before it. |
Filippo Maria Bianchi; Veronica Lachi; |
483 | Learning to Reason and Memorize with Self-Notes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. |
Jack Lanchantin; Shubham Toshniwal; Jason Weston; arthur szlam; Sainbayar Sukhbaatar; |
484 | BiMatting: Efficient Video Matting Via Binarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, binarization of the video matting model is not a straightforward process, and our empirical analysis has revealed two primary bottlenecks: severe representation degradation of the encoder and massive redundant computations of the decoder. To address these issues, we propose BiMatting, an accurate and efficient video matting model using binarization. |
Haotong Qin; Lei Ke; Xudong Ma; Martin Danelljan; Yu-Wing Tai; Chi-Keung Tang; Xianglong Liu; Fisher Yu; |
485 | Adversarial Training for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the pursuit of fixing adversarial training (1) we show and overcome fundamental theoretical as well as practical limitations of the adopted graph learning setting in prior work; (2) we reveal that more flexible GNNs based on learnable graph diffusion are able to adjust to adversarial perturbations, while the learned message passing scheme is naturally interpretable; (3) we introduce the first attack for structure perturbations that, while targeting multiple nodes at once, is capable of handling global (graph-level) as well as local (node-level) constraints. |
Lukas Gosch; Simon Geisler; Daniel Sturm; Bertrand Charpentier; Daniel Zügner; Stephan Günnemann; |
486 | ProPILE: Probing Privacy Leakage in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. |
Siwon Kim; Sangdoo Yun; Hwaran Lee; Martin Gubri; Sungroh Yoon; Seong Joon Oh; |
487 | BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs). |
Jiaming Ji; Mickel Liu; Josef Dai; Xuehai Pan; Chi Zhang; Ce Bian; Boyuan Chen; Ruiyang Sun; Yizhou Wang; Yaodong Yang; |
488 | Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. |
Jiaming Ji; Borong Zhang; Jiayi Zhou; Xuehai Pan; Weidong Huang; Ruiyang Sun; Yiran Geng; Josef Dai; Yaodong Yang; |
489 | IEBins: Iterative Elastic Bins for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel concept of iterative elastic bins (IEBins) for the classification-regression-based MDE. |
Shuwei Shao; Zhongcai Pei; Xingming Wu; Zhong Liu; Weihai Chen; Zhengguo Li; |
490 | A New Perspective on Building Efficient and Expressive 3D Equivariant Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. |
weitao Du; Yuanqi Du; Limei Wang; Dieqiao Feng; Guifeng Wang; Shuiwang Ji; Carla Gomes; Zhi-Ming Ma; |
491 | OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the OBELICS dataset, an open web-scale filtered dataset of interleaved image-text documents comprising 141 million web pages extracted from Common Crawl, 353 million associated images, and 115 billion text tokens. |
Hugo Laurençon; Lucile Saulnier; Leo Tronchon; Stas Bekman; Amanpreet Singh; Anton Lozhkov; Thomas Wang; Siddharth Karamcheti; Alexander Rush; Douwe Kiela; Matthieu Cord; Victor Sanh; |
492 | M$^2$Hub: Unlocking The Potential of Machine Learning for Materials Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. |
Yuanqi Du; Yingheng Wang; Yining Huang; Jianan Canal Li; Yanqiao Zhu; Tian Xie; Chenru Duan; John Gregoire; Carla Gomes; |
493 | Greedy Poisson Rejection Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conclusively solve one-shot channel simulation for one-dimensional problems where the target-proposal density ratio is unimodal by describing an algorithm with optimal runtime. |
Gergely Flamich; |
494 | Rethinking The Role of Token Retrieval in Multi-Vector Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. |
Jinhyuk Lee; Zhuyun Dai; Sai Meher Karthik Duddu; Tao Lei; Iftekhar Naim; Ming-Wei Chang; Vincent Zhao; |
495 | Real-Time Motion Prediction Via Heterogeneous Polyline Transformer with Relative Pose Encoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. |
Zhejun Zhang; Alexander Liniger; Christos Sakaridis; Fisher Yu; Luc V Gool; |
496 | NICE: NoIse-modulated Consistency REgularization for Data-Efficient GANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The bottleneck imposed by limited data presents substantial obstacles for efficient training of GAN, including discriminator overfitting and training instability. In this paper, we present a novel approach called NoIse-modulated Consistency rEgularization (NICE) to overcome these challenges. |
Yao Ni; Piotr Koniusz; |
497 | A Bayesian Perspective On Training Data Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a Bayesian perspective on the TDA task, where the learned model is treated as a Bayesian posterior and the TDA estimates as random variables. |
Elisa Nguyen; Minjoon Seo; Seong Joon Oh; |
498 | Estimating The Rate-Distortion Function By Wasserstein Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method to compute $R(D)$ based on Wasserstein gradient descent. |
Yibo Yang; Stephan Eckstein; Marcel Nutz; Stephan Mandt; |
499 | Riemannian Laplace Approximations for Bayesian Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple parametric approximate posterior that adapts to the shape of the true posterior through a Riemannian metric that is determined by the log-posterior gradient. |
Federico Bergamin; Pablo Moreno-Muñoz; Søren Hauberg; Georgios Arvanitidis; |
500 | Is This Loss Informative? Faster Text-to-Image Customization By Tracking Objective Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, many efficient methods of adaptation have a long training time, which limits their practical applications, slows down research experiments, and spends excessive GPU resources. In this work, we study the training dynamics of popular text-to-image personalization methods (such as Textual Inversion or DreamBooth), aiming to speed them up. |
Anton Voronov; Mikhail Khoroshikh; Artem Babenko; Max Ryabinin; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~3,500 papers), please visit Paper Digest: NeurIPS-2023 (Full List).