Paper Digest: NeurIPS 2023 Highlights
1, Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick; Jane Dwivedi-Yu; Roberto Dessi; Roberta Raileanu; Maria Lomeli; Eric Hambro; Luke Zettlemoyer; Nicola Cancedda; Thomas Scialom;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we show that LMs can teach themselves to *use external tools* via simple APIs and achieve the best of both worlds.
2, Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan; Niket Tandon; Prakhar Gupta; Skyler Hallinan; Luyu Gao; Sarah Wiegreffe; Uri Alon; Nouha Dziri; Shrimai Prabhumoye; Yiming Yang; Shashank Gupta; Bodhisattwa Prasad Majumder; Katherine Hermann; Sean Welleck; Amir Yazdanbakhsh; Peter Clark;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
3, Vicuna Evaluation: Exploring LLM-as-a-Judge and Chatbot Arena
Lianmin Zheng; Wei-Lin Chiang; Ying Sheng; Siyuan Zhuang; Zhanghao Wu; Yonghao Zhuang; Zi Lin; Zhuohan Li; Dacheng Li; Eric Xing; Hao Zhang; Joseph Gonzalez; Ion Stoica;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them.
4, Mathematical Capabilities of ChatGPT
Simon Frieder; Luca Pinchetti; Chevalier; Ryan-Rhys Griffiths; Tommaso Salvatori; Thomas Lukasiewicz; Philipp Petersen; Julius Berner;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology.
5, The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only
Guilherme Penedo; Quentin Malartic; Daniel Hesslow; Ruxandra Cojocaru; Hamza Alobeidli; Alessandro Cappelli; Baptiste Pannier; Ebtesam Almazrouei; Julien Launay;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, as larger models requiring pretraining on trillions of tokens are considered, it is unclear how scalable is curation, and whether we will run out of unique high-quality data soon. At variance with previous beliefs, we show that properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models trained on The Pile.
6, InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai; Junnan Li; DONGXU LI; Anthony Meng Huat Tiong; Junqi Zhao; Weisheng Wang; Boyang Li; Pascale N Fung; Steven Hoi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models.
7, QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers; Artidoro Pagnoni; Ari Holtzman; Luke Zettlemoyer;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance.
8, Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang; Li Dong; Wenhui Wang; Yaru Hao; Saksham Singhal; Shuming Ma; Tengchao Lv; Lei Cui; Owais Khan Mohammed; Barun Patra; Qiang Liu; Kriti Aggarwal; Zewen Chi; Nils Bjorck; Vishrav Chaudhary; Subhojit Som; XIA SONG; Furu Wei;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we introduce KOSMOS-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot).
9, Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn; Federico Cassano; Ashwin Gopinath; Karthik Narasimhan; Shunyu Yao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose Reflexion, a novel framework to reinforce language agents not by updating weights, but instead through linguistic feedback.
10, Scaling Data-Constrained Language Models
Niklas Muennighoff; Alexander Rush; Boaz Barak; Teven Le Scao; Nouamane Tazi; Aleksandra Piktus; Thomas Wolf; Colin Raffel; Sampo Pyysalo;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters.
11, LIMA: Less Is More for Alignment
Chunting Zhou; Pengfei Liu; Puxin Xu; Srinivasan Iyer; Jiao Sun; Yuning Mao; Xuezhe Ma; Avia Efrat; Ping Yu; LILI YU; Susan Zhang; Gargi Ghosh; Mike Lewis; Luke Zettlemoyer; Omer Levy;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling
12, Segment Everything Everywhere All at Once
Xueyan Zou; Jianwei Yang; Hao Zhang; Feng Li; Linjie Li; Jianfeng Wang; Lijuan Wang; Jianfeng Gao; Yong Jae Lee;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we present SEEM, a promotable and interactive model for segmenting everything everywhere all at once in an image.
13, Language Models Can Solve Computer Tasks
Geunwoo Kim; Pierre Baldi; Stephen McAleer;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent recursively criticizes and improves its output (RCI).
14, Stable Bias: Evaluating Societal Representations in Diffusion Models
Sasha Alexandra Luccioni; Christopher Akiki; Margaret Mitchell; Yacine Jernite;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This evaluation, however, is made more difficult by the synthetic nature of these systems’ outputs: common definitions of diversity are grounded in social categories of people living in the world, whereas the artificial depictions of fictive humans created by these systems have no inherent gender or ethnicity. To address this need, we propose a new method for exploring the social biases in TTI systems.
15, HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in Hugging Face
Yongliang Shen; Kaitao Song; Xu Tan; Dongsheng Li; Weiming Lu; Yueting Zhuang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks.
16, Direct Preference Optimization: Your Language Model Is Secretly A Reward Model
Rafael Rafailov; Archit Sharma; Eric Mitchell; Christopher D Manning; Stefano Ermon; Chelsea Finn;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper, we leverage a mapping between reward functions and optimal policies to show that this constrained reward maximization problem can be optimized exactly with a single stage of policy training, essentially solving a classification problem on the human preference data.
17, StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Yonglong Tian; Lijie Fan; Phillip Isola; Huiwen Chang; Dilip Krishnan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We show that (1) when the generative model is properly configured, training self-supervised methods on synthetic images can match or beat the real image counterpart;(2) by treating the multiple images generated from the same text prompt as positives for each other, we develop a multi-positive contrastive learning method, which we call StableRep.
18, Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin; Julian Michael; Ethan Perez; Samuel Bowman;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs�e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always (A)�which models systematically fail to mention in their explanations.
19, ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu; Xiao Liu; Yuchen Wu; Yuxuan Tong; Qinkai Li; Ming Ding; Jie Tang; Yuxiao Dong;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present a comprehensive solution to learn and improve text-to-image models from human preference feedback.
20, Self-Supervised Learning with Lie Symmetries for Partial Differential Equations
Grégoire Mialon; Quentin Garrido; Hannah Lawrence; Danyal Rehman; Bobak Kiani; Yann LeCun;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision.
21, Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain; Adam Polyak; Uriel Singer; Shahbuland Matiana; Joe Penna; Omer Levy;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences.
22, Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Alexander Borzunov; Dmitry Baranchuk; Tim Dettmers; Max Ryabinin; Younes Belkada; Artem Chumachenko; Pavel Samygin; Colin Raffel;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies.
23, Visual Instruction Tuning
Haotian Liu; Chunyuan Li; Qingyang Wu; Yong Jae Lee;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and an LLM for general-purpose visual and language understanding.
24, Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from The Data Perspective
Huayang Li; Tian Lan; Zihao Fu; Deng Cai; Lemao Liu; Nigel Collier; Taro Watanabe; Yixuan Su;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we aim to advance our understanding by presenting a straightforward and unified explanation from the data perspective.
25, AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback
Yann Dubois; Xuechen Li; Rohan Taori; Tianyi Zhang; Ishaan Gulrajani; Jimmy Ba; Carlos Guestrin; Percy Liang; Tatsunori Hashimoto;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Replicating and understanding this instruction-following process faces three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these bottlenecks with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost.
26, Dissecting Knowledge Distillation: An Exploration of Its Inner Workings and Applications
Utkarsh Ojha; Yuheng Li; Anirudh Sundara Rajan; Yingyu Liang; Yong Jae Lee;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Does its data invariance properties become similar? Our work presents a comprehensive study to try to answer these questions.
27, OpenProteinSet: Training Data for Structural Biology at Scale
Gustaf Ahdritz; Nazim Bouatta; Sachin Kadyan; Lukas Jarosch; Dan Berenberg; Ian Fisk; Andrew Watkins; Stephen Ra; Richard Bonneau; Mohammed AlQuraishi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions.
28, Does Localization Inform Editing? Surprising Differences in Causality-Based Localization Vs. Knowledge Editing in Language Models
Peter Hase; Mohit Bansal; Been Kim; Asma Ghandeharioun;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored.
29, Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy; Augustine Mavor-Parker; Aengus Lynch; Stefan Heimersheim; Adrià Garriga-Alonso;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This work proposes a novel algorithm, Automatic Circuit DisCovery (ACDC), to automate the identification of the important units in the network.
30, Visual Instruction Inversion: Image Editing Via Image Prompting
Thao Nguyen; Yuheng Li; Utkarsh Ojha; Yong Jae Lee;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present a method for image editing via visual prompting.
31, Paraphrasing Evades Detectors of AI-generated Text, But Retrieval Is An Effective Defense
Kalpesh Krishna; Yixiao Song; Marzena Karpinska; John Wieting; Mohit Iyyer;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
32, SceneScape: Text-Driven Consistent Scene Generation
Rafail Fridman; Amit Abecasis; Yoni Kasten; Tali Dekel;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present a method for text-driven perpetual view generation -- synthesizing long-term videos of various scenes solely, given an input text prompt describing the scene and camera poses.
33, Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation
Binhui Xie; Shuang Li; Qingju Guo; Chi Liu; Xinjing Cheng;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper presents Annotator, a general and efficient active learning baseline, in which a plain voxel-centric online selection strategy is tailored to probe and annotate the salient and exemplar voxel girds within each LiDAR scan, broadening the potential of segmentation performance even under distribution shift.
34, Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond
Oleg Platonov; Denis Kuznedelev; Artem Babenko; Liudmila Prokhorenkova;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we show that commonly used homophily measures have critical drawbacks preventing the comparison of homophily levels across different datasets.
35, Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao; Dian Yu; Jeffrey Zhao; Izhak Shafran; Tom Griffiths; Yuan Cao; Karthik Narasimhan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.
36, Guide Your Agent with Adaptive Multimodal Rewards
Changyeon Kim; Younggyo Seo; Hao Liu; Lisa Lee; Jinwoo Shin; Honglak Lee; Kimin Lee;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we instead propose to utilize the knowledge captured within large vision-language models for improving the generalization capability of control agents.
37, C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang; Yuzhuo Bai; Zhihao Zhu; Junlei Zhang; Jinghan Zhang; Tangjun Su; Junteng Liu; Chuancheng Lv; Yikai Zhang; jiayi lei; Yao Fu; Maosong Sun; Junxian He;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context.
38, Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control
Wenlong Huang; Fei Xia; Dhruv Shah; Danny Driess; Andy Zeng; Yao Lu; Pete Florence; Igor Mordatch; Sergey Levine; Karol Hausman; brian ichter;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Thus, if we want to make use of the semantic knowledge in a language model while still situating it in an embodied setting, we must construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives.
39, BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning
Xuan Chen; Wenbo Guo; Guanhong Tao; Xiangyu Zhang; Dawn Song;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Despite initial defenses proposed in recent studies, these methods have very limited generalizability and scalability. To address this issue, we propose BIRD, a technique to detect and remove backdoors from a pretrained DRL policy in a clean environment without requiring any knowledge about the attack specifications and accessing its training process.
40, Battle of The Backbones: A Large-Scale Comparison of Pretrained Models Across Computer Vision Tasks
Micah Goldblum; Hossein Souri; Renkun Ni; Manli Shu; Viraj Prabhu; Gowthami Somepalli; Prithvijit Chattopadhyay; Adrien Bardes; Mark Ibrahim; Judy Hoffman; Rama Chellappa; Andrew Wilson; Tom Goldstein;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more.
41, A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning
Valeriia Cherepanova; Gowthami Somepalli; Jonas Geiping; C. Bayan Bruss; Andrew Wilson; Tom Goldstein; Micah Goldblum;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features.
42, Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition
Samuel Dooley; Rhea Sukthanker; John Dickerson; Colin White; Frank Hutter; Micah Goldblum;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Motivated by the belief that the inductive bias of a model architecture is more important than the bias mitigation strategy, we take a different approach to bias mitigation.
43, Are Aligned Neural Networks Adversarially Aligned?
Nicholas Carlini; Florian Tramer; Daphne Ippolito; Ludwig Schmidt; Milad Nasr; Matthew Jagielski; Pang Wei Koh; Irena Gao; Christopher A. Choquette-Choo;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: They respond helpfully to user questions, but when asked to perform some behavior that would cause harm, will politely refuse. We study to what extent these models are aligned even when interacting with an adversarial user who constructs worst-case adversarial example inputs.
44, Patch N’ Pack: NaViT, A Vision Transformer for Any Aspect Ratio and Resolution
Mostafa Dehghani; Basil Mustafa; Josip Djolonga; Jonathan Heek; Matthias Minderer; Mathilde Caron; Andreas Steiner; Joan Puigcerver; Robert Geirhos; Ibrahim Alabdulmohsin; Avital Oliver; Piotr Padlewski; Alexey Gritsenko; Mario Lucic; Neil Houlsby;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence packing during training to process inputs of arbitrary resolutions and aspect ratios.
45, Likelihood-Based Diffusion Language Models
Ishaan Gulrajani; Tatsunori Hashimoto;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we take the first steps towards closing the perplexity gap between autoregressive and diffusion-based language models, with the goal of building and releasing a diffusion model which outperforms the smallest widely-adopted autoregressive model (GPT-2 124M).
46, Structural Pruning for Diffusion Models
Gongfan Fang; Xinyin Ma; Xinchao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones, without the need for extensive re-training.
47, Data Selection for Language Models Via Importance Resampling
Sang Michael Xie; Shibani Santurkar; Tengyu Ma; Percy Liang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Instead, we extend the classic importance resampling approach used in low-dimensions for LM data selection. We propose Data Selection with Importance Resampling (DSIR), an efficient and scalable framework that estimates importance weights in a reduced feature space for tractability and selects data with importance resampling according to these weights.
48, Learning Universal Policies Via Text-Guided Video Generation
Yilun Du; Mengjiao (Sherry) Yang; Bo Dai; Hanjun Dai; Ofir Nachum; Josh Tenenbaum; Dale Schuurmans; Pieter Abbeel;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images, exhibiting combinatorial generalization across domains. Motivated by this success, we investigate whether such tools can be used to construct more general-purpose agents.
49, Counterfactual Memorization in Neural Language Models
Chiyuan Zhang; Daphne Ippolito; Katherine Lee; Matthew Jagielski; Florian Tramer; Nicholas Carlini;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.
50, Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun; Yikang Shen; Qinhong Zhou; Hongxin Zhang; Zhenfang Chen; David Cox; Yiming Yang; Chuang Gan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
51, Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu; Yushi Hu; Weijia Shi; Nouha Dziri; Alane Suhr; Prithviraj (Raj) Ammanabrolu; Noah Smith; Mari Ostendorf; Hannaneh Hajishirzi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal.
52, VisoGender: A Dataset for Benchmarking Gender Bias in Image-text Pronoun Resolution
Siobhan Mackenzie Hall; Fernanda Gonçalves Abrantes; Hanwen Zhu; Grace Sodunke; Aleksandar Shtedritski; Hannah Rose Kirk;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models.
53, TextDiffuser: Diffusion Models As Text Painters
Jingye Chen; Yupan Huang; Tengchao Lv; Lei Cui; Qifeng Chen; Furu Wei;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
54, Isotropic Loss Design for Non-contrastive SSL
Manu Srinath Halvagal; Axel Laborieux; Friedemann Zenke;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Here, we analytically study learning dynamics under cosine similarity in the eigenspace of the predictor network and show that collapse is avoided through implicit variance regularization similar to Euclidean loss but with fundamentally different dynamics.
55, ToolkenGPT: Augmenting Frozen Language Models with Massive Tools Via Tool Embeddings
Shibo Hao; Tianyang Liu; Zhen Wang; Zhiting Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Although the latter method offers adaptability to new tools, it struggles with the inherent context length constraint of LLMs when many new tools are presented, and mastering a new set of tools with few-shot examples remains challenging, resulting in suboptimal performance. To address these limitations, we propose a novel solution, named **ToolkenGPT**, wherein LLMs effectively learn to master tools as predicting tokens through **tool embeddings** for solving complex tasks.
56, Generating Images with Multimodal Language Models
Jing Yu Koh; Daniel Fried; Russ Salakhutdinov;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces.
57, Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Ying Fan; Olivia Watkins; Yuqing Du; Hao Liu; Moonkyung Ryu; Craig Boutilier; Pieter Abbeel; Mohammad Ghavamzadeh; Kangwook Lee; Kimin Lee;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose using reinforcement learning (RL) to fine-tune text-to-image models.
58, Solving Inverse Problems Provably Via Posterior Sampling with Latent Diffusion Models
Litu Rout; Negin Raoof; Giannis Daras; Constantine Caramanis; Alex Dimakis; Sanjay Shakkottai;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present the first framework to solve general inverse problems leveraging pre-trained *latent* diffusion models.
59, Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei; Bo Dai; Alekh Agarwal; Mohammad Ghavamzadeh; Csaba Szepesvari; Dale Schuurmans;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation.
60, Optimizing Prompts for Text-to-Image Generation
Yaru Hao; Zewen Chi; Li Dong; Furu Wei;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts.
61, Language Models Augmented with Decoupled Memory
Weizhi Wang; Li Dong; Hao Cheng; Xiaodong Liu; Xifeng Yan; Jianfeng Gao; Furu Wei;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Decoupled-Memory-Augmented LLMs (DeMA), which enables LLMs to memorize long history.
62, Extensible Prompts for Language Models on Zero-shot Language Style Customization
Tao Ge; Hu Jing; Li Dong; Shaoguang Mao; Yan Xia; Xun Wang; Si-Qing Chen; Furu Wei;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL).
63, LLaVA-Med: Training A Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li; Cliff Wong; Sheng Zhang; Naoto Usuyama; Haotian Liu; Jianwei Yang; Tristan Naumann; Hoifung Poon; Jianfeng Gao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
64, Tracr: Compiled Transformers As A Laboratory for Interpretability
David Lindner; Janos Kramar; Sebastian Farquhar; Matthew Rahtz; Tom McGrath; Vladimir Mikulik;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We show how to compile human-readable programs into standard decoder-only transformer models.
65, Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off
Zichen Zhang; Johannes Kirschner; Junxi Zhang; Francesco Zanini; Alex Ayoub; Masood Dehghan; Dale Schuurmans;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation.
66, Multi-Objective Agency Requires Non-Markovian Rewards
Silviu Pitis;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To this end, we propose a practical non-Markovian aggregation scheme that overcomes the impossibility with only one additional parameter for each objective.
67, Bypass Exponential Time Preprocessing: Fast Neural Network Training Via Weight-Data Correlation Preprocessing
Josh Alman; 杰昊 梁; Zhao Song; Ruizhe Zhang; Danyang Zhuo;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration.
68, LLM-Pruner: On The Structural Pruning of Large Language Models
Xinyin Ma; Gongfan Fang; Xinchao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM.
69, Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells
Rylan Schaeffer; Mikail Khona; Tzuhsuan Ma; Cristobal Eyzaguirre; Sanmi Koyejo; Ila Fiete;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We define a novel self-supervised learning (SSL) framework for properly arranging representations in an abstract coding space, and show that it can produce grid codes when constrained to perform high-efficiency representation of space with recurrent neural networks.
70, Are Emergent Abilities of Large Language Models A Mirage?
Rylan Schaeffer; Brando Miranda; Sanmi Koyejo;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks.
71, Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
Wanrong Zhu; Jack Hessel; Anas Awadalla; Samir Yitzhak Gadre; Jesse Dodge; Alex Fang; Youngjae Yu; Ludwig Schmidt; William Yang Wang; Yejin Choi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We use a linear assignment algorithm to place images into longer bodies of text using CLIP features, a process that we show outperforms alternatives.
72, DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization
Zhiqing Sun; Yiming Yang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This paper broadens the current scope of neural solvers for NPC problems by introducing a new graph-based diffusion framework, namely DIFUSCO.
73, Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Pan Lu; Baolin Peng; Hao Cheng; Michel Galley; Kai-Wei Chang; Ying Nian Wu; Song-Chun Zhu; Jianfeng Gao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning.
74, Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch; Uri Alon; Graham Neubig; Matthew Gormley;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single $k$-nearest-neighbor ($k$NN) index, while the returned $k$NN distances are the attention dot-product scores.
75, Simple and Controllable Music Generation
Jade Copet; Felix Kreuk; Itai Gat; Tal Remez; Gabriel Synnaeve; Yossi Adi; Alexandre Defossez;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
76, Setting The Trap: Capturing and Defeating Backdoor Threats in PLMs Through Honeypots
Ruixiang Tang; Jiayi Yuan; Yiming Li; Zirui Liu; Rui Chen; Xia Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples.
77, Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri; Ximing Lu; Melanie Sclar; Xiang (Lorraine) Li; Liwei Jiang; Bill Yuchen Lin; Sean Welleck; Peter West; Chandra Bhagavatula; Ronan Le Bras; Jena Hwang; Soumya Sanyal; Xiang Ren; Allyson Ettinger; Zaid Harchaoui; Yejin Choi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: As a measure of compositional complexity, we introduce computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures.
78, What Makes Good Examples for Visual In-Context Learning?
Yuanhan Zhang; Kaiyang Zhou; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To demystify in-context learning in computer vision, we conduct an extensive research and identify a critical problem: downstream performance is highly sensitivie to the choice of visual in-context examples. To address this problem, we propose a prompt retrieval framework specifically for large vision models, allowing the selection of in-context examples to be fully automated.
79, Fast Attention Requires Bounded Entries
Josh Alman; Zhao Song;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we investigate whether faster algorithms are possible by \emph{implicitly} making use of the matrix $A$.
80, Simplifying and Empowering Transformers for Large-Graph Representations
Qitian Wu; Wentao Zhao; Chenxiao Yang; Hengrui Zhang; Fan Nie; Haitian Jiang; Yatao Bian; Junchi Yan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level.
81, RealTime QA: What's The Answer Right Now?
Jungo Kasai; Keisuke Sakaguchi; yoichi takahashi; Ronan Le Bras; Akari Asai; Xinyan Yu; Dragomir Radev; Noah Smith; Yejin Choi; Kentaro Inui;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce RealTime QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version).
82, Segment Anything in 3D with NeRFs
Jiazhong Cen; Zanwei Zhou; Jiemin Fang; chen yang; Wei Shen; Lingxi Xie; Dongsheng Jiang; XIAOPENG ZHANG; Qi Tian;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper aims to generalize SAM to segment 3D objects.
83, LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion
Jiaqi Guan; Xingang Peng; PeiQi Jiang; Yunan Luo; Jian Peng; Jianzhu Ma;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we address a more general problem where the poses of the fragments are *unknown* in 3D space.
84, ClusterFomer: Clustering As A Universal Visual Learner
James Liang; Yiming Cui; Qifan Wang; Tong Geng; Wenguan Wang; Dongfang Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper presents ClusterFormer, a universal vision model that is based on the Clustering paradigm with TransFormer.
85, Perfect Linear Concept Erasure in Closed Form
Nora Belrose; David Schneider-Joseph; Shauli Ravfogel; Ryan Cotterell; Edward Raff; Stella Biderman;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We prove that a rank $k - 1$ orthogonal projection is sufficient to perfectly guard a $k$-class concept from all linear adversaries with convex loss functions, and provide the formula in closed form.
86, Language Models Meet World Models: Embodied Experiences Enhance Language Models
Jiannan Xiang; Tianhua Tao; Yi Gu; Tianmin Shu; Zirui Wang; Zichao Yang; Zhiting Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities.
87, Provably Bounding Neural Network Preimages
Christopher Brix; Suhas Kotha; Huan Zhang; J. Zico Kolter; Krishnamurthy Dvijotham;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we present the INVPROP algorithm for verifying properties over the preimage of a linearly constrained output set of a neural network, which can be combined with branch-and-bound to increase precision.
88, Does Progress on ImageNet Transfer to Real-world Datasets?
Alex Fang; Simon Kornblith; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collected for comparing models.
89, What Is The Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models
Khashayar Gatmiry; Zhiyuan Li; Tengyu Ma; Sashank Reddi; Stefanie Jegelka; Ching-Yao Chuang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We show that with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization.
90, VisionLLM: Large Language Model Is Also An Open-Ended Decoder for Vision-Centric Tasks
Wenhai Wang; Zhe Chen; Xiaokang Chen; Jiannan Wu; Xizhou Zhu; Gang Zeng; Ping Luo; Tong Lu; Jie Zhou; Yu Qiao; Jifeng Dai;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we present an LLM-based framework for vision-centric tasks, termed VisionLLM.
91, Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi; Tianyu Gao; Eshaan Nichani; Alex Damian; Jason Lee; Danqi Chen; Sanjeev Arora;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we propose a memory-efficient zeroth-order optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference.
92, Incentives in Federated Learning: Equilibria, Dynamics, and Mechanisms for Welfare Maximization
Aniket Murhekar; Zhuowen Yuan; Bhaskar Ray Chaudhury; Bo Li; Ruta Mehta;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we model a collaborative FL framework, where every agent attempts to achieve an optimal trade-off between her learning payoff and data sharing cost.
93, Scaling Riemannian Diffusion Models
Aaron Lou; Minkai Xu; Adam Farris; Stefano Ermon;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements.
94, (Provable) Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More
Jan Schuchardt; Yan Scholten; Stephan Günnemann;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: For the first time, we propose a sound notion of adversarial robustness that accounts for task equivariance.
95, PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning About Change
Karthik Valmeekam; Matthew Marquez; Alberto Olmo; Sarath Sreedharan; Subbarao Kambhampati;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: There is a strong need for systematic and extensible planning benchmarks with sufficient diversity to evaluate whether LLMs have innate planning capabilities. Motivated by this, we propose PlanBench, an extensible benchmark suite based on the kinds of domains used in the automated planning community, especially in the International Planning Competition, to test the capabilities of LLMs in planning or reasoning about actions and change.
96, On The Planning Abilities of Large Language Models - A Critical Investigation
Karthik Valmeekam; Matthew Marquez; Sarath Sreedharan; Subbarao Kambhampati;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities.
97, InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
John Yang; Akshara Prabhakar; Karthik Narasimhan; Shunyu Yao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the generated code and its final execution environment. To address this gap, we introduce InterCode, a lightweight, flexible, and easy-to-use framework for constructing interactive code environments with multiple types of feedback signals.
98, Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto; Yuexiang Zhai; Anikait Singh; Max Sobol Mark; Yi Ma; Chelsea Finn; Aviral Kumar; Sergey Levine;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities.
99, Diffusion Self-Guidance for Controllable Image Generation
Dave Epstein; Allan Jabri; Ben Poole; Alexei Efros; Aleksander Holynski;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce self-guidance, a method that provides precise control over properties of the generated image by guiding the internal representations of diffusion models.
100, Stable and Low-precision Training for Large-scale Vision-language Models
Mitchell Wortsman; Tim Dettmers; Luke Zettlemoyer; Ari Morcos; Ali Farhadi; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models.
101, Collaborative Development of NLP Models
Fereshte Khani; Marco Tulio Ribeiro;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Moreover, the exhaustive delineation of a concept is challenging, and an improper approach can create shortcuts or interfere with original data or other concepts. To address these challenges, we introduce CoDev, a framework that enables multi-user interaction with the model, thereby mitigating individual limitations.
102, The Clock and The Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Ziqian Zhong; Ziming Liu; Max Tegmark; Jacob Andreas;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex: small changes to model hyperparameters and initializations can induce discovery of qualitatively different algorithms from a fixed training set, and even learning of multiple different solutions in parallel.
103, Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
Arpit Bansal; Eitan Borgnia; Hong-Min Chu; Jie Li; Hamid Kazemi; Furong Huang; Micah Goldblum; Jonas Geiping; Tom Goldstein;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact, an entire family of generative models can be constructed by varying this choice.
104, Where Are We in The Search for An Artificial Visual Cortex for Embodied Intelligence?
Arjun Majumdar; Karmesh Yadav; Sergio Arnaud; Jason Yecheng Ma; Claire Chen; Sneha Silwal; Aryan Jain; Vincent-Pierre Berges; Tingfan Wu; Jay Vakil; Pieter Abbeel; Jitendra Malik; Dhruv Batra; Yixin Lin; Oleksandr Maksymets; Aravind Rajeswaran; Franziska Meier;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual ‘foundation models’ for Embodied AI.
105, In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Leonard Salewski; Isabel Rio-Torto; Stephan Alaniz; Eric Schulz; Zeynep Akata;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context.
106, Synthetic Pretraining for Few-shot Black-Box Optimization
Tung Nguyen; Sudhanshu Agrawal; Aditya Grover;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we address the more challenging yet realistic setting of few-shot black-box optimization, where only a few labeled data points are available.
107, ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling
Tung Nguyen; Jason Jewik; Hritik Bansal; Prakhar Sharma; Aditya Grover;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science.
108, SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
Cheng-Yu Hsieh; Jieyu Zhang; Zixian Ma; Aniruddha Kembhavi; Ranjay Krishna;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This hackability is so dire that blind models with no access to the image outperform state-of-the-art vision-language models. To remedy this rampant vulnerability, we introduce $\textit{SugarCrepe}$, a new benchmark for vision-language compositionality evaluation.
109, Scalable 3D Captioning with Pretrained Models
Tiange Luo; Chris Rockwell; Honglak Lee; Justin Johnson;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects.
110, Timewarp: Transferable Acceleration of Molecular Dynamics By Learning Time-Coarsened Dynamics
Leon Klein; Andrew Foong; Tor Fjelde; Bruno Mlodozeniec; Marc Brockschmidt; Sebastian Nowozin; Frank Noe; Ryota Tomioka;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present *Timewarp*, an enhanced sampling method which uses a normalising flow as a proposal distribution in a Markov chain Monte Carlo method targeting the Boltzmann distribution.
111, Inference-Time Intervention: Eliciting Truthful Answers from A Language Model
Kenneth Li; Oam Patel; Fernanda Viégas; Hanspeter Pfister; Martin Wattenberg;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs).
112, DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie; Hieu Pham; Xuanyi Dong; Nan Du; Hanxiao Liu; Yifeng Lu; Percy Liang; Quoc V Le; Tengyu Ma; Adams Wei Yu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model.
113, Neural Functional Transformers
Allan Zhou; Kaien Yang; Yiding Jiang; Kaylee Burns; Winnie Xu; Samuel Sokota; J. Zico Kolter; Chelsea Finn;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Nevertheless, constructing expressive and efficient neural functional architectures that can handle high-dimensional weight-space objects remains challenging. This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers and composes them into deep equivariant models called neural functional Transformers (NFTs).
114, Permutation Equivariant Neural Functionals
Allan Zhou; Kaien Yang; Kaylee Burns; Adriano Cardace; Yiding Jiang; Samuel Sokota; J. Zico Kolter; Chelsea Finn;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We approach the design of neural functionals through the lens of symmetry, in particular by focusing on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. We introduce a framework for building *permutation equivariant* neural functionals, whose architectures encode these symmetries as an inductive bias.
115, Bridging Discrete and Backpropagation: Straight-Through and Beyond
Liyuan Liu; Chengyu Dong; Xiaodong Liu; Bin Yu; Jianfeng Gao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables.
116, Symbolic Discovery of Optimization Algorithms
Xiangning Chen; Chen Liang; Da Huang; Esteban Real; Kaiyuan Wang; Hieu Pham; Xuanyi Dong; Thang Luong; Cho-Jui Hsieh; Yifeng Lu; Quoc V Le;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training.
117, Is Your Code Generated By ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu; Chunqiu Steven Xia; Yuyao Wang; LINGMING ZHANG;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus – a code synthesis benchmarking framework to rigorously evaluate the functional correctness of LLM-synthesized code.
118, Reward Imputation with Sketching for Contextual Batched Bandits
Xiao Zhang; Ninglu Shao; Zihua Si; Jun Xu; Wenhan Wang; Hanjing Su; Ji-Rong Wen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose an efficient approach called Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching, which approximates the full-information feedbacks.
119, REASONER: An Explainable Recommendation Dataset with Comprehensive Labeling Ground Truths
Xu Chen; Jingsen Zhang; Lei Wang; Quanyu Dai; Zhenhua Dong; Ruiming Tang; Rui Zhang; Li Chen; Xin Zhao; Ji-Rong Wen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In the past few years, while a lot of promising explainable recommender models have been proposed, the datasets used to evaluate them still suffer from several limitations, for example, the explanation ground truths are not labeled by the real users, the explanations are mostly single-modal and around only one aspect. To bridge these gaps, in this paper, we build a new explainable recommendation dataset, which, to our knowledge, is the first contribution that provides a large amount of real user labeled multi-modal and multi-aspect explaination ground truths.
120, Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
Beichen Zhang; Kun Zhou; Xilin Wei; Xin Zhao; Jing Sha; Shijin Wang; Ji-Rong Wen;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In CARP, we test four LLMs with CoT prompting, and find that they are all prone to make mistakes at the early steps of the solution, leading to incorrect answers. Based on this finding, we propose a new approach that can deliberate the reasoning steps with tool interfaces, namely \textbf{DELI}.
121, OpenAGI: When LLM Meets Domain Experts
Yingqiang Ge; Wenyue Hua; Kai Mei; jianchao ji; Juntao Tan; Shuyuan Xu; Zelong Li; Yongfeng Zhang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we introduce \textbf{OpenAGI}, an open-source AGI research platform designed for multi-step, real-world tasks.
122, Scaling in Depth: Unlocking Robustness Certification on ImageNet
Kai Hu; Andy Zou; Zifan Wang; Klas Leino; Matt Fredrikson;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper investigates strategies for expanding certifiably robust training to larger, deeper models.
123, Why Diffusion Models Memorize and How to Mitigate Copying
Gowthami Somepalli; Vasu Singla; Micah Goldblum; Jonas Geiping; Tom Goldstein;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models.
124, Big Little Transformer Decoder
Sehoon Kim; Karttikeya Mangalam; Suhong Moon; Jitendra Malik; Michael Mahoney; Amir Gholami; Kurt Keutzer;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: The inference latency is further exacerbated by autoregressive generative tasks, as models need to run iteratively to generate tokens sequentially without leveraging token-level parallelization. To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.
125, Alexa Arena: A User-Centric Interactive Platform for Embodied AI
Qiaozi Gao; Govindarajan Thattai; Suhaila Shakiah; Xiaofeng Gao; Shreyas Pansare; Vasu Sharma; Gaurav Sukhatme; Hangjie Shi; Bofei Yang; Desheng Zhang; Lucy Hu; Karthika Arumugam; Shui Hu; Matthew Wen; Dinakar Guthy; Shunan Chung; Rohan Khanna; Osman Ipek; Leslie Ball; Kate Bland; Heather Rocker; Michael Johnston; Reza Ghanadan; Dilek Hakkani-Tur; Prem Natarajan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research.
126, Grounding Neural Inference with Satisfiability Modulo Theories
Matt Fredrikson; Kaiji Lu; Somesh Jha; Saranya Vijayakumar; Vijay Ganesh; Zifan Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper we present a set of techniques for integrating Satisfiability Modulo Theories (SMT) solvers into the forward and backward passes of a deep network layer, called SMTLayer.
127, 3D-LLM: Injecting The 3D World Into Large Language Models
Yining Hong; Haoyu Zhen; Peihao Chen; Shuhong Zheng; Yilun Du; Zhenfang Chen; Chuang Gan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we propose to inject the 3D world into large language models, and introduce a whole new family of 3D-LLMs.
128, $k$-Means Clustering with Distance-Based Privacy
Alessandro Epasto; Vahab Mirrokni; Shyam Narayanan; Peilin Zhong;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we initiate the study of Euclidean clustering with Distance-based privacy.
129, Multi-scale Diffusion Denoised Smoothing
Jongheon Jeong; Jinwoo Shin;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we investigate the trade-off between accuracy and certified robustness of denoised smoothing: for example, we question on which representation of diffusion model would maximize the certified robustness of denoised smoothing.
130, Backprop-Free Dataset Distillation
Songhua Liu; Xinchao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, different from the time-consuming forward-backward passes, we introduce a backprop-free fashion for dataset distillation with significantly improved efficiency.
131, VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models
Yonatan Bitton; Hritik Bansal; Jack Hessel; Rulin Shao; Wanrong Zhu; Anas Awadalla; Josh Gardner; Rohan Taori; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce VisIT-Bench, a robust benchmark for diverse real-life vision-language instructions across 70 tasks, from recognition to reasoning.
132, Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs
Yingcong Li; Kartik Sreenivasan; Angeliki Giannou; Dimitris Papailiopoulos; Samet Oymak;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs).
133, Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei; Nika Haghtalab; Jacob Steinhardt;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Large language models trained for safety and harmlessness remain susceptible to adversarial misuse, as evidenced by the prevalence of �jailbreak� attacks on early releases of ChatGPT that elicit undesired behavior. Going beyond recognition of the issue, we investigate why such attacks succeed and how they can be created.
134, Fair Graph Distillation
Qizhang Feng; Zhimeng Jiang; Ruiquan Li; Yicheng Wang; Na Zou; Jiang Bian; Xia Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Based on the proposed coherence metric, we introduce a framework for fair graph distillation using a bi-level optimization algorithm.
135, Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
Neel Guha; Mayee Chen; Kush Bhatia; Azalia Mirhoseini; Frederic Sala; Christopher Ré;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions.
136, On Evaluating Adversarial Robustness of Large Vision-Language Models
Yunqing Zhao; Tianyu Pang; Chao Du; Xiao Yang; Chongxuan LI; Ngai-Man (Man) Cheung; Min Lin;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To this end, we propose evaluating the robustness of open-source large VLMs in the most realistic and high-risk setting, where adversaries have only black-box system access and seek to deceive the model into returning the targeted responses.
137, The Learnability of In-Context Learning
Noam Wies; Yoav Levine; Amnon Shashua;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup.
138, NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Varun Jampani; Kevis-kokitsi Maninis; Andreas Engelhardt; Arjun Karpur; Karen Truong; Kyle Sargent; Stefan Popov; Andre Araujo; Ricardo Martin Brualla; Kaushal Patel; Daniel Vlasic; Vittorio Ferrari; Ameesh Makadia; Ce Liu; Yuanzhen Li; Howard Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To enable systematic research progress on 3D reconstruction from casual image captures, we propose `NAVI': a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters.
139, Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Yuheng Zha; Yichi Yang; Ruichen Li; Zhiting Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth.
140, Meta-in-context Learning in Large Language Models
Julian Coda-Forno; Marcel Binz; Zeynep Akata; Matt Botvinick; Jane Wang; Eric Schulz;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself.
141, Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs
Deepak Narayanan; Keshav Santhanam; Peter Henderson; Rishi Bommasani; Tony Lee; Percy Liang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Unfortunately, access to LLMs today is largely restricted to black-box text generation APIs; raw runtimes measured through this interface do not satisfy these desiderata: model providers can implement software and hardware optimizations orthogonal to the model, and shared infrastructure introduces performance contention. We propose a new metric for inference efficiency that puts models on equal footing as though they were served on uniform hardware and software and without performance contention.
142, Lexinvariant Language Models
Qian Huang; Eric Zelikman; Sarah Chen; Yuhuai Wu; Gregory Valiant; Percy Liang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: First, we prove that we can construct a lexinvariant LM to converge to the true language model at a uniform rate that is polynomial in terms of the context length, with a constant factor that is sublinear in the vocabulary size. Second, to build a lexinvariant LM, we simply encode tokens using random Gaussian vectors, such that each token maps to the same representation within each sequence but different representations across sequences.
143, Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes
Connor Toups; Rishi Bommasani; Kathleen Creel; Sarah Bana; Dan Jurafsky; Percy Liang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce *ecosystem-level analysis*: rather than analyzing a single model, we consider the collection of models that are deployed in a given context.
144, Holistic Evaluation of Text-to-Image Models
Tony Lee; Michihiro Yasunaga; Chenlin Meng; Yifan Mai; Joon Sung Park; Agrim Gupta; Yunzhi Zhang; Deepak Narayanan; Hannah Teufel; Marco Bellagente; Minguk Kang; Taesung Park; Jure Leskovec; Jun-Yan Zhu; Fei-Fei Li; Jiajun Wu; Stefano Ermon; Percy Liang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, existing evaluations primarily focus on image-text alignment and quality. To address this limitation, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM).
145, INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms
Shih-Cheng Huang; Zepeng Huo; Ethan Steinberg; Chia-Chun Chiang; Curtis Langlotz; Matthew Lungren; Serena Yeung; Nigam Shah; Jason Fries;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes.
146, Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping
Chunming He; Kai Li; Yachao Zhang; Guoxia Xu; Longxiang Tang; Yulun Zhang; Zhenhua Guo; Xiu Li;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: It remains a challenging task since (1) it is hard to distinguish concealed objects from the background due to the intrinsic similarity and (2) the sparsely-annotated training data only provide weak supervision for model learning. In this paper, we propose a new WSCOS method to address these two challenges.
147, Data Portraits: Recording Foundation Model Training Data
Marc Marone; Benjamin Van Durme;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Even while these models are now key in AI system building, it can be difficult to answer the straightforward question: has the model already encountered a given example during training? We therefore propose a widespread adoption of Data Portraits: artifacts that record training data and allow for downstream inspection.
148, MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning
Zeyuan Ma; Hongshu Guo; Jiacheng Chen; Zhenrui Li; Guojun Peng; Yue-Jiao Gong; Yining Ma; Zhiguang Cao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods.
149, Mirror Diffusion Models for Constrained and Watermarked Generation
Guan-Horng Liu; Tianrong Chen; Evangelos Theodorou; Molei Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability.
150, Towards Label-free Scene Understanding By Vision Foundation Models
Runnan Chen; Youquan Liu; Lingdong Kong; Nenglun Chen; Xinge ZHU; Yuexin Ma; Tongliang Liu; Wenping Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data.
151, Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks
Chongyu Qu; Tiezheng Zhang; Hualin Qiao; jie liu; Yucheng Tang; Alan Yuille; Zongwei Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper proposes a systematic and efficient method to expedite the annotation process for organ segmentation.
152, Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim Alabdulmohsin; Lucas Beyer; Alexander Kolesnikov; Xiaohua Zhai;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers.
153, The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad; Inkit Padhi; Karthikeyan Natesan Ramamurthy; Payel Das; Siva Reddy;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5's Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE).
154, Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels
Zebin You; Yong Zhong; Fan Bao; Jiacheng Sun; Chongxuan LI; Jun Zhu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called *dual pseudo training* (DPT), built upon strong semi-supervised learners and diffusion models.
155, VPP: Efficient Universal 3D Generation Via Voxel-Point Progressive Representation
Zekun Qi; Muzhou Yu; Runpei Dong; Kaisheng Ma;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Motivated by the characteristics of different representations, we propose VPP, a voxel-point progressive representation for both efficient and universal 3D generation.
156, Large Language Models Implicitly Learn to Straighten Neural Sentence Trajectories to Construct A Predictive Representation of Natural Language
Eghbal Hosseini; Evelina Fedorenko;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We quantify straightness using a1-dimensional curvature metric, and provide support for the trajectory straighteninghypothesis across four results: i) In trained models, the curvature progressivelydecreases from the first to the middle layers of the network.
157, (Un)interpretability of Transformers: A Case Study with Dyck Grammars
Kaiyue Wen; Yuchen Li; Bingbin Liu; Andrej Risteski;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, through a combination of theoretical results and carefully controlled experiments on synthetic data, we take a critical viewof methods that exclusively focus on individual parts of the model, rather than consider the network as a whole.
158, Learning in The Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective
Jimmy Ba; Murat Erdogdu; Taiji Suzuki; Zhichao Wang; Denny Wu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Abstract: We consider the learning of a single-index target function $f_*: \mathbb{R}^d\to\mathbb{R}$ under spiked covariance data: $f_*(\boldsymbol{x}) = ...
159, Meet in The Middle: A New Pre-training Paradigm
Anh Nguyen; Nikos Karampatziakis; Weizhu Chen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce ``Meet in the Middle'' (MIM) a new pre-training paradigm that improves data efficiency by training in two directions, left-to-right and right-to-left, and encouraging the respective modelsto agree on their token distribution for each position.
160, Swap Agnostic Learning, or Characterizing Omniprediction Via Multicalibration
Parikshit Gopalan; Michael Kim; Omer Reingold;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce and study the notion of Swap Agnostic Learning.
161, DataComp: In Search of The Next Generation of Multimodal Datasets
Samir Yitzhak Gadre; Gabriel Ilharco; Alex Fang; Jonathan Hayase; Georgios Smyrnis; Thao Nguyen; Ryan Marten; Mitchell Wortsman; Dhruba Ghosh; Jieyu Zhang; Eyal Orgad; Rahim Entezari; Giannis Daras; Sarah Pratt; Vivek Ramanujan; Yonatan Bitton; Kalyani Marathe; Stephen Mussmann; Richard Vencu; Mehdi Cherti; Ranjay Krishna; Pang Wei Koh; Olga Saukh; Alexander Ratner; Shuran Song; Hannaneh Hajishirzi; Ali Farhadi; Romain Beaumont; Sewoong Oh; Alex Dimakis; Jenia Jitsev; Yair Carmon; Vaishaal Shankar; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl.
162, Benchmarking Distribution Shift in Tabular Data with TableShift
Josh Gardner; Zoran Popovic; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: As a consequence, the robustness of tabular models to distribution shift is poorly understood. To address this issue, we introduce TableShift, a distribution shift benchmark for tabular data.
163, GenEval: An Object-focused Framework for Evaluating Text-to-image Alignment
Dhruba Ghosh; Hannaneh Hajishirzi; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we introduce GenEval, an object-focused framework to evaluate compositional image properties such as object co-occurrence, position, count, and color.
164, AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Tong Wu; Zhihao Fan; Xiao Liu; Yeyun Gong; yelong shen; Jian Jiao; Hai-Tao Zheng; Juntao Li; zhongyu wei; Jian Guo; Nan Duan; Weizhu Chen;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion).
165, On The Connection Between Pre-training Data Diversity and Fine-tuning Robustness
Vivek Ramanujan; Thao Nguyen; Sewoong Oh; Ali Farhadi; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models.
166, Improving Multimodal Datasets with Image Captioning
Thao Nguyen; Samir Yitzhak Gadre; Gabriel Ilharco; Sewoong Oh; Ludwig Schmidt;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Our work focuses on caption quality as one major source of noise, and studies the effectiveness of generated captions in increasing the utility of web-scraped datapoints with nondescript text.
167, Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective
Chenyu You; Weicheng Dai; Yifei Min; Fenglin Liu; David Clifton; S. Kevin Zhou; Lawrence Staib; James Duncan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose $\texttt{ARCO}$, a semi-supervised contrastive learning (CL) framework with stratified group theory for medical image segmentation.
168, Focused Transformer: Contrastive Training for Context Scaling
Szymon Tworkowski; Konrad Staniszewski; Mikołaj Pacek; Yuhuai Wu; Henryk Michalewski; Piotr Miłoś;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We pinpoint a key challenge, referred to as the distraction issue, where keys associated with distinct semantic values may overlap, making them challenging to differentiate. To address this issue, we propose the Focused Transformer (FoT), a method that utilizes a training process inspired by contrastive learning.
169, EmbodiedGPT: Vision-Language Pre-Training Via Embodied Chain of Thought
Yao Mu; Qinglong Zhang; Mengkang Hu; Wenhai Wang; Mingyu Ding; Jun Jin; Bin Wang; Jifeng Dai; Yu Qiao; Ping Luo;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.
170, Any-to-Any Generation Via Composable Diffusion
Zineng Tang; Ziyi Yang; Chenguang Zhu; Michael Zeng; Mohit Bansal;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.
171, Generator Born from Classifier
Runpeng Yu; Xinchao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we make a bold attempt toward an ambitious task: given a pre-trained classifier, we aim to reconstruct an image generator, without relying on any data samples.
172, How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception
Dingkang Yang; Kun Yang; Yuzheng Wang; Jing Liu; Zhi Xu; Peng Zhai; Lihua Zhang; Rongbin Yin;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Despite the advancements in previous efforts, challenges remain due to various noises in the perception procedure, including communication redundancy, transmission delay, and collaboration heterogeneity. To tackle these issues, we propose How2comm, a collaborative perception framework that seeks a trade-off between perception performance and communication bandwidth.
173, Improving CLIP Training with Language Rewrites
Lijie Fan; Dilip Krishnan; Phillip Isola; Dina Katabi; Yonglong Tian;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we introduce Language augmented CLIP (LaCLIP), a simple yet highly effective approach to enhance CLIP training through language rewrites.
174, From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
Weiwen Xu; Xin Li; Wenxuan Zhang; Meng Zhou; Wai Lam; Luo Si; Lidong Bing;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.
175, PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation
Zhaoxi Chen; Fangzhou Hong; Haiyi Mei; Guangcong Wang; Lei Yang; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present PrimDiffusion, the first diffusion-based framework for 3D human generation.
176, FELM: Benchmarking Factuality Evaluation of Large Language Models
shiqi chen; Yiran Zhao; Jinghan Zhang; I-Chun Chern; Siyang Gao; Pengfei Liu; Junxian He;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as FELM.
177, Training-Free Composition of Parameter-Efficient Modules with Arithmetic Operation
Jinghan Zhang; shiqi chen; Junteng Liu; Junxian He;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities.
178, DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation
Yiqun Duan; Charles Chau; Zhen Wang; Yu-Kai Wang; Chin-teng Lin;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: These event markers may not be readily available or could be challenging to acquire during real-time inference, and the sequence of eye fixations may not align with the order of spoken words. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks.
179, Textually Pretrained Speech Language Models
Michael Hassid; Tal Remez; Tu Anh Nguyen; Itai Gat; Alexis CONNEAU; Felix Kreuk; Jade Copet; Alexandre Defossez; Gabriel Synnaeve; Emmanuel Dupoux; Roy Schwartz; Yossi Adi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.
180, EvoPrompting: Language Models for Code-Level Neural Architecture Search
Angelica Chen; David Dohan; David So;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Given the recent impressive accomplishments of language models (LMs) for code generation, we explore the use of LMs as general adaptive mutation and crossover operators for an evolutionary neural architecture search (NAS) algorithm.
181, Mitigating Test-Time Bias for Fair Image Retrieval
Fanjie Kong; Shuai Yuan; Weituo Hao; Ricardo Henao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: So motivated, we introduce a straightforward technique, Post-hoc Bias Mitigation (PBM), that post-processes the outputs from the pre-trained vision-language model.
182, Red Teaming Deep Neural Networks with Feature Synthesis Tools
Stephen Casper; Tong Bu; Yuxiao Li; Jiawei Li; Kevin Zhang; Kaivalya Hariharan; Dylan Hadfield-Menell;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Our key insight is that we can train models that respond to specific triggers (e.g., a specific patch inserted into an image) with specific outputs (i.e. a label) and then evaluate interpretability tools based on whether they help humans identify these triggers.
183, UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
Wenliang Zhao; Lujia Bai; Yongming Rao; Jie Zhou; Jiwen Lu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct.
184, (S)GD Over Diagonal Linear Networks: Implicit Bias, Large Stepsizes and Edge of Stability
Mathieu Even; Scott Pesme; Suriya Gunasekar; Nicolas Flammarion;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2$-layer diagonal linear networks.
185, RAPHAEL: Text-to-Image Generation Via Large Mixture of Diffusion Paths
Zeyue Xue; Guanglu Song; Qiushan Guo; Boxiao Liu; Zhuofan Zong; Yu Liu; Ping Luo;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs.
186, Controlling Text-to-Image Diffusion By Orthogonal Finetuning
Zeju Qiu; Weiyang Liu; Haiwen Feng; Yuxuan Xue; Yao Feng; Zhen Liu; Dan Zhang; Adrian Weller; Bernhard Schölkopf;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks.
187, ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections
Chun-Han Yao; Amit Raj; Wei-Chih Hung; Michael Rubinstein; Yuanzhen Li; Ming-Hsuan Yang; Varun Jampani;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
188, Continuous-Time Functional Diffusion Processes
Giulio Franzese; Giulio Corallo; Simone Rossi; Markus Heinonen; Maurizio Filippone; Pietro Michiardi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces.
189, Window-Based Distribution Shift Detection for Deep Neural Networks
Guy Bar Shalom; Yonatan Geifman; Ran El-Yaniv;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Specifically, we study the case of monitoring the healthy operation of a deep neural network (DNN) receiving a stream of data, with the aim of detecting input distributional deviations over which the quality of the network's predictions is potentially damaged.
190, SEGA: Instructing Text-to-Image Models Using Semantic Guidance
Manuel Brack; Felix Friedrich; Dominik Hintersdorf; Lukas Struppek; Patrick Schramowski; Kristian Kersting;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions.
191, Recommender Systems with Generative Retrieval
Shashank Rajput; Nikhil Mehta; Anima Singh; Raghunandan Hulikal Keshavan; Trung Vu; Lukasz Heldt; Lichan Hong; Yi Tay; Vinh Tran; Jonah Samost; Maciej Kula; Ed Chi; Mahesh Sathiamoorthy;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates directly.
192, Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training By Diminishing Bias
Zhongwei Wan; Che Liu; Mi Zhang; Jie Fu; Benyou Wang; Sibo Cheng; Lei Ma; César Quilodrán-Casas; Rossella Arcucci;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This paper presents a novel framework named Unifying Cross-Lingual Medical Vision-Language Pre-Training (\textbf{Med-UniC}), designed to integrate multi-modal medical data from the two most prevalent languages, English and Spanish.
193, VidChapters-7M: Video Chapters at Scale
Antoine Yang; Arsha Nagrani; Ivan Laptev; Josef Sivic; Cordelia Schmid;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This important topic has been understudied due to the lack of publicly released datasets. To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total.
194, Provable Convergence Guarantees for Black-box Variational Inference
Justin Domke; Robert Gower; Guillaume Garrigos;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: While black-box variational inference is widely used, there is no proof that its stochastic optimization succeeds. We suggest this is due to a theoretical gap in existing stochastic optimization proofs—namely the challenge of gradient estimators with unusual noise bounds,and a composite non-smooth objective.
195, PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance
Qianqian Xie; Weiguang Han; Xiao Zhang; Yanzhao Lai; Min Peng; Alejandro Lopez-Lira; Jimin Huang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper introduces PIXIU, a comprehensive framework including the first financial LLM based on fine-tuning LLaMA with instruction data, the first instruction data with 128K data samples to support the fine-tuning, and an evaluation benchmark with 8 tasks and 15 datasets.
196, Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents
Zihao Wang; Shaofei Cai; Guanzhou Chen; Anji Liu; Xiaojian (Shawn) Ma; Yitao Liang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we study the problem of planning in Minecraft, a popular, democratized yet challenging open-ended environment for developing multi-task embodied agents.
197, Learning Generalizable Agents Via Saliency-guided Features Decorrelation
Sili Huang; Yanchao Sun; Jifeng Hu; Siyuan Guo; Bo Yang; Hechang Chen; Yi Chang; Lichao Sun;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting.
198, Decision Stacks: Flexible Reinforcement Learning Via Modular Generative Models
Siyan Zhao; Aditya Grover;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules.
199, Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach
Zhimeng Jiang; Xiaotian Han; Hongye Jin; Guanchu Wang; Rui Chen; Na Zou; Xia Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Subsequently, we analyze the sufficient conditions to guarantee fairness (i.e., low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target dataset for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group.
200, Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer; Elinor Poole-Dayan; Vikram Voleti; Chris Pal; Siva Reddy;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis.
201, Fast Optimal Locally Private Mean Estimation Via Random Projections
Hilal Asi; Vitaly Feldman; Jelani Nelson; Huy Nguyen; Kunal Talwar;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose a new algorithmic framework, namely ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a $1+o(1)$-factor.
202, A Metadata-Driven Approach to Understand Graph Neural Networks
Ting Wei Li; Qiaozhu Mei; Jiaqi Ma;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose a \emph{metadata-driven} approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks.
203, An Information Theory Perspective on Variance-Invariance-Covariance Regularization
Ravid Shwartz-Ziv; Randall Balestriero; Kenji Kawaguchi; Tim G. J. Rudner; Yann LeCun;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present an information-theoretic perspective on the VICReg objective.
204, A Generative Model of The Hippocampal Formation Trained with Theta Driven Local Learning Rules
Tom M George; Kimberly Stachenfeld; Caswell Barry; Claudia Clopath; Tomoki Fukai;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Here we introduce a biologically plausible model of the hippocampal formation tantamount to a Helmholtz machine that we apply to a temporal stream of inputs.
205, Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion
Shengqiong Wu; Hao Fei; Hanwang Zhang; Tat-Seng Chua;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-intricate setting, i.e., generating intricate visual content from simple abstract text prompts.
206, DiffComplete: Diffusion-based Generative 3D Shape Completion
Ruihang Chu; Enze Xie; Shentong Mo; Zhenguo Li; Matthias Niessner; Chi-Wing Fu; Jiaya Jia;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce a new diffusion-based approach for shape completion on 3D range scans.
207, Real-World Image Variation By Aligning Diffusion Inversion Chain
Yuechen Zhang; Jinbo Xing; Eric Lo; Jiaya Jia;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Our investigation uncovers that this domain gap originates from a latents' distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar.
208, Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
Alvin Heng; Harold Soh;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models.
209, Object-Centric Slot Diffusion
Jindong Jiang; Fei Deng; Gautam Singh; Sungjin Ahn;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach.
210, Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints
Jiaxin Bai; Xin Liu; Weiqi Wang; Chen Luo; Yangqiu Song;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Thus, in this paper, we propose a new framework to leverage neural methods to answer complex logical queries based on an EVKG, which can satisfy not only traditional first-order logic constraints but also implicit logical constraints over eventualities concerning their occurrences and orders.
211, 4D Panoptic Scene Graph Generation
Jingkang Yang; Jun CEN; WENXUAN PENG; Shuai Liu; Fangzhou Hong; Xiangtai Li; Kaiyang Zhou; Qifeng Chen; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component.
212, How Far Can Camels Go? Exploring The State of Instruction Tuning on Open Resources
Yizhong Wang; Hamish Ivison; Pradeep Dasigi; Jack Hessel; Tushar Khot; Khyathi Chandu; David Wadden; Kelsey MacMillan; Noah Smith; Iz Beltagy; Hannaneh Hajishirzi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets.
213, Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
Siyu Jiao; Yunchao Wei; Yaowei Wang; Yao Zhao; Humphrey Shi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This issue mainly relates to the fact that CLIP is trained with image-level supervision. To alleviate this issue, we propose a simple yet effective method, named Mask-aware Fine-tuning (MAFT).
214, Quantizable Transformers: Removing Outliers By Helping Attention Heads Do Nothing
Yelysei Bondarenko; Markus Nagel; Tijmen Blankevoort;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To achieve the exact zeros needed in the attention matrix for a no-update, the input to the softmax is pushed to be larger and larger during training, causing outliers in other parts of the network. Based on these observations, we propose two simple (independent) modifications to the attention mechanism - _clipped softmax_ and _gated attention_.
215, Sharpness-Aware Minimization Leads to Low-Rank Features
Maksym Andriushchenko; Dara Bahri; Hossein Mobahi; Nicolas Flammarion;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training.
216, Unified Segment-to-Segment Framework for Simultaneous Sequence Generation
Shaolei Zhang; Yang Feng;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a unified segment-tosegment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner.
217, A Hierarchical Spatial Transformer for Large Numbers of Point Samples in Continuous Space
Wenchong He; Zhe Jiang; Tingsong Xiao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: There are also works related to operator learning on numerical simulations in the continous space, but these methods often do not address the hierarchical spatial representation on irregular points. To fill this gap, this paper proposes a new hierarchical spatial transformer model for a large number of irregular point samples in continuous space.
218, What Indeed Can GPT Models Do in Chemistry? A Comprehensive Benchmark on Eight Tasks
Taicheng Guo; kehan Guo; Bozhao Nan; Zhenwen Liang; Zhichun Guo; Nitesh Chawla; Olaf Wiest; Xiangliang Zhang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, rather than pursuing state-of-the-art performance, we aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain.
219, WalkLM: A Uniform Language Model Fine-tuning Framework for Attributed Graph Embedding
Yanchao Tan; Zihao Zhou; Hang Lv; Weiming Liu; Carl Yang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we take a fundamentally different approach than GNNs, to simultaneously achieve deep joint modeling of complex attributes and flexible structures of real-world graphs and obtain unsupervised generic graph representations that are not limited to specific downstream predictions.
220, Emergent Correspondence from Image Diffusion
Luming Tang; Menglin Jia; Qianqian Wang; Cheng Perng Phoo; Bharath Hariharan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we find that correspondence emerges in diffusion models without any explicit supervision.
221, MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Marco Bellagente; Hannah Teufel; Manuel Brack; Björn Deiseroth; Felix Friedrich; Constantin Eichenberg; Andrew Dai; Robert Baldock; Souradeep Nanda; Koen Oostermeijer; Andres Felipe Cruz-Salinas; Patrick Schramowski; Kristian Kersting; Samuel Weinbach;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages.
222, Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren; Aston Zhang; Yi Zhu; Shuai Zhang; Shuai Zheng; Mu Li; Alexander Smola; Xu Sun;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This work proposes POMP, a prompt pre-training method for vision-language models.
223, Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation
Shengpu Tang; Jenna Wiens;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we propose a semi-offline evaluation framework as an intermediate step between offline and online evaluation, where human users provide annotations of unobserved counterfactual trajectories.
224, PyNeRF: Pyramidal Neural Radiance Fields
Haithem Turki; Michael Zollhöfer; Christian Richardt; Deva Ramanan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose a simple modification to grid-based models by training model heads at different spatial grid resolutions.
225, Mixture Weight Estimation and Model Prediction in Multi-source Multi-target Domain Adaptation
Yuyang Deng; Ilja Kuzborskij; Mehrdad Mahdavi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We consider a problem of learning a model from multiple sources with the goal to performwell on a new target distribution.
226, H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang; Ying Sheng; Tianyi Zhou; Tianlong Chen; Lianmin Zheng; Ruisi Cai; Zhao Song; Yuandong Tian; Christopher Ré; Clark Barrett; Zhangyang Atlas Wang; Beidi Chen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Often, a large amount of transient state information, referred to as the $\mathsf{KV}$ $\mathsf{cache}$, is stored in GPU memory in addition to model parameters, scaling linearly with the sequence length and batch size. In this paper, we introduce a novel approach for implementing the $\mathsf{KV}$ $\mathsf{cache}$ which significantly reduces its memory footprint.
227, Distributed Personalized Empirical Risk Minimization
Yuyang Deng; Mohammad Mahdi Kamani; Pouria Mahdavinia; Mehrdad Mahdavi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To learn personalized models at scale, we propose a distributed algorithm that replaces the standard model averaging with model shuffling to simultaneously optimize PERM objectives for all devices.
228, RoboDepth: Robust Out-of-Distribution Depth Estimation Under Corruptions
Lingdong Kong; Shaoyuan Xie; Hanjiang Hu; Lai Xing Ng; Benoit Cottereau; Wei Tsang Ooi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Common corruptions, however, tend to occur in practical scenarios, especially for safety-critical applications like autonomous driving. To fill in this gap, we present a comprehensive robustness test suite dubbed RoboDepth consisting of 18 corruptions from three categories: i) weather and lighting conditions; ii) sensor failure and movement; and iii) data processing issues.
229, Real-World 3D Object Inverse Rendering Benchmark
Zhengfei Kuang; Yunzhi Zhang; Hong-Xing Yu; Samir Agarwala; Shangzhe Wu; Jiajun Wu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce a new real-world 3D Object inverse Rendering Benchmark, dubbed 3D-ORB.
230, What You See Is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom; Yonatan Bitton; Soravit Changpinyo; Roee Aharoni; Jonathan Herzig; Oran Lang; Eran Ofek; Idan Szpektor;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we study methods for automatic text-image alignment evaluation.
231, High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar; Prem Seetharaman; Alejandro Luebs; Ishaan Kumar; Kundan Kumar;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth.
232, AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset
Jiakang Yuan; Bo Zhang; Xiangchao Yan; Botian Shi; Tao Chen; Yikang LI; Yu Qiao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset.
233, Benchmarking Large Language Models on CMExam - A Comprehensive Chinese Medical Exam Dataset
Junling Liu; Peilin Zhou; Yining Hua; Dading Chong; Zhongyu Tian; Andrew Liu; Helin Wang; Chenyu You; Zhenhua Guo; LEI ZHU; Michael Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination.
234, LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
Bo Liu; Yifeng Zhu; Chongkai Gao; Yihao Feng; Qiang Liu; Yuke Zhu; Peter Stone;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation.
235, ELDEN: Exploration Via Local Dependencies
Zizhao Wang; Jiaheng Hu; Roberto Martín-Martín; Peter Stone;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the state of one factor that, in order, may affect the state of another factor.
236, Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations
Lifan Yuan; Yangyi Chen; Ganqu Cui; Hongcheng Gao; FangYuan Zou; Xingyi Cheng; Heng Ji; Zhiyuan Liu; Maosong Sun;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We identify that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
237, Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu; Guanchu Wang; Shaochen (Henry) Zhong; Zhaozhuo Xu; Daochen Zha; Ruixiang Tang; Zhimeng Jiang; Kaixiong Zhou; Vipin Chaudhary; Shuai Xu; Xia Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called \sas, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient.
238, TART: A Plug-and-play Transformer Module for Task-agnostic Reasoning
Kush Bhatia; Avanika Narayan; Christopher De Sa; Christopher Ré;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and, as a proof of concept, propose TART which generically improves an LLM's reasoning abilities using a synthetically trained reasoning module.
239, Skill-it! A Data-driven Skills Framework for Understanding and Training Language Models
Mayee Chen; Nicholas Roberts; Kush Bhatia; Jue WANG; Ce Zhang; Frederic Sala; Christopher Ré;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for learning skills more quickly for both continual pre-training and fine-tuning regimes, where we aim to learn multiple skills in the former and an individual skill in the latter.
240, A Case for Reframing Automated Medical Image Classification As Segmentation
Sarah Hooper; Mayee Chen; Khaled Saab; Kush Bhatia; Curtis Langlotz; Christopher Ré;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, recent work has drastically reduced the cost of training segmentation networks. In light of this recent work, we reexamine the choice of training classification vs. segmentation models.
241, Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
Dan Fu; Jessica Grogan; Isys Johnson; Simran Arora; Evan Sabri Eyuboglu; Armin Thomas; Benjamin Spector; Michael Poli; Atri Rudra; Christopher Ré;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Here, we explore Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension.
242, Have It Your Way: Individualized Privacy Assignment for DP-SGD
Franziska Boenisch; Christopher Mühl; Adam Dziedzic; Roy Rinberg; Nicolas Papernot;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets.
243, On Efficient Training Algorithms For Transformer Language Models
Jean Kaddour; Oscar Key; Piotr Nawrot; Pasquale Minervini; Matt Kusner;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we revisit three algorithms: layer stacking, layer dropping, and selective backpropagation.
244, Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks
Aoxiang Zhang; Yu Ran; Weixuan Tang; Yuan-Gen Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we make the first attempt to evaluate the robustness of NR-VQA models against adversarial attacks, and propose a patch-based random search method for black-box attack.
245, Deep Language Networks: Joint Prompt Training of Stacked LLMs Using Variational Inference
Alessandro Sordoni; Eric Yuan; Marc-Alexandre Côté; Matheus Pereira; Adam Trischler; Ziang Xiao; Arian Hosseini; Friederike Niedtner; Nicolas Le Roux;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: By viewing large language models (LLMs) as stochastic layers in a deep network, where the tunable parameters are the prompts at each layer, we chain multiple LLMs, feeding the output of the one at layer $l$ to the one at layer $l+1$, jointly training them using variational inference.
246, Benchmarking Robustness to Adversarial Image Obfuscations
Florian Stimberg; Ayan Chakrabarti; Chun-Ta Lu; Hussein Hazimeh; Otilia Stretcu; Wei Qiao; Yintao Liu; Merve Kaya; Cyrus Rashtchian; Ariel Fuxman; Mehmet Tek; Sven Gowal;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To reach this goal, these malicious actors may obfuscate policy violating images (e.g., overlay harmful images by carefully selected benign images or visual patterns) to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark.
247, M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang; Mahani Aljunied; Chang Gao; Yew Ken Chia; Lidong Bing;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context.
248, Fine-grained Expressivity of Graph Neural Networks
Jan Böker; Ron Levie; Ningyuan Huang; Soledad Villar; Christopher Morris;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Consequently, we provide a theoretical framework for graph and graphon similarity combining various topological variants of classical characterizations of the $1$-WL.
249, Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang; Jihoon Tack; Daewon Choi; Jongheon Jeong; Jinwoo Shin;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we develop MAE as a unified, modality-agnostic SSL framework.
250, Auditing Fairness By Betting
Ben Chugg; Santiago Cortes-Gomez; Bryan Wilder; Aaditya Ramdas;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models.
251, Molecule Joint Auto-Encoding: Self-Supervised Learning of 2D and 3D Trajectories
weitao Du; Jiujiu Chen; Xuecang Zhang; Zhi-Ming Ma; Shengchao Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose a pretraining method for molecule joint auto-encoding (MoleculeJAE).
252, Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo; Yiyi Zhou; Tianhe Ren; Shengxin Chen; Xiaoshuai Sun; Rongrong Ji;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA).
253, LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Weixi Feng; Wanrong Zhu; Tsu-Jui Fu; Varun Jampani; Arjun Akula; Xuehai He; S Basu; Xin Eric Wang; William Yang Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance visual planning skills of LLMs.
254, Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
Grace Luo; Lisa Dunlap; Dong Huk Park; Aleksander Holynski; Trevor Darrell;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks.
255, Reverse Engineering Self-Supervised Learning
Ido Ben-Shaul; Ravid Shwartz-Ziv; Tomer Galanti; Shai Dekel; Yann LeCun;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Understanding the learned representation and underlying mechanisms of Self-Supervised Learning (SSL) often poses a challenge. In this paper, we ‘reverse engineer’ SSL, conducting an in-depth empirical analysis of its learned internal representations, encompassing diverse models, architectures, and hyperparameters.
256, Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams
Shiyan Chen; Jiyuan Zhang; Yajing Zheng; Zhaofei Yu; Tiejun Huang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a novel approach that integrates the two modalities from two branches, leveraging spike streams as auxiliary visual cues for guiding deblurring in high-speed motion scenes.
257, Adaptive Online Replanning with Diffusion Models
Siyuan Zhou; Yilun Du; Shun Zhang; Mengdi Xu; Yikang Shen; Wei Xiao; Dit-Yan Yeung; Chuang Gan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we explore how we may effectively replan with diffusion models.
258, DiffVL: Scaling Up Soft Body Manipulation Using Vision-Language Driven Differentiable Physics
Zhiao Huang; Feng Chen; Yewen Pu; Chunru Lin; Hao Su; Chuang Gan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks -- a combination of vision and natural language, given in multiple stages -- that can be readily leveraged by a differential physics solver.
259, In-Context Learning Unlocked for Diffusion Models
Zhendong Wang; Yifan Jiang; Yadong Lu; yelong shen; Pengcheng He; Weizhu Chen; Zhangyang Atlas Wang; Mingyuan Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models.
260, Beta Diffusion
Mingyuan Zhou; Tianqi Chen; Huangjie Zheng; Zhendong Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce beta diffusion with multiplicative transitions over time as a novel method for generative modeling of range-bounded data supported over disjoint regions.
261, Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Zhendong Wang; Yifan Jiang; Huangjie Zheng; Peihao Wang; Pengcheng He; Zhangyang Atlas Wang; Weizhu Chen; Mingyuan Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users.
262, When Do Neural Nets Outperform Boosted Trees on Tabular Data?
Duncan McElfresh; Sujay Khandagale; Jonathan Valverde; Vishak Prasad C; Ganesh Ramakrishnan; Micah Goldblum; Colin White;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate.
263, Ambient Diffusion: Learning Clean Distributions from Corrupted Data
Giannis Daras; Kulin Shah; Yuval Dagan; Aravind Gollakota; Alex Dimakis; Adam Klivans;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples.
264, Neural Oscillators Are Universal
Samuel Lanthaler; T. Konstantin Rusch; Siddhartha Mishra;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Coupled oscillators are being increasingly used as the basis of machine learning (ML) architectures, for instance in sequence modeling, graph representation learning and in physical neural networks that are used in analog ML devices. We introduce an abstract class of *neural oscillators* that encompasses these architectures and prove that neural oscillators are universal, i.e, they can approximate any continuous and casual operator mapping between time-varying functions, to desired accuracy.
265, Martingale Diffusion Models: Mitigating Sampling Drift By Learning to Be Consistent
Giannis Daras; Yuval Dagan; Alex Dimakis; Constantinos Daskalakis;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Yet, the standard training objective via Denoising Score Matching (DSM) is only designed to optimize over non-drifted data. To train on drifted data, we propose to enforce a \emph{Martingale} property (MP) which states that predictions of the model on its own generated data follow a Martingale, thus being consistent with the outputs that it generates.
266, ForecastPFN: Synthetically-Trained Zero-Shot Forecasting
Samuel Dooley; Gurnoor Singh Khurana; Chirag Mohapatra; Siddartha V Naidu; Colin White;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we take a different approach and devise ForecastPFN, a zero-shot forecasting model that is trained purely on a novel synthetic data distribution.
267, Res-Tuning: A Flexible and Efficient Tuning Paradigm Via Unbinding Tuner from Backbone
Zeyinzi Jiang; Chaojie Mao; Ziyuan Huang; Ao Ma; Yiliang Lv; Yujun Shen; Deli Zhao; Jingren Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally \textit{unbinds} tuners from the backbone.
268, Recurrent Hypernetworks Are Surprisingly SOTA in Meta-RL
Jacob Beck; Risto Vuorio; Zheng Xiong; Shimon Whiteson;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we conduct an extensive empirical investigation and suggest a method that works without the need for additional tuning.
269, Data Quality in Imitation Learning
Suneel Belkhale; Yuchen Cui; Dorsa Sadigh;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time.
270, Inverse Preference Learning: Preference-based RL Without A Reward Function
Joey Hejna; Dorsa Sadigh;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Instead of using highly complex architectures, we develop a new and parameter-efficient algorithm, Inverse Preference Learning (IPL), specifically designed for learning from offline preference data. Our key insight is that for a fixed policy, the $Q$-function encodes all information about the reward function, effectively making them interchangeable.
271, Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching
Junsheng Zhou; Baorui Ma; Wenyuan Zhang; Yi Fang; Yu-Shen Liu; Zhizhong Han;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, these methods struggle to map points and pixels to a shared latent space robustly since points and pixels have very different characteristics with patterns learned in different manners (MLP and CNN), and they also fail to construct supervision directly on the transformation since the PnP is non-differentiable, which leads to unstable registration results. To address these problems, we propose to learn a structured cross-modality latent space to represent pixel features and 3D features via a differentiable probabilistic PnP solver.
272, Understanding The Latent Space of Diffusion Models Through The Lens of Riemannian Geometry
Yong-Hyun Park; Mingi Kwon; Jaewoong Choi; Junghyo Jo; Youngjung Uh;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Through analysis, we show that 1) the model focuses on low-frequency components early in the generative process and attunes to high-frequency details later.;2) At early timesteps, different samples share similar tangent spaces.; and 3) Simpler datasets that DMs trained on, the more consistent the tangent space for each timestep.
273, Tree-Rings Watermarks: Invisible Fingerprints for Diffusion Images
Yuxin Wen; John Kirchenbauer; Jonas Geiping; Tom Goldstein;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs.
274, Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery
Yuxin Wen; Neel Jain; John Kirchenbauer; Micah Goldblum; Jonas Geiping; Tom Goldstein;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We describe an easy-to-use approach to automatically optimize hard text prompts through efficient gradient-based optimization.
275, Penalising The Biases in Norm Regularisation Enforces Sparsity
Etienne Boursier; Nicolas Flammarion;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Beyond simple intuitions, the relation between regularising parameters' norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a $\sqrt{1+x^2}$ factor.
276, Towards Revealing The Mystery Behind Chain of Thought: A Theoretical Perspective
Guhao Feng; Yuntian Gu; Haotian Ye; Bohang Zhang; Di He; Liwei Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions.
277, Efficient Diffusion Policies For Offline Reinforcement Learning
Bingyi Kang; Xiao Ma; Chao Du; Tianyu Pang; Shuicheng Yan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges.
278, Paxion: Patching Action Knowledge in Video-Language Foundation Models
Zhenhailong Wang; Ansel Blume; Sha Li; Genglin Liu; Jaemin Cho; Zineng Tang; Mohit Bansal; Heng Ji;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Despite recent video-language models’ (VidLM) impressive performance on various benchmark tasks, our diagnostic tasks reveal their surprising deficiency (near-random performance) in action knowledge, suggesting that current models rely on object recognition abilities as a shortcut for action understanding. To remedy this, we propose a novel framework, **Paxion**, along with a new **Discriminative Video Dynamics Modeling (DVDM)** objective.
279, Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Sivan Doveh; Assaf Arbelle; Sivan Harary; Roei Herzig; Donghyun Kim; Paola Cascante-Bonilla; Amit Alfassy; Rameswar Panda; Raja Giryes; Rogerio Feris; Shimon Ullman; Leonid Karlinsky;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we uncover two factors limiting the VL models' compositional reasoning performance.
280, Bounce: A Reliable Bayesian Optimization Algorithm for Combinatorial and Mixed Spaces
Leonard Papenmeier; Luigi Nardi; Matthias Poloczek;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To fill the need for a reliable algorithm for combinatorial and mixed spaces, this paper proposes Bounce that relies on a novel map of various variable types into nested embeddings of increasing dimensionality.
281, On The Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis
Arghya Datta; Sayak Chakrabarty;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: The main obstruction is posed by the inherent identifiability nature of the PPCA model resulting from the rotational symmetry of the parameterization. To resolve this ambiguity, we propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space.
282, Efficient Neural Music Generation
Max W. Y. Lam; Qiao Tian; Tang Li; Zongyu Yin; Siyuan Feng; Ming Tu; Yuliang Ji; Rui Xia; Mingbo Ma; Xuchen Song; Jitong Chen; Wang Yuping; Yuxuan Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present **M**e**L**o**D**y (**M** for music; **L** for LM; **D** for diffusion), an LM-guided diffusion model that generates music audios of state-of-the-art quality meanwhile reducing 95.7\% or 99.6\% forward passes in MusicLM, respectively, for sampling 10s or 30s music.
283, Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns
Xin Liu; Zheng Li; Yifan Gao; Jingfeng Yang; Tianyu Cao; Zhengyang Wang; Bing Yin; Yangqiu Song;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that characterizes user intents by building attribute transition graphs and matching attribute patterns.
284, Distribution-Free Statistical Dispersion Control for Societal Applications
Zhun Deng; Thomas Zollo; Jake Snell; Toniann Pitassi; Richard Zemel;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We initiate the study of distribution-free control of statistical dispersion measures with societal implications and propose a simple yet flexible framework that allows us to handle a much richer class of statistical functionals beyond previous work.
285, Synthetic Experience Replay
Cong Lu; Philip Ball; Yee Whye Teh; Jack Parker-Holder;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience.
286, Benchmark of Machine Learning Force Fields for Semiconductor Simulations: Datasets, Metrics, and Comparative Analysis
Geonu Kim; Byunggook Na; Gunhee Kim; Hyuntae Cho; Seungjin Kang; Hee Sun Lee; Saerom Choi; Heejae Kim; Seungwon Lee; Yongdeok Kim;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present comprehensive benchmark suite which consists of two semiconductor material datasets and 10 MLFF models with 6 evaluation metrics.
287, Provable Benefits of Annealing for Estimating Normalizing Constants
Omar Chehab; Aapo Hyvarinen; Andrej Risteski;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: First, we show that using NCE is more efficient than the importance sampling estimator, but in the limit of infinitesimal path steps, the difference vanishes. Second, we find that using the geometric path brings down the estimation error from an exponential to a polynomial function of the parameter distance between the target and proposal distributions.
288, MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion
Shitao Tang; Fuyang Zhang; Jiacheng Chen; Peng Wang; Yasutaka Furukawa;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given depth/pose.
289, Provable Benefits of Score Matching
Chirag Pabbaraju; Dhruv Rohatgi; Anish Prasad Sevekari; Holden Lee; Ankur Moitra; Andrej Risteski;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood---both computational and statistical---are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method.
290, DreamHuman: Animatable 3D Avatars from Text
Nikos Kolotouros; Thiemo Alldieck; Andrei Zanfir; Eduard Bazavan; Mihai Fieraru; Cristian Sminchisescu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present \emph{DreamHuman}, a method to generate realistic animatable 3D human avatar models entirely from textual descriptions.
291, Deep Equilibrium Based Neural Operators for Steady-State PDEs
Tanya Marwah; Ashwini Pokle; J. Zico Kolter; Zachary Lipton; Jianfeng Lu; Andrej Risteski;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory.
292, Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
Haibao Yu; Yingjuan Tang; Enze Xie; Jilei Mao; Ping Luo; Zaiqing Nie;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, the uncertain temporal asynchrony and limited communication conditions that are present in traffic environments can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework.
293, ProteinBench: Benchmarking Protein Design on Diverse Tasks, Models, and Metrics
Zhangyang Gao; Cheng Tan; Yijie Zhang; Xingran Chen; Lirong Wu; Stan Z. Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose ProteinBench, a new benchmark for protein design, which comprises extended protein design tasks, integrated models, and diverse evaluation metrics.
294, Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Jae Sung Park; Jack Hessel; Khyathi Chandu; Paul Pu Liang; Ximing Lu; Qiuyuan Huang; Peter West; Jianfeng Gao; Ali Farhadi; Yejin Choi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We build LocalizedVisual Commonsense model which allows users to specify (multiple) regions-as-input.
295, H3T: Efficient Integration of Memory Optimization and Parallelism for Large-scale Transformer Training
Yuzhong Wang; Xu Han; Weilin Zhao; Guoyang Zeng; Zhiyuan Liu; Maosong Sun;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a framework to automatically find an efficient integration of memory optimization and parallelism for High-Throughput Transformer Training (named H3T), which is rarely considered by existing efforts for training big Transformer-based models.
296, One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning
Shaochen (Henry) Zhong; Zaichuan You; Jiamu Zhang; Sebastian Zhao; Zachary LeClaire; Zirui Liu; Vipin Chaudhary; Shuai Xu; Xia Hu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we answer the questions by fairly and comprehensively investigating the adversarial performance of 10+ popular structured pruning methods.
297, Assumption Violations in Causal Discovery and The Robustness of Score Matching
Francesco Montagna; Atalanti Mastakouri; Elias Eulig; Nicoletta Noceti; Lorenzo Rosasco; Dominik Janzing; Bryon Aragam; Francesco Locatello;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Because causal discovery without further assumptions is an ill-posed problem, each algorithm comes with its own set of usually untestable assumptions, some of which are hard to meet in real datasets. Motivated by these considerations, this paper extensively benchmarks the empirical performance of recent causal discovery methods on observational _iid_ data generated under different background conditions, allowing for violations of the critical assumptions required by each selected approach.
298, LLMScore: Unveiling The Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu; Xianjun Yang; Xiujun Li; Xin Eric Wang; William Yang Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we propose LLMScore, a new framework that offers evaluation scores with multi-granularity compositionality.
299, SpokenWOZ: A Large-Scale Speech-Text Dataset for Spoken Task-Oriented Dialogue in Multiple Domains
Shuzheng Si; Wentao Ma; Haoyu Gao; Yuchuan Wu; Ting-En Lin; Yinpei Dai; Hangyu Li; Rui Yan; Fei Huang; Yongbin Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations.
300, CP-SLAM: Collaborative Neural Point-based SLAM System
Jiarui Hu; Mao Mao; Hujun Bao; Guofeng Zhang; Zhaopeng Cui;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper presents a collaborative implicit neural simultaneous localization and mapping (SLAM) system with RGB-D image sequences, which consists of complete front-end and back-end modules including odometry, loop detection, sub-map fusion, and global refinement.
301, The Adversarial Consistency of Surrogate Risks for Binary Classification
Natalie Frank; Jonathan Niles-Weed;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We study the consistency of surrogate risks for robust binary classification.
302, Unified 3D Segmenter As Prototypical Classifiers
Zheyun Qin; Cheng Han; Lu Xiankai; Qifan Wang; Xiushan Nie; Yilong Yin;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce ProtoSEG, a prototype-based model that unifies semantic, instance, and panoptic segmentation tasks.
303, Where Did I Come From? Origin Attribution of AI-Generated Images
Zhenting Wang; Chen Chen; Yi Zeng; Lingjuan Lyu; Shiqing Ma;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Existing methods only focus on specific types of generative models and require additional procedures during the training phase or generation phase. This makes them unsuitable for pre-trained models that lack these specific operations and may impair generation quality. To address this problem, we first develop an alteration-free and model-agnostic origin attribution method via reverse-engineering on image generation models, i.e., inverting the input of a particular model for a specific image.
304, Private Estimation Algorithms for Stochastic Block Models and Mixture Models
Hongjie Chen; Vincent Cohen-Addad; Tommaso d'Orsi; Alessandro Epasto; Jacob Imola; David Steurer; Stefan Tiegel;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms.
305, Tailoring Self-Attention for Graph Via Rooted Subtrees
Siyuan Huang; Yunchong Song; Jiayue Zhou; Zhouhan Lin;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues.
306, Learning Transformer Programs
Dan Friedman; Alexander Wettig; Danqi Chen;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we introduce a procedure for training Transformers that are mechanistically interpretable by design.
307, Binarized Neural Machine Translation
Yichi Zhang; Ankush Garg; Yuan Cao; Lukasz Lew; Behrooz Ghorbani; Zhiru Zhang; Orhan Firat;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind.
308, MADLAD-400: Monolingual And Document-Level Large Audited Dataset
Sneha Kudugunta; Isaac Caswell; Biao Zhang; Xavier Garcia; Derrick Xin; Aditya Kusupati; Romi Stella; Ankur Bapna; Orhan Firat;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on CommonCrawl, spanning 419 languages.
309, DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Qingkai Fang; Yan Zhou; Yang Feng;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose DASpeech, a non-autoregressive direct S2ST model which realizes both fast and high-quality S2ST.
310, Mixed Samples As Probes for Unsupervised Model Selection in Domain Adaptation
Dapeng Hu; Jian Liang; Jun Hao Liew; Chuhui Xue; Song Bai; Xinchao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose MixVal, a novel target-only method that employs \textit{mixup} to synthesize in-between target samples for validation.
311, GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Xin Li; Dongze Lian; Zhihe Lu; Jiawang Bai; Zhibo Chen; Xinchao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph.
312, Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation
Keji He; Chenyang Si; Zhihe Lu; Yan Huang; Liang Wang; Xinchao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In contrast to conventional approaches, which primarily focus on the spatial domain exploration, we propose a paradigm shift toward the Fourier domain.
313, Don’t Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner
Zhengxiang Shi; Aldo Lipani;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on task-related texts improves the performance of fine-tuning (FT) in downstream tasks.
314, Can LLM Already Serve As A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Jinyang Li; Binyuan Hui; Ge Qu; Binhua Li; Jiaxi Yang; Bowen Li; Bailin Wang; Bowen Qin; Ruiying Geng; Nan Huo; Xuanhe Zhou; Ma Chenhao; Guoliang Li; Kevin Chang; Fei Huang; Reynold Cheng; Yongbin Li;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present BIRD, a BIg benchmark for laRge-scale Database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains.
315, Pre-Training Protein Encoder Via Siamese Sequence-Structure Diffusion Trajectory Prediction
Zuobai Zhang; Minghao Xu; Aurelie Lozano; Vijil Chenthamarakshan; Payel Das; Jian Tang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, inspired by the success of denoising diffusion models in generative tasks, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling.
316, Hierarchical Randomized Smoothing
Yan Scholten; Jan Schuchardt; Aleksandar Bojchevski; Stephan Günnemann;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities.
317, A Comprehensive Study on Text-attributed Graphs: Benchmarking and Rethinking
Hao Yan; Chaozhuo Li; Ruosong Long; Chao Yan; Jianan Zhao; Wenwen Zhuang; Jun Yin; Peiyan Zhang; Weihao Han; Hao Sun; Weiwei Deng; Qi Zhang; Lichao Sun; Xing Xie; Senzhang Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose CS-TAG, a comprehensive and diverse collection of challenging benchmark datasets for TAGs.
318, SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Bill Yuchen Lin; Yicheng Fu; Karina Yang; Prithviraj (Raj) Ammanabrolu; Faeze Brahman; Shiyu Huang; Chandra Bhagavatula; Yejin Choi; Xiang Ren;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks.
319, Simplicity Bias in 1-Hidden Layer Neural Networks
Depen Morwani; Jatin Batra; Prateek Jain; Praneeth Netrapalli;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we rigorously define as well as thoroughly establish SB for *one hidden layer* neural networks in the infinite width regime.
320, StyleDrop: Text-to-Image Synthesis of Any Style
Kihyuk Sohn; Lu Jiang; Jarred Barber; Kimin Lee; Nataniel Ruiz; Dilip Krishnan; Huiwen Chang; Yuanzhen Li; Irfan Essa; Michael Rubinstein; Yuan Hao; Glenn Entis; Irina Blok; Daniel Castro Chin;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce *StyleDrop*, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model.
321, Information Maximization Perspective of Orthogonal Matching Pursuit with Applications to Explainable AI
Aditya Chattopadhyay; Ryan Pilgrim; Rene Vidal;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Our first contribution is to establish a fundamental connection between IP and OMP, where we prove that IP with random, continuous and linear queries ``almost'' reduces to OMP, with the difference being that IP selects atoms in order of \emph{normalized correlation gain}.
322, Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov; Emanuele La Malfa; Philip Torr; Adel Bibi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked.
323, Direct Diffusion Bridge Using Data Consistency for Inverse Problems
Hyungjin Chung; Jeongsol Kim; Jong Chul Ye;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Several recent works have tried to alleviate this problem by building a diffusion process, directly bridging the clean and the corrupted for specific inverse problems. In this paper, we first unify these existing works under the name Direct Diffusion Bridges (DDB), showing that while motivated by different theories, the resulting algorithms only differ in the choice of parameters.
324, Transformer-based Planning for Symbolic Regression
Parshin Shojaee; Kazem Meidani; Amir Barati Farimani; Chandan Reddy;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, these models primarily rely on supervised pretraining goals borrowed from text generation and overlook equation-specific objectives like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search into the transformer decoding process.
325, Simplifying Neural Network Training Under Class Imbalance
Ravid Shwartz-Ziv; Micah Goldblum; Yucen Li; C. Bayan Bruss; Andrew Wilson;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, architecture size, pre-training, optimizer, and label smoothing, can achieve state-of-the-art performance without any specialized loss functions or samplers.
326, The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium
Shuyue Hu; Harold Soh; Georgios Piliouras;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We show that smooth fictitious play, a well-known learning model in game theory, can achieve both consensus and convergence to equilibrium in diverse multi-agent settings.
327, High-dimensional Asymptotics of Denoising Autoencoders
Hugo Cui; Lenka Zdeborová;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection.
328, Clifford Group Equivariant Neural Networks
David Ruhe; Johannes Brandstetter; Patrick Forré;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{E}(n)$-equivariant networks.
329, Improving Few-Shot Generalization By Exploring and Exploiting Auxiliary Data
Alon Albalak; Colin Raffel; William Yang Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization.
330, Few-shot Generation Via Recalling The Episodic-Semantic Memory Like Human Being
Zhibin Duan; Zhiyi Lv; Chaojie Wang; Bo Chen; Bo An; Mingyuan Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Inspired by the cognitive systems of human being, in this work, we carefully design a variational structured memory module (VSM), which can simultaneously store both episodic and semantic memories to assistant existing generative models to efficiently recall memory during generation.
331, Training Private Models That Know What They Don’t Know
Stephan Rabanser; Anvith Thudi; Abhradeep Guha Thakurta; Krishnamurthy Dvijotham; Nicolas Papernot;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selective classifiers---that can abstain when they are unsure---under a differential privacy constraint.
332, A Unified Conditional Framework for Diffusion-based Image Restoration
Yi Zhang; Xiaoyu Shi; Dasong Li; Xiaogang Wang; Jian Wang; Hongsheng Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present a unified conditional framework based on diffusion models for image restoration.
333, Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss
An Zhang; Leheng Sheng; Zhibo Cai; Xiang Wang; Tat-Seng Chua;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To bridge the gap, we delve into the reasons underpinning the success of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss (AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods.
334, Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
Nikhil Vyas; Alexander Atanasov; Blake Bordelon; Depen Morwani; Sabarish Sainathan; Cengiz Pehlevan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets.
335, UniControl: A Unified Diffusion Model for Controllable Visual Generation In The Wild
Can Qin; Shu Zhang; Ning Yu; Yihao Feng; Xinyi Yang; Yingbo Zhou; Huan Wang; Juan Carlos Niebles; Caiming Xiong; Silvio Savarese; Stefano Ermon; Yun Fu; Ran Xu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts.
336, Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei; Junwen Bai; Siddhartha Brahma; Joshua Ainslie; Kenton Lee; Yanqi Zhou; Nan Du; Vincent Zhao; Yuexin Wu; Bo Li; Yu Zhang; Ming-Wei Chang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency.
337, Two-Stage Learning to Defer with Multiple Experts
Anqi Mao; Mehryar Mohri; Yutao Zhong;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We study a two-stage scenario for learning to defer, which we argue is crucial in practice for many applications.
338, Structured Prediction with Stronger Consistency Guarantees
Anqi Mao; Mehryar Mohri; Yutao Zhong;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: These loss functions readily lead to new structured prediction algorithms with stronger theoretical guarantees, based on their minimization. We describe efficient algorithms for minimizing several of these surrogate losses, including a new *structured logistic loss*.
339, $H$-Consistency Bounds: Characterization and Extensions
Anqi Mao; Mehryar Mohri; Yutao Zhong;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present new and tight $H$-consistency bounds for both the family of constrained losses and that of comp-sum losses, which covers the familiar cross-entropy, or logistic loss applied to the outputs of a neural network.
340, Segment Anything in High Quality
Lei Ke; Mingqiao Ye; Martin Danelljan; Yifan liu; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability.
341, Derandomized Novelty Detection with FDR Control Via Conformal E-values
Meshi Bashari; Amir Epstein; Yaniv Romano; Matteo Sesia;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose to make conformal inferences more stable by leveraging suitable conformal e-values instead of p-values to quantify statistical significance.
342, Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh; Benjamin Elizalde; Rita Singh; Huaming Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.
343, Module-wise Adaptive Distillation for Multimodality Foundation Models
Chen Liang; Jiahui Yu; Ming-Hsuan Yang; Matthew Brown; Yin Cui; Tuo Zhao; Boqing Gong; Tianyi Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently.
344, Contrastive Lift: 3D Object Instance Segmentation By Slow-Fast Contrastive Fusion
Yash Bhalgat; Iro Laina; João Henriques; Andrea Vedaldi; Andrew Zisserman;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation.
345, FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation
Xinyu Sun; Peihao Chen; Jugang Fan; Jian Chen; Thomas Li; Mingkui Tan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Existing methods try to solve this problem by learning a navigation policy, which captures semantic features of the goal image and observation image independently and lastly fuses them for predicting a sequence of navigation actions. However, these methods suffer from two major limitations. 1) They may miss detailed information in the goal image, and thus fail to reason the goal location. 2) More critically, it is hard to focus on the goal-relevant regions in the observation image, because they attempt to understand observation without goal conditioning. In this paper, we aim to overcome these limitations by designing a Fine-grained Goal Prompting (\sexyname) method for image-goal navigation.
346, Transfer Visual Prompt Generator Across LLMs
Ao Zhang; Hao Fei; Yuan Yao; Wei Ji; Li Li; Zhiyuan Liu; Tat-Seng Chua;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we investigate VPG transferability across LLMs for the first time, aiming to reduce the cost of VPG training.
347, Diverse Conventions for Human-AI Collaboration
Bidipta Sarkar; Andy Shih; Dorsa Sadigh;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different.
348, Representational Strengths and Limitations of Transformers
Clayton Sanford; Daniel Hsu; Matus Telgarsky;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work we establish both positive and negative results on the representation power of attention layers, with a focus on intrinsic complexity parameters such as width, depth, and embedding dimension.
349, Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
Haonan Duan; Adam Dziedzic; Nicolas Papernot; Franziska Boenisch;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt.
350, JourneyDB: A Benchmark for Generative Image Understanding
Junting Pan; Keqiang Sun; Yuying Ge; Hao Li; Haodong Duan; Xiaoshi Wu; Renrui Zhang; Aojun Zhou; Zipeng Qin; Yi Wang; Jifeng Dai; Yu Qiao; Hongsheng Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Synthetic images, in comparison to real data, encompass a higher level of diversity in terms of both content and style, thereby presenting significant challenges for the models to fully grasp. In light of this challenge, we introduce a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images within the context of multi-modal visual understanding.
351, InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Junda Wu; Tong Yu; Rui Wang; Zhao Song; Ruiyi Zhang; Handong Zhao; Chaochao Lu; Shuai Li; Ricardo Henao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations).
352, Proximity-Informed Calibration for Deep Neural Networks
Miao Xiong; Ailin Deng; Pang Wei Koh; Jiaying Wu; Shen Li; Jianqing Xu; Bryan Hooi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Motivated by the empirical findings, we propose ProCal, a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity.
353, Context-TAP: Tacking Any Point Demands Context Features
Weikang Bian; Zhaoyang Huang; Xiaoyu Shi; Yitong Dong; Yijin Li; Hongsheng Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To this end, we propose a novel framework Context-TAP, which effectively improves point trajectory accuracy by aggregating spatial context features in videos.
354, Generate What You Prefer: Reshaping Sequential Recommendation Via Guided Diffusion
Zhengyi Yang; Jiancan Wu; Zhicai Wang; Xiang Wang; Yancheng Yuan; Xiangnan He;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Sequential recommendation aims to recommend the next item that matches a user's interest, based on a sequence of items he/she interacted with before. Scrutinizing previous studies, we can summarize a common learning-to-classify paradigm --- given a positive item, a recommender model performs negative sampling to add negative items and learns to classify whether the user prefers them or not, based on his/her historical item sequence.
355, VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
Jiayi Guan; Guang Chen; Jiaming Ji; Long Yang; ao zhou; Zhijun Li; changjun jiang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose a Variational Optimization with Conservative Eestimation algorithm (VOCE) to solve the problem of optimizing safety policies in the offline dataset.
356, Consensus and Subjectivity of Skin Tone Annotation for ML Fairness
Candice Schumann; Gbolahan Olanubi; Auriel Wright; Ellis Monk; Courtney Heldreth; Susanna Ricco;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper examines the subjectivity of skin tone annotation through a series of annotation experiments using the Monk Skin Tone (MST) scale~\cite{Monk2022Monk}, a small pool of professional photographers, and a much larger pool of trained crowdsourced annotators.
357, Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity
Zijiao Chen; Jiaxin Qing; Juan Helen Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose Mind-Video that learns spatiotemporal information from continuous fMRI data of the cerebral cortex progressively through masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model that incorporates network temporal inflation.
358, The Quantization Model of Neural Scaling
Eric Michaud; Ziming Liu; Uzay Girit; Max Tegmark;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale.
359, Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matthew Le; Bowen Shi; Apoorv Vyas; Brian Karrer; Leda Sari; Yossi Adi; Vimal Manohar; Jay Mahadeokar; Wei-Ning Hsu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present Voicebox, the most versatile text-guided generative model for speech at scale.
360, CLadder: Assessing Causal Reasoning in Language Models
Zhijing Jin; Yuen Chen; Felix Leeb; Luigi Gresele; Ojasv Kamal; Zhiheng LYU; Kevin Blin; Fernando Gonzalez Adauto; Max Kleiman-Weiner; Mrinmaya Sachan; Bernhard Schölkopf;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Much of the existing work in NLP on causality focuses on understanding commonsense causal relationships, thus failing to assess whether the model's reasoning abilities include *formal* causal inference. To address this, we propose a new NLP task, *causal inference in natural language*, inspired by the ``causal inference engine'' postulated by Judea Pearl.
361, OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Ayca Takmaz; Elisabetta Fedele; Robert Sumner; Marc Pollefeys; Federico Tombari; Francis Engelmann;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: While such a representation can be directly employed to perform semantic segmentation, existing methods have limitations in their ability to handle object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation.
362, Block-State Transformers
Jonathan Pilault; Mahan Fathi; Pierre-Luc Bacon; Chris Pal; Orhan Firat; Ross Goroshin;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose a hybrid layer named Block-State Transformer (*BST*), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences.
363, BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents That Solve Fuzzy Tasks
Stephanie Milani; Anssi Kanervisto; Karolis Ramanauskas; Sander Schulhoff; Brandon Houghton; Rohin Shah;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: These comparisons serve as a fixed, preliminary leaderboard for evaluating newly-developed algorithms. To enable this comparison, we present a streamlined codebase for benchmarking new algorithms against the leaderboard.
364, Random-Access Infinite Context Length for Transformers
Amirkeivan Mohtashami; Martin Jaggi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present a novel approach that allows access to the complete context while retaining random-access flexibility, closely resembling running attention on the entire context.
365, Unsupervised Image Denoising with Score Function
Yutong Xie; Mingze Yuan; Bin Dong; Quanzheng Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a new approach which is more general and applicable to complicated noise models.
366, A Meta Learning Model for Scalable Hyperbolic Graph Neural Networks
Nurendra Choudhary; Nikhil Rao; Chandan Reddy;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce a novel method, Hyperbolic GRAph Meta Learner (H-GRAM), that learns transferable information from a set of support local subgraphs, in the form of hyperbolic meta gradients and label hyperbolic protonets, to enable faster learning over a query set of new tasks dealing with disjoint subgraphs.
367, Learning to Augment Distributions for Out-of-distribution Detection
Qizhou Wang; Zhen Fang; Yonggang Zhang; Feng Liu; Yixuan Li; Bo Han;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Accordingly, we propose Distributional-Augmented OOD Learning (DAOL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution.
368, MAViL: Masked Audio-Video Learners
Po-Yao Huang; Vasu Sharma; Hu Xu; Chaitanya Ryali; haoqi fan; Yanghao Li; Shang-Wen Li; Gargi Ghosh; Jitendra Malik; Christoph Feichtenhofer;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present Masked Audio-Video Learners (MAViL) to learn audio-visual representations with three complementary forms of self-supervision: (1) reconstructing masked raw audio and video inputs, (2) intra-modal and inter-modal contrastive learning with masking, and (3) self-training to predict aligned and contextualized audio-video representations learned from the first two objectives.
369, Energy-based Attention for Associative Memory
Benjamin Hoover; Yuchen Liang; Bao Pham; Rameswar Panda; Hendrik Strobelt; Duen Horng Chau; Mohammed Zaki; Dmitry Krotov;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose a novel architecture, called the Energy transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks.
370, On Quantum Backpropagation, Information Reuse, and Cheating Measurement Collapse
Amira Abbas; Robbie King; Hsin-Yuan Huang; William J. Huggins; Ramis Movassagh; Dar Gilboa; Jarrod McClean;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We show that achieving backpropagation scaling is impossible without access to multiple copies of a state. With this added ability, we introduce an algorithm with foundations in shadow tomography that matches backpropagation scaling in quantum resources while reducing classical auxiliary computational costs to open problems in shadow tomography.
371, Noise-Adaptive Thompson Sampling for Linear Contextual Bandits
Ruitu Xu; Yifei Min; Tianhao Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we study linear contextual bandits with heteroscedastic noise and propose the first noise-adaptive Thompson sampling-style algorithm that achieves a variance-dependent regret upper bound of $\widetilde O\Big(d^{3/2} + d^{3/2} \sqrt{\sum_{t=1}^T \sigma_t^2}\Big)$, where $d$ is the dimension of the context vectors and $\sigma_t^2$ is the variance of the reward in round $t$.
372, Disentangled Wasserstein Autoencoder for Protein Engineering
Tianxiao Li; Hongyu Guo; Filippo Grazioli; Mark Gerstein; Martin Renqiang Min;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Identifying and modifying those functional sites is critical for protein engineering but computationally non-trivial, and requires significant domain knowledge. To automate this process from a data-driven perspective, we propose a disentangled Wasserstein autoencoder with an auxiliary classifier, which isolates the function-related patterns from the rest with theoretical guarantees.
373, DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Boxin Wang; Weixin Chen; Hengzhi Pei; Chulin Xie; Mintong Kang; Chenhui Zhang; Chejian Xu; Zidi Xiong; Ritik Dutta; Rylan Schaeffer; Sang Truong; Simran Arora; Mantas Mazeika; Dan Hendrycks; Zinan Lin; Yu Cheng; Sanmi Koyejo; Dawn Song; Bo Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives – including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness.
374, EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
Michael Wornow; Rahul Thapa; Ethan Steinberg; Jason Fries; Nigam Shah;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance.
375, Multi-Head Adapter Routing for Cross-Task Generalization
Lucas Page-Caccia; Edoardo Maria Ponti; Zhan Su; Matheus Pereira; Nicolas Le Roux; Alessandro Sordoni;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Polytropon [Ponti et al., 2023] ($\texttt{Poly}$) jointly learns an inventory of adapters and a *routing* function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. In this paper, we investigate the role that adapter routing plays in its success and design new variants based on our findings.
376, Equivariant Flow Matching
Leon Klein; Andreas Krämer; Frank Noe;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce equivariant flow matching, a new training objective for equivariant CNFs that is based on the recently proposed optimal transport flow matching.
377, Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Zhengxuan Wu; Atticus Geiger; Christopher Potts; Noah Goodman;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters -- an approach we call Boundless DAS.
378, How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in A Pre-trained Language Model
Michael Hanna; Ollie Liu; Alexandre Variengien;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models.
379, Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
Frederik Kunstner; Victor Sanches Portella; Mark Schmidt; Nicholas Harvey;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose multidimensional backtracking, an extension of the backtracking line-search to find good diagonal preconditioners for smooth convex problems.
380, Deep Reinforcement Learning with Plasticity Injection
Evgenii Nikishin; Junhyuk Oh; Georg Ostrovski; Clare Lyle; Razvan Pascanu; Will Dabney; Andre Barreto;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper introduces plasticity injection, a minimalistic intervention that increases the network plasticity without changing the number of trainable parameters or biasing the predictions.
381, LANCE: Stress-testing Visual Models By Generating Language-guided Counterfactual Images
Viraj Prabhu; Sriram Yenamandra; Prithvijit Chattopadhyay; Judy Hoffman;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE).
382, Domain Watermark: Effective and Harmless Dataset Copyright Protection Is Closed at Hand
Junfeng Guo; Yiming Li; Lixu Wang; Shu-Tao Xia; Heng Huang; Cong Liu; Bo Li;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we revisit backdoor-based dataset ownership verification (DOV), which is currently the only feasible approach to protect the copyright of open-source datasets.
383, The Probability Flow ODE Is Provably Fast
Sitan Chen; Sinho Chewi; Holden Lee; Yuanzhi Li; Jianfeng Lu; Adil Salim;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We provide the first polynomial-time convergence guarantees for the probabilistic flow ODE implementation (together with a corrector step) of score-based generative modeling.
384, Towards Robust and Expressive Whole-body Human Pose and Shape Estimation
Hui En Pang; Zhongang Cai; Lei Yang; Tianwei Zhang; Qingyi Tao; Zhonghua Wu; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a novel framework to enhance the robustness of whole-body pose and shape estimation.
385, FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
Mingyuan Zhang; Huirong Li; Zhongang Cai; Jiawei Ren; Lei Yang; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions.
386, SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
Zhongang Cai; Wanqi Yin; Ailing Zeng; CHEN WEI; Qingping SUN; Wang Yanjun; Hui En Pang; Haiyi Mei; Mingyuan Zhang; Lei Zhang; Chen Change Loy; Lei Yang; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources.
387, Language Models Are Visual Reasoning Coordinators
Liangyu Chen; Bo Li; Sheng Shen; Jingkang Yang; Chunyuan Li; Kurt Keutzer; Trevor Darrell; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning.
388, Segment Any Point Cloud Sequences By Distilling Vision Foundation Models
Youquan Liu; Lingdong Kong; Jun CEN; Runnan Chen; Wenwei Zhang; Liang Pan; Kai Chen; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we introduce ***Seal***, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences.
389, InsActor: Instruction-driven Physics-based Characters
Jiawei Ren; Mingyuan Zhang; Cunjun Yu; Xiao Ma; Liang Pan; Ziwei Liu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present $\textbf{InsActor}$, a principled generative framework that leverages recent advancements in diffusion-based human motion models to produce instruction-driven animations of physics-based characters.
390, Towards The Difficulty for A Deep Neural Network to Learn Concepts of Different Complexities
Dongrui Liu; Huiqi Deng; Xu Cheng; Qihan Ren; Kangrui Wang; Quanshi Zhang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Each interactive concept is encoded by the DNN to represent the collaboration between a set of input variables. Therefore, in this study, we aim to theoretically explain that interactive concepts involving more input variables (i.e., more complex concepts) are more difficult to learn.
391, CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
Denis Kuznedelev; Eldar Kurtić; Elias Frantar; Dan Alistarh;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Pruner (CAP), a new unstructured pruning framework which significantly pushes the compressibility limits for state-of-the-art architectures.
392, ZipLM: Inference-Aware Structured Pruning of Language Models
Eldar Kurtić; Elias Frantar; Dan Alistarh;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM.
393, Knowledge Distillation Performs Partial Variance Reduction
Mher Safaryan; Alexandra Peste; Dan Alistarh;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we shed new light on the inner workings of this method, by examining it from an optimization perspective.
394, To Repeat or Not To Repeat: Insights from Scaling LLM Under Token-Crisis
Fuzhao Xue; Yao Fu; Wangchunshu Zhou; Zangwei Zheng; Yang You;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this study, we empirically investigate three key aspects under this approach.
395, Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Zangwei Zheng; Xiaozhe Ren; Fuzhao Xue; Yang Luo; Xin Jiang; Yang You;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
396, Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials
Shengchao Liu; weitao Du; Yanjing Li; Zhuoxinran Li; Zhiling Zheng; Chenru Duan; Zhi-Ming Ma; Omar Yaghi; Animashree Anandkumar; Christian Borgs; Jennifer Chayes; Hongyu Guo; Jian Tang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Nevertheless, due to the rapidly evolving process of the field and the knowledge gap between science (e.g., physics, chemistry, \& biology) and machine learning communities, a benchmarking study on geometrical representation for such data has not been conducted. To address such an issue, in this paper, we first provide a unified view of the current symmetry-informed geometric methods, classifying them into three main categories: invariance, equivariance with spherical frame basis, and equivariance with vector frame basis.
397, Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu; Jaemin Cho; Prateek Yadav; Mohit Bansal;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Although humans often find a video moment to focus on and rewind the moment to answer questions, training a query-aware video moment localizer often requires expensive annotations and high computational costs. To address this issue, we propose Self-Chained Video Localization-Answering (SeViLA), a novel framework that leverages a single image-language model (BLIP-2) to tackle both temporal keyframe localization and question answering on videos.
398, Can Language Models Teach? Teacher Explanations Improve Student Performance Via Theory of Mind
Swarnadeep Saha; Peter Hase; Mohit Bansal;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Next, when the teacher is constrained by a budget, we decompose the teaching problem along two axes for better efficiency: (1) deciding when it is worth explaining a data point, and (2) understanding how the teacher should personalize explanations to better teach the student. We tackle both these problems by proposing a Theory of Mind approach, in which the teacher builds two few-shot mental models of the student.
399, Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization
Xiangsen Wang; Haoran Xu; Yinan Zheng; Xianyuan Zhan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we present OMIGA, a new offline multi-agent RL algorithm with implicit global-to-local value regularization.
400, Text-to-Image Diffusion Models Are Zero Shot Classifiers
Kevin Clark; Priyank Jaini;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, what knowledge their representations capture is not fully understood, and they have not been thoroughly explored on downstream tasks. We investigate diffusion models by proposing a method for evaluating them as zero-shot classifiers.
401, MomentDiff: Generative Video Moment Retrieval from Random to Real
Pandeng Li; Chen-Wei Xie; Hongtao Xie; Liming Zhao; Lei Zhang; Yun Zheng; Deli Zhao; Yongdong Zhang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To evaluate the influence of the temporal location biases, we propose two ``anti-bias'' datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom.
402, MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory
Yinan Liang; Ziwei Wang; Xiuwei Xu; Yansong Tang; Jie Zhou; Jiwen Lu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a hardware-algorithm co-design method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory, where we jointly design transformer architectures and construct the inference compiler to fit the memory resource constraint.
403, How Does Adaptive Optimization Impact Local Neural Network Geometry?
Kaiqi Jiang; Dhruv Malik; Yuanzhi Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: For iterate trajectories produced by running a generic optimization algorithm OPT, we introduce $R^{\text{OPT}}\_{\text{med}}$, a statistic that is analogous to the condition number of the loss Hessian evaluated at the iterates.
404, ResShift: Efficient Diffusion Model for Image Super-resolution By Residual Shifting
Zongsheng Yue; Jianyi Wang; Chen Change Loy;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration.
405, Rubik's Cube: High-Order Channel Interactions with A Hierarchical Receptive Field
Naishan Zheng; man zhou; Chong Zhou; Chen Change Loy;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, most of these methods, \emph{e.g.}, convolution and the FFN architecture of transformers, only take implicit advantage of the first-order channel interaction and have yet to fully tap into its potential for high-order modeling. To address this, our study delves into modeling channel-dimension relationships, and proposes a simple yet effective and efficient high-order channel-wise operator for image restoration.
406, Guiding Diffusion Models for Versatile Face Restoration Via Partial Guidance
Peiqing Yang; Shangchen Zhou; Qingyi Tao; Chen Change Loy;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce $\textit{partial guidance}$, a fresh perspective that is more adaptable to real-world degradations compared to existing works.
407, Mutual Information Regularized Offline Reinforcement Learning
Xiao Ma; Bingyi Kang; Zhongwen Xu; Min Lin; Shuicheng Yan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset by directly constraining the policy improvement direction.
408, Towards Self-Interpretable Graph-Level Anomaly Detection
Yixin Liu; Kaize Ding; Qinghua Lu; Fuyi Li; Leo Yu Zhang; Shirui Pan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i.e., the vital subgraph that leads to the predictions.
409, Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data
Xin Zheng; Miao Zhang; Chunyang Chen; Quoc Viet Hung Nguyen; Xingquan Zhu; Shirui Pan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we advocate a new Structure-Free Graph Condensation paradigm, named SFGC, to distill a large-scale graph into a small-scale graph node set without explicit graph structures, i.e., graph-free data.
410, GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels
Xin Zheng; Miao Zhang; Chunyang Chen; Soheila Molaei; Chuan Zhou; Shirui Pan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Evaluating the performance of graph neural networks (GNNs) is an essential task for practical GNN model deployment and serving, as deployed GNNs face significant performance uncertainty when inferring on unseen and unlabeled test graphs, due to mismatched training-test graph distributions. In this paper, we study a *new* problem, **GNN model evaluation**, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs, by precisely estimating its performance (e.g., node classification accuracy) on unseen graphs without labels.
411, Open Visual Knowledge Extraction Via Relation-Oriented Multimodality Model Prompting
Hejie Cui; Xinyu Fang; Zihan Zhang; Ran Xu; Xuan Kan; Xin Liu; Manling Li; Yangqiu Song; Carl Yang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we take a first exploration to a new paradigm of open visual knowledge extraction.
412, Semantic Image Synthesis with Unconditional Generator
JungWoo Chae; Hyunin Cho; Sooyeon Go; Kyungmook Choi; Youngjung Uh;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce a new approach, for reflecting user's detailed guiding masks on a pretrained unconditional generator.
413, Autodecoding Latent 3D Diffusion Models
Evangelos Ntavelis; Aliaksandr Siarohin; Kyle Olszewski; Chaoyang Wang; Luc V Gool; Sergey Tulyakov;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Such data is scarce for 3D generation, prohibiting the learning of large-scale diffusion models for 3D synthesis. We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
414, Your Representations Are in The Network: Composable and Parallel Adaptation for Large Scale Models
Yonatan Dukler; Alessandro Achille; Hao Yang; Varsha Vivek; Luca Zancato; Benjamin Bowman; Avinash Ravichandran; Charless Fowlkes; Ashwin Swaminathan; Stefano Soatto;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present a framework for transfer learning that efficiently adapts a large base-model by learning lightweight cross-attention modules attached to its intermediate activations.
415, Optimal Cross-learning for Contextual Bandits with Unknown Context Distributions
Jon Schneider; Julian Zimmert;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We consider the problem of designing contextual bandit algorithms in the ``cross-learning'' setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round.
416, Calibration By Distribution Matching: Trainable Kernel Calibration Metrics
Charles Marx; Sofian Zalouk; Stefano Ermon;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
417, Learning to Compress Prompts with Gist Tokens
Jesse Mu; Xiang Li; Noah Goodman;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we present gisting, which trains an LM to compress prompts into smaller sets of gist tokens which can be cached and reused for compute efficiency.
418, Thinker: Learning to Plan and Act
Stephen Chung; Ivan Anokhin; David Krueger;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model.
419, Bypassing The Simulator: Near-Optimal Adversarial Linear Contextual Bandits
Haolin Liu; Chen-Yu Wei; Julian Zimmert;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution.
420, Robust Mean Estimation Without Moments
Gleb Novikov; David Steurer; Stefan Tiegel;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Moreover, the guarantees that they achieve in the heavy-tailed setting are weaker than those for sub-Gaussian distributions with known covariance. In this work, we show that such a tradeoff, between error guarantees and heavy-tails, is not necessary for symmetric distributions.
421, Hardware Resilience Properties of Text-Guided Image Classifiers
Syed Talal Wasim; Kabila Haile Soboka; Abdulrahman Mahmoud; Salman Khan; David Brooks; Gu-Yeon Wei;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This research paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors.
422, Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning
Xinyi Wang; Wanrong Zhu; Michael Saxon; Mark Steyvers; William Yang Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as implicit topic models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LLM, then directly generalize the selected demonstrations to larger LLMs.
423, KuaiSim: A Comprehensive Simulator for Recommender Systems
Kesen Zhao; Shuchang Liu; Qingpeng Cai; Xiangyu Zhao; Ziru Liu; Dong Zheng; Peng Jiang; Kun Gai;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Existing simulators have shown promising results but also have limitations such as simplified user feedback, lacking consistency with real-world data, the challenge of simulator evaluation, and difficulties in migration and expansion across RSs. To address these challenges, we propose KuaiSim, a comprehensive user environment that provides user feedback with multi-behavior and cross-session responses.
424, Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes
Yishi Xu; Jianqiao Sun; Yudi Su; Xinyang Liu; Zhibin Duan; Bo Chen; Mingyuan Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To address the issue, we propose an effective approach for topic modeling under the low-resource regime, the core of which is the adaptive generation of semantic matching word embeddings by integrating the contextual information of each task.
425, Dynamically Masked Discriminator for GANs
Wentian Zhang; Haozhe Liu; Bing Li; Jinheng Xie; Yawen Huang; Yuexiang Li; Yefeng Zheng; Bernard Ghanem;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a novel method for GANs from the viewpoint of online continual learning.
426, Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang; Shujian Zhang; Congying Xia; Yihao Feng; Caiming Xiong; Mingyuan Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: There is, therefore, a *granularity mismatch* between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance.
427, CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society
Guohao Li; Hasan Hammoud; Hani Itani; Dmitrii Khizbullin; Bernard Ghanem;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing.
428, Norm-guided Latent Space Exploration for Text-to-image Generation
Dvir Samuel; Rami Ben-Ari; Nir Darshan; Haggai Maron; Gal Chechik;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this metric and use it to further define centroids in the latent seed space.
429, A Randomized Approach for Tight Privacy Accounting
Jiachen T. Wang; Saeed Mahloujifar; Tong Wu; Ruoxi Jia; Prateek Mittal;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a new differential privacy paradigm called estimate-verify-release (EVR), which tackles the challenges of providing a strict upper bound for the privacy parameter in DP compositions by converting an *estimate* of privacy parameter into a formal guarantee.
430, Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation
Lisa Dunlap; Alyssa Umino; Han Zhang; Jiezhi Yang; Joseph Gonzalez; Trevor Darrell;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing.
431, A Privacy-Friendly Approach to Data Valuation
Jiachen T. Wang; Yuqing Zhu; Yu-Xiang Wang; Ruoxi Jia; Prateek Mittal;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical challenges in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley).
432, An Inverse Scaling Law for CLIP Training
Xianhang Li; Zeyu Wang; Cihang Xie;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.
433, Robust Multi-Agent Reinforcement Learning Via Adversarial Regularization: Theoretical Foundation and Stable Algorithms
Alexander Bukharin; Yan Li; Yue Yu; Qingru Zhang; Zhehui Chen; Simiao Zuo; Chao Zhang; Songan Zhang; Tuo Zhao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work we show that we can gain robustness by controlling a policy’s Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Motivated by these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization.
434, Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
Shenao Zhang; Boyi Liu; Zhaoran Wang; Tuo Zhao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
435, Subject-driven Text-to-Image Generation Via Apprenticeship Learning
wenhu chen; Hexiang Hu; Yandong Li; Nataniel Ruiz; Xuhui Jia; Ming-Wei Chang; William Cohen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with {in-context} learning.
436, Why Deep Models Often Cannot Beat Non-deep Counterparts on Molecular Property Prediction?
Jun Xia; Lecheng Zhang; Xiao Zhu; Yue Liu; Zhangyang Gao; Bozhen Hu; Cheng Tan; Jiangbin Zheng; Siyuan Li; Stan Z. Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this study, we benchmark 12 representative models (3 non-deep models and 9 deep models) on 15 molecule datasets.
437, Foundation Model Is Efficient Multimodal Multitask Model Selector
fanqing meng; Wenqi Shao; zhanglin peng; Chonghe Jiang; Kaipeng Zhang; Yu Qiao; Ping Luo;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Although recent-advanced approaches employed lightweight metrics to measure models’ transferability, they often depend heavily on the prior knowledge of a single task, making them inapplicable in a multi-modal multi-task scenario. To tackle this issue, we propose an efficient multitask model selector (EMMS), which employs large-scale foundation models to transform diverse label formats such as categories, texts, and bounding boxes of different downstream tasks into a unified noisy label embedding.
438, Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh; Santhosh Kumar Ramakrishnan; Triantafyllos Afouras; Kristen Grauman;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos.
439, Augmentation-free Dense Contrastive Distillation for Efficient Semantic Segmentation
Jiawei Fan; Chao Li; Xiaolong Liu; Meina Song; Anbang Yao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Existing methods heavily rely on data augmentation and memory buffer, which entail high computational resource demands when applying them to handle semantic segmentation that requires to preserve high-resolution feature maps for making dense pixel-wise predications. In order to alleviate this problem, we present Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD), a new contrastive distillation learning paradigm to train compact and accurate deep neural networks for semantic segmentation applications.
440, LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Neel Guha; Julian Nyarko; Daniel Ho; Christopher Ré; Adam Chilton; Aditya K; Alex Chohlas-Wood; Austin Peters; Brandon Waldon; Daniel Rockmore; Diego Zambrano; Dmitry Talisman; Enam Hoque; Faiz Surani; Frank Fagan; Galit Sarfaty; Gregory Dickinson; Haggai Porat; Jason Hegland; Jessica Wu; Joe Nudell; Joel Niklaus; John Nay; Jonathan Choi; Kevin Tobia; Margaret Hagan; Megan Ma; Michael Livermore; Nikon Rasumov-Rahe; Nils Holzenberger; Noam Kolt; Peter Henderson; Sean Rehaag; Sharad Goel; Shang Gao; Spencer Williams; Sunny Gandhi; Tom Zur; Varun Iyer; Zehua Li;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning.
441, Cola: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray; Filip Radenovic; Abhimanyu Dubey; Bryan Plummer; Ranjay Krishna; Kate Saenko;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Compositional reasoning is a hallmark of human visual intelligence; yet despite the size of large vision-language models, they struggle to represent simple compositions by combining objects with their attributes. To measure this lack of compositional capability, we design Cola, a text-to-image retrieval benchmark to Compose Objects Localized with Attributes.
442, Learning Threshold Neurons Via Edge of Stability
Kwangjun Ahn; Sebastien Bubeck; Sinho Chewi; Yin Tat Lee; Felipe Suarez; Yi Zhang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks.
443, Improving Category Discovery When No Representation Rules Them All
Sagar Vaze; Andrea Vedaldi; Andrew Zisserman;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper we tackle the problem of Generalized Category Discovery (GCD).
444, FaceComposer: A Unified Framework for Versatile Facial Content Creation
Jiayu Wang; Kang Zhao; Yifeng Ma; Shiwei Zhang; Yingya Zhang; Yujun Shen; Deli Zhao; Jingren Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This work presents FaceComposer, a unified generative model that accomplishesa variety of facial content creation tasks, including text-conditioned face synthesis,text-guided face editing, face animation etc.
445, VideoComposer: Compositional Video Synthesis with Motion Controllability
Xiang Wang; Hangjie Yuan; Shiwei Zhang; Dayou Chen; Jiuniu Wang; Yingya Zhang; Yujun Shen; Deli Zhao; Jingren Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Based on the paradigm of compositional generation, this work presents VideoComposer that allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions.
446, Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms
Peiyao Xiao; Hao Ban; Kaiyi Ji;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose a new direction-oriented multi-objective problem by regularizing the common descent direction within a neighborhood of a direction that optimizes a linear combination of objectives such as the average loss in MTL.
447, Fairness-guided Few-shot Prompting for Large Language Models
Huan Ma; Changqing Zhang; Yatao Bian; Lemao Liu; Zhirui Zhang; Peilin Zhao; Shu Zhang; Huazhu Fu; Qinghua Hu; Bingzhe Wu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias.
448, DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method
Ahmed Khaled; Konstantin Mishchenko; Chi Jin;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce DoWG (Distance over Weighted Gradients), a new parameter-free optimizer that combines adaptive gradient weighting with distance estimation.
449, Revisiting The Evaluation of Image Synthesis with GANs
mengping yang; Ceyuan Yang; Yichi Zhang; Qingyan Bai; Yujun Shen; Bo Dai;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
450, Language Models Can Improve Event Prediction By Few-Shot Abductive Reasoning
Xiaoming Shi; Siqiao Xue; Kangrui Wang; Fan Zhou; James Zhang; Jun Zhou; Chenhao Tan; Hongyuan Mei;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction accuracy of event sequence models.
451, Can Language Models Solve Graph Problems in Natural Language?
Heng Wang; Shangbin Feng; Tianxing He; Zhaoxuan Tan; Xiaochuang Han; Yulia Tsvetkov;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To this end, we propose NLGraph (Natural Language Graph), a comprehensive benchmark of graph-based problem solving designed in natural language.
452, NeuralGF: Unsupervised Point Normal Estimation By Learning Neural Gradient Function
Qing Li; Huifang Feng; Kanle Shi; Yue Gao; Yi Fang; Yu-Shen Liu; Zhizhong Han;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In addition, normal orientation consistency across shapes remains difficult to achieve without a separate post-processing procedure. To resolve these issues, we propose a novel method for estimating oriented normals directly from point clouds without using ground truth normals as supervision.
453, Is Distance Matrix Enough for Geometric Deep Learning?
Zian Li; Xiyuan Wang; Yinan Huang; Muhan Zhang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we expand on the families of counterexamples that MPNNs are unable to distinguish from their distance matrices, by constructing families of novel and symmetric geometric graphs, to better understand the inherent limitations of MPNNs.
454, Exploring Loss Functions for Time-based Training Strategy in Spiking Neural Networks
Yaoyu Zhu; Wei Fang; Xiaodong Xie; Tiejun Huang; Zhaofei Yu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: After that, we infer that loss functions providing adequate positive overall gradients help training by theoretical analysis. Based on this, we propose the enhanced counting loss to replace the commonly used mean square counting loss.
455, Learning Neural Implicit Through Volume Rendering with Attentive Depth Fusion Priors
Pengchong Hu; Zhizhong Han;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, rendering a view each time suffers from incomplete depth at holes and unawareness of occluded structures from the depth supervision, which severely affects the accuracy of geometry inference via volume rendering. To resolve this issue, we propose to learn neural implicit representations from multi-view RGBD images through volume rendering with an attentive depth fusion prior.
456, Diffusion Optimization Models with Trajectory Alignment for Constrained Design Generation
Giorgio Giannone; Akash Srivastava; Ole Winther; Faez Ahmed;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Specifically, engineering optimization methods based on physics still outperform generative models when dealing with constrained environments where data is scarce and precision is paramount. To address these challenges, we introduce Diffusion Optimization Models (DOM) and Trajectory Alignment (TA), a learning framework that demonstrates the efficacy of aligning the sampling trajectory of diffusion models with the optimization trajectory derived from traditional physics-based methods.
457, Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
cheng cheng; Lin Song; Ruoyi Xue; Hang Wang; Hongbin Sun; Yixiao Ge; Ying Shan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Nevertheless, few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples, resulting in longer inference time and the risk of overfitting in certain domains. To tackle these challenges, we propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner.
458, Resolving Interference When Merging Models
Prateek Yadav; Derek Tam; Leshem Choshen; Colin Raffel; Mohit Bansal;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter’s values across models. To address this, we propose our method, TrIm, Elect Sign & Merge (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign.
459, Margin Maximization in Attention Mechanism
Davoud Ataee Tarzanagh; Yingcong Li; Xuechen Zhang; Samet Oymak;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we initiate the study of a softmax-attention model $f(X)=v^\top X^\top \text{softmax}(XW^\top p)$, where, $X$ is the tokenized input, $v$ is the value weights, $W$ is the key-query weights, and $p$ is a tunable token/prompt.
460, Replicability in Reinforcement Learning
Amin Karbasi; Grigoris Velegkas; Lin Yang; Felix Zhou;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL).
461, Online Performative Gradient Descent for Learning Nash Equilibria in Decision-Dependent Games
Zihan Zhu; Ethan Fang; Zhuoran Yang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, since agents are strategically coupled, traditional gradient-based methods are infeasible without the gradient oracle. To overcome this challenge, we model the strategic interactions by a general parametric model and propose a novel online algorithm, Online Performative Gradient Descent (OPGD), which leverages the ideas of online stochastic approximation and projected gradient descent to learn the Nash equilibrium in the context of function approximation for the unknown gradient.
462, Towards Optimal Caching and Model Selection for Large Model Inference
Banghua Zhu; Ying Sheng; Lianmin Zheng; Clark Barrett; Michael Jordan; Jiantao Jiao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model selector to choose from an ensemble of models for query processing.
463, Doubly-Robust Self-Training
Banghua Zhu; Mingyu Ding; Philip Jacobson; Ming Wu; Wei Zhan; Michael Jordan; Jiantao Jiao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce doubly-robust self-training, an innovative semi-supervised algorithm that provably balances between two extremes.
464, Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback
Jangwon Kim; Hangyeol Kim; Jiwook Kang; Jongchan Baek; Soohee Han;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present a novel actor-critic algorithm for an environment with delayed feedback, which addresses the state-space explosion problem of conventional approaches.
465, Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules
ZHIYUAN LIU; Yaorui Shi; An Zhang; Enzhi Zhang; Kenji Kawaguchi; Xiang Wang; Tat-Seng Chua;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (\ie subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph.
466, Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme; Nicolas Flammarion;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper we fully describe the trajectory of gradient flow over $2$-layer diagonal linear networks for the regression setting in the limit of vanishing initialisation.
467, Testing The General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Abulhair Saparov; Richard Yuanzhe Pang; Vishakh Padmakumar; Nitish Joshi; Mehran Kazemi; Najoung Kim; He He;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization.
468, Fairly Recommending with Social Attributes: A Flexible and Controllable Optimization Approach
Jinqiu Jin; Haoxuan Li; Fuli Feng; Sihao Ding; Peng Wu; Xiangnan He;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce social attribute-aware IGF metrics from the perspective of social utility, and propose a new IGF problem that considers both direct and social utilities.
469, Unleashing The Power of Graph Data Augmentation on Covariate Shift
Yongduo Sui; Qitian Wu; Jiancan Wu; Qing Cui; Longfei Li; Jun Zhou; Xiang Wang; Xiangnan He;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, most strategies, such as invariant learning or graph augmentation, typically struggle with limited training environments or perturbed stable features, thus exposing limitations in handling the covariate shift issue. To address this, we develop a simple yet effective data augmentation strategy, Adversarial Invariant Augmentation (AIA), to handle the graph covariate shift.
470, Understanding Contrastive Learning Via Distributionally Robust Optimization
Junkang Wu; Jiawei Chen; Jiancan Wu; Wentao Shi; Xiang Wang; Xiangnan He;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (\eg labels).
471, Evaluating Post-hoc Explanations for Graph Neural Networks Via Robustness Analysis
Junfeng Fang; Wei Liu; Xiang Wang; Zemin Liu; An Zhang; Yuan Gao; Xiangnan He;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Conversely, in this work, we endeavor to confront the issue via introducing a novel evaluation metric, termed **O**OD-resistant **A**dversarial **R**obustness (OAR).
472, 3D Open-vocabulary Segmentation with Foundation Models
Kunhao Liu; Fangneng Zhan; Jiahui Zhang; MUYU XU; Yingchen Yu; Abdulmotaleb El Saddik; Christian Theobalt; Eric Xing; Shijian Lu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We tackle the challenges in 3D open-vocabulary segmentation by exploiting the open-vocabulary multimodal knowledge and object reasoning capability of pre-trained foundation models CLIP and DINO, without necessitating any fine-tuning.
473, MixFormerV2: Efficient Fully Transformer Tracking
Yutao Cui; Tianhui Song; Gangshan Wu; Limin Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, their efficiency remains an obstacle to practical deployment on both GPU and CPU platforms. In this paper, to overcome this issue, we propose a fully transformer tracking framework, coined as \emph{MixFormerV2}, without any dense convolutional operation and complex score prediction module.
474, Solving A Class of Non-Convex Minimax Optimization in Federated Learning
Xidong Wu; Jianhui Sun; Zhengmian Hu; Aidong Zhang; Heng Huang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we study a class of federated nonconvex minimax optimization problems.
475, Federated Conditional Stochastic Optimization
Xidong Wu; Jianhui Sun; Zhengmian Hu; Junyi Li; Aidong Zhang; Heng Huang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper considers the nonconvex conditional stochastic optimization in federated learning and proposes the first federated conditional stochastic optimization algorithm (FCSG) with a conditional stochastic gradient estimator.
476, Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks
Yiwen Kou; Zixiang Chen; Quanquan Gu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Therefore, implicit bias in non-smooth neural networks trained by gradient descent remains an open question. In this paper, we aim to answer this question by studying the implicit bias of gradient descent for training two-layer fully connected (leaky) ReLU neural networks.
477, ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields
Jiahua Dong; Yu-Xiong Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce ViCA-NeRF, a view-consistency-aware method for 3D editing with text instructions.
478, EDGI: Equivariant Diffusion for Planning with Embodied Agents
Johann Brehmer; Joey Bose; Pim de Haan; Taco Cohen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group ℤ, and the object permutation group Sₙ.
479, Geometric Algebra Transformers
Johann Brehmer; Pim de Haan; Sönke Behrends; Taco Cohen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data.
480, Diversify \& Conquer: Outcome-directed Curriculum RL Via Out-of-Distribution Disagreement
Daesol Cho; Seungjae Lee; H. Jin Kim;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called $\textbf{D}$iversify for $\textbf{D}$isagreement \& $\textbf{C}$onquer ($\textbf{D2C}$).
481, Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model
Jiankai Sun; Yiqi Jiang; Jianing Qiu; Parth Nobel; Mykel J Kochenderfer; Mac Schwager;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we quantify the uncertainty of dynamics models using Conformal Prediction (CP), which is an effective technique for constructing prediction sets that achieve valid coverage.
482, The Expressive Power of Pooling in Graph Neural Networks
Filippo Maria Bianchi; Veronica Lachi;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this work, we derive sufficient conditions for a pooling operator to fully preserve the expressive power of the MP layers before it.
483, Learning to Reason and Memorize with Self-Notes
Jack Lanchantin; Shubham Toshniwal; Jason Weston; arthur szlam; Sainbayar Sukhbaatar;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes.
484, BiMatting: Efficient Video Matting Via Binarization
Haotong Qin; Lei Ke; Xudong Ma; Martin Danelljan; Yu-Wing Tai; Chi-Keung Tang; Xianglong Liu; Fisher Yu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, binarization of the video matting model is not a straightforward process, and our empirical analysis has revealed two primary bottlenecks: severe representation degradation of the encoder and massive redundant computations of the decoder. To address these issues, we propose BiMatting, an accurate and efficient video matting model using binarization.
485, Adversarial Training for Graph Neural Networks
Lukas Gosch; Simon Geisler; Daniel Sturm; Bertrand Charpentier; Daniel Zügner; Stephan Günnemann;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In the pursuit of fixing adversarial training (1) we show and overcome fundamental theoretical as well as practical limitations of the adopted graph learning setting in prior work; (2) we reveal that more flexible GNNs based on learnable graph diffusion are able to adjust to adversarial perturbations, while the learned message passing scheme is naturally interpretable; (3) we introduce the first attack for structure perturbations that, while targeting multiple nodes at once, is capable of handling global (graph-level) as well as local (node-level) constraints.
486, ProPILE: Probing Privacy Leakage in Large Language Models
Siwon Kim; Sangdoo Yun; Hwaran Lee; Martin Gubri; Sungroh Yoon; Seong Joon Oh;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services.
487, BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment
Jiaming Ji; Mickel Liu; Josef Dai; Xuehai Pan; Chi Zhang; Ce Bian; Boyuan Chen; Ruiyang Sun; Yizhou Wang; Yaodong Yang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs).
488, Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark
Jiaming Ji; Borong Zhang; Jiayi Zhou; Xuehai Pan; Weidong Huang; Ruiyang Sun; Yiran Geng; Josef Dai; Yaodong Yang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input.
489, IEBins: Iterative Elastic Bins for Monocular Depth Estimation
Shuwei Shao; Zhongcai Pei; Xingming Wu; Zhong Liu; Weihai Chen; Zhengguo Li;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose a novel concept of iterative elastic bins (IEBins) for the classification-regression-based MDE.
490, A New Perspective on Building Efficient and Expressive 3D Equivariant Graph Neural Networks
weitao Du; Yuanqi Du; Limei Wang; Dieqiao Feng; Guifeng Wang; Shuiwang Ji; Carla Gomes; Zhi-Ming Ma;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches.
491, OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Hugo Laurençon; Lucile Saulnier; Leo Tronchon; Stas Bekman; Amanpreet Singh; Anton Lozhkov; Thomas Wang; Siddharth Karamcheti; Alexander Rush; Douwe Kiela; Matthieu Cord; Victor Sanh;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce the OBELICS dataset, an open web-scale filtered dataset of interleaved image-text documents comprising 141 million web pages extracted from Common Crawl, 353 million associated images, and 115 billion text tokens.
492, M$^2$Hub: Unlocking The Potential of Machine Learning for Materials Discovery
Yuanqi Du; Yingheng Wang; Yining Huang; Jianan Canal Li; Yanqiao Zhu; Tian Xie; Chenru Duan; John Gregoire; Carla Gomes;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery.
493, Greedy Poisson Rejection Sampling
Gergely Flamich;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we conclusively solve one-shot channel simulation for one-dimensional problems where the target-proposal density ratio is unimodal by describing an algorithm with optimal runtime.
494, Rethinking The Role of Token Retrieval in Multi-Vector Retrieval
Jinhyuk Lee; Zhuyun Dai; Sai Meher Karthik Duddu; Tao Lei; Iftekhar Naim; Ming-Wei Chang; Vincent Zhao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval.
495, Real-Time Motion Prediction Via Heterogeneous Polyline Transformer with Relative Pose Encoding
Zhejun Zhang; Alexander Liniger; Christos Sakaridis; Fisher Yu; Luc V Gool;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
496, NICE: NoIse-modulated Consistency REgularization for Data-Efficient GANs
Yao Ni; Piotr Koniusz;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: The bottleneck imposed by limited data presents substantial obstacles for efficient training of GAN, including discriminator overfitting and training instability. In this paper, we present a novel approach called NoIse-modulated Consistency rEgularization (NICE) to overcome these challenges.
497, A Bayesian Perspective On Training Data Attribution
Elisa Nguyen; Minjoon Seo; Seong Joon Oh;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we introduce a Bayesian perspective on the TDA task, where the learned model is treated as a Bayesian posterior and the TDA estimates as random variables.
498, Estimating The Rate-Distortion Function By Wasserstein Gradient Descent
Yibo Yang; Stephan Eckstein; Marcel Nutz; Stephan Mandt;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose a method to compute $R(D)$ based on Wasserstein gradient descent.
499, Riemannian Laplace Approximations for Bayesian Neural Networks
Federico Bergamin; Pablo Moreno-Muñoz; Søren Hauberg; Georgios Arvanitidis;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose a simple parametric approximate posterior that adapts to the shape of the true posterior through a Riemannian metric that is determined by the log-posterior gradient.
500, Is This Loss Informative? Faster Text-to-Image Customization By Tracking Objective Dynamics
Anton Voronov; Mikhail Khoroshikh; Artem Babenko; Max Ryabinin;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, many efficient methods of adaptation have a long training time, which limits their practical applications, slows down research experiments, and spends excessive GPU resources. In this work, we study the training dynamics of popular text-to-image personalization methods (such as Textual Inversion or DreamBooth), aiming to speed them up.
501, Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
Haoran Chen; Xintong Han; Zuxuan Wu; Yu-Gang Jiang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Inspired by recent advances in prompt learning that adapts high-capacity models for downstream tasks in a computationally economic way, we introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA.
502, Embedding Space Interpolation Beyond Mini-Batch, Beyond Pairs and Beyond Examples
Shashanka Venkataramanan; Ewa Kijak; laurent amsaleg; Yannis Avrithis;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In most methods, the number of generated examples is limited to the mini-batch size and the number of examples being interpolated is limited to two (pairs), in the input space. We make progress in this direction by introducing MultiMix, which generates an arbitrarily large number of interpolated examples beyond the mini-batch size and interpolates the entire mini-batch in the embedding space.
503, Approximation-Generalization Trade-offs Under (Approximate) Group Equivariance
Mircea Petrache; Shubhendu Trivedi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Moreover, it is posited that when the data and/or the model exhibits only approximate or partial symmetry, the optimal or best-performing model is one where the model symmetry aligns with the data symmetry. In this paper, we conduct a formal unified investigation of these intuitions.
504, Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen; Tengyu Ma; Zhiyuan Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks:(1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize poorly, and (3) perhaps most strikingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize.
505, On The Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: An Improved Analysis
Ta Duy Nguyen; Alina Ene; Huy Nguyen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we revisit the generalization error of stochastic mirror descent for quadratically bounded losses studied in Telgarsky (2022).
506, DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models
威佳 吴; Yuzhong Zhao; Hao Chen; Yuchao Gu; Rui Zhao; Yefei He; Hong Zhou; Mike Zheng Shou; Chunhua Shen;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse syntheticimages and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth).
507, Adversarial Examples Might Be Avoidable: The Role of Data Concentration in Adversarial Robustness
Ambar Pal; Jeremias Sulam; Rene Vidal;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists.
508, Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration
Dongyoung Kim; Jinwoo Shin; Pieter Abbeel; Younggyo Seo;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we present a novel exploration technique that maximizes the value-conditional state entropy, which separately estimates the state entropies that are conditioned on the value estimates of each state, then maximizes their average.
509, SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
Ziyi Wu; Jingyu Hu; Wuyue Lu; Igor Gilitschenski; Animesh Garg;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we focus on improving slot-to-image decoding, a crucial aspect for high-quality visual generation.
510, Many-body Approximation for Non-negative Tensors
Kazu Ghalamkari; Mahito Sugiyama; Yoshinobu Kawahara;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present an alternative approach to decompose non-negative tensors, called many-body approximation.
511, BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization
Darko Drakulic; Sofia Michel; Florian Mai; Arnaud Sors; Jean-Marc Andreoli;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we present a novel formulation of Combinatorial Optimization Problems (COPs) as Markov Decision Processes (MDPs) that effectively leverages common symmetries of COPs to improve out-of-distribution robustness.
512, Exploring Why Object Recognition Performance Degrades Across Income Levels and Geographies with Factor Annotations
Laura Gustafson; Megan Richards; Melissa Hall; Caner Hazirbas; Diane Bouchacourt; Mark Ibrahim;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We study a range of modern vision models, finding that performance disparities are most associated with differences in _texture, occlusion_, and images with _darker lighting_.
513, Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
Maxime Chevalier-Boisvert; Bolun Dai; Mark Towers; Rodrigo Perez-Vicente; Lucas Willems; Salem Lahlou; Suman Pal; Pablo Samuel Castro; J Terry;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We present the Minigrid and Miniworld libraries, which provide a suite of modular and highly customizable simulation environments to facilitate the development of reinforcement learning (RL) algorithms for solving goal-oriented tasks.
514, ADGym: Design Choices for Deep Anomaly Detection
Minqi Jiang; Chaochuan Hou; Ao Zheng; Songqiao Han; Hailiang Huang; Qingsong Wen; Xiyang Hu; Yue Zhao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Meanwhile, we may neglect the contribution of other meaningful prerequisite steps like preprocessing by giving all credits to newly designed loss functions and/or architectures. In this paper, we address the above gaps by answering: (i) which components (i.e., design choices) of deep AD methods play crucial roles in detecting anomalies?
515, QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution
Haotong Qin; Yulun Zhang; Yifu Ding; Yifan liu; Xianglong Liu; Martin Danelljan; Fisher Yu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, many quantized SR models suffer from accuracy degradation compared to their full-precision counterparts, especially at ultra-low bit widths (2-4 bits), limiting their practical applications. To address this issue, we propose a novel quantized image SR network, called QuantSR, which achieves accurate and efficient SR processing under low-bit quantization.
516, Graphs Contrastive Learning with Stable and Scalable Spectral Encoding
Deyu Bo; Yuan Fang; Yang Liu; Chuan Shi;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, existing spectral-based graph views either ignore the eigenvectors that encode valuable positional information or suffer from high complexity when trying to address the instability of spectral features. To tackle these challenges, we first design an informative, stable, and scalable spectral encoder, termed EigenMLP, to learn effective representations from the spectral features.
517, Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity
Weichao Mao; Haoran Qiu; Chen Wang; Hubertus Franke; Zbigniew Kalbarczyk; Ravishankar Iyer; Tamer Basar;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively.
518, Post Hoc Explanations of Language Models Can Improve Language Models
Satyapriya Krishna; Jiaqi Ma; Dylan Slack; Asma Ghandeharioun; Sameer Singh; Himabindu Lakkaraju;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation.
519, Star-Shaped Denoising Diffusion Probabilistic Models
Andrey Okhotin; Dmitry Molchanov; Arkhipkin Vladimir; Grigory Bartosh; Viktor Ohanesian; Aibek Alanov; Dmitry Vetrov;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we introduce Star-Shaped DDPM (SS-DDPM).
520, Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning
Casey Meehan; Florian Bordes; Pascal Vincent; Kamalika Chaudhuri; Chuan Guo;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we perform a systematic study of the unintended memorization of image-specific information in SSL models -- which we refer to as déjà vu memorization.
521, Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Lingchen Meng; Xiyang Dai; Jianwei Yang; Dongdong Chen; Yinpeng Chen; Mengchen Liu; Yi-Ling Chen; Zuxuan Wu; Lu Yuan; Yu-Gang Jiang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: One popular strategy is to explore extra data with image-level labels, yet it produces limited results due to (1) semantic ambiguity---an image-level label only captures a salient part of the image, ignoring the remaining rich semantics within the image; and (2) location sensitivity---the label highly depends on the locations and crops of the original image, which may change after data transformations like random cropping. To remedy this, we propose RichSem, a simple but effective method, which is robust to learn rich semantics from coarse locations without the need of accurate bounding boxes.
522, Understanding Expertise Through Demonstrations: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning
Siliang Zeng; Chenliang Li; Alfredo Garcia; Mingyi Hong;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose a new algorithmic framework to solve the bi-level optimization problem formulation and provide statistical and computational guarantees of performance for the associated optimal reward estimator.
523, DaTaSeg: Taming A Universal Multi-Dataset Multi-Task Segmentation Model
Xiuye Gu; Yin Cui; Jonathan Huang; Abdullah Rashwan; Xuan Yang; Xingyi Zhou; Golnaz Ghiasi; Weicheng Kuo; Huizhong Chen; Liang-Chieh Chen; David Ross;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg.
524, Statistical Guarantees for Variational Autoencoders Using PAC-Bayesian Theory
Diarra Mbacke; Florence Clerc; Pascal Germain;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Using PAC-Bayesian theory, this work develops statistical guarantees for VAEs.
525, Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise
Ta Duy Nguyen; Thien H Nguyen; Alina Ene; Huy Nguyen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we study the convergence in high probability of clipped gradient methods when the noise distribution has heavy tails, i.e., with bounded $p$th moments, for some $1
526, NeRF Revisited: Fixing Quadrature Instability in Volume Rendering
Mikaela Angelina Uy; Guandao Yang; Kiyohiro Nakayama; Leonidas Guibas; Ke Li;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We propose a mathematically principled solution by reformulating the sample-based rendering equation so that it corresponds to the exact integral under piecewise linear volume density.
527, Neural Approximation of Wasserstein Distance Via A Universal Architecture for Symmetric and Factorwise Group Invariant Functions
Samantha Chen; Yusu Wang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general NN with a sketching idea in order to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets.
528, Knowledge Distillation for High Dimensional Search Index
Zepu Lu; Jin Chen; Defu Lian; ZAIXI ZHANG; Yong Ge; Enhong Chen;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we propose a novel KnowledgeDistillation for high dimensional search index framework (KDindex), with the aim of efficiently learning lightweight indexes by distilling knowledge from high-precision ANNS and MIPS models such as graph-based indexes.
529, Markovian Sliced Wasserstein Distances: Beyond Independent Projections
Khai Nguyen; Tongzheng Ren; Nhat Ho;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To address the problem, we introduce a new family of SW distances, named Markovian sliced Wasserstein (MSW) distance, which imposes a first-order Markov structure on projecting directions.
530, Energy-Based Sliced Wasserstein Distance
Khai Nguyen; Nhat Ho;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To address the issues, we propose to design the slicing distribution as an energy-based distribution that is parameter-free and has the density proportional to an energy function of the projected one-dimensional Wasserstein distance.
531, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Yuchao Gu; Xintao Wang; Jay Zhangjie Wu; Yujun Shi; Yunpeng Chen; Zihan Fan; WUYOU XIAO; Rui Zhao; Shuning Chang; 威佳 吴; Yixiao Ge; Ying Shan; Mike Zheng Shou;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion.
532, BIOT: Biosignal Transformer for Cross-data Learning in The Wild
Chaoqi Yang; M Westover; Jimeng Sun;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To overcome the unique challenges associated with biosignals of various formats, such as mismatched channels, variable sample lengths, and prevalent missing values, we propose a Biosignal Transformer (BIOT).
533, Concept Algebra for Score-based Conditional Model
Zihao Wang; Lin Gui; Jeffrey Negrea; Victor Veitch;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: This paper concerns the structure of learned representations in text-guided generative models, focusing on score-based models.
534, Restart Sampling for Improving Generative Processes
Yilun Xu; Mingyang Deng; Xiang Cheng; Yonglong Tian; Ziming Liu; Tommi Jaakkola;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while stochasticity in SDE contracts accumulated errors. Based on these findings, we propose a novel sampling algorithm called *Restart* in order to better balance discretization errors and contraction.
535, Towards Better Dynamic Graph Learning: New Architecture and Unified Library
Le Yu; Leilei Sun; Bowen Du; Weifeng Lv;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We propose DyGFormer, a new Transformer-based architecture for dynamic graph learning.
536, Stability and Generalization of The Decentralized Stochastic Gradient Descent Ascent Algorithm
Miaoxi Zhu; Li Shen; Bo Du; Dacheng Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we investigate the primal-dual generalization bound of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm using the approach of algorithmic stability under both convex-concave and nonconvex-nonconcave settings.
537, Domain Re-Modulation for Few-Shot Generative Domain Adaptation
Yi Wu; Ziqiang Li; Chaoyue Wang; Heliang Zheng; Shanshan Zhao; Bin Li; Dacheng Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this study, we investigate the task of few-shot Generative Domain Adaptation (GDA), which involves transferring a pre-trained generator from one domain to a new domain using a few reference images.
538, VanillaNet: The Power of Minimalism in Deep Learning
Hanting Chen; Yunhe Wang; Jianyuan Guo; Dacheng Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: In this study, we introduce VanillaNet, a neural network architecture that embraces elegance in design.
539, ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding
Lunhao Duan; Shanshan Zhao; Nan Xue; Mingming Gong; Gui-Song Xia; Dacheng Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we attempt to reduce the costs and model the local geometry prior by developing a new transformer block, named ConDaFormer.
540, Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning
Guozheng Ma; Linrui Zhang; Haoyu Wang; Lu Li; Zilin Wang; Zhen Wang; Li Shen; Xueqian Wang; Dacheng Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To investigate this issue and further explore the potential of DA, this work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy and provides the following insights and improvements: (1) For individual DA operations, we reveal that both ample spatial diversity and slight hardness are indispensable. Building on this finding, we introduce Random PadResize (Rand PR), a new DA operation that offers abundant spatial diversity with minimal hardness.
541, Understanding How Consistency Works in Federated Learning Via Stage-wise Relaxed Initialization
Yan Sun; Li Shen; Dacheng Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: To alleviate the negative impact of the client drift and explore its substance in FL, in this paper, we first design an efficient FL algorithm FedInit, which allows employing the personalized relaxed initialization state at the beginning of each local training stage.
542, All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation
Liyao Tang; Zhe Chen; Shanshan Zhao; Chaoyue Wang; Dacheng Tao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The noise in pseudo-labels may result in significant discrepancies between pseudo-labels and model predictions, thus confusing and affecting the model training greatly. To address this issue, we propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
543, Evaluating Neuron Interpretation Methods of NLP Models
Yimin Fan; Fahim Dalvi; Nadir Durrani; Hassan Sajjad;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The commonly used evaluation metric is suboptimal for several reasons and creating ground truth annotation of neurons is infeasible. In this work, we tackle these challenges and propose an evaluation framework based on the voting theory.
544, Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information
Arman Zharmagambetov; Brandon Amos; Aaron Ferber; Taoan Huang; Bistra Dilkina; Yuandong Tian;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: The training is further challenged by sparse gradients of $\mathbf{g}$, especially for combinatorial solvers. To address these challenges, we propose using a smooth and learnable **Landscape Surrogate** $\mathcal{M}$ as a replacement for $f\circ \mathbf{g}$.
545, The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
Saurabh Saxena; Charles Herrmann; Junhwa Hur; Abhishek Kar; Mohammad Norouzi; Deqing Sun; David Fleet;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly without task-specific architectures and loss functions that are predominant for these tasks.
546, Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian; Yiping Wang; Beidi Chen; Simon Du;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In particular, with a simple predictive loss, how the representation emerges from the gradient \emph{training dynamics} remains a mystery. In this paper, for 1-layer transformer with one self-attention layer plus one decoder layer, we analyze its SGD training dynamics for the task of next token prediction in a mathematically rigorous manner.
547, Allievating The Semantic Gap for Generalized FMRI-to-Image Reconstruction
Tao Fang; Qian Zheng; Gang Pan;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Specifically, we leverage the pre-trained CLIP model to map the training data to a compact feature representation, which essentially extends the sparse semantics of training data to dense ones, thus alleviating the semantic gap for instances within known semantic space(i.e., inside the expanded semantic subspace).
548, Robust Learning with Progressive Data Expansion Against Spurious Correlation
Yihe Deng; Yu Yang; Baharan Mirzasoleiman; Quanquan Gu;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. In light of this, we propose a new training algorithm called **PDE** that efficiently enhances the model's robustness for a better worst-group performance.
549, Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
Xiaoyu Tian; Tao Jiang; Longfei Yun; Yucheng Mao; Huitong Yang; Yue Wang; Yilun Wang; Hang Zhao;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: To support 3D occupancy prediction, we develop a label generation pipeline that produces dense, visibility-aware labels for any given scene.
550, Benchmarking and Analyzing 3D-aware Image Synthesis with A Modularized Codebase
Qiuyu Wang; Zifan Shi; Kecheng Zheng; Yinghao Xu; Sida Peng; Yujun Shen;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: Following the most popular and effective paradigm in this field, which incorporates a neural radiance field (NeRF) into the generator of a generative adversarial network (GAN), we builda well-structured codebase through modularizing the generation process. Such a design allows researchers to develop and replace each module independently, and hence offers an opportunity to fairly compare various approaches and recognize their contributions from the module perspective.
551, Online Learning in Multi-unit Auctions with Uniform Pricing
Simina Branzei; Mahsa Derakhshan; Negin Golrezaei; Yanjun Han;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Both of these rules represent the Walrasian mechanism, where the $K$-th highest bid represents the maximum Walrasian price, while the $(K+1)$-st highest bid represents the minimum Walrasian price. Our contribution is to analyze the bidding strategies and properties of these auctions in both the offline and online settings.
552, Brain Dissection: FMRI-trained Networks Reveal Spatial Selectivity in The Processing of Natural Images
Gabriel Sarch; Michael Tarr; Leila Wehbe; Katerina Fragkiadaki;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this paper, we train networks to directly predict, from scratch, brain responses to images from a large-scale dataset of natural scenes.
553, Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment
Yutong Xia; Yuxuan Liang; Haomin Wen; Xu Liu; Kun Wang; Zhengyang Zhou; Roger Zimmermann;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Spatio-Temporal Graph Neural Networks have emerged as the most popular method for STG forecasting, but they often struggle with temporal out-of-distribution (OoD) issues and dynamic spatial causation. In this paper, we propose a novel framework called CaST to tackle these two challenges via causal treatments.
554, Weakly-Supervised Audio-Visual Segmentation
Shentong Mo; Bhiksha Raj;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation.
555, DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Shentong Mo; Enze Xie; Ruihang Chu; Lanqing Hong; Matthias Niessner; Zhenguo Li;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, it is unclear how the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, named DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers.
556, Efficiently Incorporating Quintuple Interactions Into Geometric Deep Learning Force Fields
Zun Wang; Guoqing Liu; Yichi Zhou; Tong Wang; Bin Shao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose the quintuple network (QuinNet), an end-to-end graph neural network that efficiently expresses many-body interactions up to five-body interactions with ab initio accuracy.
557, Grammar Prompting for Domain-Specific Language Generation with Large Language Models
Bailin Wang; Zi Wang; Xuezhi Wang; Yuan Cao; Rif A. Saurous; Yoon Kim;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: We explore \emph{grammar prompting} as a simple approach for enabling LLMs to use external knowledge and domain-specific constraints, expressed through a grammar expressed in Backus--Naur Form (BNF), during in-context learning.
558, SituatedGen: Incorporating Geographical and Temporal Contexts Into Generative Commonsense Reasoning
Yunxiang Zhang; Xiaojun Wan;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: We introduce a corresponding English dataset consisting of 8,268 contrastive sentence pairs, which are built upon several existing commonsense reasoning benchmarks with minimal manual labor.
559, Feature Likelihood Score: Evaluating The Generalization of Generative Models Using Samples
Marco Jiralerspong; Joey Bose; Ian Gemp; Chongli Qin; Yoram Bachrach; Gauthier Gidel;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: However, current methods for evaluating such models remain incomplete: standard likelihood-based metrics do not always apply and rarely correlate with perceptual fidelity, while sample-based metrics, such as FID, are insensitive to overfitting, i.e., inability to generalize beyond the training set. To address these limitations, we propose a new metric called the Feature Likelihood Score (FLS), a parametric sample-based score that uses density estimation to provide a comprehensive trichotomic evaluation accounting for novelty (i.e., different from the training samples), fidelity, and diversity of generated samples.
560, Learning Regularized Monotone Graphon Mean-Field Games
Fengzhuo Zhang; Vincent Tan; Zhaoran Wang; Zhuoran Yang;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions.
561, RRHF: Rank Responses to Align Language Models with Human Feedback Without Tears
Zheng Yuan; Hongyi Yuan; Chuanqi Tan; Wei Wang; Songfang Huang; Fei Huang;
Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View
Highlight: However, PPO is sensitive to hyperparameters and requires a minimum of four models in its standard implementation, making it hard to train and scale up to larger parameter counts. In contrast, we propose a novel learning paradigm called RRHF, which scores sampled responses from different sources via logarithm of conditional probabilities and learns to align these probabilities with human preferences through ranking loss.
562, Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning
Chengliang Liu; Jie Wen; Yabo Liu; Chao Huang; Zhihao Wu; Xiaoling Luo; Yong Xu;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem.
563, LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer
Haoyu Chen; Hao Tang; Radu Timofte; Luc V Gool; Guoying Zhao;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we propose a novel 3D Transformer framework called LART for 3D motion transfer.
564, Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
Lin Guan; Karthik Valmeekam; Sarath Sreedharan; Subbarao Kambhampati;
Related Papers Related Patents Related Grants Related Venues Related Experts View
Highlight: In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners.
565, Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
Aran Nayebi; Rishi Rajalingham; Mehrdad Jazayeri; Guangyu Robert Yang;
Related Papers Related Patents Related Grants