Paper Digest: ICLR 2023 Highlights
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICLR 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Understanding Embodied Reference with Touch-Line Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study embodied reference understanding: locating referents using embodied gestural cues and language references. |
Yang Li; Xiaoxue Chen; Hao Zhao; Jiangtao Gong; Guyue Zhou; Federico Rossano; Yixin Zhu; |
2 | ISS: Image As Stepping Stone for Text-Guided 3D Shape Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities and to eliminate the need for paired text-shape data. |
Zhengzhe Liu; Peng Dai; Ruihui Li; XIAOJUAN QI; Chi-Wing Fu; |
3 | Structured World Representations Via Block-Slot Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel object-centric representation, called Block-Slot Representation which, unlike the conventional slot representation, provides concept-level disentanglement within a slot. |
Gautam Singh; Yeongbin Kim; Sungjin Ahn; |
4 | Learning to Estimate Single-View Volumetric Flow Motions Without 3D Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the challenging problem of jointly inferring the 3D flow and volumetric densities moving in a fluid from a monocular input video with a deep neural network. |
Erik Franz; Barbara Solenthaler; Nils Thuerey; |
5 | DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from them, we propose a Dynamic Ball Query (DBQ) network to adaptively select a subset of input points according to the input features, and assign the feature transform with a suitable receptive field for each selected point. |
Jinrong Yang; Lin Song; Songtao Liu; Weixin Mao; Zeming Li; Xiaoping Li; Hongbin Sun; Jian Sun; Nanning Zheng; |
6 | Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Edgeformers, a framework built upon graph-enhanced Transformers, to perform edge and node representation learning by modeling texts on edges in a contextualized way. |
Bowen Jin; Yu Zhang; Yu Meng; Jiawei Han; |
7 | Protein Representation Learning By Geometric Structure Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to pretrain protein representations according to their 3D structures. |
Zuobai Zhang; Minghao Xu; Arian Rokkum Jamasb; Vijil Chenthamarakshan; Aurelie Lozano; Payel Das; Jian Tang; |
8 | Any-scale Balanced Samplers for Discrete Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose any-scale balanced samplers to repair the gap in non-local proposals. |
Haoran Sun; Bo Dai; Charles Sutton; Dale Schuurmans; Hanjun Dai; |
9 | Truthful Self-Play Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general framework for evolutionary learning to emergent unbiased state representation without any supervision. |
Shohei Ohsawa; |
10 | Rethinking Symbolic Regression: Morphology and Adaptability in The Context of Evolutionary Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink SR from 2 perspectives: morphology and adaptability.For researchers interested in equation-recovery problems, we also propose a set of conventions that can be used to promote fairness in comparison across SR methods and to reduce unintentional bias. |
Kei Sen Fong; Shelvia Wongso; Mehul Motani; |
11 | Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new multimodal 3D generative model that enables shape-conditioned 3D molecular design by equivariantly encoding molecular shape and variationally encoding chemical identity. |
Keir Adams; Connor W. Coley; |
12 | TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation. |
Rongjie Huang; Jinglin Liu; Huadai Liu; Yi Ren; Lichao Zhang; Jinzheng He; Zhou Zhao; |
13 | A Law of Adversarial Risk, Interpolation, and Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the relationship between label noise and adversarial risk for any data distribution. |
Daniel Paleka; Amartya Sanyal; |
14 | Short-Term Memory Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although they often have superior quality compared to standard DSP methods, this advantage is diminished by higher latency. In this work we propose a method for minimization of latency and memory consumption, called Short-Term Memory Convolution (STMC) and its transposed counterpart. |
Grzegorz Stefański; Krzysztof Arendt; Paweł Daniluk; Bartłomiej Jasik; Artur Szumaczuk; |
15 | StyleMorph: Disentangling Shape, Pose and Appearance Through 3D Morphable Image and Geometry Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce StyleMorph, a 3D generative model that relies on the 3D morphable model paradigm to disentangle shape, pose, object and scene texture for high quality image synthesis. |
Eric-Tuan Le; Edward Bartrum; Iasonas Kokkinos; |
16 | SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we successfully apply SlotFormer to perform video prediction on datasets with complex object interactions. |
Ziyi Wu; Nikita Dvornik; Klaus Greff; Thomas Kipf; Animesh Garg; |
17 | Deconstructing Distributions: A Pointwise Framework of Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new approach: we measure the performance of a collection of models when evaluated at *single input point*. |
Gal Kaplun; Nikhil Ghosh; Saurabh Garg; Boaz Barak; Preetum Nakkiran; |
18 | Trading Information Between Latents in Hierarchical Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The proposal of $\beta$-VAEs breaks this interpretation and generalizes VAEs to application domains beyond generative modeling (e.g., representation learning, clustering, or lossy data compression) by introducing an objective function that allows practitioners to trade off between the information content (“bit rate”) of the latent representation and the distortion of reconstructed data. In this paper, we reconsider this rate/distortion trade-off in the context of hierarchical VAEs, i.e., VAEs with more than one layer of latent variables. |
Tim Z. Xiao; Robert Bamler; |
19 | FairGBM: Gradient Boosting with Fairness Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. |
André Cruz; Catarina G Belém; João Bravo; Pedro Saleiro; Pedro Bizarro; |
20 | DySR: Adaptive Super-Resolution Via Algorithm and System Co-design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, there is no SR model or machine learning system that supports adaptive SR, and enabling adaptive SR model on mobile devices is challenging because adapting model can cause significant framerate drop or even service interruption. To address this challenge, we take an algorithm and system co-design approach and propose DySR that maintains QoS while maximizing the model performance. |
Syed Zawad; Cheng Li; Zhewei Yao; Elton Zheng; Yuxiong He; Feng Yan; |
21 | A CMDP-within-online Framework for Meta-Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of meta-safe reinforcement learning (meta-SRL) through the CMDP-within-online framework. |
Vanshaj Khattar; Yuhao Ding; Bilgehan Sel; Javad Lavaei; Ming Jin; |
22 | Mastering The Game of No-Press Diplomacy Via Human-Regularized Reinforcement Learning and Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We prove that this is a no-regret learning algorithm under a modified utility function. |
Anton Bakhtin; David J Wu; Adam Lerer; Jonathan Gray; Athul Paul Jacob; Gabriele Farina; Alexander H Miller; Noam Brown; |
23 | Pruning Deep Neural Networks from A Sparsity Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may under-prune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. |
Enmao Diao; Ganghua Wang; Jiawei Zhang; Yuhong Yang; Jie Ding; Vahid Tarokh; |
24 | Everybody Needs Good Neighbours: An Unsupervised Locality-based Method for Bias Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new meta-algorithm for debiasing representation learning models, which combines the notions of data locality and accuracy of model fit, such that a supervised debiasing method can optimise fairness between neighbourhoods of poorly vs. well modelled instances as identified by our method. |
Xudong Han; Timothy Baldwin; Trevor Cohn; |
25 | Spacetime Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a general family of representations for directed graphs through connected time-oriented Lorentz manifolds, called spacetimes in general relativity. |
Marc T. Law; James Lucas; |
26 | Quasi-optimal Learning with Continuous Treatments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, it is important to induce a policy class whose support only contains near-optimal actions, and shrink the action-searching area for effectiveness and reliability. To achieve this, we develop a novel \emph{quasi-optimal learning algorithm}, which can be easily optimized in off-policy settings with guaranteed convergence under general function approximations. |
Yuhan Li; Wenzhuo Zhou; Ruoqing Zhu; |
27 | Learning to Extrapolate: A Transductive Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem of developing machine learning systems that retain the power of overparametrized function approximators, while enabling extrapolation to out-of-support testing points when possible. |
Aviv Netanyahu; Abhishek Gupta; Max Simchowitz; Kaiqing Zhang; Pulkit Agrawal; |
28 | Label-free Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This poor performance creates a barrier for adoption in practical real world applications. Motivated by these challenges, we propose \textit{Label-free} CBM which is a framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. |
Tuomas Oikarinen; Subhro Das; Lam M. Nguyen; Tsui-Wei Weng; |
29 | CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. |
Tuomas Oikarinen; Tsui-Wei Weng; |
30 | Predicting Cellular Responses with Variational Causal Inference and Refined Relational Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Predicting the responses of a cell under perturbations may bring important benefits to drug discovery and personalized therapeutics. In this work, we propose a novel graph variational Bayesian causal inference framework to predict a cell’s gene expressions under counterfactual perturbations (perturbations that this cell did not factually receive), leveraging information representing biological knowledge in the form of gene regulatory networks (GRNs) to aid individualized cellular response predictions. |
Yulun Wu; Rob Barton; Zichen Wang; Vassilis N. Ioannidis; Carlo De Donno; Layne C Price; Luis F. Voloch; George Karypis; |
31 | Hard-Meta-Dataset++: Towards Understanding Few-Shot Performance on Difficult Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This phenomenon has real-world consequences for deployed few-shot systems where safety and reliability are paramount, yet little has been done to understand these failure cases. In this paper, we study these difficult tasks to gain a more nuanced understanding of the limitations of current methods. |
Samyadeep Basu; Megan Stanley; John F Bronskill; Soheil Feizi; Daniela Massiceti; |
32 | Data Continuity Matters: Improving Sequence Modeling with Lipschitz Regularizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, few attempts have been made to understand the inherent data property of sequence data, neglecting the critical factor that may significantly affect the performance of sequence modeling. In this paper, we theoretically and empirically analyze a generic property of sequence data, i.e., continuity, and connect this property with the performance of deep models. |
Eric Qu; Xufang Luo; Dongsheng Li; |
33 | Symbolic Physics Learner: Discovering Governing Equations Via Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Distilling analytical expressions that govern nonlinear dynamics from limited data remains vital but challenging. To tackle this fundamental issue, we propose a novel Symbolic Physics Learner (SPL) machine to discover the mathematical structure of nonlinear dynamics. |
Fangzheng Sun; Yang Liu; Jian-Xun Wang; Hao Sun; |
34 | Neural Implicit Shape Editing Using Boundary Sensitivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to classic geometry representations, however, neural representations do not allow the user to exert intuitive control over the shape. Motivated by this, we leverage \emph{boundary sensitivity} to express how perturbations in parameters move the shape boundary. |
Arturs Berzins; Moritz Ibing; Leif Kobbelt; |
35 | Understanding The Role of Nonlinearity in Training Dynamics of Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the role of nonlinearity in the training dynamics of contrastive learning (CL) on one and two-layer nonlinear networks with homogeneous activation $h(x) = h'(x)x$. |
Yuandong Tian; |
36 | Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, we theoretically analyze the gradient flow dynamics to shed light on how data heterogeneity result in dimensional collapse for local models. To remedy this problem caused by the data heterogeneity, we propose FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning. |
Yujun Shi; Jian Liang; Wenqing Zhang; Vincent Tan; Song Bai; |
37 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of large language models up to 16.1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER. |
Erik Nijkamp; Bo Pang; Hiroaki Hayashi; Lifu Tu; Huan Wang; Yingbo Zhou; Silvio Savarese; Caiming Xiong; |
38 | On The Complexity of Nonsmooth Automatic Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. |
Jerome Bolte; Ryan Boustany; Edouard Pauwels; Béatrice Pesquet-Popescu; |
39 | M-L2O: Towards Generalizable Learning-to-Optimize By Test-Time Fast Self-Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such learned optimizers often struggle when new test problems come with a substantially deviation from the training task distribution. This paper investigates a potential solution to this open challenge, by meta-training an L2O optimizer that can perform fast test-time self-adaptation to a out-of-distribution task, in only a few steps. |
Junjie Yang; Xuxi Chen; Tianlong Chen; Zhangyang Wang; Yingbin Liang; |
40 | Benchmarking Deformable Object Manipulation with Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present DaXBench, a differentiable DOM benchmark with a wide object and task coverage. |
Siwei Chen; Cunjun Yu; Yiqing Xu; Linfeng Li; Xiao Ma; Zhongwen Xu; David Hsu; |
41 | DiffMimic: Efficient Motion Mimicking with Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed $\textbf{DiffMimic}$. |
Jiawei Ren; Cunjun Yu; Siwei Chen; Xiao Ma; Liang Pan; Ziwei Liu; |
42 | Thalamus: A Brain-inspired Algorithm for Biologically-plausible Continual Learning and Disentangled Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the brain thalamocortical circuit, we introduce a simple algorithm that uses optimization at inference time to generate internal representations of the current task dynamically. |
Ali Hummos; |
43 | Adversarial Diversity in Hanabi Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach to diverse policy generation for turn-based Dec-POMDPs with public actions, which relies on off-belief learning to encourage reasonableness and skill, and on “repulsive” fictitious transitions to encourage diversity. |
Brandon Cui; Andrei Lupu; Samuel Sokota; Hengyuan Hu; David J Wu; Jakob Nicolaus Foerster; |
44 | CogVideo: Large-scale Pretraining for Text-to-Video Generation Via Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present CogVideo, a 9B-parameter transformer for text-to-video generation. |
Wenyi Hong; Ming Ding; Wendi Zheng; Xinghan Liu; Jie Tang; |
45 | Interpretability in The Wild: A Circuit for Indirect Object Identification in GPT-2 Small Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most previous work either focuses on simple behaviors in small models, or describes complicated behaviors in larger models with broad strokes. In this work, we bridge this gap by presenting an explanation for how GPT-2 small performs a natural language task that requires logical reasoning: indirect object identification (IOI). |
Kevin Ro Wang; Alexandre Variengien; Arthur Conmy; Buck Shlegeris; Jacob Steinhardt; |
46 | Causal Reasoning in The Presence of Latent Confounders Via Neural ADMG Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with nonlinear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows. |
Matthew Ashman; Chao Ma; Agrin Hilmkil; Joel Jennings; Cheng Zhang; |
47 | Offline RL for Natural Language Generation with Implicit Language Q Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue can be addressed by finetuning such models via supervised learning on curated datasets, or via reinforcement learning. In this work, we propose a novel offline RL method, implicit language Q-learning (ILQL), designed for use on language models, that combines both the flexible utility maximization framework of RL algorithms with the ability of supervised learning to leverage previously collected data, as well as its simplicity and stability. |
Charlie Victor Snell; Ilya Kostrikov; Yi Su; Sherry Yang; Sergey Levine; |
48 | Mid-Vision Feedback for Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel mechanism which modulates perception in Convolutional Neural Networks (CNNs) based on high level categorical expectations: Mid-Vision Feedback (MVF). |
Michael Maynord; Eadom T Dessalene; Cornelia Fermuller; Yiannis Aloimonos; |
49 | HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve deep into the comparison between ViT and Swin, revealing that (i) the performance gain of Swin is mainly brought by a deepened backbone and relative positional encoding, (ii) the hierarchical design of Swin can be simplified into hierarchical patch embedding (proposed in this work), and (iii) other designs such as shifted-window attentions can be removed. |
Xiaosong Zhang; Yunjie Tian; Lingxi Xie; Wei Huang; Qi Dai; Qixiang Ye; Qi Tian; |
50 | Generalizing and Decoupling Neural Collapse Via Hyperspherical Uniformity Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how NC characterizes the training target of neural networks, we decouple NC into two objectives: minimal intra-class variability and maximal inter-class separability. We then introduce the concept of hyperspherical uniformity (which characterizes the degree of uniformity on the unit hypersphere) as a unified framework to quantify these two objectives. |
Weiyang Liu; Longhui Yu; Adrian Weller; Bernhard Schölkopf; |
51 | Score-based Generative 3D Mesh Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the regular graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. |
Zhen Liu; Yao Feng; Michael J. Black; Derek Nowrouzezahrai; Liam Paull; Weiyang Liu; |
52 | ISAAC Newton: Input-based Approximate Curvature for Newton’s Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. |
Felix Petersen; Tobias Sutter; Christian Borgelt; Dongsung Huh; Hilde Kuehne; Yuekai Sun; Oliver Deussen; |
53 | Language Models Can Teach Themselves to Program Better Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. |
Patrick Haluptzok; Matthew Bowers; Adam Tauman Kalai; |
54 | Latent Bottlenecked Attentive Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Conversely, existing sub-quadratic NP variants perform significantly worse than that of TNPs. Tackling this issue, we propose Latent Bottlenecked Attentive Neural Processes (LBANPs), a new computationally efficient sub-quadratic NP variant, that has a querying computational complexity independent of the number of context datapoints. |
Leo Feng; Hossein Hajimirsadeghi; Yoshua Bengio; Mohamed Osama Ahmed; |
55 | Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy. |
Lingxiao Wang; Qi Cai; Zhuoran Yang; Zhaoran Wang; |
56 | Learning Kernelized Contextual Bandits in A Distributed and Asynchronous Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, in order to improve the robustness against delays and unavailability of clients that are common in practice, we propose the first asynchronous solution based on approximated kernel regression for distributed kernel bandit learning. |
Chuanhao Li; Huazheng Wang; Mengdi Wang; Hongning Wang; |
57 | GReTo: Remedying Dynamic Graph Topology-task Discordance Via Target Homophily Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit node-wise relationships and explore novel homophily measurements on dynamic graphs with both signs and distances, capturing multiple node-level spatial relations and temporal evolutions. |
Zhengyang Zhou; qihe huang; Gengyu Lin; Kuo Yang; LEI BAI; Yang Wang; |
58 | DocPrompting: Generating Code By Retrieving The Docs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, when human programmers use functions and libraries for the first time, they frequently refer to textual resources such as code manuals and documentation, to explore and understand the available functionality. Inspired by this observation, we introduce DocPrompting: a natural-language-to-code generation approach that explicitly leverages documentation by (1) retrieving the relevant documentation pieces given an NL intent, and (2) generating code based on the NL intent and the retrieved documentation. |
Shuyan Zhou; Uri Alon; Frank F. Xu; Zhengbao Jiang; Graham Neubig; |
59 | SWIFT: Rapid Decentralized Federated Learning Via Wait-Free Model Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. |
Marco Bornstein; Tahseen Rabbani; Evan Z Wang; Amrit Bedi; Furong Huang; |
60 | RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose RoPAWS, a robust extension of PAWS that can work with real-world unlabeled data. |
Sangwoo Mo; Jong-Chyi Su; Chih-Yao Ma; Mido Assran; Ishan Misra; Licheng Yu; Sean Bell; |
61 | Interpretable Geometric Deep Learning Via Learnable Randomness Injection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes a general mechanism based on \learnable randomness injection (LRI) that allows building inherently interpretable models with general GDL backbones.We also propose four scientific datasets in the domains of high energy physics and biochemistry to evaluate LRI. |
Siqi Miao; Yunan Luo; Mia Liu; Pan Li; |
62 | Machine Unlearning of Federated Clusters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes the first known unlearning mechanism for federated clustering with privacy criteria that support simple, provable, and efficient data removal at the client and server level. |
Chao Pan; Jin Sima; Saurav Prakash; Vishal Rana; Olgica Milenkovic; |
63 | PerFedMask: Personalized Federated Learning with Optimized Masking Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a personalized FL algorithm with optimized masking vectors called PerFedMask. |
Mehdi Setayesh; Xiaoxiao Li; Vincent W.S. Wong; |
64 | A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by humans’ remarkable ability to master arithmetic and generalize to unseen problems, we present a new dataset, HINT, to study machines’ capability of learning generalizable concepts at three levels: perception, syntax, and semantics. |
Qing Li; Siyuan Huang; Yining Hong; Yixin Zhu; Ying Nian Wu; Song-Chun Zhu; |
65 | Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and model-based estimates during the policy evaluation step according to their epistemic uncertainties, and facilitates conservatism by taking a lower bound on the Bayesian posterior value estimate. |
Jihwan Jeong; Xiaoyu Wang; Michael Gimelfarb; Hyunwoo Kim; Baher abdulhai; Scott Sanner; |
66 | SAM As An Optimal Relaxation of Bayes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. |
Thomas Möllenhoff; Mohammad Emtiyaz Khan; |
67 | Masked Vision and Language Modeling for Multi-modal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. |
Gukyeong Kwon; Zhaowei Cai; Avinash Ravichandran; Erhan Bas; Rahul Bhotika; Stefano Soatto; |
68 | Extreme Q-Learning: MaxEnt RL Without Entropy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT) inspired by Economics. |
Divyansh Garg; Joey Hejna; Matthieu Geist; Stefano Ermon; |
69 | Direct Embedding of Temporal Network Edges Via Time-Decayed Line Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: First, time is assumed to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations can only be calculated indirectly from the nodes, which may be suboptimal for tasks like edge classification. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. |
Sudhanshu Chanpuriya; Ryan A. Rossi; Sungchul Kim; Tong Yu; Jane Hoffswell; Nedim Lipka; Shunan Guo; Cameron N Musco; |
70 | Scaling Forward Gradient With Local Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to scale forward gradient by adding a large number of local greedy loss functions. |
Mengye Ren; Simon Kornblith; Renjie Liao; Geoffrey Hinton; |
71 | Latent Variable Representation for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle in the face of uncertainty for exploration. |
Tongzheng Ren; Chenjun Xiao; Tianjun Zhang; Na Li; Zhaoran Wang; sujay sanghavi; Dale Schuurmans; Bo Dai; |
72 | Learning in Temporally Structured Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper advances a multiscale learning model in which each weight in a neural network is decomposed into a sum of subweights learning independently with different learning and decay rates. |
Matt Jones; Tyler R. Scott; Mengye Ren; Gamaleldin Fathy Elsayed; Katherine Hermann; David Mayo; Michael Curtis Mozer; |
73 | Learning QUBO Forms in Quantum Annealing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, such explicit formulations impose tangible constraints on solution encodings. In stark contrast to prior work, this paper proposes to learn QUBO forms from data through gradient backpropagation instead of deriving them. |
Marcel Seelbach Benkner; Maximilian Krahn; Edith Tretschk; Zorah Lähner; Michael Moeller; Vladislav Golyanik; |
74 | The Generalized Eigenvalue Problem As A Nash Equilibrium Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a game-theoretic formulation of the top-$k$ SGEP whose Nash equilibrium is the set of generalized eigenvectors. |
Ian Gemp; Charlie Chen; Brian McWilliams; |
75 | $O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with smooth value updates, finds an $O(T^{?1})$ approximate Nash equilibrium in $T$ iterations for two-player zero-sum Markov games with full information. |
Yuepeng Yang; Cong Ma; |
76 | Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds Using Deep Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. |
Xiang Ji; Minshuo Chen; Mengdi Wang; Tuo Zhao; |
77 | Critic Sequential Monte Carlo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. |
Vasileios Lioutas; Jonathan Wilder Lavington; Justice Sefas; Matthew Niedoba; Yunpeng Liu; Berend Zwartsenberg; Setareh Dabiri; Frank Wood; Adam Scibior; |
78 | Basic Binary Convolution Unit for Binarized Image Restoration Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks. |
Bin Xia; Yulun Zhang; Yitong Wang; Yapeng Tian; Wenming Yang; Radu Timofte; Luc Van Gool; |
79 | Knowledge Distillation Based Degradation Estimation for Blind Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). |
Bin Xia; Yulun Zhang; Yitong Wang; Yapeng Tian; Wenming Yang; Radu Timofte; Luc Van Gool; |
80 | Spectral Decomposition Representation for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current spectral methods suffer from limited applicability because they are constructed for state-only aggregation and are derived from a policy-dependent transition kernel, without considering the issue of exploration. To address these issues, we propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning. |
Tongzheng Ren; Tianjun Zhang; Lisa Lee; Joseph E. Gonzalez; Dale Schuurmans; Bo Dai; |
81 | Fake It Until You Make It : Towards Accurate Near-Distribution Novelty Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first demonstrate existing methods could experience up to 20\% decrease in their AUCs in the near-distribution setting. Next, we propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data. Our model is then fine-tuned to distinguish such data from the normal samples. We make quantitative as well as qualitative evaluation of this strategy, and compare the results with a variety of GAN-based models. |
Hossein Mirzaei; Mohammadreza Salehi; Sajjad Shahabi; Efstratios Gavves; Cees G. M. Snoek; Mohammad Sabokrou; Mohammad Hossein Rohban; |
82 | Contextual Image Masking Modeling Via Synergized Contrasting Without View Augmentation for Faster and Better Visual Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new contextual masking image modeling (MIM) approach called contrasting-aided contextual MIM (ccMIM), under the MIM paradigm for visual pretraining. |
Shaofeng Zhang; Feng Zhu; Rui Zhao; Junchi Yan; |
83 | Patch-Level Contrasting Without Patch Correspondence for Accurate and Dense Contrastive Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ADCLR: \underline{A}ccurate and \underline{D}ense \underline{C}ontrastive \underline{R}epresentation \underline{L}earning, a novel self-supervised learning framework for learning accurate and dense vision representation. |
Shaofeng Zhang; Feng Zhu; Rui Zhao; Junchi Yan; |
84 | A Learning Based Hypothesis Test for Harmful Covariate Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we define harmful covariate shift (HCS) as a change in distribution that may weaken the generalization of a predictive model. |
Tom Ginsberg; Zhongyuan Liang; Rahul G Krishnan; |
85 | Backpropagation at The Infinitesimal Inference Limit of Energy-Based Models: Unifying Predictive Coding, Equilibrium Propagation, and Contrastive Hebbian Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we provide a comprehensive theory of the conditions under which EBMs can approximate BP, which lets us unify many of the BP approximation results in the literature (namely, predictive coding, equilibrium propagation, and contrastive Hebbian learning) and demonstrate that their approximation to BP arises from a simple and general mathematical property of EBMs at free-phase equilibrium. |
Beren Millidge; Yuhang Song; Tommaso Salvatori; Thomas Lukasiewicz; Rafal Bogacz; |
86 | Re-Imagen: Retrieval-Augmented Text-to-Image Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though state-of-the-art models can generate high-quality images of common entities, they often have difficulty generating images of uncommon entities, such as `Chortai (dog)’ or `Picarones (food)’. To tackle this issue, we present the Retrieval-Augmented Text-to-Image Generator (Re-Imagen), a generative model that uses retrieved information to produce high-fidelity and faithful images, even for rare or unseen entities. |
Wenhu Chen; Hexiang Hu; Chitwan Saharia; William W. Cohen; |
87 | Task-customized Masked Autoencoder Via Mixture of Cluster-conditional Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE’s scalability. To address this issue, we propose a novel MAE based pre-training paradigm, named Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provide customized pre-training models for diverse downstream tasks. |
Zhili LIU; Kai Chen; Jianhua Han; Lanqing HONG; Hang Xu; Zhenguo Li; James Kwok; |
88 | A Theoretical Framework for Inference and Learning in Predictive Coding Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a comprehensive theoretical analysis of the properties of PCNs trained with prospective configuration. |
Beren Millidge; Yuhang Song; Tommaso Salvatori; Thomas Lukasiewicz; Rafal Bogacz; |
89 | Learning to Grow Pretrained Models for Efficient Transformer Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes an approach for accelerating transformer training by learning to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. |
Peihao Wang; Rameswar Panda; Lucas Torroba Hennigen; Philip Greengard; Leonid Karlinsky; Rogerio Feris; David Daniel Cox; Zhangyang Wang; Yoon Kim; |
90 | A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Curiously, while these approaches all state to address the same eventual goal of detecting failures of a classifier upon real-life application, they currently constitute largely separated research fields with individual evaluation protocols, which either exclude a substantial part of relevant methods or ignore large parts of relevant failure sources. In this work, we systematically reveal current pitfalls caused by these inconsistencies and derive requirements for a holistic and realistic evaluation of failure detection. |
Paul F Jaeger; Carsten Tim Lüth; Lukas Klein; Till J. Bungert; |
91 | Generating Intuitive Fairness Specifications for Natural Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. |
Florian E. Dorner; Momchil Peychev; Nikola Konstantinov; Naman Goel; Elliott Ash; Martin Vechev; |
92 | PiFold: Toward Effective and Efficient Protein Inverse Folding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. |
Zhangyang Gao; Cheng Tan; Stan Z. Li; |
93 | Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give generalization bounds for downstream linear prediction using our kernel PCA representation, and show empirically on a set of synthetic tasks that applying kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength. |
Daniel D. Johnson; Ayoub El Hanchi; Chris J. Maddison; |
94 | Provably Auditing Ordinary Least Squares in Low Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop provable and efficient algorithms for estimating stability of OLS to dropping samples in the low-dimensional regime. |
Ankur Moitra; Dhruv Rohatgi; |
95 | Learning Sparse Group Models Through Boolean Relaxation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an efficient algorithmic framework for learning sparse group models formulated as the natural convex relaxation of a cardinality-constrained program with Boolean variables. |
Yijie Wang; Yuan Zhou; Xiaoqing Huang; Kun Huang; Jie Zhang; Jianzhu Ma; |
96 | QAID: Question Answering Inspired Few-shot Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method achieve SOTA results on few-shot intent detection by combining Question-Answering architecture, Contrastive Learning techniques and use of the intent name as answer. |
Asaf Yehudai; Matan Vetzler; Yosi Mass; Koren Lazar; Doron Cohen; Boaz Carmeli; |
97 | Out-of-distribution Representation Learning for Time Series Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to view time series classification from the distribution perspective. |
Wang Lu; Jindong Wang; Xinwei Sun; Yiqiang Chen; Xing Xie; |
98 | Neural DAG Scheduling Via One-Shot Priority Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of scheduling operations/nodes, the dependency among which is characterized by a Directed Acyclic Graph (DAG). |
Wonseok Jeon; Mukul Gagrani; Burak Bartan; Weiliang Will Zeng; Harris Teague; Piero Zappi; Christopher Lott; |
99 | Efficiently Computing Nash Equilibria in Adversarial Team Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players—in the absence of any explicit coordination or communication—is competing against an adversarial player. |
Fivos Kalogiannis; Ioannis Anagnostides; Ioannis Panageas; Emmanouil-Vasileios Vlatakis-Gkaragkounis; Vaggos Chatziafratis; Stelios Andrew Stavroulakis; |
100 | Graph Neural Network-Inspired Kernels for Gaussian Processes in Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Their competitive performance is often attributed to a proper capturing of the graph inductive bias. In this work, we introduce this inductive bias into GPs to improve their predictive performance for graph-structured data. |
Zehao Niu; Mihai Anitescu; Jie Chen; |
101 | SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile semantic segmentation. |
Qiang Wan; Jiachen Lu; Zilong Huang; Gang YU; Li Zhang; |
102 | Differentiable Gaussianization Layers for Inverse Problems Regularized By Deep Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In such cases, deep generative models are ineffective in attaining high-fidelity solutions. To address this issue, we propose to reparameterize and Gaussianize the latent tensors using novel differentiable data-dependent layers wherein custom operators are defined by solving optimization problems. |
Dongzhuo Li; |
103 | Approximate Vanishing Ideal Computations at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we scale up the Oracle Approximate Vanishing Ideal algorithm (OAVI), the only generator-constructing algorithm with known learning guarantees. |
Elias Samuel Wirth; Hiroshi Kera; Sebastian Pokutta; |
104 | SoftMatch: Addressing The Quantity-Quality Tradeoff in Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training, effectively exploiting the unlabeled data. |
Hao Chen; Ran Tao; Yue Fan; Yidong Wang; Marios Savvides; Jindong Wang; Bhiksha Raj; Xing Xie; Bernt Schiele; |
105 | Learning Uncertainty for Unknown Domains with Zero-Target-Assumption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce our Maximum-Entropy Rewarded Reinforcement Learning (MERRL) framework that selects training data for more accurate Natural Language Processing (NLP). |
Yu Yu; Hassan Sajjad; Jia Xu; |
106 | Scalable and Equivariant Spherical CNNs By Discrete-Continuous (DISCO) Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a hybrid discrete-continuous (DISCO) group convolution that is simultaneously equivariant and computationally scalable to high-resolution. |
Jeremy Ocampo; Matthew Alexander Price; Jason McEwen; |
107 | FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the analysis, we hence propose FreeMatch to define and adjust the confidence threshold in a self-adaptive manner according to the model’s learning status. |
Yidong Wang; Hao Chen; Qiang Heng; Wenxin Hou; Yue Fan; Zhen Wu; Jindong Wang; Marios Savvides; Takahiro Shinozaki; Bhiksha Raj; Bernt Schiele; Xing Xie; |
108 | Can Discrete Information Extraction Prompts Generalize Across Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a way to induce prompts by mixing language models at training time that results in prompts that generalize well across models. |
Nathanaël Carraz Rakotonirina; Roberto Dessi; Fabio Petroni; Sebastian Riedel; Marco Baroni; |
109 | Disentangling The Mechanisms Behind Implicit Regularization in SGD Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. |
Zachary Novack; Simran Kaur; Tanya Marwah; Saurabh Garg; Zachary Chase Lipton; |
110 | Transformer-based World Models Are Happy With 100k Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. |
Jan Robine; Marc Höftmann; Tobias Uelwer; Stefan Harmeling; |
111 | Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take a different route—we explicitly enhance input-output connections by maximizing their mutual information. |
Ye Zhu; Yu Wu; Kyle Olszewski; Jian Ren; Sergey Tulyakov; Yan Yan; |
112 | Confidential-PROFITT: Confidential PROof of FaIr Training of Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method to provide a confidential proof of fairness for training, in the context of widely used decision trees, which we term Confidential-PROFITT. |
Ali Shahin Shamsabadi; Sierra Calanda Wyllie; Nicholas Franzese; Natalie Dullerud; Sébastien Gambs; Nicolas Papernot; Xiao Wang; Adrian Weller; |
113 | DCI-ES: An Extended Disentanglement Framework with Connections to Identifiability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our main idea is that the functional capacity required to use a representation is an important but thus-far neglected aspect of representation quality, which we quantify using explicitness or ease-of-use (E). |
Cian Eastwood; Andrei Liviu Nicolicioiu; Julius Von Kügelgen; Armin Kekić; Frederik Träuble; Andrea Dittadi; Bernhard Schölkopf; |
114 | Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods rely on intuitive assumptions and lack mathematical guarantees. To bridge this gap, we introduce Bort, an optimizer for improving model explainability with boundedness and orthogonality constraints on model parameters, derived from the sufficient conditions of model comprehensibility and transparency. |
Borui Zhang; Wenzhao Zheng; Jie Zhou; Jiwen Lu; |
115 | Faster Federated Optimization Under Second-order Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider finite-sum federated optimization under a second-order function similarity condition and strong convexity, and propose two new algorithms: SVRP and Catalyzed SVRP. |
Ahmed Khaled; Chi Jin; |
116 | The Augmented Image Prior: Distilling 1000 Classes By Extrapolating from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While any image obviously cannot contain the multitudes of all existing objects, scenes and lighting conditions — within the space of all $256^{3\cdot224\cdot224}$ possible $224$-sized square images, it might still provide a strong prior for natural images. To analyze this “augmented image prior” hypothesis, we develop a simple framework for training neural networks from scratch using a single image and augmentations using knowledge distillation from a supervised pretrained teacher. |
Yuki M Asano; Aaqib Saeed; |
117 | Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key idea is to factorize canonical shapes and articulated object poses from input articulated shapes through part-level equivariant shape analysis. |
Xueyi Liu; Ji Zhang; Ruizhen Hu; Haibin Huang; He Wang; Li Yi; |
118 | Schema Inference for Interpretable Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a novel inference paradigm, termed as schema inference, that learns to deductively infer the explainable predictions by rebuilding the prior deep neural network (DNN) forwarding scheme, guided by the prevalent philosophical cognitive concept of schema. |
Haofei Zhang; Xiaokang Liu; Mengqi Xue; Kaixuan Chen; Jie Song; Mingli Song; |
119 | Autoencoders As Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training Autoencoders as Cross-Modal Teachers (ACT). |
Runpei Dong; Zekun Qi; Linfeng Zhang; Junbo Zhang; Jianjian Sun; Zheng Ge; Li Yi; Kaisheng Ma; |
120 | Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this line of research is still in its infancy, where (1) unified structural conditions enabling sample-efficient learning are lacking; (2) existing sample complexities for known tractable subclasses are far from sharp; and (3) fewer sample-efficient algorithms are available than in fully observable RL. This paper advances all three aspects above for Partially Observable RL in the general setting of Predictive State Representations (PSRs). |
Fan Chen; Yu Bai; Song Mei; |
121 | Towards Lightweight, Model-Agnostic and Diversity-Aware Active Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, most existing AAD approaches are specially tailored for a certain unsupervised detector, making it difficult to extend to other detection models. To tackle these problems, we propose a lightweight, model-agnostic and diversity-aware AAD method, named LMADA. |
Xu Zhang; Yuan Zhao; Ziang Cui; Liqun Li; Shilin He; Qingwei Lin; Yingnong Dang; Saravan Rajmohan; Dongmei Zhang; |
122 | Complexity-Based Prompting for Multi-step Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. |
Yao Fu; Hao Peng; Ashish Sabharwal; Peter Clark; Tushar Khot; |
123 | Gromov-Wasserstein Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel representation learning method, Gromov-Wasserstein Autoencoders (GWAE), which directly matches the latent and data distributions using the variational autoencoding scheme. |
Nao Nakagawa; Ren Togo; Takahiro Ogawa; Miki Haseyama; |
124 | Moving Forward By Moving Backward: Embedding Action Impact Over Action Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of relying that the impact of an action stably reflects its pre-defined semantic meaning, we propose to model the impact of actions on-the-fly using latent embeddings. |
Kuo-Hao Zeng; Luca Weihs; Roozbeh Mottaghi; Ali Farhadi; |
125 | Decomposed Prompting: A Modular Approach for Solving Complex Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a library of prompting-based LLMs dedicated to these sub-tasks. |
Tushar Khot; Harsh Trivedi; Matthew Finlayson; Yao Fu; Kyle Richardson; Peter Clark; Ashish Sabharwal; |
126 | UNICORN: A Unified Backdoor Trigger Inversion Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work formally defines and analyzes the trigger and the inversion problem. |
Zhenting Wang; Kai Mei; Juan Zhai; Shiqing Ma; |
127 | How Gradient Estimator Variance and Bias Impact Learning in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. |
Arna Ghosh; Yuhan Helena Liu; Guillaume Lajoie; Konrad Kording; Blake Aaron Richards; |
128 | Sampling Is As Easy As Learning The Score: Theory for Diffusion Models with Minimal Data Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2. |
Sitan Chen; Sinho Chewi; Jerry Li; Yuanzhi Li; Adil Salim; Anru Zhang; |
129 | Post-hoc Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). |
Mert Yuksekgonul; Maggie Wang; James Zou; |
130 | Is A Caption Worth A Thousand Images? A Study on Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by our findings, we devise simple data and algorithmic interventions to improve the transfer performance of CLIP-style models. |
Shibani Santurkar; Yann Dubois; Rohan Taori; Percy Liang; Tatsunori Hashimoto; |
131 | Continual Post-Training of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing research has shown that post-training or adapting an LM using an unlabeled topical/domain corpus can improve the end-task performance in the domain. This paper proposes a novel method to continually post-train an LM with a sequence of unlabeled domain corpora to adapt the LMto these domains to improve their end-task performances. |
Zixuan Ke; Haowei Lin; Yijia Shao; Tatsuya Konishi; Gyuhak Kim; Bing Liu; |
132 | Learning to Generate Columns with Application to Vertex Coloring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new column generation approach based on Machine Learning (ML) for solving combinatorial optimization problems. |
Yuan Sun; Andreas T Ernst; Xiaodong Li; Jake Weiner; |
133 | Constraining Representations Yields Models That Know What They Don’t Know Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model’s internal activation patterns. |
Joao Monteiro; Pau Rodriguez; Pierre-Andre Noel; Issam H. Laradji; David Vazquez; |
134 | Temporal Domain Generalization with Drift-Aware Dynamic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address them, we propose a Temporal Domain Generalization with Drift-Aware Dynamic Neural Network (DRAIN) framework. |
Guangji Bai; Chen Ling; Liang Zhao; |
135 | Causal Estimation for Text Data with (Apparent) Overlap Violations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The purpose of this paper is to show how to handle causal identification and obtain robust causal estimation in the presence of apparent overlap violations. |
Lin Gui; Victor Veitch; |
136 | A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Visual room rearrangement evaluates an agent’s ability to rearrange objects in a room to a desired goal based solely on visual input. We propose a simple yet effective method for this problem: (1) search for and map which objects need to be rearranged, and (2) rearrange each object until the task is complete. |
Brandon Trabucco; Gunnar A Sigurdsson; Robinson Piramuthu; Gaurav S. Sukhatme; Ruslan Salakhutdinov; |
137 | Improved Training of Physics-Informed Neural Networks Using Energy-Based Priors: A Study on Electrical Impedance Tomography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Bayesian approach through a data-driven energy-based model (EBM) as a prior, to improve the overall accuracy and quality of tomographic reconstruction. |
Akarsh Pokkunuru; Pedram Rooshenas; Thilo Strauss; Anuj Abhishek; Taufiquar Khan; |
138 | ESD: Expected Squared Difference As A Tuning-Free Trainable Calibration Measure Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a tuning-free calibration obejctive loss Expected Squared Difference (ESD), where we view the calibration error from the perspective of the squared difference between two expectations. |
Hee Suk Yoon; Joshua Tian Jin Tee; Eunseop Yoon; Sunjae Yoon; Gwangsu Kim; Yingzhen Li; Chang D. Yoo; |
139 | An Extensible Multi-modal Multi-task Object Dataset with Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present EMMa, an Extensible, Multimodal dataset of Amazon product listings that contains rich Material annotations. |
Trevor Scott Standley; Ruohan Gao; Dawn Chen; Jiajun Wu; Silvio Savarese; |
140 | Does Zero-Shot Reinforcement Learning Exist? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Strategies for approximate zero-shot RL have been suggested using successor features (SFs) (Borsa et al., 2018) or forward-backward (FB) representations (Touati & Ollivier, 2021), but testing has been limited. After clarifying the relationships between these schemes, we introduce improved losses and new SF models, and test the viability of zero-shot RL schemes systematically on tasks from the Unsupervised RL benchmark (Laskin et al., 2021). |
Ahmed Touati; Jérémy Rapin; Yann Ollivier; |
141 | Self-Stabilization: The Implicit Bias of Gradient Descent at The Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that, far from being chaotic, the dynamics of gradient descent at the edge of stability can be captured by a cubic Taylor expansion: as the iterates diverge in direction of the top eigenvector of the Hessian due to instability, the cubic term in the local Taylor expansion of the loss function causes the curvature to decrease until stability is restored. |
Alex Damian; Eshaan Nichani; Jason D. Lee; |
142 | Interactive Portrait Harmonization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable flexible interaction between user and harmonization, we introduce interactive harmonization, a new setting where the harmonization is performed with respect to a selected region in the reference image instead of the entire background.Furthermore, we also introduce a new dataset carefully curated for validating portrait harmonization. |
Jeya Maria Jose Valanarasu; HE Zhang; Jianming Zhang; Yilin Wang; Zhe Lin; Jose Echevarria; Yinglan Ma; Zijun Wei; Kalyan Sunkavalli; Vishal Patel; |
143 | STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce STREET, a unified multi-task and multi-domain natural language reasoning and explanation benchmark. |
Danilo Neves Ribeiro; Shen Wang; Xiaofei Ma; Henghui Zhu; Rui Dong; Deguang Kong; Juliette Burger; Anjelica Ramos; zhiheng huang; William Yang Wang; George Karypis; Bing Xiang; Dan Roth; |
144 | Hierarchical Sliced Wasserstein Distance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, for applications where the number of supports is relatively small compared with the dimension, e.g., several deep learning applications where the mini-batch approaches are utilized, the complexities from matrix multiplication of Radon Transform become the main computational bottleneck. To address this issue, we propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections. |
Khai Nguyen; Tongzheng Ren; Huy Nguyen; Litu Rout; Tan Minh Nguyen; Nhat Ho; |
145 | Restricted Strong Convexity of Deep Learning Models with Smooth Activations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of optimization of deep learning models with smooth activation functions. |
Arindam Banerjee; Pedro Cisneros; Libin Zhu; Misha Belkin; |
146 | Koopman Neural Operator Forecaster for Time-series with Temporal Distributional Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deep sequence model based on the Koopman theory for time series forecasting: Koopman Neural Forecaster (KNF) that leverages DNNs to learn the linear Koopman space and the coefficients of chosen measurement functions. |
Rui Wang; Yihe Dong; Sercan O Arik; Rose Yu; |
147 | Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a single objective which jointly optimizes a latent-space model and policy to achieve high returns while remaining self-consistent. |
Raj Ghugare; Homanga Bharadhwaj; Benjamin Eysenbach; Sergey Levine; Russ Salakhutdinov; |
148 | Minimum Description Length Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. |
Ted Moskovitz; Ta-Chu Kao; Maneesh Sahani; Matthew Botvinick; |
149 | Decoupled Training for Long-Tailed Classification With Stochastic Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel classifier re-training algorithm for long-tailed classification. |
Giung Nam; Sunguk Jang; Juho Lee; |
150 | Where to Begin? Exploring The Impact of Pre-Training and Initialization in Federated Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically study the impact of starting from a pre-trained model in federated learning using four standard federated learning benchmark datasets. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. |
John Nguyen; Jianyu Wang; Kshitiz Malik; Maziar Sanjabi; Michael Rabbat; |
151 | Martingale Posterior Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a different approach based on the martingale posterior, a recently developed alternative to Bayesian inference. |
Hyungi Lee; Eunggu Yun; Giung Nam; Edwin Fong; Juho Lee; |
152 | BigVGAN: A Universal Neural Vocoder with Large-Scale Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present BigVGAN, a universal vocoder that generalizes well for various out-of-distribution (OOD) scenarios without fine-tuning. |
Sang-gil Lee; Wei Ping; Boris Ginsburg; Bryan Catanzaro; Sungroh Yoon; |
153 | Progressive Voronoi Diagram Subdivision Enables Accurate Data-free Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present \emph{iVoro}, a novel framework derived from computational geometry. |
Chunwei Ma; Zhanghexuan Ji; Ziyun Huang; Yan Shen; Mingchen Gao; Jinhui Xu; |
154 | MEDICAL IMAGE UNDERSTANDING WITH PRETRAINED VISION LANGUAGE MODELS: A COMPREHENSIVE STUDY Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper discuss about how to leverage the trending vision language model to transfer to the medical domain, showing exciting performance on zero-shot and few-shot learning tasks. |
Ziyuan Qin; Hua Hui Yi; Qicheng Lao; Kang Li; |
155 | Approximate Bayesian Inference with Stein Functional Variational Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general-purpose variational algorithm that forms a natural analogue of Stein variational gradient descent (SVGD) in function space. |
Tobias Pielok; Bernd Bischl; David Rügamer; |
156 | When and Why Vision-Language Models Behave Like Bags-of-Words, and What to Do About It? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the settings where state-of-the-art VLMs behave like bags-of-words—i.e. when they have poor relational understanding, can blunder when linking objects to their attributes, and demonstrate a severe lack of order sensitivity.Here, we create the Attribution, Relation, and Order (ARO) benchmark to systematically evaluate the ability of VLMs to understand different types of relationships, attributes, and order information. |
Mert Yuksekgonul; Federico Bianchi; Pratyusha Kalluri; Dan Jurafsky; James Zou; |
157 | Causal Imitation Learning Via Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies imitation learning through causal lenses and extends the analysis and tools developed for behavior cloning (Zhang, Kumor, Bareinboim, 2020) to inverse reinforcement learning. |
Kangrui Ruan; Junzhe Zhang; Xuan Di; Elias Bareinboim; |
158 | The Surprising Computational Power of Nondeterministic Stack RNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nondeterminism is needed for recognizing all CFLs (not just deterministic CFLs), but in this paper, we show that nondeterminism and the neural controller interact to produce two more unexpected abilities. |
Brian DuSell; David Chiang; |
159 | Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop ORCHID, a flexible framework generalizing Ollivier-Ricci curvature to hypergraphs, and prove that the resulting curvatures have favorable theoretical properties. |
Corinna Coupette; Sebastian Dalleiger; Bastian Rieck; |
160 | Hyperbolic Self-paced Learning for Self-supervised Skeleton-based Action Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel HYperbolic Self-Paced model (HYSP) for learning skeleton-based action representations. |
Luca Franco; Paolo Mandica; Bharti Munjal; Fabio Galasso; |
161 | Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and show a pessimism-type algorithm that can recover an approximate NE. |
Haozhe Jiang; Qiwen Cui; Zhihan Xiong; Maryam Fazel; Simon Shaolei Du; |
162 | Auto-Encoding Goodness of Fit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop the Goodness of Fit Autoencoder (GoFAE), which incorporates hypothesis tests at two levels. |
Aaron Palmer; Zhiyi Chi; Derek Aguiar; Jinbo Bi; |
163 | Sparse Tree-based Initialization for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new sparse initialization technique for (potentially deep) multilayer perceptrons (MLP): we first train a tree-based procedure to detect feature interactions and use the resulting information to initialize the network, which is subsequently trained via standard stochastic gradient strategies. |
Patrick Lutz; Ludovic Arnould; Claire Boyer; Erwan Scornet; |
164 | Efficient Conditionally Invariant Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. |
Roman Pogodin; Namrata Deka; Yazhe Li; Danica J. Sutherland; Victor Veitch; Arthur Gretton; |
165 | Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the models on two typical kinds of NLP tasks, text classification and extractive question answering. |
Mingxu Tao; Yansong Feng; Dongyan Zhao; |
166 | Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we start by making the empirical observation that a naive joint optimization of CL and MIM losses leads to conflicting gradient directions – more severe as the layers go deeper. |
Ziyu Jiang; Yinpeng Chen; Mengchen Liu; Dongdong Chen; Xiyang Dai; Lu Yuan; Zicheng Liu; Zhangyang Wang; |
167 | DreamFusion: Text-to-3D Using 2D Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D or multiview data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. |
Ben Poole; Ajay Jain; Jonathan T. Barron; Ben Mildenhall; |
168 | Learning Input-agnostic Manipulation Directions in StyleGAN with Text Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Indeed, we show that it fails to discover a large portion of manipulation directions that can be found by existing methods, which manually manipulates latent space without texts. To alleviate this issue, we propose a novel method that learns a Dictionary, whose entry corresponds to the representation of a single channel, by taking into account the manipulation effect coming from the interaction with multiple other channels. |
Yoonjeon Kim; Hyunsu Kim; Junho Kim; Yunjey Choi; Eunho Yang; |
169 | Effective Passive Membership Inference Attacks in Federated Learning Against Overparameterized Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work considers the challenge of performing membership inference attacks in a federated learning setting —for image classification— where an adversary can only observe the communication between the central node and a single client (a passive white-box attack). |
Jiacheng Li; Ninghui Li; Bruno Ribeiro; |
170 | Joint Edge-Model Sparse Learning Is Provably Efficient for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Encouraged by the empirical success of sparse learners in accelerating GNN training, this paper characterizes the impact of graph sampling and neuron pruning on the sample complexity and convergence rate for a desirable test accuracy quantitatively. |
Shuai Zhang; Meng Wang; Pin-Yu Chen; Sijia Liu; Songtao Lu; Miao Liu; |
171 | Tier Balancing: Towards Dynamic Fairness Over Underlying Causal Factors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, through causal modeling with a directed acyclic graph (DAG) on the decision-distribution interplay, we investigate the possibility of achieving long-term fairness from a dynamic perspective. |
Zeyu Tang; Yatong Chen; Yang Liu; Kun Zhang; |
172 | CoRTX: Contrastive Framework for Real-time Explanation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a COntrastive Real-Time eXplanation (CoRTX) framework to learn the explanation-oriented representation and relieve the intensive dependence of explainer training on explanation labels. |
Yu-Neng Chuang; Guanchu Wang; Fan Yang; Quan Zhou; Pushkar Tripathi; Xuanting Cai; Xia Hu; |
173 | Anamnesic Neural Differential Equations with Orthogonal Polynomial Projections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose PolyODE, a Neural ODE that models the latent continuous-time process as a projection onto a basis of orthogonal polynomials. |
Edward De Brouwer; Rahul G Krishnan; |
174 | Large Language Models Are Human-Level Prompt Engineers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. |
Yongchao Zhou; Andrei Ioan Muresanu; Ziwen Han; Keiran Paster; Silviu Pitis; Harris Chan; Jimmy Ba; |
175 | AutoTransfer: AutoML with Knowledge Transfer – An Application to Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose AutoTransfer, an AutoML solution that improves search efficiency by transferring the prior architectural design knowledge to the novel task of interest. |
Kaidi Cao; Jiaxuan You; Jiaju Liu; Jure Leskovec; |
176 | Explaining RL Decisions with Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the literature, the explanation is often provided by saliency attribution to the features of the RL agent’s state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training. |
Shripad Vilasrao Deshmukh; Arpan Dasgupta; Chirag Agarwal; Nan Jiang; Balaji Krishnamurthy; Georgios Theocharous; Jayakumar Subramanian; |
177 | On Representing Linear Programs By Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the literature has reported encouraging numerical results, this paper establishes the theoretical foundation of applying GNNs to solving LPs. |
Ziang Chen; Jialin Liu; Xinshang Wang; Wotao Yin; |
178 | On Representing Mixed-Integer Linear Programs By Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work discovers a fundamental limitation: there exist feasible and infeasible MILPs that all GNNs will, however, treat equally, indicating GNN’s lacking power to express general MILPs. Then, we show that, by restricting the MILPs to unfoldable ones or by adding random features, there exist GNNs that can reliably predict MILP feasibility, optimal objective values, and optimal solutions up to prescribed precision. |
Ziang Chen; Jialin Liu; Xinshang Wang; Wotao Yin; |
179 | Efficient Discrete Multi Marginal Optimal Transport Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage multi-marginal optimal transport (MMOT), where we take advantage of a procedure that computes a generalized earth mover’s distance as a sub-routine. |
Ronak Mehta; Jeffery Kline; Vishnu Suresh Lokhande; Glenn Fung; Vikas Singh; |
180 | Graph Signal Sampling for Inductive One-Bit Matrix Completion: A Closed-form Solution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. |
Chao Chen; Haoyu Geng; Gang Zeng; Zhaobing Han; Hua Chai; Xiaokang Yang; Junchi Yan; |
181 | A New Hierarchy of Expressivity for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we show that, contrary to the widely accepted view, the $k$-WL hierarchy is not well-suited for measuring expressive GNNs. |
Qing Wang; Dillon Ze Chen; Asiri Wijesinghe; Shouheng Li; Muhammad Farhan; |
182 | On Achieving Optimal Adversarial Test Error Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first elucidate various fundamental properties of optimal adversarial predictors: the structure of optimal adversarial convex predictors in terms of optimal adversarial zero-one predictors, bounds relating the adversarial convex loss to the adversarial zero-one loss, and the fact that continuous predictors can get arbitrarily close to the optimal adversarial error for both convex and zero-one losses. Applying these results along with new Rademacher complexity bounds for adversarial training near initialization, we prove that for general data distributions and perturbation sets, adversarial training on shallow networks with early stopping and an idealized optimal adversary is able to achieve optimal adversarial test error. |
Justin D. Li; Matus Telgarsky; |
183 | Powderworld: A Platform for Understanding Generalization Via Rich Task Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To take a step towards addressing this research bottleneck, this work presents Powderworld, a lightweight yet expressive simulation environment running directly on the GPU. |
Kevin Frans; Phillip Isola; |
184 | 3D Segmenter: 3D Transformer Based Semantic Segmentation Via 2D Panoramic Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, in this work, we propose the first 2D-to-3D knowledge distillation strategy to enhance 3D semantic segmentation model with knowledge embedded in the latent space of powerful 2D models.To facilitate our research, we create a large-scale, fine-annotated 3D semantic segmentation benchmark, containing voxel-wise semantic labels and aligned panoramas of 5175 scenes. |
ZHENNAN WU; YANG LI; Yifei Huang; Lin Gu; Tatsuya Harada; Hiroyuki Sato; |
185 | Fairness and Accuracy Under Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from never-before-seen domains. |
Thai-Hoang Pham; Xueru Zhang; Ping Zhang; |
186 | Text Summarization with Oracle Expectation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify two flaws with the widely used greedy labeling approach: it delivers suboptimal and deterministic oracles. |
Yumo Xu; Mirella Lapata; |
187 | Efficient Attention Via Control Variates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel analysis of random feature attention based on control variates, which characterizes its gap to full softmax attention and induces a novel efficient variant that significantly improves the approximation while remaining efficient. |
Lin Zheng; Jianbo Yuan; Chong Wang; Lingpeng Kong; |
188 | Pitfalls of Gaussians As A Noise Distribution in NCE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss – even for very simple data distributions. |
Holden Lee; Chirag Pabbaraju; Anish Prasad Sevekari; Andrej Risteski; |
189 | HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, such hierarchical aggregation significantly improves the cross-modal alignment. To demonstrate the advantages of HiCLIP, we conduct qualitative analysis on its unsupervised hierarchy induction during inference, as well as extensive quantitative experiments on both visual recognition and vision-language downstream tasks. |
Shijie Geng; Jianbo Yuan; Yu Tian; Yuxiao Chen; Yongfeng Zhang; |
190 | Sparse MoE with Random Routing As The New Dropout: Training Bigger and Self-Scalable Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new plug-and-play training framework, $\textbf{SMoE-Dropout}$ to enable scaling transformers to better accuracy in the full capacity setting without collapse. |
Tianlong Chen; Zhenyu Zhang; AJAY KUMAR JAISWAL; Shiwei Liu; Zhangyang Wang; |
191 | MIMT: Masked Image Modeling Transformer for Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus introduce an entropy model based on a masked image modeling transformer (MIMT) to learn the spatial-temporal dependencies. |
Jinxi Xiang; Kuan Tian; Jun Zhang; |
192 | Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce Sparsity May Cry Benchmark (SMC-Bench), a collection of carefully curated 4 diverse tasks with 12 datasets, that accounts for capturing a wide-range of domain-specific knowledge. |
Shiwei Liu; Tianlong Chen; Zhenyu Zhang; Xuxi Chen; Tianjin Huang; AJAY KUMAR JAISWAL; Zhangyang Wang; |
193 | Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new mechanism to augment a word vector embedding representation that offers improved bias removal while retaining the key information—resulting in improved interpretability of the representation. |
Prince Osei Aboagye; Yan Zheng; Jack Shunn; Chin-Chia Michael Yeh; Junpeng Wang; Zhongfang Zhuang; Huiyuan Chen; Liang Wang; Wei Zhang; Jeff Phillips; |
194 | The Surprising Effectiveness of Equivariant Models in Domains with Latent Symmetry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper discovers that equivariant models are surprisingly effective in domains with latent or partial symmetries. |
Dian Wang; Jung Yeon Park; Neel Sortur; Lawson L.S. Wong; Robin Walters; Robert Platt; |
195 | Unsupervised Model Selection for Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper answers the question– Given an unlabeled dataset and a set of candidate time series anomaly detectors, how can we select the most accurate model? |
Mononito Goswami; Cristian Ignacio Challu; Laurent Callot; Lenon Minorics; Andrey Kan; |
196 | GoBigger: A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even though this has greatly promoted the development of multi-agent reinforcement learning (MARL), it is still not enough to support further exploration on the behavior of swarm intelligence between multiple teams, and cooperation between multiple agents due to their limited scalability. To alleviate this, we introduce GoBigger, a scalable platform for cooperative-competition multi-agent interactive simulation. |
Ming Zhang; Shenghan Zhang; Zhenjie Yang; Lekai Chen; Jinliang Zheng; Chao Yang; Chuming Li; Hang Zhou; Yazhe Niu; Yu Liu; |
197 | Representation Learning for Low-rank General-sum Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage representation learning and present a model-based and a model-free approach to construct an effective representation from collected data. |
Chengzhuo Ni; Yuda Song; Xuezhou Zhang; Zihan Ding; Chi Jin; Mengdi Wang; |
198 | Exploring Low-Rank Property in Multiple Instance Learning for Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage the properties of the apparent similarity in high-resolution WSIs, which essentially exhibit \textit{low-rank} structures in the data manifold, to develop a novel MIL with a boost in both feature embedding and feature aggregation. |
Jinxi Xiang; Jun Zhang; |
199 | Win: Weight-Decay-Integrated Nesterov Acceleration for Adaptive Gradient Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the problem of “\textit{how to accelerate the convergence of adaptive gradient algorithms in a general manner}, and aim at providing practical insights to boost the training efficiency. |
Pan Zhou; Xingyu Xie; Shuicheng YAN; |
200 | Efficient Edge Inference By Selective Query Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel end-to-end hybrid learning framework that allows the edge to selectively query only those hard examples that the cloud can classify correctly. |
Anil Kag; Igor Fedorov; Aditya Gangrade; Paul Whatmough; Venkatesh Saligrama; |
201 | Learning Topology-preserving Data Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for learning topology-preserving data representations (dimensionality reduction). |
Ilya Trofimov; Daniil Cherniavskii; Eduard Tulchinskii; Nikita Balabin; Serguei Barannikov; Evgeny Burnaev; |
202 | Towards Understanding Why Mask Reconstruction Pretraining Helps in Downstream Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative semantics of each potential semantic class in the pretraining dataset. Then considering the fact that the pretraining dataset is of huge size and high diversity and thus covers most semantics in downstream dataset, in fine-tuning phase, the pretrained encoder can capture as much semantics as it can in downstream datasets, and would not lost these semantics with theoretical guarantees. |
Jiachun Pan; Pan Zhou; Shuicheng YAN; |
203 | Leveraging Incompatibility to Defend Against Backdoor Poisoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify an incompatibility property of the interaction of clean and poisoned data with the training algorithm, specifically that including poisoned data in the training dataset does not improve model accuracy on clean data and vice-versa. Leveraging this property, we develop an algorithm that iteratively refines subsets of the poisoned dataset to obtain subsets that concentrate around either clean or poisoned data. |
Charles Jin; Melinda Sun; Martin Rinard; |
204 | Statistical Guarantees for Consensus Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the statistical performance of aggregation algorithms under a stochastic label perturbation model, and show that a $K$-means type algorithm followed by a local refinement step can achieve near optimal performance, with a rate that decays exponentially in $N$. |
Zhixin Zhou; Gautam Dudeja; Arash A Amini; |
205 | More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose multi-agent conditional policy factorization (MACPF), which takes more centralized training but still enables decentralized execution. |
Jiangxing Wang; Deheng Ye; Zongqing Lu; |
206 | Calibrating Transformers Via Sparse Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Extending Transformer’s success to safety-critical domains requires calibrated uncertainty estimation which remains under-explored. To address this, we propose Sparse Gaussian Process attention (SGPA), which performs Bayesian inference directly in the output space of multi-head attention blocks (MHAs) in transformer to calibrate its uncertainty. |
Wenlong Chen; Yingzhen Li; |
207 | Red PANDA: Disambiguating Anomaly Detection By Removing Nuisance Factors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Breaking from previous research, we present a new anomaly detection method that allows operators to exclude an attribute when detecting anomalies. |
Niv Cohen; Jonathan Kahana; Yedid Hoshen; |
208 | STOCHASTIC NO-REGRET LEARNING FOR GENERAL GAMES WITH VARIANCE REDUCTION Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that a stochastic version of optimistic mirror descent (OMD), a variant of mirror descent with recency bias, converges fast in general games. |
Yichi Zhou; Fang Kong; Shuai Li; |
209 | The Dark Side of AutoML: Towards Architectural Backdoor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper asks the intriguing question: is it possible to exploit neural architecture search (NAS) as a new attack vector to launch previously improbable attacks? Specifically, we present EVAS, a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers. |
Ren Pang; Changjiang Li; Zhaohan Xi; Shouling Ji; Ting Wang; |
210 | Alternating Differentiation for Optimization Layers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we developed a new framework, named Alternating Differentiation (Alt-Diff), that differentiates optimization problems (here, specifically in the form of convex optimization problems with polyhedral constraints) in a fast and recursive way. |
Haixiang Sun; Ye Shi; Jingya Wang; Hoang Duong Tuan; H. Vincent Poor; Dacheng Tao; |
211 | On The Relative Error of Random Fourier Features for Preserving Kernel Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that for a significant range of kernels, including the well-known Laplacian kernels, RFF cannot approximate the kernel distance with small relative error using low dimensions. |
Kuan Cheng; Shaofeng H.-C. Jiang; Luojian Wei; Zhide Wei; |
212 | Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of a recent hardness result (Liu et al., 2022), we focus on the setting where the opponent’s previous policies are revealed to the agent for decision making. With such an information structure, we propose a new algorithm, Decentralized Optimistic hypeRpolicy mIrror deScent (DORIS), which achieves $\sqrt{K}$-regret in the context of general function approximation, where $K$ is the number of episodes. |
Wenhao Zhan; Jason D. Lee; Zhuoran Yang; |
213 | PAC Reinforcement Learning for Predictive State Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. |
Wenhao Zhan; Masatoshi Uehara; Wen Sun; Jason D. Lee; |
214 | Make-A-Video: Text-to-Video Generation Without Text-Video Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Make-A-Video — an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). |
Uriel Singer; Adam Polyak; Thomas Hayes; Xi Yin; Jie An; Songyang Zhang; Qiyuan Hu; Harry Yang; Oron Ashual; Oran Gafni; Devi Parikh; Sonal Gupta; Yaniv Taigman; |
215 | Static Prediction of Runtime Errors By Learning to Execute Programs with External Resource Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce a competitive programming dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and learns to execute descriptions of external resources. |
David Bieber; Rishab Goel; Dan Zheng; Hugo Larochelle; Daniel Tarlow; |
216 | MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current detection of cache timing attacks relies heavily on heuristics and expert knowledge, which can lead to brittleness and inability to adapt to new attacks. To mitigate these problems, we develop a two-player environment for cache-timing attacks and detection, and leverage the idea of population-based multi-agent reinforcement learning (MARL) to train both attackers and detectors. |
Jiaxun Cui; Xiaomeng Yang; Geunbae Lee; Mulong Luo; Peter Stone; Hsien-Hsin S. Lee; Benjamin Lee; G. Edward Suh; Wenjie Xiong; Yuandong Tian; |
217 | Quantized Compressed Sensing with Score-Based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the power of score-based generative models (SGM, also known as diffusion models) in capturing the rich structure of natural signals beyond simple sparsity, we propose an unsupervised data-driven approach called quantized compressed sensing with SGM (QCS-SGM), where the prior distribution is modeled by a pre-trained SGM. |
Xiangming Meng; Yoshiyuki Kabashima; |
218 | SGDA with Shuffling: Faster Convergence for Nonconvex-PŁ Minimax Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-?ojasiewicz (P?) geometry. |
Hanseul Cho; Chulhee Yun; |
219 | MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention. |
Chenglin Yang; Siyuan Qiao; Qihang Yu; Xiaoding Yuan; Yukun Zhu; Alan Yuille; Hartwig Adam; Liang-Chieh Chen; |
220 | View Synthesis with Sculpted Neural Points Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new approach that performs view synthesis using point clouds. |
Yiming Zuo; Jia Deng; |
221 | Extremely Simple Activation Shaping for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, in this work, we propose an extremely simple, post-hoc, on-the-fly activation shaping method, ASH, where a large portion (e.g. 90%) of a sample’s activation at a late layer is removed, and the rest (e.g. 10%) simplified or lightly adjusted. |
Andrija Djurisic; Nebojsa Bozanic; Arjun Ashok; Rosanne Liu; |
222 | Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. |
Mhairi Dunion; Trevor McInroe; Kevin Sebastian Luck; Josiah P. Hanna; Stefano V Albrecht; |
223 | Dr.Spider: A Diagnostic Evaluation Benchmark Towards Text-to-SQL Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. |
Shuaichen Chang; Jun Wang; Mingwen Dong; Lin Pan; Henghui Zhu; Alexander Hanbo Li; Wuwei Lan; Sheng Zhang; Jiarong Jiang; Joseph Lilien; Steve Ash; William Yang Wang; Zhiguo Wang; Vittorio Castelli; Patrick Ng; Bing Xiang; |
224 | HiT-MDP: Learning The SMDP Option Framework on MDPs with Hidden Temporal Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Markov Decision Process (MDP), the Hidden Temporal MDP (HiT-MDP), and prove that the option-induced HiT-MDP is homomorphic equivalent to the option-induced SMDP. |
Chang Li; Dongjin Song; Dacheng Tao; |
225 | Expressive Monotonic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs. |
Niklas Nolte; Ouail Kitouni; Mike Williams; |
226 | Information-Theoretic Analysis of Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While our bounds for the first kind of error are in line with the traditional analysis and give similar insights, our bounds on the second kind of error are algorithm-dependent, which also provide insights into algorithm designs. Specifically, we present two simple techniques for improving generalization in UDA and validate them experimentally. |
Ziqiao Wang; Yongyi Mao; |
227 | Provably Efficient Lifelong Reinforcement Learning with Linear Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We frame the problem as a linearly parameterized contextual Markov decision process (MDP), where each task is specified by a context and the transition dynamics is context-independent, and we introduce a new completeness-style assumption on the representation which is sufficient to ensure the optimal multi-task policy is realizable under the linear representation. Under this assumption, we propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks while using only sublinear planning calls. |
Sanae Amani; Lin Yang; Ching-An Cheng; |
228 | Valid P-Value for Deep Learning-driven Salient Region Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a method to quantify the reliability of a saliency region in the form of p-values. |
Miwa Daiki; Vo Nguyen Le Duy; Ichiro Takeuchi; |
229 | A Theoretical Understanding of Vision Transformers: Learning, Generalization, and Sample Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a three-layer ViT, i.e., one self-attention layer followed by a two-layer perceptron, for a classification task. |
Hongkang Li; Meng Wang; Sijia Liu; Pin-Yu Chen; |
230 | Disentanglement of Correlated Factors Via Hausdorff Factorized Support Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a method that allows for disentangled representation learning not only under the assumption of independent factors of variation but instead fundamentally allows for much more realistic correlations during training. |
Karsten Roth; Mark Ibrahim; Zeynep Akata; Pascal Vincent; Diane Bouchacourt; |
231 | SCALE-UP: An Efficient Black-box Input-level Backdoor Detection Via Analyzing Scaled Prediction Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection, called SCALE-UP, which requires only the predicted labels to alleviate this problem. |
Junfeng Guo; Yiming Li; Xun Chen; Hanqing Guo; Lichao Sun; Cong Liu; |
232 | Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform a comprehensive experimental evaluation on MPO and SAC to explore the effectiveness of other colors of noise as action noise. |
Onno Eberhard; Jakob Hollenstein; Cristina Pinneri; Georg Martius; |
233 | Revisiting The Assumption of Latent Separability for Backdoor Defenses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This question is central to understanding whether the assumption of latent separability provides a reliable foundation for defending against backdoor poisoning attacks. In this paper, we design adaptive backdoor poisoning attacks to present counter-examples against this assumption. |
Xiangyu Qi; Tinghao Xie; Yiming Li; Saeed Mahloujifar; Prateek Mittal; |
234 | Optimal Transport for Offline Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Optimal Transport Relabeling (OTR), an imitation learning algorithm that can automatically relabel offline data of mixed and unknown quality with rewards from a few good demonstrations. |
Yicheng Luo; zhengyao jiang; Samuel Cohen; Edward Grefenstette; Marc Peter Deisenroth; |
235 | Mitigating Dataset Bias By Using Per-Sample Gradient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a debiasing algorithm leveraging gradient called PGD (Per-sample Gradient-based Debiasing). |
Sumyeong Ahn; Seongyoon Kim; Se-Young Yun; |
236 | Efficient Model Updates for Approximate Unlearning of Graph-Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces the first known approach for \emph{approximate graph unlearning} with provable theoretical guarantees. |
Eli Chien; Chao Pan; Olgica Milenkovic; |
237 | MAST: Masked Augmentation Subspace Training for Generalizable Self-Supervised Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to learn self-supervised features that generalize well across a variety of downstream tasks (e.g., object classification, detection and instance segmentation) without knowing any task information beforehand. |
Chen Huang; Hanlin Goh; Jiatao Gu; Joshua M. Susskind; |
238 | SmartFRZ: An Efficient Training Framework Using Attention-Based Layer Freezing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a generic and efficient training framework (SmartFRZ). |
Sheng Li; Geng Yuan; Yue Dai; Youtao Zhang; Yanzhi Wang; Xulong Tang; |
239 | Sparse Random Networks for Communication-Efficient Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an FL framework, where clients find a sparse random network using a stochastic strategy; and provide (1) lower communication cost, (2) higher accuracy, (3) faster convergence, and (4) at the end of the training, a compressed final model. |
Berivan Isik; Francesco Pase; Deniz Gunduz; Tsachy Weissman; Zorzi Michele; |
240 | PV3D: A 3D Generative Model for Portrait Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos. |
Eric Zhongcong Xu; Jianfeng Zhang; Jun Hao Liew; Wenqing Zhang; Song Bai; Jiashi Feng; Mike Zheng Shou; |
241 | S-NeRF: Neural Radiance Fields for Street Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new street-view NeRF (S-NeRF) that considers novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly. |
Ziyang Xie; Junge Zhang; Wenye Li; Feihu Zhang; Li Zhang; |
242 | The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To gain insight into the relationship between learned features, function approximation, and the learning rule, we analyze infinite-width deep networks trained with gradient descent (GD) and biologically-plausible alternatives including feedback alignment (FA), direct feedback alignment (DFA), and error modulated Hebbian learning (Hebb), as well as gated linear networks (GLN). |
Blake Bordelon; Cengiz Pehlevan; |
243 | On The Data-Efficiency with Contrastive Image Transformation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the end, we propose a contrastive invariant transformation (CoIT), a simple yet promising learnable data augmentation combined with standard model-free algorithms to improve sample-efficiency. |
Sicong Liu; Xi Sheryl Zhang; Yushuo Li; Yifan Zhang; Jian Cheng; |
244 | Dataless Knowledge Fusion By Merging Weights of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. |
Xisen Jin; Pengxiang Cheng; Daniel Preotiuc-Pietro; Xiang Ren; |
245 | Long Range Language Modeling Via Gated State Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. |
Harsh Mehta; Ankit Gupta; Ashok Cutkosky; Behnam Neyshabur; |
246 | Generalization Bounds for Federated Learning: Fast Rates, Unparticipating Clients and Unbounded Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a theoretical analysis of generalization error of {federated learning}, which captures both heterogeneity and relatedness of the distributions. |
Xiaolin Hu; Shaojie Li; Yong Liu; |
247 | More ConvNets in The 2020s: Scaling Up Kernels Beyond 51×51 Using Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the possibility of training extreme convolutions larger than 31×31 and test whether the performance gap can be eliminated by strategically enlarging convolutions. |
Shiwei Liu; Tianlong Chen; Xiaohan Chen; Xuxi Chen; Qiao Xiao; Boqian Wu; Tommi Kärkkäinen; Mykola Pechenizkiy; Decebal Constantin Mocanu; Zhangyang Wang; |
248 | Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-Free RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning. In this work, we make a first attempt to answer this question. |
Ruiquan Huang; Jing Yang; Yingbin Liang; |
249 | CUTS: Neural Causal Discovery from Unstructured Time-Series Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods assume structured input data and degenerate greatly when encountering data with randomly missing entries or non-uniform sampling frequencies, which hampers their applications in real scenarios. To address this issue, here we present CUTS, a neural Granger causal discovery algorithm to jointly impute unobserved data points and build causal graphs, via plugging in two mutually boosting modules in an iterative framework: (i) Latent data prediction stage: designs a Delayed Supervision Graph Neural Network (DSGNN) to hallucinate and register unstructured data which might be of high dimension and with complex distribution; (ii) Causal graph fitting stage: builds a causal adjacency matrix with imputed data under sparse penalty. |
Cheng Yuxiao; Runzhao Yang; Tingxiong Xiao; Zongren Li; Jinli Suo; Kunlun He; Qionghai Dai; |
250 | A Kernel Perspective of Skip Connections in Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we study their properties through their Gaussian Process and Neural Tangent kernels. |
Daniel Barzilai; Amnon Geifman; Meirav Galun; Ronen Basri; |
251 | RLx2: Training A Sparse Deep Reinforcement Learning Model from Scratch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel sparse DRL training framework, “the Rigged Reinforcement Learning Lottery” (RLx2), which builds upon gradient-based topology evolution and is capable of training a sparse DRL model based entirely on a sparse network. |
Yiqin Tan; Pihe Hu; Ling Pan; Jiatai Huang; Longbo Huang; |
252 | NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a unified framework of synthesizing and manipulating voice signals from analysis features, dubbed NANSY++. |
Hyeong-Seok Choi; Jinhyeok Yang; Juheon Lee; Hyeongju Kim; |
253 | Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present SE(3)-equivariant models for visual robotic manipulation from point clouds that can be trained fully end-to-end. |
Hyunwoo Ryu; Hong-in Lee; Jeong-Hoon Lee; Jongeun Choi; |
254 | Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has achieved remarkable zero-shot performance with multi-task prompted pretraining, but little has been understood. For the first time, we show that training on a small number of key tasks beats using all the training tasks, while removing these key tasks substantially hurts performance. |
Jing Zhou; Zongyu Lin; Yanan Zheng; Jian Li; Zhilin Yang; |
255 | Characterizing The Spectrum of The NTK Via A Power Series Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Under mild conditions on the network initialization we derive a power series expansion for the Neural Tangent Kernel (NTK) of arbitrarily deep feedforward networks in the infinite width limit. |
Michael Murray; Hui Jin; Benjamin Bowman; Guido Montufar; |
256 | Corrupted Image Modeling for Self-Supervised Visual Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training. |
Yuxin Fang; Li Dong; Hangbo Bao; Xinggang Wang; Furu Wei; |
257 | Compositional Task Representations for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel prompt-free approach, Compositional Task Representations (CTR), that employs multi-task training to learn a discrete, compositional codebook. |
NAN SHAO; Zefan Cai; Hanwei xu; Chonghua Liao; Yanan Zheng; Zhilin Yang; |
258 | Equivariant Hypergraph Diffusion Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by hypergraph diffusion algorithms, this work proposes a new HNN architecture named ED-HNN, which provably approximates any continuous equivariant hypergraph diffusion operators that can model a wide range of higher-order relations. |
Peihao Wang; Shenghao Yang; Yunyu Liu; Zhangyang Wang; Pan Li; |
259 | TextShield: Beyond Successfully Detecting Adversarial Sentences in NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, {the core limitation of previous detection methods is being incapable of giving correct predictions on adversarial sentences unlike defense methods from other paradigms.} To solve this issue, this paper proposes TextShield: (1) we discover a link between text attack and saliency information, and then we propose a saliency-based detector, which can effectively detect whether an input sentence is adversarial or not. (2) We design a saliency-based corrector, which converts the detected adversary sentences to benign ones. |
Lingfeng Shen; Ze Zhang; Haiyun Jiang; Ying Chen; |
260 | REVISITING PRUNING AT INITIALIZATION THROUGH THE LENS OF RAMANUJAN GRAPH Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better understand the underlying mechanism of PaI, we propose to interpret it through the lens of the Ramanujan Graph – a class of expander graphs that are sparse while being highly connected. |
Duc N.M Hoang; Shiwei Liu; Radu Marculescu; Zhangyang Wang; |
261 | Discovering Latent Knowledge in Language Models Without Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. |
Collin Burns; Haotian Ye; Dan Klein; Jacob Steinhardt; |
262 | How to Exploit Hyperspherical Embeddings for Out-of-Distribution Detection? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose CIDER, a novel representation learning framework that exploits hyperspherical embeddings for OOD detection. |
Yifei Ming; Yiyou Sun; Ousmane Dia; Yixuan Li; |
263 | Automatic Chain of Thought Prompting in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the effect of such mistakes, we investigate various principles for automatically constructing demonstrations and find that diversity matters. Inspired by these findings, we propose an automatic CoT prompting method called Auto-CoT. |
Zhuosheng Zhang; Aston Zhang; Mu Li; Alex Smola; |
264 | Parameter-Efficient Fine-Tuning Design Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we present a parameter-efficient fine-tuning design paradigm and discover design patterns that are applicable to different experimental settings. |
Jiaao Chen; Aston Zhang; Xingjian Shi; Mu Li; Alex Smola; Diyi Yang; |
265 | Learning Multimodal Data Augmentation in Feature Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. |
Zichang Liu; Zhiqiang Tang; Xingjian Shi; Aston Zhang; Mu Li; Anshumali Shrivastava; Andrew Gordon Wilson; |
266 | AIM: Adapting Image Models for Efficient Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to Adapt pre-trained Image Models (AIM) for efficient video understanding. |
Taojiannan Yang; Yi Zhu; Yusheng Xie; Aston Zhang; Chen Chen; Mu Li; |
267 | Factorized Fourier Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Factorized Fourier Neural Operator (F-FNO), a learning-based approach for simulating partial differential equations (PDEs). |
Alasdair Tran; Alexander Mathews; Lexing Xie; Cheng Soon Ong; |
268 | FaiREE: Fair Classification with Finite-sample and Distribution-free Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FaiREE, a fair classification algorithm which can satisfy group fairness constraints with finite-sample and distribution-free theoretical guarantees. |
Puheng Li; James Zou; Linjun Zhang; |
269 | Exponential Generalization Bounds with Near-Optimal Rates for $L_q$-Stable Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, a natural question we would like to address in this paper is whether it is possible to derive near-optimal exponential generalization bounds for $L_q$-stable learning algorithms. As the core contribution of the present work, we give an affirmative answer to this question by developing strict analogues of the near-optimal generalization and risk bounds of uniformly stable algorithms for $L_q$-stable algorithms. |
Xiaotong Yuan; Ping Li; |
270 | Equal Improvability: A New Fairness Notion Considering The Long-term Impact Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To find a classifier that satisfies the EI requirement, we propose and study three different approaches that solve EI regularized optimization problems. |
Ozgur Guldogan; Yuchen Zeng; Jy-yong Sohn; Ramtin Pedarsani; Kangwook Lee; |
271 | Riemannian Metric Learning Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an optimal transport-based model for learning a metric tensor from cross-sectional samples of evolving probability measures on a common Riemannian manifold. |
Christopher Scarvelis; Justin Solomon; |
272 | MaskViT: Masked Visual Pre-Training for Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling. |
Agrim Gupta; Stephen Tian; Yunzhi Zhang; Jiajun Wu; Roberto Martín-Martín; Li Fei-Fei; |
273 | Prompting GPT-3 To Be Reliable Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We establish simple and effective prompting methods to make GPT-3 reliable in terms of: robustness, fairness, calibration, factuality. |
Chenglei Si; Zhe Gan; Zhengyuan Yang; Shuohang Wang; Jianfeng Wang; Jordan Lee Boyd-Graber; Lijuan Wang; |
274 | Teacher Guided Training: An Efficient Framework for Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data. |
Manzil Zaheer; Ankit Singh Rawat; Seungyeon Kim; Chong You; Himanshu Jain; Andreas Veit; Rob Fergus; Sanjiv Kumar; |
275 | Sparsity-Constrained Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose in this paper a new approach for OT with explicit cardinality constraints on the transportation plan. |
Tianlin Liu; Joan Puigcerver; Mathieu Blondel; |
276 | Turning The Curse of Heterogeneity in Federated Learning Into A Blessing for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, a notorious challenge in FL is data heterogeneity where each client collects non-identically and independently distributed (non-iid) data. We propose to take advantage of such heterogeneity and turn the curse into a blessing that facilitates OoD detection in FL. |
Shuyang Yu; Junyuan Hong; Haotao Wang; Zhangyang Wang; Jiayu Zhou; |
277 | Unbiased Stochastic Proximal Solver for Graph Neural Networks with Equilibrium States Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such weakness limits the scalability of the implicit graph models. To tackle such limitations, we propose two unbiased stochastic proximal solvers inspired by the stochastic proximal gradient descent method and its variance reduction variant called USP and USP-VR solvers. |
Mingjie Li; Yifei Wang; Yisen Wang; Zhouchen Lin; |
278 | Asynchronous Distributed Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a remedy, we propose Asynchronous Distributed Bilevel Optimization (ADBO) algorithm. |
Yang Jiao; Kai Yang; Tiancheng Wu; Dongjin Song; Chengtao Jian; |
279 | Relative Behavioral Attributes: Filling The Gap Between Symbolic Goal Specification and Reward Learning from Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two different parametric methods that can potentially encode any kind of behavioral attributes from ordered behavior clips. |
Lin Guan; Karthik Valmeekam; Subbarao Kambhampati; |
280 | Neural Lagrangian Schr\{o}dinger Bridge: Diffusion Modeling for Population Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To satisfy these requirements of the sample trajectories, we formulate the Lagrangian Schrödinger bridge (LSB) problem and propose to solve it approximately by modeling the advection-diffusion process with regularized neural SDE.One of the main difficulties in analyzing population dynamics is that we can only obtain observation data with coarse time intervals from fixed-point observations due to experimental costs or measurement constraints. |
Takeshi Koshizuka; Issei Sato; |
281 | How to Prepare Your Task Head for Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze how the choice of task head controls feature adaptation and hence influences the downstream performance. |
Yi Ren; Shangmin Guo; Wonho Bae; Danica J. Sutherland; |
282 | Learning Fast and Slow for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, inspired by the Complementary Learning Systems (CLS) theory, we propose Fast and Slow learning Network (FSNet) as a novel framework to address the challenges of online forecasting. |
Quang Pham; Chenghao Liu; Doyen Sahoo; Steven Hoi; |
283 | Gradient-based Optimization Is Not Necessary for Generalization in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is commonly believed that the implicit regularization of optimizers is needed for neural networks to generalize in the overparameterized regime. In this paper, we observe experimentally that this implicit regularization behavior is {\em generic}, i.e. it does not depend strongly on the choice of optimizer. |
Ping-yeh Chiang; Renkun Ni; David Yu Miller; Arpit Bansal; Jonas Geiping; Micah Goldblum; Tom Goldstein; |
284 | Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a structural equation model, called Rhino, which combines vector auto-regression, deep learning and variational inference to model non-linear relationships with instantaneous effects while allowing the noise distribution to be modulated by history observations. |
Wenbo Gong; Joel Jennings; Cheng Zhang; Nick Pawlowski; |
285 | Mitigating Memorization of Noisy Labels Via Regularization Between Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is non-trivial to design a neural network with the best capacity given an arbitrary task. To circumvent this dilemma, instead of changing the model architecture, we decouple DNNs into an encoder followed by a linear classifier and propose to restrict the function space of a DNN by a representation regularizer. |
Hao Cheng; Zhaowei Zhu; Xing Sun; Yang Liu; |
286 | Backpropagation Through Combinatorial Algorithms: Identity with Projection Works Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. |
Subham Sekhar Sahoo; Anselm Paulus; Marin Vlastelica; Vít Musil; Volodymyr Kuleshov; Georg Martius; |
287 | BSTT: A Bayesian Spatial-Temporal Transformer for Sleep Staging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Bayesian spatial-temporal relation inference neural network, named Bayesian spatial-temporal transformer (BSTT), for sleep staging. |
Yuchen Liu; Ziyu Jia; |
288 | Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel noise-robust re-weighting framework SunGen to automatically construct high-quality data for zero-shot classification problems. |
Jiahui Gao; Renjie Pi; LIN Yong; Hang Xu; Jiacheng Ye; Zhiyong Wu; WEIZHONG ZHANG; Xiaodan Liang; Zhenguo Li; Lingpeng Kong; |
289 | H2RBox: Horizonal Box Annotation Is All You Need for Oriented Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Oriented object detection emerges in many applications from aerial images to autonomous driving, while many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box, leading to a gap between the readily available training corpus and the rising demand for oriented object detection. This paper proposes a simple yet effective oriented object detection approach called H2RBox merely using horizontal box annotation for weakly-supervised training, which closes the above gap and shows competitive performance even against those trained with rotated boxes. |
Xue Yang; Gefan Zhang; Wentong Li; Yue Zhou; Xuehui Wang; Junchi Yan; |
290 | IDEAL: Query-Efficient Data-Free Learning from Black-Box Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, these works require a large number of queries to the teacher model, which incurs significant monetary and computational costs. To address these problems, we propose a novel method called \emph{query-effIcient Data-free lEarning blAck-box modeLs} (IDEAL), which aims to query-efficiently learn from black-box model APIs to train a good student without any real data. |
Jie Zhang; Chen Chen; Lingjuan Lyu; |
291 | Scaling Laws in Mean-Field Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we attempt to bridge the two largely independently evolving fields of finite-agent and infinite-agent games, by studying the scaling laws in mean-field games. |
Pengdeng Li; Xinrun Wang; Shuxin Li; Hau Chan; Bo An; |
292 | Towards Addressing Label Skews in One-Shot Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the limited number of classes in each party, the local models misclassify the data from unseen classes into seen classes, which leads to very ineffective global models from voting. To address the label skew issue in one-shot FL, we propose a novel approach named FedOV which generates diverse outliers and introduces them as an additional unknown class in local training to improve the voting performance. |
Yiqun Diao; Qinbin Li; Bingsheng He; |
293 | Sequential Attention for Feature Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. |
Taisuke Yasuda; Mohammadhossein Bateni; Lin Chen; Matthew Fahrbach; Gang Fu; Vahab Mirrokni; |
294 | Deep Transformers Without Shortcuts: Modifying Self-attention for Faithful Signal Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And so the question remains: \emph{is it possible to train deep vanilla transformers?} We answer this question in the affirmative by designing several approaches that use combinations of parameter initialisations, bias matrices and location-dependent rescaling to achieve faithful signal propagation in vanilla transformers. |
Bobby He; James Martens; Guodong Zhang; Aleksandar Botev; Andrew Brock; Samuel L Smith; Yee Whye Teh; |
295 | Approximation and Non-parametric Estimation of Functions Over High-dimensional Spheres Via Deep ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a new approximation and estimation analysis of deep feed-forward neural networks (FNNs) with the Rectified Linear Unit (ReLU) activation. |
Namjoon Suh; Tian-Yi Zhou; Xiaoming Huo; |
296 | Specformer: Spectral Graph Neural Networks Meet Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, these filters are often constructed based on some fixed-order polynomials, which have limited expressiveness and flexibility. To tackle these issues, we introduce Specformer, which effectively encodes the set of all eigenvalues and performs self-attention in the spectral domain, leading to a learnable set-to-set spectral filter. |
Deyu Bo; Chuan Shi; Lele Wang; Renjie Liao; |
297 | MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an embarrassingly simple, yet hugely effective initialization method for GNN training acceleration, called MLPInit. |
Xiaotian Han; Tong Zhao; Yozen Liu; Xia Hu; Neil Shah; |
298 | Empowering Graph Representation Learning with Test-Time Graph Transformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent efforts have been made on tackling these issues from a modeling perspective which requires additional cost of changing model architectures or re-training model parameters. In this work, we provide a data-centric view to tackle these issues and propose a graph transformation framework named GTrans which adapts and refines graph data at test time to achieve better performance. |
Wei Jin; Tong Zhao; Jiayuan Ding; Yozen Liu; Jiliang Tang; Neil Shah; |
299 | Improving The Calibration of Fine-tuned Language Models Via Denoising Variational Auto-Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the problem of calibrating fine-tuned language models. |
Guande He; Jianfei Chen; Jun Zhu; |
300 | Softened Symbol Grounding for Neuro-symbolic Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel, softened symbol grounding process, enabling the interactions of the two worlds in a mutually beneficial manner. |
Zenan Li; Yuan Yao; Taolue Chen; Jingwei Xu; Chun Cao; Xiaoxing Ma; Jian L\{u}; |
301 | Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When applied to various downstream tasks, these frameworks rarely perform equally well for every task, because one philosophy may not span the extensive knowledge required for all tasks. In light of this, we introduce ParetoGNN, a multi-task SSL framework for node representation learning over graphs. |
Mingxuan Ju; Tong Zhao; Qianlong Wen; Wenhao Yu; Neil Shah; Yanfang Ye; Chuxu Zhang; |
302 | Learning with Logical Constraints But Without Shortcut Satisfaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new framework for learning with logical constraints. |
Zenan Li; Zehua Liu; Yuan Yao; Jingwei Xu; Taolue Chen; Xiaoxing Ma; Jian L\{u}; |
303 | Link Prediction with Non-Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extensively evaluate the performance of existing non-contrastive methods for link prediction in both transductive and inductive settings. |
William Shiao; Zhichun Guo; Tong Zhao; Evangelos E. Papalexakis; Yozen Liu; Neil Shah; |
304 | A Neural Mean Embedding Approach for Back-door and Front-door Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment. |
Liyuan Xu; Arthur Gretton; |
305 | Can We Find Nash Equilibria at A Linear Rate in Markov Games? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. |
Zhuoqing Song; Jason D. Lee; Zhuoran Yang; |
306 | Weighted Ensemble Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. |
Yangjun Ruan; Saurabh Singh; Warren Richard Morningstar; Alexander A Alemi; Sergey Ioffe; Ian Fischer; Joshua V. Dillon; |
307 | $k$NN Prompting: Learning Beyond The Context with Nearest Neighbor Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In-Context Learning, which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing and standard utilization of large language models. In this paper, we disclose an actual predicament for this typical usage that it can not scale up with training data due to context length restrictions. |
Benfeng Xu; Quan Wang; Zhendong Mao; Yajuan Lyu; Qiaoqiao She; Yongdong Zhang; |
308 | Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables x, y and z. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. |
Chi Han; Qizheng He; Charles Yu; Xinya Du; Hanghang Tong; Heng Ji; |
309 | Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing DABP approaches mostly rely on knowledge distillation (KD) from the black-box predictor, i.e., training the model with its noisy target-domain predictions, which however inevitably introduces the confirmation bias accumulated from the prediction noises and leads to degrading performance. To mitigate such bias, we propose a new strategy, \textit{divide-to-adapt}, that purifies cross-domain knowledge distillation by proper domain division. |
Jianfei Yang; Xiangyu Peng; Kai Wang; Zheng Zhu; Jiashi Feng; Lihua Xie; Yang You; |
310 | A Simple Yet Powerful Deep Active Learning With Snapshots Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we highlight the effectiveness of snapshot ensembles for deep active learning. |
Seohyeon Jung; Sanghyun Kim; Juho Lee; |
311 | Domain-Indexing Variational Bayes for Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such domain indices are not always available. To address this challenge, we first provide a formal definition of domain index from the probabilistic perspective, and then propose an adversarial variational Bayesian framework that infers domain indices from multi-domain data, thereby providing additional insight on domain relations and improving domain adaptation performance. |
Zihao Xu; Hao He; Guang-Yuan Hao; Hao Wang; |
312 | Over-parameterized Model Optimization with Polyak-{\L}ojasiewicz Condition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a new regularized risk minimization for over-parameterized models with a novel PL regularization and implements it via network pruning guided by PL-based condition number. |
Yixuan Chen; Yubin Shi; Mingzhi Dong; Xiaochen Yang; Dongsheng Li; Yujiang Wang; Robert Dick; Qin Lv; Yingying Zhao; Fan Yang; Ning Gu; Li Shang; |
313 | RGI: Robust GAN-inversion for Mask-free Image Inpainting and Unsupervised Pixel-wise Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown \textit{gross} corruptions, where a small fraction of pixels are completely corrupted. |
Shancong Mou; Xiaoyi Gu; Meng Cao; Haoping Bai; Ping Huang; Jiulong Shan; Jianjun Shi; |
314 | Encoding Recurrence Into Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding … |
Feiqing Huang; Kexin Lu; Yuxi CAI; Zhen Qin; Yanwen Fang; Guangjian Tian; Guodong Li; |
315 | GAIN: On The Generalization of Instructional Action Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a benchmark, named GAIN, to analyze the GeneralizAbility of INstructional action understanding models. |
Junlong Li; Guangyi Chen; Yansong Tang; Jinan Bao; Kun Zhang; Jie Zhou; Jiwen Lu; |
316 | Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method to measure uncertainty in large language models. |
Lorenz Kuhn; Yarin Gal; Sebastian Farquhar; |
317 | Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in reality, the observation coexistence will be limited due to the high cost of acquiring expert observations. In this work, we study this challenging problem with limited observation coexistence under heterogeneous observations: Heterogeneously Observable Imitation Learning (HOIL). |
Xin-Qiang Cai; Yao-Xiang Ding; Zixuan Chen; Yuan Jiang; Masashi Sugiyama; Zhi-Hua Zhou; |
318 | Pre-training Via Denoising for Molecular Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. |
Sheheryar Zaidi; Michael Schaarschmidt; James Martens; Hyunjik Kim; Yee Whye Teh; Alvaro Sanchez-Gonzalez; Peter Battaglia; Razvan Pascanu; Jonathan Godwin; |
319 | A New Characterization of The Edge of Stability Based on A Sharpness Measure Aware of Batch Gradient Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new sharpness measure (interaction-aware-sharpness) aware of the \emph{interaction} between the batch gradient distribution and the loss landscape geometry. |
Sungyoon Lee; Cheongjae Jang; |
320 | Equivariant Energy-Guided SDE for Inverse Molecular Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose equivariant energy-guided stochastic differential equations (EEGSDE), a flexible framework for controllable 3D molecule generation under the guidance of an energy function in diffusion models. |
Fan Bao; Min Zhao; Zhongkai Hao; Peiyao Li; Chongxuan Li; Jun Zhu; |
321 | Mutual Partial Label Learning with Competitive Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a more realistic PLL scenario with competitive noise labels that are more difficult to distinguish from the true label than the random noise labels. |
Yan Yan; Yuhong Guo; |
322 | ImaginaryNet: Learning Object Detectors Without Real Images and Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we define a novel paradigm as Imaginary-Supervised Object Detection (ISOD), where no real images and manual annotations are used for training object detectors. To resolve this challenge, we propose ImaginaryNet, a framework to learn object detectors by combining pretrained language model as well as text-to-image synthesis models. |
Minheng Ni; Zitong Huang; Kailai Feng; Wangmeng Zuo; |
323 | Delving Into Semantic Scale Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we define and quantify the semantic scale of classes, which is equivalent to the feature diversity of classes. |
Yanbiao Ma; Licheng Jiao; Fang Liu; Yuxin Li; Shuyuan Yang; Xu Liu; |
324 | Agnostic Learning of General ReLU Activation Using Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. |
Pranjal Awasthi; Alex Tang; Aravindan Vijayaraghavan; |
325 | Neural-based Classification Rule Learning for Sequential Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel differentiable fully interpretable method to discover both local and global patterns (i.e. catching a relative or absolute temporal dependency) for rule-based binary classification. |
Marine Collery; Philippe Bonnard; François Fages; Remy Kusters; |
326 | Max-Margin Works While Large Margin Fails: Generalization Without Uniform Convergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails. Our main contribution is proving novel generalization bounds in two such settings, one linear, and one non-linear. |
Margalit Glasgow; Colin Wei; Mary Wootters; Tengyu Ma; |
327 | Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. |
Michał Zawalski; Michał Tyrolski; Konrad Czechowski; Damian Stachura; Piotr Piękos; Tomasz Odrzygóźdź; Yuhuai Wu; Łukasz Kuciński; Piotr Miłoś; |
328 | Exploring The Limits of Differentially Private Deep Learning with Group-wise Clipping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. |
Jiyan He; Xuechen Li; Da Yu; Huishuai Zhang; Janardhan Kulkarni; Yin Tat Lee; Arturs Backurs; Nenghai Yu; Jiang Bian; |
329 | Towards Minimax Optimal Reward-free Reinforcement Learning in Linear MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel algorithm LSVI-RFE under the linear MDP setting, where the transition probability and reward functions are linear in a feature mapping. |
Pihe Hu; Yu Chen; Longbo Huang; |
330 | Localized Randomized Smoothing for Collective Robustness Certification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a more general collective robustness certificate for all types of models and further show that this approach is beneficial for the larger class of softly local models, where each output is dependent on the entire input but assigns different levels of importance to different input regions (e.g. based on their proximity in the image). |
Jan Schuchardt; Tom Wollschläger; Aleksandar Bojchevski; Stephan Günnemann; |
331 | Towards Open Temporal Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a general and principled learning approach for open temporal graphs, called OTGNet, with the goal of addressing the above two challenges. |
Kaituo Feng; Changsheng Li; Xiaolu Zhang; JUN ZHOU; |
332 | Efficiently Controlling Multiple Risks with Pareto Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on recent results in distribution-free, finite-sample risk control for general losses, we propose Pareto Testing: a two-stage process which combines multi-objective optimization with multiple hypothesis testing. |
Bracha Laufer-Goldshtein; Adam Fisch; Regina Barzilay; Tommi S. Jaakkola; |
333 | Bridge The Inference Gaps of Neural Processes Via Expectation Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The topic of inference suboptimality and an analysis of the NP from the optimization objective perspective has hardly been studied in earlier work. To fix this issue, we propose a surrogate objective of the target log-likelihood of the meta dataset within the expectation maximization framework. |
Qi Wang; Marco Federici; Herke van Hoof; |
334 | Discovering Policies with DOMiNO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose a Reinforcement Learning (RL) agent that can discover complex behaviours in a rich environment with a simple reward function. |
Tom Zahavy; Yannick Schroecker; Feryal Behbahani; Kate Baumli; Sebastian Flennerhag; Shaobo Hou; Satinder Singh; |
335 | Neural Architecture Design and Robustness: A Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, the aim of this paper is to facilitate better streamlined research on architectural design choices with respect to their impact on robustness as well as, for example, the evaluation of surrogate measures for robustness.We evaluate all these networks on a range of common adversarial attacks and corruption types and introduce a database on neural architecture design and robustness evaluations. |
Steffen Jung; Jovita Lukasik; Margret Keuper; |
336 | A Unified Framework of Soft Threshold Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the *Iterative Shrinkage-Thresholding* Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing. |
Yanqi Chen; Zhaofei Yu; Wei Fang; Zhengyu Ma; Xiawu Zheng; Yonghong Tian; |
337 | Improving Out-of-distribution Generalization with Indirection Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a generic module named Indirection Layer (InLay), which leverages indirection and data internal relationships to effectively construct symbolic indirect representations to improve out-of-distribution generalization capabilities of various neural architectures. |
Kha Pham; Hung Le; Man Ngo; Truyen Tran; |
338 | Accelerating Guided Diffusion Sampling with Splitting Numerical Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the contrary, we discover that the same techniques do not work for guided sampling, and little has been explored about its acceleration. This paper explores the culprit of this problem and provides a solution based on operator splitting methods, motivated by our key finding that high-order numerical methods are unsuitable for the conditional function. |
Suttisak Wizadwongsa; Supasorn Suwajanakorn; |
339 | Batch Multivalid Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. |
Christopher Jung; Georgy Noarov; Ramya Ramalingam; Aaron Roth; |
340 | Long-Tailed Learning Requires Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. |
Thomas Laurent; James von Brecht; Xavier Bresson; |
341 | The Onset of Variance-Limited Behavior for Networks in The Lazy and Rich Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, at a critical sample size $P^*$, the finite-width network generalization begins to worsen compared to the infinite width performance. In this work, we empirically study the transition from the infinite width behavior to this variance-limited regime as a function of sample size $P$ and network width $N$. |
Alexander Atanasov; Blake Bordelon; Sabarish Sainathan; Cengiz Pehlevan; |
342 | On Accelerated Perceptrons and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There have been several recent works that managed to improve this rate by a quadratic factor, to $\Omega(\sqrt{\log n}/\gamma)$, with more sophisticated algorithms. In this paper, we unify these existing results under one framework by showing that they can all be described through the lens of solving min-max problems using modern acceleration techniques, mainly through \emph{optimistic} online learning. |
Guanghui Wang; Rafael Hanashiro; Etash Kumar Guha; Jacob Abernethy; |
343 | Sequential Latent Variable Models for Few-Shot High-Dimensional Time-Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the very first step toward few-shot high-dimensional sequence forecasting by a Bayesian meta-learning model that learns the process of learning latent dynamics that changes with the small number of observations that are available. |
Xiajun Jiang; Ryan Missel; Zhiyuan Li; Linwei Wang; |
344 | Continual Unsupervised Disentangling of Self-Organizing Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that this is because existing approaches treat continually-arrived data independently, without considering how they are related based on the underlying semantic factors. We address this by a new generative model describing a topologically-connected mixture of spike-and-slab distributions in the latent space, learned end-to-end in a continual fashion via principled variational inference. |
Zhiyuan Li; Xiajun Jiang; Ryan Missel; Prashnna Kumar Gyawali; Nilesh Kumar; Linwei Wang; |
345 | Learning to Decompose Visual Features with Latent Textual Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address them, we propose Decomposed Feature Prompting (DeFo). |
Feng Wang; Manling Li; Xudong Lin; Hairong Lv; Alex Schwing; Heng Ji; |
346 | Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. |
Spencer Frei; Gal Vardi; Peter Bartlett; Nathan Srebro; Wei Hu; |
347 | Is Attention All That NeRF Needs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Generalizable NeRF Transformer (GNT), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to renders novel views on the fly from source views. |
Mukund Varma T; Peihao Wang; Xuxi Chen; Tianlong Chen; Subhashini Venugopalan; Zhangyang Wang; |
348 | Squeeze Training for Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we highlight that some collaborative examples, nearly perceptually indistinguishable from both adversarial and benign examples yet show extremely lower prediction loss, can be utilized to enhance adversarial training. |
Qizhang Li; Yiwen Guo; Wangmeng Zuo; Hao Chen; |
349 | Domain Generalization Via Heckman-type Selection Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate DG as a sample selection problem where each domain is sampled from a common underlying population through non-random sampling probabilities that correlate with both the features and the outcome. |
Hyungu Kahng; Hyungrok Do; Judy Zhong; |
350 | Context-enriched Molecule Representations Improve Few-shot Drug Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new method for few-shot drug discovery. |
Johannes Schimunek; Philipp Seidl; Lukas Friedrich; Daniel Kuhn; Friedrich Rippmann; Sepp Hochreiter; Günter Klambauer; |
351 | Do We Really Need Complicated Model Architectures For Temporal Networks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose \oure, a conceptually and technically simple architecture that consists of three components: \circled{1} a \emph{link-encoder} that is only based on multi-layer perceptrons (MLP) to summarize the information from temporal links, \circled{2} a \emph{node-encoder} that is only based on neighbor mean-pooling to summarize node information, and \circled{3} an MLP-based \emph{link classifier} that performs link prediction based on the outputs of the encoders. |
Weilin Cong; Si Zhang; Jian Kang; Baichuan Yuan; Hao Wu; Xin Zhou; Hanghang Tong; Mehrdad Mahdavi; |
352 | Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we examine the importance of having unbiased quantization in quantized neural network training, where to maintain it, and how to combine it with logarithmic quantization. |
Brian Chmiel; Ron Banner; Elad Hoffer; Hilla Ben-Yaacov; Daniel Soudry; |
353 | Learning with Auxiliary Activation for Memory-Efficient Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new learning rule which significantly reduces memory requirements while closely matching the performance of backpropagation. |
Sunghyeon Woo; Dongsuk Jeon; |
354 | TRANSFORMER-PATCHER: ONE MISTAKE WORTH ONE NEURON Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus a preferable solution is to rectify the mistakes as soon as they appear nonstop. Therefore, we extend the existing ME into the Sequential Model Editing (SME) to help develop more practical editing methods. |
Zeyu Huang; Yikang Shen; Xiaofeng Zhang; Jie Zhou; Wenge Rong; Zhang Xiong; |
355 | An Additive Instance-Wise Approach to Multi-class Model Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. |
Vy Vo; Van Nguyen; Trung Le; Quan Hung Tran; Reza Haf; Seyit Camtepe; Dinh Phung; |
356 | Guiding Continuous Operator Learning Through Physics-based Boundary Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Boundary enforcing Operator Network (BOON) that enables the BC satisfaction of neural operators by making structural changes to the operator kernel. |
Nadim Saad; Gaurav Gupta; Shima Alizadeh; Danielle C. Maddix; |
357 | Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, by contrast, we opt for the diversity in substitute models and advocate to attack a Bayesian model for achieving desirable transferability. |
Qizhang Li; Yiwen Guo; Wangmeng Zuo; Hao Chen; |
358 | Sublinear Algorithms for Kernel Matrices Via Kernel Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give a framework for using recently developed tools for kernel density estimation to solve downstream kernel problems in sub-quadratic time. |
Ainesh Bakshi; Piotr Indyk; Praneeth Kacham; Sandeep Silwal; Samson Zhou; |
359 | Choreographer: Learning and Adapting Skills in Imagination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, it is unclear how to leverage the learned skill behaviors for adapting to downstream tasks in a data-efficient manner. We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. |
Pietro Mazzaglia; Tim Verbelen; Bart Dhoedt; Alexandre Lacoste; Sai Rajeswar; |
360 | DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint – Dropping Intermediate Tensors (DropIT). |
Joya Chen; Kai Xu; Yuhui Wang; Yifei Cheng; Angela Yao; |
361 | Exploring Temporally Dynamic Data Augmentation for Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These variations should be generated as diverse as possible using fewer additional hyper-parameters during training. Through this motivation, we propose a simple yet effective video data augmentation framework, DynaAugment. |
Taeoh Kim; Jinhyung Kim; Minho Shim; Sangdoo Yun; Myunggu Kang; Dongyoon Wee; Sangyoun Lee; |
362 | Computational Language Acquisition with Theory of Mind Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing inspiration from the modern operationalized versions of ToM implemented in Rabinowitz et al. (2018) and Zhu et al. (2021), we build language-learning agents equipped with ToM, and measure its effects on the learning process. |
Andy Liu; Hao Zhu; Emmy Liu; Yonatan Bisk; Graham Neubig; |
363 | Mind’s Eye: Grounded Language Model Reasoning Through Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Mind’s Eye, a paradigm to ground language model reasoning in the physical world. |
Ruibo Liu; Jason Wei; Shixiang Shane Gu; Te-Yen Wu; Soroush Vosoughi; Claire Cui; Denny Zhou; Andrew M. Dai; |
364 | Language Models Are Realistic Tabular Data Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose GReaT (Generation of Realistic Tabular data), which exploits an auto-regressive generative LLM to sample synthetic and yet highly realistic tabular data. |
Vadim Borisov; Kathrin Sessler; Tobias Leemann; Martin Pawelczyk; Gjergji Kasneci; |
365 | Aligning Model and Macaque Inferior Temporal Cortex Representations Improves Model-to-Human Behavioral Alignment and Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conducted chronic, large-scale multi-electrode recordings across the IT cortex in six non-human primates (rhesus macaques). |
Joel Dapello; Kohitij Kar; Martin Schrimpf; Robert Baldwin Geary; Michael Ferguson; David Daniel Cox; James J. DiCarlo; |
366 | SimpleKT: A Simple But Tough-to-Beat Baseline for Knowledge Tracing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, due to the lack of standardized evaluation protocol \citep{liu2022pykt}, there is no widely agreed KT baselines and published experimental comparisons become inconsistent and self-contradictory, i.e., the reported AUC scores of DKT on ASSISTments2009 range from 0.721 to 0.821 \citep{minn2018deep,yeung2018addressing}. Therefore, in this paper, we provide a strong but simple baseline method to deal with the KT task named \textsc{simpleKT}. |
Zitao Liu; Qiongqiong Liu; Jiahao Chen; Shuyan Huang; Weiqi Luo; |
367 | Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on our investigation, we propose to generate a cost volume from a long history of image observations, compensating for the coarse but efficient matching resolution with a more optimal multi-view matching setup. |
Jinhyung Park; Chenfeng Xu; Shijia Yang; Kurt Keutzer; Kris M. Kitani; Masayoshi Tomizuka; Wei Zhan; |
368 | Massively Scaling Heteroscedastic Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes. |
Mark Collier; Rodolphe Jenatton; Basil Mustafa; Neil Houlsby; Jesse Berent; Effrosyni Kokiopoulou; |
369 | Interpretable Single/Multi-label Text Classification with Unsupervised Constituent-label Alignments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, symbolic probabilistic models function with clear interpretability, but how to combine them with neural networks to enhance their performance remains to be explored. In this paper, we try to marry these two systems for text classification via structured language models. |
Xiang Hu; XinYu KONG; Kewei Tu; |
370 | Transformer Meets Boundary Value Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A Transformer-based deep direct sampling method is proposed for solving a class of boundary value inverse problem. |
Ruchi Guo; Shuhao Cao; Long Chen; |
371 | DAG Learning Via Sparse Relaxations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. |
Valentina Zantedeschi; Luca Franceschi; Jean Kaddour; Matt Kusner; Vlad Niculae; |
372 | Soft Neighbors Are Positive Supporters in Contrastive Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink the instance discrimination framework and find the binary instance labeling insufficient to measure correlations between different samples. |
Chongjian GE; Jiangliu Wang; Zhan Tong; Shoufa Chen; Yibing Song; Ping Luo; |
373 | Finding The Global Semantic Representation in GAN Through Fréchet Mean Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In other words, in this disentangled space, there exists the global semantic basis as a vector space where each basis component describes one attribute of generated images. In this paper, we propose an unsupervised method for finding this global semantic basis in the intermediate latent space in GANs. |
Jaewoong Choi; Geonho Hwang; Hyunsoo Cho; Myungjoo Kang; |
374 | MARS: Meta-learning As Score Matching in The Function Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, existing approaches resort to meta-learning restrictive diagonal Gaussian priors, severely limiting their expressiveness and performance. To circumvent these issues, we approach meta-learning through the lens of functional Bayesian neural network inference which views the prior as a stochastic process and performs inference in the function space. |
Krunoslav Lehman Pavasovic; Jonas Rothfuss; Andreas Krause; |
375 | On The Effectiveness of Out-of-distribution Data in Self-supervised Long-tail Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an alternative but easy-to-use and effective solution, \textbf{C}ontrastive with \textbf{O}ut-of-distribution (OOD) data for \textbf{L}ong-\textbf{T}ail learning (COLT), which can effectively exploit OOD data to dynamically re-balance the feature space. |
Jianhong Bai; Zuozhu Liu; Hualiang Wang; Jin Hao; YANG FENG; Huanpeng Chu; Haoji Hu; |
376 | Faster Gradient-Free Methods for Escaping Saddle Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the case when calculations of explicit gradients are expensive or even infeasible, and only function values are accessible. |
Hualin Zhang; Bin Gu; |
377 | A View From Somewhere: Human-Centric Face Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to implicitly learn a set of continuous face-varying dimensions, without ever asking an annotator to explicitly categorize a person. |
Jerone Theodore Alexander Andrews; Przemyslaw Joniak; Alice Xiang; |
378 | Dynamical Systems Embedding with A Physics-informed Convolutional Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose, \texttt{phase2vec}, an embedding method that learns high-quality, physically-meaningful representations of dynamical systems without supervision. |
Matt Ricci; Noa Moriel; Zoe Piran; Mor Nitzan; |
379 | Mind The Pool: Convolutional Neural Networks Can Overfit Input Size Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue is inherent to pooling arithmetic, with standard downsampling layers playing a major role in favoring certain input sizes and skewing the weights accordingly. We present a solution to this problem by depriving these layers from the arithmetic cues they use to overfit the input size. |
Bilal Alsallakh; David Yan; Narine Kokhlikyan; Vivek Miglani; Orion Reblitz-Richardson; Pamela Bhattacharya; |
380 | Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we strive to improve the compositional skills of existing large-scale T2I models, specifically more accurate attribute binding and better image compositions. |
Weixi Feng; Xuehai He; Tsu-Jui Fu; Varun Jampani; Arjun Reddy Akula; Pradyumna Narayana; Sugato Basu; Xin Eric Wang; William Yang Wang; |
381 | A Theory of Dynamic Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to an extensive theoretical and empirical study of the static setting, the dynamic setting lags behind due to limited empirical studies and no apparent theoretical foundation to date. Responding to this deficit, we initiate a theoretical study of dynamic benchmarking. |
Ali Shirali; Rediet Abebe; Moritz Hardt; |
382 | LAVA: Data Valuation Without Pre-Specified Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. |
Hoang Anh Just; Feiyang Kang; Tianhao Wang; Yi Zeng; Myeongseob Ko; Ming Jin; Ruoxi Jia; |
383 | Multi-level Protein Structure Pre-training Via Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering protein sequences can determine multi-level structures, in this paper, we aim to realize the comprehensive potential of protein sequences for function prediction. |
Zeyuan Wang; Qiang Zhang; Shuang-Wei HU; Haoran Yu; Xurui Jin; Zhichen Gong; Huajun Chen; |
384 | Towards Robustness Certification Against Universal Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of certifying neural network robustness against universal perturbations (UPs), which have been widely used in universal adversarial attacks and backdoor attacks. |
Yi Zeng; Zhouxing Shi; Ming Jin; Feiyang Kang; Lingjuan Lyu; Cho-Jui Hsieh; Ruoxi Jia; |
385 | AutoGT: Automated Graph Transformer Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of automated graph Transformer, for the first time. |
Zizhao Zhang; Xin Wang; Chaoyu Guan; Ziwei Zhang; Haoyang Li; Wenwu Zhu; |
386 | Blurring Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we show that blurring can equivalently be defined through a Gaussian diffusion process with non-isotropic noise. |
Emiel Hoogeboom; Tim Salimans; |
387 | Adversarial Imitation Learning with Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a novel method for policy learning that incorporates two different feedback types, namely \emph{demonstrations} and \emph{preferences}. |
Aleksandar Taranovic; Andras Gabor Kupcsik; Niklas Freymuth; Gerhard Neumann; |
388 | Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give the first constant factor approximate sketches for l1 and logistic regression in a turnstile stream with almost linear sketching dimension that result in an efficient optimization problem in the sketch space. |
Alexander Munteanu; Simon Omlor; David Woodruff; |
389 | On The Soft-Subnetwork for Few-Shot Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by Regularized Lottery Ticket Hypothesis, which states that competitive smooth (non-binary) subnetworks exist within a dense network, we propose a few-shot class-incremental learning method referred to as Soft-SubNetworks (SoftNet). |
Haeyong Kang; Jaehong Yoon; Sultan Rizky Hikmawan Madjid; Sung Ju Hwang; Chang D. Yoo; |
390 | Efficient Offline Policy Optimization with A Learned Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a regularized one-step model-based method that outperforms MuZero Unplugged on Atari benchmark. |
Zichen Liu; Siyi Li; Wee Sun Lee; Shuicheng YAN; Zhongwen Xu; |
391 | Bispectral Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural network architecture, Bispectral Neural Networks (BNNs) for learning representations that are invariant to the actions of compact commutative groups on the space over which a signal is defined. |
Sophia Sanborn; Christian A Shewmake; Bruno Olshausen; Christopher J. Hillar; |
392 | Learning Group Importance Using The Differentiable Hypergeometric Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the differentiable hypergeometric distribution. |
Thomas M. Sutter; Laura Manduchi; Alain Ryser; Julia E Vogt; |
393 | TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify that CBN and TBN are in a trade-off relationship and present a new test-time normalization (TTN) method that interpolates the statistics by adjusting the importance between CBN and TBN according to the domain-shift sensitivity of each BN layer. |
Hyesu Lim; Byeonggeun Kim; Jaegul Choo; Sungha Choi; |
394 | ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present ManiSkill2, the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. |
Jiayuan Gu; Fanbo Xiang; Zhan Ling; Xinyue Wei; Xiqiang Liu; Xuanlin Li; Rui Chen; Stone Tao; Tongzhou Mu; Pengwei Xie; Yunchao Yao; Yihe Tang; Xiaodi Yuan; Zhiao Huang; Hao Su; |
395 | MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the aforementioned issues, we propose MaskMix and Progressive Attention Labeling (PAL) in image and label space, respectively. |
Qihao Zhao; Yangyu Huang; Wei Hu; Fan Zhang; Jun Liu; |
396 | Flow Matching for Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. |
Yaron Lipman; Ricky T. Q. Chen; Heli Ben-Hamu; Maximilian Nickel; Matthew Le; |
397 | Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose compositional prompt tuning with motion cues: an extended prompt tuning paradigm for compositional predictions of video data. |
Kaifeng Gao; Long Chen; Hanwang Zhang; Jun Xiao; Qianru Sun; |
398 | Out-of-Distribution Detection and Selective Generation for Conditional Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the space of potential low-quality outputs is larger as arbitrary text can be generated and it is important to know when to trust the generated output. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. |
Jie Ren; Jiaming Luo; Yao Zhao; Kundan Krishna; Mohammad Saleh; Balaji Lakshminarayanan; Peter J Liu; |
399 | Budgeted Training for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem by proposing a framework that enables the training process under any training budget, while achieving competitive model performances. |
zhuofan xia; Xuran Pan; Xuan Jin; Yuan He; Hui Xue’; Shiji Song; Gao Huang; |
400 | ODAM: Gradient-based Instance-Specific Visual Explanations for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Gradient-weighted Object Detector Activation Mapping (Grad-ODAM), a visualized explanation technique for interpreting the predictions of object detectors. |
Chenyang ZHAO; Antoni B. Chan; |
401 | Regression with Label Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the optimal mechanism takes the form of a randomized response on bins, and propose an efficient algorithm for finding the optimal bin values. |
Badih Ghazi; Pritish Kamath; Ravi Kumar; Ethan Leeman; Pasin Manurangsi; Avinash Varadarajan; Chiyuan Zhang; |
402 | Boosting Adversarial Transferability Using Dynamic Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we induce dynamic cues within the image models without sacrificing their original performance on images. |
Muzammal Naseer; Ahmad Mahmood; Salman Khan; Fahad Khan; |
403 | Towards Inferential Reproducibility of Machine Learning Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to shift from the goal of duplicating a SOTA training result without any changes to a new type of reproducibility called inferential reproducibility that treats performance variation depending on data characteristics, meta-parameter settings, and their interactions as an inherent and interesting feature of non-deterministic deep learning, not as a bug that needs to be resolved. |
Michael Hagmann; Philipp Meier; Stefan Riezler; |
404 | Mole-BERT: Rethinking Pre-training Graph Neural Networks for Molecules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explain the negative transfer in molecular graph pre-training and develop two novel pre-training strategies to alleviate this issue. |
Jun Xia; Chengshuai Zhao; Bozhen Hu; Zhangyang Gao; Cheng Tan; Yue Liu; Siyuan Li; Stan Z. Li; |
405 | The KFIoU Loss for Rotated Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an effective approximate SkewIoU loss based on Gaussian modeing and Kalman filter, which mainly consists of two items. |
Xue Yang; Yue Zhou; Gefan Zhang; Jirui Yang; Wentao Wang; Junchi Yan; XIAOPENG ZHANG; Qi Tian; |
406 | PowerQuant: Automorphism Search for Non-Uniform Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identity the uniformity of the quantization operator as a limitation of existing approaches, and propose a data-free non-uniform method. |
Edouard YVINEC; Arnaud Dapogny; Matthieu Cord; Kevin Bailly; |
407 | Rethinking Skip Connection Model As A Learnable Markov Chain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we deep dive into the model’s behaviors with skip connections which can be formulated as a learnable Markov chain. |
Chen Dengsheng; Jie Hu; Wenwen Qiang; Xiaoming Wei; Enhua Wu; |
408 | Effects of Graph Convolutions in Multi-layer Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. |
Aseem Baranwal; Kimon Fountoulakis; Aukosh Jagannath; |
409 | Addressing Parameter Choice Issues in Unsupervised Domain Adaptation By Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While several heuristics exist that follow this strategy, methods are still missing that rely on thorough theories for bounding the target error. In this turn, we propose a method that extends weighted least squares to vector-valued functions, e.g., deep neural networks. |
Marius-Constantin Dinu; Markus Holzleitner; Maximilian Beck; Hoan Duc Nguyen; Andrea Huber; Hamid Eghbal-zadeh; Bernhard A. Moser; Sergei Pereverzyev; Sepp Hochreiter; Werner Zellinger; |
410 | Bayesian Semi-supervised Learning with A Principled Likelihood from A Generative Model of Data Curation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are thus able to introduce Bayesian SSL, which gives considerable improvements over standard SSL in the setting of 40 labelled points on CIFAR-10, with performance of $92.2\pm 0.3\%$ vs $88.6\%$ in the original FixMatch paper. |
Stoil Krasimirov Ganev; Laurence Aitchison; |
411 | Spherical Sliced-Wasserstein Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a SW discrepancy on the sphere using only tools intrinsic to the manifold. |
Clément Bonet; Paul Berg; Nicolas Courty; François Septier; Lucas Drumetz; Minh Tan Pham; |
412 | Scenario-based Question Answering with Interacting Contextual Properties Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although understanding the relationship between conditions is crucial for solving this challenging QA task, limited work has been done so far in modeling this. In this paper, we propose the T-Reasoner model, which solves this problem with three jointly learned modules: an entailment module which checks whether a condition has been satisfied by the scenario, a decoding module which locates eligible answers from documents, and a reasoning module which infers the relationship between conditions and performs a reasoning step to determine the logically consistent answers and identify missing conditions. |
Haitian Sun; William W. Cohen; Ruslan Salakhutdinov; |
413 | CktGNN: Circuit Graph Neural Network for Electronic Design Automation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. |
Zehao Dong; Weidong Cao; Muhan Zhang; Dacheng Tao; Yixin Chen; Xuan Zhang; |
414 | Prompt-to-Prompt Image Editing with Cross-Attention Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we pursue an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only. |
Amir Hertz; Ron Mokady; Jay Tenenbaum; Kfir Aberman; Yael Pritch; Daniel Cohen-or; |
415 | Efficient Out-of-Distribution Detection Based on In-Distribution Data Patterns Memorization with Modern Hopfield Energy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike existing OOD methods refining the confidence estimation procedure from output logits with handpicked hyperparameters, we propose a new store-then-compare paradigm. |
Jinsong Zhang; Qiang Fu; Xu Chen; Lun Du; Zelin Li; Gang Wang; xiaoguang Liu; Shi Han; Dongmei Zhang; |
416 | CodeT: Code Generation with Generated Tests Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method, CodeT, that leverages the same pre-trained language models to automatically generate test cases for the code samples, thus reducing the human effort and increasing the coverage of the test scenarios. |
Bei Chen; Fengji Zhang; Anh Nguyen; Daoguang Zan; Zeqi Lin; Jian-Guang Lou; Weizhu Chen; |
417 | Does Deep Learning Learn to Abstract? A Systematic Probing Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective. |
Shengnan An; Zeqi Lin; Bei Chen; Qiang Fu; Nanning Zheng; Jian-Guang Lou; |
418 | Exact Group Fairness Regularization Via Classwise Robust Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we propose a principled method that indeed can incorporate an $\textit{exact}$ form of a well-justified group fairness metric, Difference of Conditional Accuracy (DCA), as a regularizer using a $\textit{classwise}$ distributionally robust optimization (DRO) framework. |
Sangwon Jung; Taeeon Park; Sanghyuk Chun; Taesup Moon; |
419 | Weighted Clock Logic Point Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework for modeling temporal point processes called clock logic neural networks (CLNN) which learn weighted clock logic (wCL) formulas as interpretable temporal rules by which some events promote or inhibit other events. |
Ruixuan Yan; Yunshi Wen; Debarun Bhattacharjya; Ronny Luss; Tengfei Ma; Achille Fokoue; Anak Agung Julius; |
420 | DiffEdit: Diffusion-based Semantic Image Editing with Mask Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we propose DiffEdit, a method to take advantage of text-conditioned diffusion models for the task of semantic image editing, where the goal is to edit an image based on a text query. |
Guillaume Couairon; Jakob Verbeek; Holger Schwenk; Matthieu Cord; |
421 | Human Alignment of Neural Network Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the factors that affect alignment between the representations learned by neural networks and human concept representations. |
Lukas Muttenthaler; Jonas Dippel; Lorenz Linhardt; Robert A. Vandermeulen; Simon Kornblith; |
422 | SMART: Self-supervised Multi-task PretrAining with ContRol Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The challenge becomes combinatorially more complex if we want to pretrain representations amenable to a large variety of tasks. To tackle this problem, in this work, we formulate a general pretraining-finetuning pipeline for sequential decision making, under which we propose a generic pretraining framework \textit{Self-supervised Multi-task pretrAining with contRol Transformer (SMART)}. |
Yanchao Sun; Shuang Ma; Ratnesh Madaan; Rogerio Bonatti; Furong Huang; Ashish Kapoor; |
423 | Are More Layers Beneficial to Graph Transformers? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation, and applies local attention on related nodes to obtain substructure based attention encoding. |
Haiteng Zhao; Shuming Ma; Dongdong Zhang; Zhi-Hong Deng; Furu Wei; |
424 | A Universal 3D Molecular Representation Learning Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a universal 3D MRL framework that significantly enlarges the representation ability and application scope of MRL schemes. |
Gengmo Zhou; Zhifeng Gao; Qiankun Ding; Hang Zheng; Hongteng Xu; Zhewei Wei; Linfeng Zhang; Guolin Ke; |
425 | Accurate Bayesian Meta-Learning By Accurate Task Posterior Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior work studies a range of architectural modifications to boost performance, such as attentive computation paths or improved context aggregation schemes, while the influence of the VI scheme remains under-explored. We aim to bridge this gap by introducing GMM-NP, a novel BML model, which builds on recent work that enables highly accurate, full-covariance Gaussian mixture (GMM) TP approximations by combining VI with natural gradients and trust regions. |
Michael Volpp; Philipp Dahlinger; Philipp Becker; Christian Daniel; Gerhard Neumann; |
426 | Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that $Q$-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee. To substantiate this concept, we propose the technique of $Q$ replay buffer, which stores the learned $Q$-networks from the past iterations, and arrive at a practical actor-critic implementation. |
Wei Hung; Bo Kai Huang; Ping-Chun Hsieh; Xi Liu; |
427 | Single-shot General Hyper-parameter Optimization for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Federated Loss SuRface Aggregation (FLoRA), a general FL-HPO solution framework that can address use cases of tabular data and any Machine Learning (ML) model including gradient boosting training algorithms, SVMs, neural networks, among others and thereby further expands the scope of FL-HPO. |
Yi Zhou; Parikshit Ram; Theodoros Salonidis; Nathalie Baracaldo; Horst Samulowitz; Heiko Ludwig; |
428 | AE-FLOW: Autoencoders with Normalizing Flows for Medical Images Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we propose a normalizing flow based autoencoder for an efficient and tractable representation of normal medical images. |
Yuzhong Zhao; Qiaoqiao Ding; Xiaoqun Zhang; |
429 | What Is Missing in IRM Training and Evaluation? Challenges and Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, a series of advanced IRM algorithms have been developed that show practical improvement over IRMV1. In this work, we revisit these recent IRM advancements and identify and resolve three practical limitations in IRM training and evaluation. |
Yihua Zhang; Pranay Sharma; Parikshit Ram; Mingyi Hong; Kush R. Varshney; Sijia Liu; |
430 | Distributional Meta-Gradient Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: All the existing algorithms adhere to the same reward learning regime, where the adaptive return is simply formulated in the form of expected cumulative rewards, upon which the policy and critic update rules are specified under well adopted distance metrics. In this paper, we present a novel algorithm which builds on the success of meta-gradient RL algorithms and effectively improves such algorithms by following a simple recipe, i.e., going beyond the expected return to formulate and learn the return in a more expressive form, value distributions. |
Haiyan Yin; Shuicheng YAN; Zhongwen Xu; |
431 | Min-Max Multi-objective Bilevel Optimization with Applications in Robust Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. |
Alex Gu; Songtao Lu; Parikshit Ram; Tsui-Wei Weng; |
432 | Linearly Mapping from Image to Text Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we show that the image representations from vision models can be transferred as continuous prompts to frozen LMs by training only a single linear projection. |
Jack Merullo; Louis Castricato; Carsten Eickhoff; Ellie Pavlick; |
433 | Evidential Uncertainty and Diversity Guided Active Learning for Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, directly porting current AL methods to the SGG task poses the following challenges: 1) unreliable uncertainty estimates, and 2) data bias problems. To deal with these challenges, we propose EDAL (\textbf{E}vidential Uncertainty and \textbf{D}iversity Guided Deep \textbf{A}ctive \textbf{L}earning), a novel AL framework tailored for the SGG task. |
Shuzhou Sun; Shuaifeng Zhi; Janne Heikkilä; Li Liu; |
434 | StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. |
Haoxuan Li; Chunyuan Zheng; Peng Wu; |
435 | Variational Latent Branching Model for Off-Policy Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the variational latent branching model (VLBM) to learn the transition function of MDPs by formulating the environmental dynamics as a compact latent space, from which the next states and rewards are then sampled. |
Qitong Gao; Ge Gao; Min Chi; Miroslav Pajic; |
436 | TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a principled approach that can effectively reduce the bias and variance simultaneously for existing DR estimators when the error-imputation model is misspecified. |
Haoxuan Li; Yan Lyu; Chunyuan Zheng; Peng Wu; |
437 | Improving Deep Policy Gradients with Value Function Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on improving value approximation and analyzing the effects on Deep Policy Gradient primitives such as value prediction, variance reduction, and correlation of gradient estimates with the true gradient. |
Enrico Marchesini; Christopher Amato; |
438 | LMSeg: Language-guided Multi-dataset Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the multi-dataset segmentation and propose a scalable Language-guided Multi-dataset Segmentation framework, dubbed LMSeg, which supports both semantic and panoptic segmentation. |
Qiang Zhou; Yuang Liu; Chaohui Yu; Jingliang Li; Zhibin Wang; Fan Wang; |
439 | Graph Neural Networks for Link Prediction with Subgraph Sketching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. |
Benjamin Paul Chamberlain; Sergey Shirobokov; Emanuele Rossi; Fabrizio Frasca; Thomas Markovich; Nils Yannick Hammerla; Michael M. Bronstein; Max Hansmire; |
440 | Extracting Robust Models with Uncertain Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, how to extract a robust model with similar resilience against adversarial attacks is never investigated. This paper presents the first study toward this goal. |
Guanlin Li; Guowen Xu; Shangwei Guo; Han Qiu; Jiwei Li; Tianwei Zhang; |
441 | Combinatorial Pure Exploration of Causal Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide the first gap-dependent and fully adaptive pure exploration algorithms on two types of causal models — the binary generalized linear model (BGLM) and general graphs. |
Nuoya Xiong; Wei Chen; |
442 | Continuous-time Identification of Dynamic State-space Models By Deep Subspace Encoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, even with numerous recent developments, the CT nonlinear state-space (NL-SS) model identification problem remains to be solved in full, considering common experimental aspects such as the presence of external inputs, measurement noise, latent states, and general robustness. This paper presents a novel estimation method that addresses all these aspects and that can obtain state-of-the-art results on multiple benchmarks with compact fully connected neural networks capturing the CT dynamics. |
Gerben I. Beintema; Maarten Schoukens; Roland Tóth; |
443 | Better Generative Replay for Continual Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By analyzing the behaviors of clients during training, we find the unstable training process caused by distributed training on non-IID data leads to a notable performance degradation. To address this problem, we propose our FedCIL model with two simple but effective solutions: 1. |
Daiqing Qi; Handong Zhao; Sheng Li; |
444 | Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, several works have demonstrated high gains by taking a straightforward approach for incorporating intermediate supervision in compounded natural language problems: the sequence-to-sequence LM is fed with an augmented input, in which the decomposed tasks’ labels are simply concatenated to the original input. In this paper, we prove a positive learning result that motivates these recent efforts. |
Noam Wies; Yoav Levine; Amnon Shashua; |
445 | On The Word Boundaries of Emergent Languages Based on Harris’s Articulation Scheme Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is not obvious whether such a simulated language would have the same properties as natural language. In this paper, we test if they satisfy HAS. |
Ryo Ueda; Taiga Ishii; Yusuke Miyao; |
446 | Generative Modelling with Inverse Heat Dissipation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. |
Severi Rissanen; Markus Heinonen; Arno Solin; |
447 | Self-supervision Through Random Segments with Autoregressive Coding (RandSAC) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of self-supervised autoregressive representation learning in natural language (GPT and its variants), and advances in recent visual architecture design with Vision Transformers (ViTs), in this paper, we explore the effects various design choices have on the success of applying such training strategies for visual feature learning. |
Tianyu Hua; Yonglong Tian; Sucheng Ren; Michalis Raptis; Hang Zhao; Leonid Sigal; |
448 | Ask Me Anything: A Simple Strategy for Prompting Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a prompting strategy based on aggregating the predictions of multiple prompts, which enables a 6B parameter model to exceed the few-shot performance of GPT3-175B on 15/20 popular benchmarks. |
Simran Arora; Avanika Narayan; Mayee F Chen; Laurel Orr; Neel Guha; Kush Bhatia; Ines Chami; Christopher Re; |
449 | DAVA: Disentangling Adversarial Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue, we introduce DAVA, a novel training procedure for variational auto-encoders that alleviates the issue of hyperparameter selection at the cost of a comparatively small overhead. |
Benjamin Estermann; Roger Wattenhofer; |
450 | Temperature Schedules for Self-supervised Contrastive Methods on Long-tail Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on imbalanced data. |
Anna Kukleva; Moritz Böhle; Bernt Schiele; Hilde Kuehne; Christian Rupprecht; |
451 | From $t$-SNE to UMAP with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we uncover their conceptual connection via a new insight into contrastive learning methods. |
Sebastian Damrich; Niklas Böhm; Fred A Hamprecht; Dmitry Kobak; |
452 | Sharper Bounds for Uniformly Stable Algorithms with Stationary $\varphi$-mixing Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use algorithmic stability to study the generalization performance of learning algorithms with $\varphi$-mixing data, where the dependency between observations weakens over time. |
Shi Fu; Yunwen Lei; Qiong Cao; Xinmei Tian; Dacheng Tao; |
453 | Pareto-Efficient Decision Agents for Offline Multi-Objective Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. |
Baiting Zhu; Meihua Dang; Aditya Grover; |
454 | Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF). |
Yibo Yang; Haobo Yuan; Xiangtai Li; Zhouchen Lin; Philip Torr; Dacheng Tao; |
455 | Efficient Recurrent Architectures Through Activity Sparsity and Sparse Back-propagation Through Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a solution inspired by biological neuron dynamics that makes the communication between RNN units sparse and discrete. |
Anand Subramoney; Khaleelulla Khan Nazeer; Mark Schöne; Christian Mayr; David Kappel; |
456 | Neural EPDOs: Spatially Adaptive Equivariant Partial Differential Operator Based Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel nonlinear PDOs scheme that is both spatially adaptive and translation equivariant. |
Lingshen He; Yuxuan Chen; Zhengyang Shen; Yibo Yang; Zhouchen Lin; |
457 | Learning to Segment from Noisy Annotations: A Spatial Correction Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel noise model for segmentation problems that encodes spatial correlation and bias, which are prominent in segmentation annotations. |
Jiachen Yao; Yikai Zhang; Songzhu Zheng; Mayank Goswami; Prateek Prasanna; Chao Chen; |
458 | Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel \underline{e}dge guided \underline{g}enerative \underline{a}dversarial \underline{n}etwork with \underline{c}ontrastive learning (ECGAN) for the challenging semantic image synthesis task. |
Hao Tang; XIAOJUAN QI; Guolei Sun; Dan Xu; Nicu Sebe; Radu Timofte; Luc Van Gool; |
459 | Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. |
Florent Delgrange; Ann Nowe; Guillermo Perez; |
460 | STaSy: Score-based Tabular Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new model named $\textbf{S}$core-based $\textbf{Ta}$bular data $\textbf{Sy}$nthesis ($\texttt{STaSy}$) and its training strategy based on the paradigm of score-based generative modeling. |
Jayoung Kim; Chaejeong Lee; Noseong Park; |
461 | De Novo Molecular Generation Via Connection-aware Motif Mining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose MiCaM to generate molecules based on mined connection-aware motifs. |
Zijie Geng; Shufang Xie; Yingce Xia; Lijun Wu; Tao Qin; Jie Wang; Yongdong Zhang; Feng Wu; Tie-Yan Liu; |
462 | When Source-Free Domain Adaptation Meets Learning with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study SFDA from the perspective of learning with label noise (LLN). |
Li Yi; Gezheng Xu; Pengcheng Xu; Jiaqi Li; Ruizhi Pu; Charles Ling; Ian McLeod; Boyu Wang; |
463 | Test-Time Adaptation Via Self-Training with Nearest Neighbor Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods often encounter performance degradation at the adapted classifier. To overcome this limitation, we propose a novel test-time adaptation method, called Test-time Adaptation via Self-Training with nearest neighbor information (TAST), which is composed of the following procedures: (1) adds trainable adaptation modules on top of the trained feature extractor; (2) newly defines a pseudo-label distribution for the test data by using the nearest neighbor information; (3) trains these modules only a few times during test time to match the nearest neighbor-based pseudo label distribution and a prototype-based class distribution for the test data; and (4) predicts the label of test data using the average predicted class distribution from these modules. |
Minguk Jang; Sae-Young Chung; Hye Won Chung; |
464 | Federated Neural Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: So, this paper introduces the federated neural-upper confidence bound (FN-UCB) algorithm. |
Zhongxiang Dai; Yao Shu; Arun Verma; Flint Xiaofeng Fan; Bryan Kian Hsiang Low; Patrick Jaillet; |
465 | Measuring Axiomatic Identifiability of Counterfactual Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general framework for evaluating image counterfactuals. |
Miguel Monteiro; Fabio De Sousa Ribeiro; Nick Pawlowski; Daniel C. Castro; Ben Glocker; |
466 | In-sample Actor Critic for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose In-sample Actor Critic (IAC) which utilizes sampling-importance resampling to execute in-sample policy evaluation. |
Hongchang Zhang; Yixiu Mao; Boyuan Wang; Shuncheng He; Yi Xu; Xiangyang Ji; |
467 | Planning Goals for Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose planning exploratory goals (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. |
Edward S. Hu; Richard Chang; Oleh Rybkin; Dinesh Jayaraman; |
468 | Kernel Neural Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the Neural Optimal Transport (NOT) algorithm which uses the general optimal transport formulation and learns stochastic transport plans. |
Alexander Korotin; Daniil Selikhanovych; Evgeny Burnaev; |
469 | Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to do targeted hyperparameter optimization with lexicographic preference over multiple objectives, motivated by various practical applications. |
Shaokun Zhang; Feiran Jia; Chi Wang; Qingyun Wu; |
470 | A Non-monotonic Self-terminating Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm. |
Cheolhyoung Lee; Eugene Choi; Kyunghyun Cho; |
471 | Noise-Robust De-Duplication at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study uses the unique timeliness of historical news wires to create a 27,210 document dataset, with 122,876 positive duplicate pairs, for studying noise-robust de-duplication. |
Emily Silcock; Luca D’Amico-Wong; Jinglin Yang; Melissa Dell; |
472 | Simplicial Embeddings in Self-Supervised Learning and Downstream Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We use softmax to embed representations in a collection of simplices in SSL models, which offers improved generalization properties for downstream classification. |
Samuel Lavoie; Christos Tsirigotis; Max Schwarzer; Ankit Vani; Michael Noukhovitch; Kenji Kawaguchi; Aaron Courville; |
473 | Policy Pre-training for Autonomous Driving Via Self-supervised Geometric Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving. |
Penghao Wu; Li Chen; Hongyang Li; Xiaosong Jia; Junchi Yan; Yu Qiao; |
474 | Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose MARVEL, which adaptively allocates the parameter budget among weight matrices according to their importance score. |
Qingru Zhang; Minshuo Chen; Alexander Bukharin; Pengcheng He; Yu Cheng; Weizhu Chen; Tuo Zhao; |
475 | Treeformer: Dense Gradient Trees for Efficient Attention Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we view attention computation as that of nearest neighbor retrieval, and use decision tree based hierarchical navigation to reduce the retrieval cost per query token from linear in sequence length to nearly logarithmic. |
Lovish Madaan; Srinadh Bhojanapalli; Himanshu Jain; Prateek Jain; |
476 | GPTQ: Accurate Quantization for Generative Pre-trained Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. |
Elias Frantar; Saleh Ashkboos; Torsten Hoefler; Dan Alistarh; |
477 | Neural Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel neural-networks-based algorithm to compute optimal transport maps and plans for strong and weak transport costs. |
Alexander Korotin; Daniil Selikhanovych; Evgeny Burnaev; |
478 | DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able to induce state-space covering exploration within seconds of interaction. |
Pierre Schumacher; Daniel Haeufle; Dieter Büchler; Syn Schmitt; Georg Martius; |
479 | Optimal Activation Functions for The Random Features Regression Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. |
Jianxin Wang; José Bento; |
480 | Unsupervised Object-Centric Learning with Bi-level Optimized Query Slot Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods, however, have been exceedingly difficult to train without supervision and are ambiguous in the notion of object, especially for complex natural scenes. In this paper, we propose to address these issues by (1) initializing Slot-Attention modules with learnable queries and (2) optimizing the model with bi-level optimization. |
Baoxiong Jia; Yu Liu; Siyuan Huang; |
481 | Augmentation with Projection: Towards An Effective and Efficient Data Augmentation Paradigm for Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. |
Ziqi Wang; Yuexin Wu; Frederick Liu; Daogao Liu; Le Hou; Hongkun Yu; Jing Li; Heng Ji; |
482 | Learning An Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This phenomenon, we term \emph{Feature Replication Hypothesis}, coupled with the \emph{Implicit Bias} of SGD to converge to maximum margin solutions in the feature space, leads the models to rely mostly on the simple features for classification. To mitigate this bias, we propose \emph{Feature Reconstruction Regularizer (FRR)} to ensure that the learned features can be reconstructed back from the logits. |
Sravanti Addepalli; Anshul Nasery; Venkatesh Babu Radhakrishnan; Praneeth Netrapalli; Prateek Jain; |
483 | Imbalanced Semi-supervised Learning with Bias Adaptive Classifier Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such an assumption is far from realistic scenarios and thus severely limits the performance of current pseudo-labeling methods under the context of class-imbalance. To alleviate this problem, we design a bias adaptive classifier that targets the imbalanced SSL setups. |
Renzhen Wang; Xixi Jia; Quanziang Wang; Yichen Wu; Deyu Meng; |
484 | On Compositional Uncertainty Quantification for Seq2seq Graph Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to quantify and evaluate compositional uncertainty for seq2seq graph parsing by proposing a simple probabilistic framework and rigorous evaluation metrics. |
Zi Lin; Jeremiah Zhe Liu; Du Phan; Panupong Pasupat; Jingbo Shang; |
485 | EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an Efficient Unsupervised Reinforcement Learning Framework with Multi-choice Dynamics model (EUCLID), which introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efficiency. |
Yifu Yuan; Jianye HAO; Fei Ni; Yao Mu; YAN ZHENG; Yujing Hu; Jinyi Liu; Yingfeng Chen; Changjie Fan; |
486 | A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general framework that unifies model-based and model-free RL, and an Admissible Bellman Characterization (ABC) class that subsumes nearly all Markov decision process (MDP) models in the literature for tractable RL. |
Zixiang Chen; Chris Junchi Li; Huizhuo Yuan; Quanquan Gu; Michael Jordan; |
487 | Sequence to Sequence Text Generation with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is difficult due to the discrete nature of text. We tackle this challenge by proposing DiffuSeq: a diffusion model designed for sequence-to-sequence (Seq2Seq) text generation tasks. |
Shansan Gong; Mukai Li; Jiangtao Feng; Zhiyong Wu; Lingpeng Kong; |
488 | Measure The Predictive Heterogeneity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that it can be reliably estimated from finite data with PAC bounds even in high dimensions. |
Jiashuo Liu; Jiayun Wu; Renjie Pi; Renzhe Xu; Xingxuan Zhang; Bo Li; Peng Cui; |
489 | InPL: Pseudo-labeling The Inliers First for Imbalanced Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new perspective of pseudo-labeling for imbalanced SSL. |
Zhuoran Yu; Yin Li; Yong Jae Lee; |
490 | PandA: Unsupervised Learning of Parts and Appearances in The Feature Maps of GANs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not facilitate localized control, or require some form of supervision through manually provided regions or segmentation masks. In this light, we present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. |
James Oldfield; Christos Tzelepis; Yannis Panagakis; Mihalis Nicolaou; Ioannis Patras; |
491 | Unsupervised Visualization of Image Datasets Using Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data. |
Niklas Böhm; Philipp Berens; Dmitry Kobak; |
492 | Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the underlying neural mechanisms are not fully understood, various evidence supports that synaptic plasticity plays a critical role in memory formation and fast learning. Inspired by these results, we equip Recurrent Neural Networks (RNNs) with plasticity rules to enable them to adapt their parameters according to ongoing experiences. In addition to the traditional local Hebbian plasticity, we propose a global, gradient-based plasticity rule, which allows the model to evolve towards its self-determined target. |
Yu Duan; Zhongfan Jia; Qian Li; Yi Zhong; Kaisheng Ma; |
493 | Learned Index with Dynamic $\epsilon$ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a mathematically-grounded learned index framework with dynamic $\epsilon$, which is efficient and pluggable to existing learned index methods. |
Daoyuan Chen; Wuchao Li; Yaliang Li; Bolin Ding; Kai Zeng; Defu Lian; Jingren Zhou; |
494 | ViT-Adapter: Exploring Plain Vision Transformer for Accurate Dense Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike recently advanced variants that incorporate vision-specific inductive biases into their architectures, the plain ViT suffers inferior performance on dense predictions due to weak prior assumptions. To address this issue, we propose the ViT-Adapter, which allows plain ViT to achieve comparable performance to vision-specific transformers. |
Zhe Chen; Yuchen Duan; Wenhai Wang; Junjun He; Tong Lu; Jifeng Dai; Yu Qiao; |
495 | Pareto Invariant Risk Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, these compromises could easily lead to suboptimal performance of either the ERM or OOD objective. To address these issues, we introduce a multi-objective optimization (MOO) perspective to understand the OOD optimization process, and propose a new optimization scheme called PAreto Invariant Risk Minimization (PAIR). |
Yongqiang Chen; Kaiwen Zhou; Yatao Bian; Binghui Xie; Bingzhe Wu; Yonggang Zhang; MA KAILI; Han Yang; Peilin Zhao; Bo Han; James Cheng; |
496 | ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present ROSCOE, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics. |
Olga Golovneva; Moya Peng Chen; Spencer Poff; Martin Corredor; Luke Zettlemoyer; Maryam Fazel-Zarandi; Asli Celikyilmaz; |
497 | Variational Information Pursuit for Interpretable Predictions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Variational Information Pursuit (V-IP), a variational characterization of IP which bypasses the need for learning generative models. |
Aditya Chattopadhyay; Kwan Ho Ryan Chan; Benjamin David Haeffele; Donald Geman; Rene Vidal; |
498 | Deep Learning on Implicit Neural Representations of Shapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we put forward this research problem and propose inr2vec, a framework that can compute a compact latent representation for an input INR in a single inference pass. |
Luca De Luigi; Adriano Cardace; Riccardo Spezialetti; Pierluigi Zama Ramirez; Samuele Salti; Luigi di Stefano; |
499 | Image to Sphere: Learning Equivariant Features for Efficient Pose Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose a novel mapping of features from the image domain to the 3D rotation manifold. |
David Klee; Ondrej Biza; Robert Platt; Robin Walters; |
500 | Generalization and Estimation Error Bounds for Model-based Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks, and derive practical design rules that allow to construct model-based networks with guaranteed high generalization. |
Avner Shultzman; Eyar Azar; Miguel R. D. Rodrigues; Yonina C. Eldar; |
501 | Consolidator: Mergable Adapter with Group Connections for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, some of them bring heavy inference cost though benefiting storage. To tackle these problems, we propose consolidator to achieve efficient transfer learning for vision transformers. |
Tianxiang Hao; Hui Chen; Yuchen Guo; Guiguang Ding; |
502 | Multivariate Time-series Imputation with Disentangled Temporal Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from existing approaches, we propose TIDER, a novel matrix factorization-based method with disentangled temporal representations that account for multiple factors, namely trend, seasonality, and local bias, to model complex dynamics. |
SHUAI LIU; Xiucheng Li; Gao Cong; Yile Chen; YUE JIANG; |
503 | Characterizing The Influence of Graph Elements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the nodes/edges in a graph are interdependent in GCNs, it is challenging to derive influence functions for GCNs. To fill this gap, we started with the simple graph convolution (SGC) model that operates on an attributed graph, and formulated an influence function to approximate the changes of model parameters when a node or an edge is removed from an attributed graph. |
Zizhang Chen; Peizhao Li; Hongfu Liu; Pengyu Hong; |
504 | LipsFormer: Introducing Lipschitz Continuity to Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a Lipschitz continuous Transformer, called LipsFormer, to pursue training stability both theoretically and empirically for Transformer-based models. |
Xianbiao Qi; Jianan Wang; Yihao Chen; Yukai Shi; Lei Zhang; |
505 | Neuro-Symbolic Procedural Planning with Commonsense Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper proposes a neuro-symbolic procedural PLANner (PLAN) that elicits procedural planning knowledge from the LLMs with commonsense-infused prompting. |
Yujie Lu; Weixi Feng; Wanrong Zhu; Wenda Xu; Xin Eric Wang; Miguel Eckstein; William Yang Wang; |
506 | Robust Scheduling with GFlowNets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new approach to scheduling by sampling proportionally to the proxy metric using a novel GFlowNet method. |
David W Zhang; Corrado Rainone; Markus Peschl; Roberto Bondesan; |
507 | On The Performance of Temporal Difference Learning With Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we provide a convergence analysis of Neural TD Learning with a projection onto $B(\theta_0, \omega)$, a ball of fixed radius $\omega$ around the initial point $\theta_0$. |
HAOXING TIAN; Ioannis Paschalidis; Alex Olshevsky; |
508 | WikiWhy: Answering and Explaining Cause-and-Effect Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. |
Matthew Ho; Aditya Sharma; Justin Chang; Michael Saxon; Sharon Levy; Yujie Lu; William Yang Wang; |
509 | Data Augmentation Alone Can Improve Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proves that, contrary to previous findings, data augmentation alone can significantly boost accuracy and robustness in adversarial training. |
Lin Li; Michael W. Spratling; |
510 | Spikformer: When Spiking Neural Network Meets Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer). |
Zhaokun Zhou; Yuesheng Zhu; Chao He; Yaowei Wang; Shuicheng YAN; Yonghong Tian; Li Yuan; |
511 | NERDS: A General Framework to Train Camera Denoisers from Single Noisy Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To utilize this property, we can adopt noisy/clean image synthesis at low-resolution to train camera denoisers. On this basis, we propose a new solution pipeline — NERDS that estimates camera noises and synthesizes noisy-clean image pairs from only noisy images. |
Heewon Kim; Kyoung Mu Lee; |
512 | Modeling The Data-Generating Process Is Necessary for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the relationship between spurious attributes and the classification label, we obtain realizations of the canonical causal graph that characterize common distribution shifts and show that each shift entails different independence constraints over observed variables. |
Jivat Neet Kaur; Emre Kiciman; Amit Sharma; |
513 | Strong Inductive Biases Provably Prevent Harmless Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator’s inductive bias, i.e., how heavily the estimator favors solutions with a certain structure: while strong inductive biases prevent harmless interpolation, weak inductive biases can even require fitting noise to generalize well. |
Michael Aerni; Marco Milanta; Konstantin Donhauser; Fanny Yang; |
514 | Certified Training: Small Boxes Are All You Need Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the novel certified training method, SABR, which outperforms existing methods across perturbation magnitudes on MNIST, CIFAR-10, and TinyImageNet, in terms of both standard and certifiable accuracies. |
Mark Niklas Mueller; Franziska Eckert; Marc Fischer; Martin Vechev; |
515 | Efficient Certified Training and Robustness Verification of Neural ODEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite significant progress in robustness verification for standard feed-forward architectures, the verification of high dimensional NODEs remains an open problem. In this work we address this challenge and propose GAINS, an analysis framework for NODEs combining three key ideas: (i) a novel class of ODE solvers, based on variable but discrete time steps, (ii) an efficient graph representation of solver trajectories, and (iii) a novel abstraction algorithm operating on this graph representation. |
Mustafa Zeqiri; Mark Niklas Mueller; Marc Fischer; Martin Vechev; |
516 | Confidence Estimation Using Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. |
Chen Li; Xiaoling Hu; Chao Chen; |
517 | Neural Episodic Control with State Abstraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Neural Episodic Control with State Abstraction (NECSA), a simple but effective state abstraction-based episodic control containing a more comprehensive episodic memory, a novel state evaluation, and a multi-step state analysis. |
Zhuo Li; Derui Zhu; Yujing Hu; Xiaofei Xie; Lei Ma; YAN ZHENG; Yan Song; Yingfeng Chen; Jianjun Zhao; |
518 | Leveraging Large Language Models for Multiple Choice Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse tasks and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated. |
Joshua Robinson; David Wingate; |
519 | Relative Representations Enable Zero-shot Latent Space Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to adopt pairwise similarities as an alternative data representation, that can be used to enforce the desired invariance without any additional training. |
Luca Moschella; Valentino Maiorca; Marco Fumero; Antonio Norelli; Francesco Locatello; Emanuele Rodolà; |
520 | ILA-DA: Improving Transferability of Intermediate Level Attack with Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, it has been shown that simple image transformations can also enhance attack transferability. Based on these two observations, we propose ILA-DA, which employs three novel augmentation techniques to enhance ILA. |
Chiu Wai Yan; Tsz-Him Cheung; Dit-Yan Yeung; |
521 | Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training and that parameter-efficient scaling scales with model and dataset size. |
Zaid Khan; Yun Fu; |
522 | Real-time Variational Method for Learning Neural Trajectory and Its Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite the potential of real-time alternatives to give immediate feedback to experimentalists, and enhance experimental design, they have received markedly less attention. In this work, we introduce the exponential family variational Kalman filter (eVKF), an online recursive Bayesian method aimed at inferring latent trajectories while simultaneously learning the dynamical system generating them. |
Matthew Dowling; Yuan Zhao; Il Memming Park; |
523 | Minimalistic Unsupervised Learning with The Sparse Manifold Transform Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. |
Yubei Chen; Zeyu Yun; Yi Ma; Bruno Olshausen; Yann LeCun; |
524 | Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a self-ensemble protection (SEP) method to take advantage of intermediate checkpoints in a single training process for data protection. |
Sizhe Chen; Geng Yuan; Xinwen Cheng; Yifan Gong; Minghai Qin; Yanzhi Wang; Xiaolin Huang; |
525 | On The Duality Between Contrastive and Non-contrastive Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that contrastive and non-contrastive self-supervised methods can be shown to be closely related, and then study how implementation details impact performance. We validate empirically our findings and significantly improve known behaviours. |
Quentin Garrido; Yubei Chen; Adrien Bardes; Laurent Najman; Yann LeCun; |
526 | AGRO: Adversarial Discovery of Error-prone Groups for Robust Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose AGRO—Adversarial Group discovery for Distributionally Robust Optimization—an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them. |
Bhargavi Paranjape; Pradeep Dasigi; Vivek Srikumar; Luke Zettlemoyer; Hannaneh Hajishirzi; |
527 | Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. |
Zhang-Wei Hong; Remi Tachet des Combes; Pulkit Agrawal; Romain Laroche; |
528 | Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent’s network are simultaneously increased. |
Jesse Farebrother; Joshua Greaves; Rishabh Agarwal; Charline Le Lan; Ross Goroshin; Pablo Samuel Castro; Marc G Bellemare; |
529 | Self-Consistency Improves Chain of Thought Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. |
Xuezhi Wang; Jason Wei; Dale Schuurmans; Quoc V Le; Ed H. Chi; Sharan Narang; Aakanksha Chowdhery; Denny Zhou; |
530 | Investigating Multi-task Pretraining and Generalization in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose to investigate the generalization capabilities of a popular actor-critic method, IMPALA. |
Adrien Ali Taiga; Rishabh Agarwal; Jesse Farebrother; Aaron Courville; Marc G Bellemare; |
531 | ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a simple neural network building block called ChordMixer which can model the attention for long sequences with variable lengths. |
Ruslan Khalitov; Tong Yu; Lei Cheng; Zhirong Yang; |
532 | Personalized Federated Learning with Feature Alignment and Classifier Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct explicit local-global feature alignment by leveraging global semantic knowledge for learning a better representation. |
Jian Xu; Xinyi Tong; Shao-Lun Huang; |
533 | EA-HAS-Bench: Energy-aware Hyperparameter and Architecture Search Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we present the first large-scale energy-aware benchmark that allows studying AutoML methods to achieve better trade-offs between performance and search energy consumption, named EA-HAS-Bench. |
Shuguang Dou; XINYANG JIANG; Cai Rong Zhao; Dongsheng Li; |
534 | Distributionally Robust Recourse Action Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this assumption does not always hold in practice because of data distribution shifts, and in this case, the recourse action may become invalid. To redress this shortcoming, we propose the Distributionally Robust Recourse Action (DiRRAc) framework, which generates a recourse action that has high probability of being valid under a mixture of model shifts. |
Duy Nguyen; Ngoc Bui; Viet Anh Nguyen; |
535 | A Probabilistic Framework for Task-aligned Intra- and Inter-area Neural Manifold Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we propose a novel probabilistic framework that allows for interpretable partitioning of population variability within and across areas in the context of naturalistic behavior. |
Edoardo Balzani; Jean-Paul G Noel; Pedro Herrero-Vidal; Dora E Angelaki; Cristina Savin; |
536 | Block and Subword-Scaling Floating-Point (BSFP) : An Efficient Non-Uniform Quantization For Low Precision Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Block and Subword-Scaling Floating-Point (BSFP), a non-uniform quantization scheme for the skewed and non-uniform distribution of weight vectors in neural networks. |
Yun-Chen Lo; Tse-Kuang Lee; Ren-Shuo Liu; |
537 | Simple Yet Effective Graph Contrastive Learning for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues that negatively impact the generality and robustness of CL-based recommenders. |
Xuheng Cai; Chao Huang; Lianghao Xia; Xubin Ren; |
538 | Unbiased Supervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we tackle the problem of learning representations that are robust to biases. |
Carlo Alberto Barbano; Benoit Dufumier; Enzo Tartaglione; Marco Grangetto; Pietro Gori; |
539 | SQA3D: Situated Question Answering in 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D). |
Xiaojian Ma; Silong Yong; Zilong Zheng; Qing Li; Yitao Liang; Song-Chun Zhu; Siyuan Huang; |
540 | Data Valuation Without Training of A Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. |
Ki Nohyun; Hoyong Choi; Hye Won Chung; |
541 | HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present $\texttt{HotProtein}$, a large-scale protein dataset with \textit{growth temperature} annotations of thermostability, containing $182$K amino acid sequences and $3$K folded structures from $230$ different species with a wide temperature range $-20^{\circ}\texttt{C}\sim 120^{\circ}\texttt{C}$. |
Tianlong Chen; Chengyue Gong; Daniel Jesus Diaz; Xuxi Chen; Jordan Tyler Wells; qiang liu; Zhangyang Wang; Andrew Ellington; Alex Dimakis; Adam Klivans; |
542 | Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle these issues, in this paper, we propose Switch-NeRF, a novel end-to-end large-scale NeRF with learning-based scene decomposition. |
Zhenxing MI; Dan Xu; |
543 | Measuring Forgetting of Memorized Training Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When models are trained on large datasets, we show that privacy attacks become less effective on examples seen early in training, and investigate why. |
Matthew Jagielski; Om Thakkar; Florian Tramer; Daphne Ippolito; Katherine Lee; Nicholas Carlini; Eric Wallace; Shuang Song; Abhradeep Guha Thakurta; Nicolas Papernot; Chiyuan Zhang; |
544 | Fundamental Limits in Formal Verification of Message-Passing Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that in the context of Message Passing Neural Networks (MPNN), a common Graph Neural Network (GNN) model, formal verification is impossible. |
Marco Sälzer; Martin Lange; |
545 | Part-Based Models Improve Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification. |
Chawin Sitawarin; Kornrapat Pongmala; Yizheng Chen; Nicholas Carlini; David Wagner; |
546 | GLM-130B: An Open Bilingual Pre-trained Model Related Papers Related Patents Related Grants&nb |