# Paper Digest: ICML 2022 Highlights

Download ICML-2022-Paper-Digests.pdf– highlights of all ICML-2022 papers. Readers can choose to read all these highlights on our console, which allows users to filter out papers using keywords, authors and find related papers, patents, grants, experts, organizations and code. To browse papers by author and review each author’s research work, here is a list of top ICML-2022 authors, sorted by #papers accepted. You may also like to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.

The Internationl Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2022, it is to be held in Baltimore, US.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.

If you do not want to miss interesting academic papers, you are welcome to **sign up our daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

New York City, New York, 10017

team@paperdigest.org

#### TABLE 1: Paper Digest: ICML 2022 Highlights

Paper | Author(s) | |
---|---|---|

1 | PAC-Bayesian Bounds on Rate-Efficient ClassifiersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We derive analytic bounds on the noise invariance of majority vote classifiers operating on compressed inputs. |
Alhabib Abbas; Yiannis Andreopoulos; |

2 | Sharp-MAML: Sharpness-Aware Model-Agnostic Meta LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Specifically, the loss landscape of MAML is much complex with possibly many more saddle points and local minima than its empirical risk minimization counterpart. To address this challenge, we leverage the recently invented sharpness-aware minimization and develop a sharpness-aware MAML approach that we term Sharp-MAML. |
Momin Abbas; Quan Xiao; Lisha Chen; Pin-Yu Chen; Tianyi Chen; |

3 | An Initial Alignment Between Neural Network and Target Is Needed for Gradient Descent to LearnRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper introduces the notion of “Initial Alignment” (INAL) between a neural network at initialization and a target function. |
Emmanuel Abbe; Elisabetta Cornacchia; Jan Hazla; Christopher Marquis; |

4 | Active Sampling for Min-Max FairnessRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose simple active sampling and reweighting strategies for optimizing min-max fairness that can be applied to any classification or regression model learned via loss minimization. |
Jacob D Abernethy; Pranjal Awasthi; Matth?us Kleindessner; Jamie Morgenstern; Chris Russell; Jie Zhang; |

5 | Meaningfully Debugging Model Mistakes Using Conceptual Counterfactual ExplanationsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a systematic approach, conceptual counterfactual explanations (CCE), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts (e.g. this zebra is misclassified as a dog because of faint stripes). |
Abubakar Abid; Mert Yuksekgonul; James Zou; |

6 | Batched Dueling BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the batched K-armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality. |
Arpit Agarwal; Rohan Ghuge; Viswanath Nagarajan; |

7 | Hierarchical Shrinkage: Improving The Accuracy and Interpretability of Tree-based ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm which regularizes the tree not by altering its structure, but by shrinking the prediction over each leaf toward the sample means over each of its ancestors, with weights depending on a single regularization parameter and the number of samples in each ancestor. |
Abhineet Agarwal; Yan Shuo Tan; Omer Ronen; Chandan Singh; Bin Yu; |

8 | Deep Equilibrium Networks Are Sensitive to Initialization StatisticsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. |
Atish Agarwala; Samuel S Schoenholz; |

9 | Learning of Cluster-based Feature Importance for Electronic Health Record Time-seriesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a supervised deep learning model to cluster EHR data based on the identification of clinically understandable phenotypes with regard to both outcome prediction and patient trajectory. |
Henrique Aguiar; Mauro Santos; Peter Watkinson; Tingting Zhu; |

10 | On The Convergence of The Shapley Value in Parametric Bayesian Learning GamesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we establish the convergence property of the Shapley value in parametric Bayesian learning games where players perform a Bayesian inference using their combined data, and the posterior-prior KL divergence is used as the characteristic function. |
Lucas Agussurja; Xinyi Xu; Bryan Kian Hsiang Low; |

11 | Individual Preference Stability for ClusteringRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. |
Saba Ahmadi; Pranjal Awasthi; Samir Khuller; Matth?us Kleindessner; Jamie Morgenstern; Pattara Sukprasert; Ali Vakilian; |

12 | Understanding The Unstable Convergence of Gradient DescentRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, many works have observed that in machine learning applications step sizes often do not fulfill this condition, yet (stochastic) gradient descent still converges, albeit in an unstable manner. We investigate this unstable convergence phenomenon from first principles, and discuss key causes behind it. |
Kwangjun Ahn; Jingzhao Zhang; Suvrit Sra; |

13 | Minimum Cost Intervention Design for Causal Effect IdentificationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we consider the problem of designing the collection of interventions with the minimum cost to identify the desired effect. |
Sina Akbari; Jalal Etesami; Negar Kiyavash; |

14 | How Faithful Is Your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we introduce a 3-dimensional evaluation metric, ($\alpha$-Precision, $\beta$-Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. |
Ahmed Alaa; Boris Van Breugel; Evgeny S. Saveliev; Mihaela van der Schaar; |

15 | A Natural Actor-Critic Framework for Zero-Sum Markov GamesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. |
Ahmet Alacaoglu; Luca Viano; Niao He; Volkan Cevher; |

16 | Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced RepresentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: By leveraging Holographic Reduced Representations (HRRs), we create a neural network with a pseudo-encryption style defense that empirically shows robustness to attack, even under threat models that unrealistically favor the adversary. |
Mohammad Mahmudul Alam; Edward Raff; Tim Oates; James Holt; |

17 | Optimistic Linear Support and Successor Features As A Basis for Optimal Policy TransferRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, the identified solutions are not guaranteed to be optimal. We introduce a novel algorithm that addresses this limitation. |
Lucas Nunes Alegre; Ana Bazzan; Bruno C. Da Silva; |

18 | Structured Stochastic Gradient MCMCRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. To relax these assumptions, this work proposes a new non-parametric variational inference scheme that combines ideas from both SGMCMC and coordinate-ascent VI. |
Antonios Alexos; Alex J Boyd; Stephan Mandt; |

19 | XAI for Transformers: Better Explanations Through Conservative PropagationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. |
Ameen Ali; Thomas Schnake; Oliver Eberle; Gr?goire Montavon; Klaus-Robert M?ller; Lior Wolf; |

20 | RUMs from Head-to-Head ContestsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we focus on slates of size two representing head-to-head contests. |
Matteo Almanza; Flavio Chierichetti; Ravi Kumar; Alessandro Panconesi; Andrew Tomkins; |

21 | Neuro-Symbolic Language Modeling with Automaton-augmented RetrievalRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we present RetoMaton – retrieval automaton – which approximates the datastore search, based on (1) saving pointers between consecutive datastore entries, and (2) clustering of entries into "states". |
Uri Alon; Frank Xu; Junxian He; Sudipta Sengupta; Dan Roth; Graham Neubig; |

22 | Minimax Classification Under Concept Drift with Multidimensional Adaptation and Performance GuaranteesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This paper presents adaptive minimax risk classifiers (AMRCs) that account for multidimensional time changes by means of a multivariate and high-order tracking of the time-varying underlying distribution. |
Ver?nica ?lvarez; Santiago Mazuelas; Jose A Lozano; |

23 | Scalable First-Order Bayesian Optimization Via Structured Automatic DifferentiationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Here, we observe that a wide range of kernels gives rise to structured matrices, enabling an exact $O(n^2d)$ matrix-vector multiply for gradient observations and $O(n^2d^2)$ for Hessian observations. Beyond canonical kernel classes, we derive a programmatic approach to leveraging this type of structure for transformations and combinations of the discussed kernel classes, which constitutes a structure-aware automatic differentiation algorithm. |
Sebastian E Ament; Carla P Gomes; |

24 | Public Data-Assisted Mirror Descent for Private Model TrainingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. |
Ehsan Amid; Arun Ganesh; Rajiv Mathews; Swaroop Ramaswamy; Shuang Song; Thomas Steinke; Thomas Steinke; Vinith M Suriyakumar; Om Thakkar; Abhradeep Thakurta; |

25 | On Last-Iterate Convergence Beyond Zero-Sum GamesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we provide new results and techniques that apply to broader families of games and learning dynamics. |
Ioannis Anagnostides; Ioannis Panageas; Gabriele Farina; Tuomas Sandholm; |

26 | Online Algorithms with Multiple PredictionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the performance of the best solution obtained from the predictions. |
Keerti Anand; Rong Ge; Amit Kumar; Debmalya Panigrahi; |

27 | Learning to Hash Robustly, GuaranteedRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms, while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance on the minimum-performing query. |
Alexandr Andoni; Daniel Beaglehole; |

28 | Set Based Stochastic SubsamplingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). |
Bruno Andreis; Seanie Lee; A. Tuan Nguyen; Juho Lee; Eunho Yang; Sung Ju Hwang; |

29 | Towards Understanding Sharpness-Aware MinimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. |
Maksym Andriushchenko; Nicolas Flammarion; |

30 | Fair and Fast K-Center Clustering for Data SummarizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider two key issues faced by many clustering methods when used for data summarization, namely (a) an unfair representation of "demographic groups” and (b) distorted summarizations, where data points in the summary represent subsets of the original data of vastly different sizes. |
Haris Angelidakis; Adam Kurpisz; Leon Sering; Rico Zenklusen; |

31 | Interactive Correlation Clustering with Existential Cluster ConstraintsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we introduce existential cluster constraints: a new form of feedback where users indicate the features of desired clusters. |
Rico Angell; Nicholas Monath; Nishant Yadav; Andrew Mccallum; |

32 | Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in ImagingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Current algorithms, however, do not generally offer statistical guarantees that protect against a model’s mistakes and hallucinations. To address this, we develop uncertainty quantification techniques with rigorous statistical guarantees for image-to-image regression problems. |
Anastasios N Angelopoulos; Amit Pal Kohli; Stephen Bates; Michael Jordan; Jitendra Malik; Thayer Alshaabi; Srigokul Upadhyayula; Yaniv Romano; |

33 | AdaGrad Avoids Saddle PointsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we focus on the AdaGrad family of algorithms – from scalar to full-matrix preconditioning – and we examine the question of whether the method’s trajectories avoid saddle points. |
Kimon Antonakopoulos; Panayotis Mertikopoulos; Georgios Piliouras; Xiao Wang; |

34 | UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate GuaranteesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Our paper aims to bridge this gap by providing a scalable universal method – dubbed UnDERGrad – which enjoys an almost dimension-free oracle complexity in problems with a favorable geometry (like the simplex, $\ell_1$-ball or trace-constraints), while retaining the order-optimal dependence on T described above. |
Kimon Antonakopoulos; Dong Quan Vu; Volkan Cevher; Kfir Levy; Panayotis Mertikopoulos; |

35 | Adapting The Linearised Laplace Model Evidence for Modern Deep LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we examine the assumptions behind this method, particularly in conjunction with model selection. |
Javier Antoran; David Janz; James U Allingham; Erik Daxberger; Riccardo Rb Barbano; Eric Nalisnick; Jose Miguel Hernandez-Lobato; |

36 | EAT-C: Environment-Adversarial Sub-Task Curriculum for Efficient Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Reinforcement learning (RL) is inefficient on long-horizon tasks due to sparse rewards and its policy can be fragile to slightly perturbed environments. We address these challenges via a curriculum of tasks with coupled environments, generated by two policies trained jointly with RL: (1) a co-operative planning policy recursively decomposing a hard task into a coarse-to-fine sub-task tree; and (2) an adversarial policy modifying the environment in each sub-task. |
Shuang Ao; Tianyi Zhou; Jing Jiang; Guodong Long; Xuan Song; Chengqi Zhang; |

37 | Online Balanced Experimental DesignRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we present algorithms that build on recent advances in online discrepancy minimization which accommodate both arbitrary treatment probabilities and multiple treatments. |
David Arbour; Drew Dimmery; Tung Mai; Anup Rao; |

38 | VariGrow: Variational Architecture Growing for Task-Agnostic Continual Learning Based on Bayesian NoveltyRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper proposes a variational architecture growing framework dubbed VariGrow. |
Randy Ardywibowo; Zepeng Huo; Zhangyang Wang; Bobak J Mortazavi; Shuai Huang; Xiaoning Qian; |

39 | Thresholded Lasso BanditRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. |
Kaito Ariu; Kenshi Abe; Alexandre Proutiere; |

40 | Gradient Based ClusteringRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. |
Aleksandar Armacki; Dragana Bajovic; Dusan Jakovetic; Soummya Kar; |

41 | Understanding Gradient Descent on The Edge of Stability in Deep LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss. |
Sanjeev Arora; Zhiyuan Li; Abhishek Panigrahi; |

42 | Private Optimization in The Interpolation Regime: Faster Rates and Hardness ResultsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we investigate differentially private stochastic optimization in the interpolation regime. |
Hilal Asi; Karan Chadha; Gary Cheng; John Duchi; |

43 | Optimal Algorithms for Mean Estimation Under Local Differential PrivacyRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we investigate the question of designing the randomizer with the smallest variance. |
Hilal Asi; Vitaly Feldman; Kunal Talwar; |

44 | Asymptotically-Optimal Gaussian Bandits with Side ObservationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: The LP optimizes the cost (regret) required to reliably estimate the suboptimality gap of each arm. This LP lower bound motivates our main contribution: the first known asymptotically optimal algorithm for this general setting. |
Alexia Atsidakou; Orestis Papadigenopoulos; Constantine Caramanis; Sujay Sanghavi; Sanjay Shakkottai; |

45 | Congested Bandits: Optimal Routing Via Short-term ResetsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: For the multi-armed setup, we propose a UCB style algorithm and show that its policy regret scales as $\tilde{O}(\sqrt{K \Delta T})$.Motivated by this, we introduce the problem of Congested Bandits where each arm’s reward is allowed to depend on the number of times it was played in the past $\Delta$ timesteps. |
Pranjal Awasthi; Kush Bhatia; Sreenivas Gollapudi; Kostas Kollias; |

46 | Do More Negative Samples Necessarily Hurt In Contrastive Learning?Related Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. |
Pranjal Awasthi; Nishanth Dikkala; Pritish Kamath; |

47 | H-Consistency Bounds for Surrogate Loss MinimizersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a detailed study of estimation errors in terms of surrogate loss estimation errors. |
Pranjal Awasthi; Anqi Mao; Mehryar Mohri; Yutao Zhong; |

48 | Iterative Hard Thresholding with Adaptive Regularization: Sparser Solutions Without Sacrificing RuntimeRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a simple modification to the iterative hard thresholding (IHT) algorithm, which recovers asymptotically sparser solutions as a function of the condition number. |
Kyriakos Axiotis; Maxim Sviridenko; |

49 | Proving Theorems Using Incremental Learning and Hindsight Experience ReplayRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we adapt the idea of hindsight experience replay from reinforcement learning to the automated theorem proving domain, so as to use the intermediate data generated during unsuccessful proof attempts. |
Eser Ayg?n; Ankit Anand; Laurent Orseau; Xavier Glorot; Stephen M Mcaleer; Vlad Firoiu; Lei M Zhang; Doina Precup; Shibl Mourad; |

50 | Near-optimal Rate of Consistency for Linear Models with Missing ValuesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. |
Alexis Ayme; Claire Boyer; Aymeric Dieuleveut; Erwan Scornet; |

51 | How Tempering Fixes Data Augmentation in Bayesian Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work we identify two interlaced factors concurrently influencing the strength of the cold posterior effect, namely the correlated nature of augmentations and the degree of invariance of the employed model to such transformations. |
Gregor Bachmann; Lorenzo Noci; Thomas Hofmann; |

52 | ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGDRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce (i) $\mathtt{ASAP.SGD}$, an analytical framework capturing necessary and desired properties of staleness-adaptive step size functions and (ii) \textsc{tail}-$\tau$, a method for utilizing key properties of the execution instance, generating a tailored strategy that not only dampens the impact of stale updates, but also leverages fresh ones. |
Karl B?ckstr?m; Marina Papatriantafilou; Philippas Tsigas; |

53 | From Noisy Prediction to True Label: Noisy Prediction Calibration Via Generative ModelRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We suggest a new branch of method, Noisy Prediction Calibration (NPC) in learning with noisy labels. |
Heesun Bae; Seungjae Shin; Byeonghu Na; Joonho Jang; Kyungwoo Song; Il-Chul Moon; |

54 | Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and LanguageRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. |
Alexei Baevski; Wei-Ning Hsu; Qiantong Xu; Arun Babu; Jiatao Gu; Michael Auli; |

55 | End-to-End Balancing for Causal Continuous Treatment-Effect EstimationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a new theory for consistency of entropy balancing for continuous treatments. |
Taha Bahadori; Eric Tchetgen Tchetgen; David Heckerman; |

56 | A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed GraphsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned Kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. |
Lu Bai; Lixin Cui; Hancock Edwin; |

57 | Near-Optimal Learning of Extensive-Form Games with Imperfect InformationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. |
Yu Bai; Chi Jin; Song Mei; Tiancheng Yu; |

58 | Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label ClassificationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a novel contrastive learning boosted multi-label prediction model based on a Gaussian mixture variational autoencoder (C-GMVAE), which learns a multimodal prior space and employs a contrastive loss. |
Junwen Bai; Shufeng Kong; Carla P Gomes; |

59 | A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-quality speech. To address this problem, we propose our framework, Alignment-Aware Acoustic-Text Pretraining (A$^3$T), which reconstructs masked acoustic signals with text input and acoustic-text alignment during training. |
He Bai; Renjie Zheng; Junkun Chen; Mingbo Ma; Xintong Li; Liang Huang; |

60 | Stability Based Generalization Bounds for Exponential Family Langevin DynamicsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we unify and substantially generalize stability based generalization bounds and make three technical contributions. |
Arindam Banerjee; Tiancong Chen; Xinyan Li; Yingxue Zhou; |

61 | Certified Neural Network Watermarks with Randomized SmoothingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose the first certifiable watermarking method. |
Arpit Bansal; Ping-Yeh Chiang; Michael J Curry; Rajiv Jain; Curtis Wigington; Varun Manjunatha; John P Dickerson; Tom Goldstein; |

62 | Data Scaling Laws in NMT: The Effect of Noise and ArchitectureRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). |
Yamini Bansal; Behrooz Ghorbani; Ankush Garg; Biao Zhang; Colin Cherry; Behnam Neyshabur; Orhan Firat; |

63 | Learning Stable Classifiers By Transferring Unstable FeaturesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we explicitly inform the target classifier about unstable features in the source tasks. |
Yujia Bao; Shiyu Chang; Dr.Regina Barzilay; |

64 | Fast Composite Optimization and Statistical Recovery in Federated LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: From optimization upfront, we propose a new algorithm named Fast Federated Dual Averaging for strongly convex and smooth loss and establish state-of-the-art iteration and communication complexity in the composite setting. |
Yajie Bao; Michael Crawshaw; Shan Luo; Mingrui Liu; |

65 | Generative Modeling for Multi-task Visual LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, motivated by multi-task learning of shareable feature representations, we consider a novel problem of learning a shared generative model that is useful across various visual perception tasks. |
Zhipeng Bao; Martial Hebert; Yu-Xiong Wang; |

66 | Estimating The Optimal Covariance with Imperfect Mean in Diffusion Probabilistic ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we consider diagonal and full covariances to improve the expressive power of DPMs. |
Fan Bao; Chongxuan Li; Jiacheng Sun; Jun Zhu; Bo Zhang; |

67 | On The Surrogate Gap Between Contrastive and Supervised LossesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Following the simplified setting where positive pairs are drawn from the true distribution (not generated by data augmentation; as supposed in previous studies), this study establishes surrogate upper and lower bounds for the downstream classification loss for all negative sample sizes that best explain the empirical observations on the negative sample size in the earlier studies. |
Han Bao; Yoshihiro Nagano; Kento Nozawa; |

68 | Representation Topology Divergence: A Method for Comparing Neural Network RepresentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a method for comparing two data representations. |
Serguei Barannikov; Ilya Trofimov; Nikita Balabin; Evgeny Burnaev; |

69 | Sparse Mixed Linear Regression with Guarantees: Taming An Intractable Problem with Invex RelaxationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we study the problem of sparse mixed linear regression on an unlabeled dataset that is generated from linear measurements from two different regression parameter vectors. |
Adarsh Barik; Jean Honorio; |

70 | Neural Fisher Discriminant Analysis: Optimal Neural Network Embeddings in Polynomial TimeRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce a natural extension of FLDA that employs neural networks, called Neural Fisher Discriminant Analysis (NFDA). |
Burak Bartan; Mert Pilanci; |

71 | Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic GamesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model called discounted stochastic game (Shapley, 1953). |
Lucas Baudin; Rida Laraki; |

72 | Information Discrepancy in Strategic LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We initiate the study of the effects of non-transparency in decision rules on individuals’ ability to improve in strategic learning settings. |
Yahav Bechavod; Chara Podimata; Steven Wu; Juba Ziani; |

73 | On The Hidden Biases of Policy Mirror Ascent in Continuous Action SpacesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. |
Amrit Singh Bedi; Souradip Chakraborty; Anjaly Parayil; Brian M Sadler; Pratap Tokekar; Alec Koppel; |

74 | Imitation Learning By Estimating Expertise of DemonstratorsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms. |
Mark Beliaev; Andy Shih; Stefano Ermon; Dorsa Sadigh; Ramtin Pedarsani; |

75 | Matching Normalizing Flows and Probability Paths on ManifoldsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose to train CNFs on manifolds by minimizing probability path divergence (PPD), a novel family of divergences between the probability density path generated by the CNF and a target probability density path. |
Heli Ben-Hamu; Samuel Cohen; Joey Bose; Brandon Amos; Maximillian Nickel; Aditya Grover; Ricky T. Q. Chen; Yaron Lipman; |

76 | Stochastic Contextual Dueling Bandits Under Linear Stochastic Transitivity ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a computationally efficient algorithm, \Algo{CoLSTIM}, which makes its choice based on imitating the feedback process using perturbed context-dependent utility estimates of the underlying CoLST model. |
Viktor Bengs; Aadirupa Saha; Eyke H?llermeier; |

77 | Neural Inverse KinematicRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose a neural IK method that employs the hierarchical structure of the problem to sequentially sample valid joint angles conditioned on the desired position and on the preceding joints along the chain. |
Raphael Bensadoun; Shir Gur; Nitsan Blau; Lior Wolf; |

78 | Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian ProcessesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To address this fundamental limitation, we show how to re-cast a class of stochastic volatility models as a hierarchical Gaussian process (GP) model with specialized covariance functions. |
Gregory Benton; Wesley Maddox; Andrew Gordon Wilson; |

79 | Gradient Descent on Neurons and Its Link to Approximate Second-order OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This challenges widely held believes and immediately raises the question why KFAC performs so well. Towards answering this question we present evidence strongly suggesting that KFAC approximates a first-order algorithm, which performs gradient descent on neurons rather than weights. |
Frederik Benzing; |

80 | Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft ConstraintsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the hard-threshold problem of achieving sublinear regret while guaranteeing that the threshold constraint is satisfied at every iteration with high probability. |
Martino Bernasconi; Federico Cacciamani; Matteo Castiglioni; Alberto Marchesi; Nicola Gatti; Francesco Trov?; |

81 | Skin Deep Unlearning: Artefact and Instrument Debiasing in The Context of Melanoma ClassificationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we robustly remove bias and spurious variation from an automated melanoma classification pipeline using two leading bias unlearning techniques. |
Peter Bevan; Amir Atapour-Abarghouei; |

82 | Approximate Bayesian Computation with Domain Expert in The LoopRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we introduce an active learning method for ABC statistics selection which reduces the domain expert’s work considerably. |
Ayush Bharti; Louis Filstroff; Samuel Kaski; |

83 | Minimax M-estimation Under Adversarial ContaminationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To illustrate the usefulness of the derived robust M-estimator in an online setting, we present a bandit algorithm for the partially identifiable best arm identification problem that improves upon the sample complexity of the state of the art algorithms. |
Sujay Bhatt; Guanhua Fang; Ping Li; Gennady Samorodnitsky; |

84 | Nearly Optimal Catoni’s M-estimator for Infinite VarianceRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we extend the remarkable M-estimator of Catoni \citep{Cat12} to situations where the variance is infinite. |
Sujay Bhatt; Guanhua Fang; Ping Li; Gennady Samorodnitsky; |

85 | Personalization Improves Privacy-Accuracy Tradeoffs in Federated LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint) differential privacy. |
Alberto Bietti; Chen-Yu Wei; Miroslav Dudik; John Langford; Steven Wu; |

86 | Non-Vacuous Generalisation Bounds for Shallow Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. |
Felix Biggs; Benjamin Guedj; |

87 | Structure-preserving GANsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by developing new variational representations for divergences. |
Jeremiah Birrell; Markos Katsoulakis; Luc Rey-Bellet; Wei Zhu; |

88 | Scalable Spike-and-SlabRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this article, we propose Scalable Spike-and-Slab (S^3), a scalable Gibbs sampling implementation for high-dimensional Bayesian regression with the continuous spike-and-slab prior of George & McCulloch (1993). |
Niloy Biswas; Lester Mackey; Xiao-Li Meng; |

89 | Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate A Combination of The Same Core QuantitiesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: The goal of this paper is to recognize common objectives as well as to identify the implicit scoring functions of different OOD detection methods. |
Julian Bitterwolf; Alexander Meinke; Maximilian Augustin; Matthias Hein; |

90 | A Query-optimal Algorithm for Finding CounterfactualsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. |
Guy Blanc; Caleb Koch; Jane Lange; Li-Yang Tan; |

91 | Popular Decision Tree Algorithms Are Provably Noise TolerantRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. |
Guy Blanc; Jane Lange; Ali Malik; Li-Yang Tan; |

92 | Optimizing Sequential Experimental Design with Deep Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, these methods may not sufficiently explore the design space, require access to a differentiable probabilistic model and can only optimize over continuous design spaces. Here, we address these limitations by showing that the problem of optimizing policies can be reduced to solving a Markov decision process (MDP). |
Tom Blau; Edwin V. Bonilla; Iadine Chades; Amir Dezfouli; |

93 | Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)Related Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. |
Huang Bojun; |

94 | Generalized Results for The Existence and Consistency of The MLE in The Bradley-Terry-Luce ModelRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we study the performance of the Bradley-Terry-Luce model for ranking from pairwise comparison data under more realistic settings than those considered in the literature so far. |
Heejong Bong; Alessandro Rinaldo; |

95 | How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment PerspectiveRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Leveraging NTK theory, we show theoretically that gradient descent drives layerwise weight updates that are aligned with their input activity correlations weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. |
Akhilan Boopathy; Ila Fiete; |

96 | Improving Language Models By Retrieving from Trillions of TokensRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. |
Sebastian Borgeaud; Arthur Mensch; Jordan Hoffmann; Trevor Cai; Eliza Rutherford; Katie Millican; George Bm Van Den Driessche; Jean-Baptiste Lespiau; Bogdan Damoc; Aidan Clark; Diego De Las Casas; Aurelia Guy; Jacob Menick; Roman Ring; Tom Hennigan; Saffron Huang; Loren Maggiore; Chris Jones; Albin Cassirer; Andy Brock; Michela Paganini; Geoffrey Irving; Oriol Vinyals; Simon Osindero; Karen Simonyan; Jack Rae; Erich Elsen; Laurent Sifre; |

97 | Lie Point Symmetry Data Augmentation for Neural PDE SolversRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Thus, we are presented with a proverbial chicken-and-egg problem. In this paper, we present a method, which can partially alleviate this problem, by improving neural PDE solver sample complexity—Lie point symmetry data augmentation (LPSDA). |
Johannes Brandstetter; Max Welling; Daniel E Worrall; |

98 | An Iterative Clustering Algorithm for The Contextual Stochastic Block Model with Optimality GuaranteesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. |
Guillaume Braun; Hemant Tyagi; Christophe Biernacki; |

99 | Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical SystemsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Motivated by the emerging principles of dendritic computation, we augment a dynamically interpretable and mathematically tractable piecewise-linear (PL) recurrent neural network (RNN) by a linear spline basis expansion. |
Manuel Brenner; Florian Hess; Jonas M Mikhaeil; Leonard F Bereska; Zahra Monfared; Po-Chen Kuo; Daniel Durstewitz; |

100 | Learning to Predict Graphs with Fused Gromov-Wasserstein BarycentersRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. |
Luc Brogat-Motte; R?mi Flamary; Celine Brouard; Juho Rousu; Florence D?Alch?-Buc; |

101 | Efficient Learning of CNNs Using Patch Based FeaturesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Recent work has demonstrated the effectiveness of using patch based representations when learning from image data. Here we provide theoretical support for this observation, by showing that a simple semi-supervised algorithm that uses patch statistics can efficiently learn labels produced by a one-hidden-layer Convolutional Neural Network (CNN). |
Alon Brutzkus; Amir Globerson; Eran Malach; Alon Regev Netser; Shai Shalev-Schwartz; |

102 | Causal Structure-based Root Cause Analysis of OutliersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a formal method to identify "root causes" of outliers, amongst variables. |
Kailash Budhathoki; Lenon Minorics; Patrick Bloebaum; Dominik Janzing; |

103 | IGLUE: A Benchmark for Transfer Learning Across Modalities, Tasks, and LanguagesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. |
Emanuele Bugliarello; Fangyu Liu; Jonas Pfeiffer; Siva Reddy; Desmond Elliott; Edoardo Maria Ponti; Ivan Vulic; |

104 | Interactive Inverse Reinforcement Learning for Cooperative GamesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. |
Thomas Kleine B?ning; Anne-Marie George; Christos Dimitrakakis; |

105 | Convolutional and Residual Networks Provably Contain Lottery TicketsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability. |
Rebekka Burkholz; |

106 | Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest PathRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce a new algorithm with stronger sample complexity bounds than existing ones. |
Haoyuan Cai; Tengyu Ma; Simon Du; |

107 | Convergence of Invariant Graph NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we investigate the convergence of one powerful GNN, Invariant Graph Network (IGN) over graphs sampled from graphons. |
Chen Cai; Yusu Wang; |

108 | Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample EfficiencyRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an $\epsilon$-optimal policy within $O(1/\epsilon^2)$ episodes. |
Qi Cai; Zhuoran Yang; Zhaoran Wang; |

109 | Scaling Gaussian Process Optimization By Evaluating A Few Unique Candidates Multiple TimesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We show that sequential black-box optimization based on GPs (GP-Opt) can be made efficient by sticking to a candidate solution for multiple evaluation steps and switch only when necessary. |
Daniele Calandriello; Luigi Carratino; Alessandro Lazaric; Michal Valko; Lorenzo Rosasco; |

110 | Adaptive Gaussian Process Change Point DetectionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Detecting change points in time series, i.e., points in time at which some observed process suddenly changes, is a fundamental task that arises in many real-world applications, with consequences for safety and reliability. In this work, we propose ADAGA, a novel Gaussian process-based solution to this problem, that leverages a powerful heuristics we developed based on statistical hypothesis testing. |
Edoardo Caldarelli; Philippe Wenk; Stefan Bauer; Andreas Krause; |

111 | Measuring Dissimilarity with Diffeomorphism InvarianceRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data’s internal structure to be invariant to diffeomorphisms. |
Th?ophile Cantelobre; Carlo Ciliberto; Benjamin Guedj; Alessandro Rudi; |

112 | A Model-Agnostic Randomized Learning Framework Based on Random Hypothesis Subspace SamplingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a model-agnostic randomized learning framework based on Random Hypothesis Subspace Sampling (RHSS). |
Yiting Cao; Chao Lan; |

113 | Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical ApplicationsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, state-of-the-art techniques for safety-critical settings hinge on the assumption that the kernel hyperparameters are known, which does not apply in general. To mitigate this, we introduce robust Gaussian process uniform error bounds in settings with unknown hyperparameters. |
Alexandre Capone; Armin Lederer; Sandra Hirche; |

114 | Burst-Dependent Plasticity and Dendritic Amplification Support Target-Based Learning and Hierarchical Imitation LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a multi-compartment model of pyramidal neuron, in which bursts and dendritic input segregation give the possibility to plausibly support a biological target-based learning. |
Cristiano Capone; Cosimo Lupo; Paolo Muratore; Pier Stanislao Paolucci; |

115 | A Marriage Between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame SolvingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In particular, we propose a new, suitable game representation that we call team-public-information, in which a team is represented as a single coordinator who only knows information common to the whole team and prescribes to each member an action for any possible private state. |
Luca Carminati; Federico Cacciamani; Marco Ciccone; Nicola Gatti; |

116 | RECAPP: Crafting A More Efficient Catalyst for Convex OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. |
Yair Carmon; Arun Jambulapati; Yujia Jin; Aaron Sidford; |

117 | Estimating and Penalizing Induced Preference Shifts in Recommender SystemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We focus on induced preference shifts in users. |
Micah D Carroll; Anca Dragan; Stuart Russell; Dylan Hadfield-Menell; |

118 | YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for EveryoneRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. |
Edresson Casanova; Julian Weber; Christopher D Shulby; Arnaldo Candido Junior; Eren G?lge; Moacir A Ponti; |

119 | The Infinite Contextual Graph Markov ModelRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: As with most Deep Graph Networks, an inherent limitation is the need to perform an extensive model selection to choose the proper size of each layer’s latent representation. In this paper, we address this problem by introducing the Infinite Contextual Graph Markov Model (iCGMM), the first deep Bayesian nonparametric model for graph learning. |
Daniele Castellana; Federico Errica; Davide Bacciu; Alessio Micheli; |

120 | Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned DataRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. |
Timothy J Castiglia; Anirban Das; Shiqiang Wang; Stacy Patterson; |

121 | Online Learning with Knapsacks: The Best of Both WorldsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$ resource constraints. |
Matteo Castiglioni; Andrea Celli; Christian Kroer; |

122 | Stabilizing Off-Policy Deep Reinforcement Learning from PixelsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards. |
Edoardo Cetin; Philip J Ball; Stephen Roberts; Oya Celiktutan; |

123 | Accelerated, Optimal and Parallel: Some Results on Model-based Stochastic OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose an acceleration scheme for the APROX family and provide non-asymptotic convergence guarantees, which are order-optimal in all problem-dependent constants and provide even larger minibatching speedups. |
Karan Chadha; Gary Cheng; John Duchi; |

124 | Robust Imitation Learning Against Variations in Environment DynamicsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a robust imitation learning (IL) framework that improves the robustness of IL when environment dynamics are perturbed. |
Jongseong Chae; Seungyul Han; Whiyoung Jung; Myungsik Cho; Sungho Choi; Youngchul Sung; |

125 | Fairness with Adaptive WeightsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose a novel adaptive reweighing method to address representation bias. |
Junyi Chai; Xiaoqian Wang; |

126 | UNIREX: A Unified Learning Framework for Language Model Rationale ExtractionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Although attribution algorithms and select-predict pipelines are commonly used in rationale extraction, they both rely on certain heuristics that hinder them from satisfying all three desiderata. In light of this, we propose UNIREX, a flexible learning framework which generalizes rationale extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (2) select explainability objectives (\ie faithfulness and plausibility criteria); and (3) jointly train the task model and rationale extractor on the task using selected objectives. |
Aaron Chan; Maziar Sanjabi; Lambert Mathias; Liang Tan; Shaoliang Nie; Xiaochang Peng; Xiang Ren; Hamed Firooz; |

127 | Revisiting Label Smoothing and Knowledge Distillation Compatibility: What Was Missing?Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: The main contributions of our work are the discovery, analysis and validation of systematic diffusion as the missing concept which is instrumental in understanding and resolving these contradictory findings. |
Keshigeyan Chandrasegaran; Ngoc-Trung Tran; Yunqing Zhao; Ngai-Man Cheung; |

128 | Style Equalization: Unsupervised Learning of Controllable Generative Sequence ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we tackle the training-inference mismatch encountered during unsupervised learning of controllable generative sequence models. |
Jen-Hao Rick Chang; Ashish Shrivastava; Hema Koppula; Xiaoshuai Zhang; Oncel Tuzel; |

129 | Learning Bellman Complete Representations for Offline Policy EvaluationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose BCRL, which directly learns from data an approximately linear Bellman complete representation with good coverage. |
Jonathan Chang; Kaiwen Wang; Nathan Kallus; Wen Sun; |

130 | Sample Efficient Learning of Predictors That Complement HumansRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we provide the first theoretical analysis of the benefit of learning complementary predictors in expert deferral. |
Mohammad-Amin Charusaie; Hussein Mozannar; David Sontag; Samira Samadi; |

131 | Nystrom Kernel Mean EmbeddingsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose an efficient approximation procedure based on the Nystr{ö}m method, which exploits a small random subset of the dataset. |
Antoine Chatalic; Nicolas Schreuder; Lorenzo Rosasco; Alessandro Rudi; |

132 | Coarsening The Granularity: Towards Structurally Sparse Lottery TicketsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general. |
Tianlong Chen; Xuxi Chen; Xiaolong Ma; Yanzhi Wang; Zhangyang Wang; |

133 | Learning Domain Adaptive Object Detection with Probabilistic TeacherRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture the uncertainty of unlabeled target data from a gradually evolving teacher and guides the learning of a student in a mutually beneficial manner. |
Meilin Chen; Weijie Chen; Shicai Yang; Jie Song; Xinchao Wang; Lei Zhang; Yunfeng Yan; Donglian Qi; Yueting Zhuang; Di Xie; Shiliang Pu; |

134 | The Fundamental Price of Secure Aggregation in Differentially Private Federated LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round. |
Wei-Ning Chen; Christopher A Choquette Choo; Peter Kairouz; Ananda Theertha Suresh; |

135 | Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. |
Mayee Chen; Daniel Y Fu; Avanika Narayan; Michael Zhang; Zhao Song; Kayvon Fatahalian; Christopher Re; |

136 | Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and RiskRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints. |
Tianrui Chen; Aditya Gangrade; Venkatesh Saligrama; |

137 | On The Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we extend the uniform-PAC sample complexity from episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an $\Tilde{O}\big(d^2/((1-\gamma)^4\epsilon^2)+1/((1-\gamma)^6\epsilon^2)\big)$ uniform-PAC sample complexity, where $d$ is the dimension of the feature mapping, $\gamma \in(0,1)$ is the discount factor of the MDP and $\epsilon$ is the accuracy parameter. |
Yuanzhou Chen; Jiafan He; Quanquan Gu; |

138 | Streaming Algorithms for Support-Aware HistogramsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: As a result, even relatively simple distributions cannot be approximated by succinct histograms without incurring large error. In this paper, we address this issue by adapting the definition of approximation so that only the errors of the items that belong to the support of the distribution are considered. |
Justin Chen; Piotr Indyk; Tal Wagner; |

139 | Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDPRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). |
Liyu Chen; Rahul Jain; Haipeng Luo; |

140 | Learning Infinite-horizon Average-reward Markov Decision Process with ConstraintsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints. |
Liyu Chen; Rahul Jain; Haipeng Luo; |

141 | Active Multi-Task Representation LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance. |
Yifang Chen; Kevin Jamieson; Simon Du; |

142 | On Collective Robustness of Bagging Against Data PoisoningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Based on this analysis, we propose hash bagging to improve the robustness of vanilla bagging almost for free. |
Ruoxin Chen; Zenan Li; Jie Li; Junchi Yan; Chentao Wu; |

143 | Online Active RegressionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. |
Cheng Chen; Yi Li; Yiming Sun; |

144 | Selling Data To A Machine Learner: Pricing Via Costly SignalingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider a new problem of selling data to a machine learner who looks to purchase data to train his machine learning model. |
Junjie Chen; Minming Li; Haifeng Xu; |

145 | ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart DiseasesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose a novel disease-aware generative adversarial network for multi-view ECG synthesis called ME-GAN, which attains panoptic electrocardio representations conditioned on heart diseases and projects the representations onto multiple standard views to yield ECG signals. |
Jintai Chen; Kuanlun Liao; Kun Wei; Haochao Ying; Danny Z Chen; Jian Wu; |

146 | Weisfeiler-Lehman Meets Gromov-WassersteinRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose the Weisfeiler-Lehman (WL) distance, a notion of distance between labeled measure Markov chains (LMMCs), of which labeled graphs are special cases. |
Samantha Chen; Sunhyuk Lim; Facundo Memoli; Zhengchao Wan; Yusu Wang; |

147 | On Non-local Convergence Analysis of Deep Linear NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we study the non-local convergence properties of deep linear networks. |
Kun Chen; Dachao Lin; Zhihua Zhang; |

148 | Flow-based Recurrent Belief State Learning for POMDPsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we introduce the \textbf{F}l\textbf{O}w-based \textbf{R}ecurrent \textbf{BE}lief \textbf{S}tate model (FORBES), which incorporates normalizing flows into the variational inference to learn general continuous belief states for POMDPs. |
Xiaoyu Chen; Yao Mark Mu; Ping Luo; Shengbo Li; Jianyu Chen; |

149 | Structure-Aware Transformer for Graph Representation LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose several methods for automatically generating the subgraph representation and show theoretically that the resulting representations are at least as expressive as the subgraph representations. |
Dexiong Chen; Leslie O?Bray; Karsten Borgwardt; |

150 | The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure AggregationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce the Poisson Binomial mechanism (PBM), a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics. |
Wei-Ning Chen; Ayfer Ozgur; Peter Kairouz; |

151 | Learning Mixtures of Linear Dynamical SystemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the problem of learning a mixture of multiple linear dynamical systems (LDSs) from unlabeled short sample trajectories, each generated by one of the LDS models. |
Yanxi Chen; H. Vincent Poor; |

152 | On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy EvaluationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. |
Xiaohong Chen; Zhengling Qi; |

153 | Faster Fundamental Graph Algorithms Via Learned PredictionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider the question of speeding up classic graph algorithms with machine-learned predictions. |
Justin Chen; Sandeep Silwal; Ali Vakilian; Fred Zhang; |

154 | Improve Single-Point Zeroth-Order Optimization Using High-Pass and Low-Pass FiltersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we borrow the idea of high-pass and low-pass filters from extremum seeking control (continuous-time version of SZO) and develop a novel SZO method called HLF-SZO by integrating these filters. |
Xin Chen; Yujie Tang; Na Li; |

155 | Deep Variational Graph Convolutional Recurrent Network for Multivariate Time Series Anomaly DetectionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we model sensor dependency and stochasticity within MTS by developing an embedding-guided probabilistic generative network. |
Wenchao Chen; Long Tian; Bo Chen; Liang Dai; Zhibin Duan; Mingyuan Zhou; |

156 | Auxiliary Learning with Joint Task and Data SchedulingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose to learn a joint task and data schedule for auxiliary learning, which captures the importance of different data samples in each auxiliary task to the target task. |
Hong Chen; Xin Wang; Chaoyu Guan; Yue Liu; Wenwu Zhu; |

157 | Optimization-Induced Graph Implicit Nonlinear DiffusionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Due to the over-smoothing issue, most existing graph neural networks can only capture limited dependencies with their inherently finite aggregation layers. To overcome this limitation, we propose a new kind of graph convolution, called Graph Implicit Nonlinear Diffusion (GIND), which implicitly has access to infinite hops of neighbors while adaptively aggregating features with nonlinear diffusion to prevent over-smoothing. |
Qi Chen; Yifei Wang; Yisen Wang; Jiansheng Yang; Zhouchen Lin; |

158 | Robust Meta-learning with Sampling Noise and Label Noise Via Eigen-ReptileRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Besides, when handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise on a corrupted dataset. To address these two challenges, we present Eigen-Reptile (ER) that updates the meta-parameters with the main direction of historical task-specific parameters. |
Dong Chen; Lingfei Wu; Siliang Tang; Xiao Yun; Bo Long; Yueting Zhuang; |

159 | Adaptive Model Design for Markov Decision ProcessRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Hence, appropriate regulations are often required, if we hope to take the external costs/benefits of its actions into consideration. In this paper, we study how to regulate such an agent by redesigning model parameters that can affect the rewards and/or the transition kernels. |
Siyu Chen; Donglin Yang; Jiayang Li; Senmiao Wang; Zhuoran Yang; Zhaoran Wang; |

160 | State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Inspired by the state transition of dendritic spines in the filopodial model of spinogenesis, we model different states of SNN weights, facilitating weight optimization for pruning. |
Yanqi Chen; Zhaofei Yu; Wei Fang; Zhengyu Ma; Tiejun Huang; Yonghong Tian; |

161 | Efficient Online ML API Selection for Multi-Label Classification TasksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting the user’s budget. |
Lingjiao Chen; Matei Zaharia; James Zou; |

162 | Data-Efficient Double-Win Lottery Tickets from Robust Pre-trainingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we formulate a more rigorous concept, Double-Win Lottery Tickets, in which a located subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks, to reach BOTH the same standard and robust generalization, under BOTH standard and adversarial training regimes, as the full pre-trained model can do. |
Tianlong Chen; Zhenyu Zhang; Sijia Liu; Yang Zhang; Shiyu Chang; Zhangyang Wang; |

163 | Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable RobustnessRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To trade off the DNN expressiveness (which calls for more non-linearity) and robustness certification scalability (which prefers more linearity), we propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity. |
Tianlong Chen; Huan Zhang; Zhenyu Zhang; Shiyu Chang; Sijia Liu; Pin-Yu Chen; Zhangyang Wang; |

164 | Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function ApproximationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose the first optimistic model-based algorithm for PbRL with general function approximation, which estimates the model using value-targeted regression and calculates the exploratory policies by solving an optimistic planning problem. |
Xiaoyu Chen; Han Zhong; Zhuoran Yang; Zhaoran Wang; Liwei Wang; |

165 | Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time AnalysisRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. |
Ziyi Chen; Yi Zhou; Rong-Rong Chen; Shaofeng Zou; |

166 | Task-aware Privacy Preservation for Multi-dimensional DataRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we address how to significantly improve the ultimate task performance with multi-dimensional user data by considering a task-aware privacy preservation problem. |
Jiangnan Cheng; Ao Tang; Sandeep Chinchali; |

167 | Adversarially Trained Actor Critic for Offline Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. |
Ching-An Cheng; Tengyang Xie; Nan Jiang; Alekh Agarwal; |

168 | Quantum-Inspired Algorithms from Randomized Numerical Linear AlgebraRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. |
Nadiia Chepurko; Kenneth Clarkson; Lior Horesh; Honghao Lin; David Woodruff; |

169 | RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random ForestsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. |
Victor Chernozhukov; Whitney Newey; Vi?ctor M Quintas-Marti?nez; Vasilis Syrgkanis; |

170 | Self-supervised Learning with Random-projection Quantizer for Speech RecognitionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We present a simple and effective self-supervised learning approach for speech recognition. |
Chung-Cheng Chiu; James Qin; Yu Zhang; Jiahui Yu; Yonghui Wu; |

171 | Discrete Probabilistic Inverse Optimal TransportRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We formalize and systematically analyze the properties of IOT using tools from the study of entropy-regularized OT. |
Wei-Ting Chiu; Pei Wang; Patrick Shafto; |

172 | Selective Network Linearization for Efficient Private InferenceRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. |
Minsu Cho; Ameya Joshi; Brandon Reagen; Siddharth Garg; Chinmay Hegde; |

173 | From Block-Toeplitz Matrices to Differential Equations on Graphs: Towards A General Theory for Scalable Masked TransformersRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. |
Krzysztof Choromanski; Han Lin; Haoxian Chen; Tianyi Zhang; Arijit Sehanobish; Valerii Likhosherstov; Jack Parker-Holder; Tamas Sarlos; Adrian Weller; Thomas Weingarten; |

174 | Shuffle Private Linear Contextual BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a general algorithmic framework for linear contextual bandits under the shuffle trust model, where there exists a trusted shuffler – in between users and the central server– that randomly permutes a batch of users data before sending those to the server. |
Sayak Ray Chowdhury; Xingyu Zhou; |

175 | DNA: Domain Generalization with Diversified Neural AveragingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Methodologically, we propose a diversified neural averaging (DNA) method for DG, which optimizes the proposed PAC-Bayes bound approximately. |
Xu Chu; Yujie Jin; Wenwu Zhu; Yasha Wang; Xin Wang; Shanghang Zhang; Hong Mei; |

176 | TPC: Transformation-Specific Smoothing for Point Cloud ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a transformation-specific smoothing framework TPC, which provides tight and scalable robustness guarantees for point cloud models against semantic transformation attacks. |
Wenda Chu; Linyi Li; Bo Li; |

177 | Unified Scaling Laws for Routed Language ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. |
Aidan Clark; Diego De Las Casas; Aurelia Guy; Arthur Mensch; Michela Paganini; Jordan Hoffmann; Bogdan Damoc; Blake Hechtman; Trevor Cai; Sebastian Borgeaud; George Bm Van Den Driessche; Eliza Rutherford; Tom Hennigan; Matthew J Johnson; Albin Cassirer; Chris Jones; Elena Buchatskaya; David Budden; Laurent Sifre; Simon Osindero; Oriol Vinyals; Marc?Aurelio Ranzato; Jack Rae; Erich Elsen; Koray Kavukcuoglu; Karen Simonyan; |

178 | Context-Aware Drift DetectionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Instead we may wish to test for differences in the distributions conditional on context that is permitted to change. To facilitate this we borrow machinery from the causal inference domain to develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects. |
Oliver Cobb; Arnaud Van Looveren; |

179 | On The Robustness of CountSketch to Adaptive InputsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior "blackbox" approaches. |
Edith Cohen; Xin Lyu; Jelani Nelson; Tamas Sarlos; Moshe Shechner; Uri Stemmer; |

180 | Diffusion Bridges Vector Quantized Variational AutoencodersRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a new model to train the prior and the encoder/decoder networks simultaneously. |
Max Cohen; Guillaume Quispe; Sylvain Le Corff; Charles Ollion; Eric Moulines; |

181 | Online and Consistent Correlation ClusteringRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we study the problem in the classic online setting with recourse; The vertices of the graphs arrive in an online manner and the goal is to maintain an approximate clustering while minimizing the number of times each vertex changes cluster. |
Vincent Cohen-Addad; Silvio Lattanzi; Andreas Maggiori; Nikos Parotsidis; |

182 | Massively Parallel $k$-Means Clustering for Perturbation Resilient InstancesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider $k$-means clustering of $n$ data points in Euclidean space in the Massively Parallel Computation (MPC) model, a computational model which is an abstraction of modern massively parallel computing system such as MapReduce. |
Vincent Cohen-Addad; Vahab Mirrokni; Peilin Zhong; |

183 | One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence StreamsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose an efficient sampling routine that uses an online representation of the data distribution as a prefilter to retain elements from rare groups. |
Benjamin Coleman; Benito Geordie; Li Chou; R. A. Leo Elworth; Todd Treangen; Anshumali Shrivastava; |

184 | Transfer and Marginalize: Explaining Away Label Noise with Privileged InformationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We develop a simple and efficient method for supervised learning with neural networks: it transfers via weight sharing the knowledge learned with privileged information and approximately marginalizes over privileged information at test time. |
Mark Collier; Rodolphe Jenatton; Effrosyni Kokiopoulou; Jesse Berent; |

185 | MAML and ANIL Provably Learn RepresentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we prove that two well-known GBML methods, MAML and ANIL, as well as their first-order approximations, are capable of learning common representation among a set of given tasks. |
Liam Collins; Aryan Mokhtari; Sewoong Oh; Sanjay Shakkottai; |

186 | Entropic Causal Inference: Graph IdentifiabilityRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In our work, we first extend the causal graph identifiability result in the two-variable setting under relaxed assumptions. We then show the first identifiability result using the entropic approach for learning causal graphs with more than two nodes. |
Spencer Compton; Kristjan Greenewald; Dmitriy A Katz; Murat Kocaoglu; |

187 | Mitigating Gender Bias in Face Recognition Using The Von Mises-Fisher Mixture ModelRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we investigate the gender bias of deep Face Recognition networks. |
Jean-R?my Conti; Nathan Noiry; Stephan Clemencon; Vincent Despiegel; St?phane Gentric; |

188 | Counterfactual Transportability: A Formal ApproachRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we investigate the transportability of counterfactuals from an arbitrary combination of observational and experimental distributions coming from disparate domains. |
Juan D Correa; Sanghack Lee; Elias Bareinboim; |

189 | Label-Free Explainability for Unsupervised ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem. To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time. |
Jonathan Crabb?; Mihaela van der Schaar; |

190 | Evaluating The Adversarial Robustness of Adaptive Test-time DefensesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: While these results are disappointing, we still believe that adaptive test-time defenses are a promising avenue of research and, as such, we provide recommendations for their thorough evaluation. |
Francesco Croce; Sven Gowal; Thomas Brunner; Evan Shelhamer; Matthias Hein; Taylan Cemgil; |

191 | Adversarial Robustness Against Multiple and Single $l_p$-Threat Models Via Quick Fine-Tuning of Robust ClassifiersRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper we propose Extreme norm Adversarial Training (E-AT) for multiple-norm robustness which is based on geometric properties of $l_p$-balls. |
Francesco Croce; Matthias Hein; |

192 | Self-conditioning Pre-Trained Language ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). |
Xavier Suau Cuadros; Luca Zappella; Nicholas Apostoloff; |

193 | Only Tails Matter: Average-Case Universality and Robustness in The Convex RegimeRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This work shows that the concentration of eigenvalues near the edges of the ESD determines a problem’s asymptotic average complexity. |
Leonardo Cunha; Gauthier Gidel; Fabian Pedregosa; Damien Scieur; Courtney Paquette; |

194 | Principal Component FlowsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we characterize the geometric structure of flows using principal manifolds and understand the relationship between latent variables and samples using contours. |
Edmond Cunningham; Adam D Cobb; Susmit Jha; |

195 | Deep Symbolic Regression for Recurrence PredictionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. |
St?phane D?Ascoli; Pierre-Alexandre Kamienny; Guillaume Lample; Francois Charton; |

196 | Continuous Control with Action Quantization from DemonstrationsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). |
Robert Dadashi; L?onard Hussenot; Damien Vincent; Sertan Girgin; Anton Raichuk; Matthieu Geist; Olivier Pietquin; |

197 | Dialog Inpainting: Turning Documents Into DialogsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. |
Zhuyun Dai; Arun Tejasvi Chaganty; Vincent Y Zhao; Aida Amini; Qazi Mamunur Rashid; Mike Green; Kelvin Guu; |

198 | DisPFL: Towards Communication-Efficient Personalized Federated Learning Via Decentralized Sparse TrainingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named DisPFL, which employs personalized sparse masks to customize sparse local models on the edge. |
Rong Dai; Li Shen; Fengxiang He; Xinmei Tian; Dacheng Tao; |

199 | Marginal Distribution Adaptation for Discrete Sets Via Module-Oriented Divergence MinimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We develop a general framework to adapt a generative model subject to a (possibly counterfactual) target data distribution with both sampling and computation efficiency. |
Hanjun Dai; Mengjiao Yang; Yuan Xue; Dale Schuurmans; Bo Dai; |

200 | Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a novel formulation for the Inverse Reinforcement Learning (IRL) problem, which jointly accounts for the compatibility with the expert behavior of the identified reward and its effectiveness for the subsequent forward learning phase. |
Angelo Damiani; Giorgio Manganini; Alberto Maria Metelli; Marcello Restelli; |

201 | Understanding Robust Generalization in Learning Regular LanguagesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We hypothesize that standard end-to-end modeling strategies cannot generalize well to systematic distribution shifts and propose a compositional strategy to address this. |
Soham Dan; Osbert Bastani; Dan Roth; |

202 | Unsupervised Image Representation Learning with Deep Latent ParticlesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a new representation of visual data that disentangles object position from appearance. |
Tal Daniel; Aviv Tamar; |

203 | Guarantees for Epsilon-Greedy Reinforcement Learning with Function ApproximationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. |
Chris Dann; Yishay Mansour; Mehryar Mohri; Ayush Sekhari; Karthik Sridharan; |

204 | Monarch: Expressive Structured Matrices for Efficient and Accurate TrainingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: These methods have not seen widespread adoption (1) in end-to-end training due to unfavorable efficiency–quality tradeoffs, and (2) in dense-to-sparse fine-tuning due to lack of tractable algorithms to approximate a given dense weight matrix. To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms). |
Tri Dao; Beidi Chen; Nimit S Sohoni; Arjun Desai; Michael Poli; Jessica Grogan; Alexander Liu; Aniruddh Rao; Atri Rudra; Christopher Re; |

205 | Score-Guided Intermediate Level Optimization: Fast Langevin Mixing for Inverse ProblemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In practice, to allow for increased expressivity, we propose to do posterior sampling in the latent space of a pre-trained generative model. |
Giannis Daras; Yuval Dagan; Alex Dimakis; Constantinos Daskalakis; |

206 | Test-Time Training Can Close The Natural Distribution Shift Performance Gap in Deep Learning Based Compressed SensingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a domain adaptation method for deep learning based compressive sensing that relies on self-supervision during training paired with test-time training at inference. |
Mohammad Zalbagi Darestani; Jiayu Liu; Reinhard Heckel; |

207 | Knowledge Base Question Answering By Case-based Reasoning Over SubgraphsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Leveraging this structural similarity between local neighborhoods of different subgraphs, we introduce a semiparametric model (CBR-SUBG) with (i) a nonparametric component that for each query, dynamically retrieves other similar $k$-nearest neighbor (KNN) training queries along with query-specific subgraphs and (ii) a parametric component that is trained to identify the (latent) reasoning patterns from the subgraphs of KNN queries and then apply them to the subgraph of the target query. |
Rajarshi Das; Ameya Godbole; Ankita Naik; Elliot Tower; Manzil Zaheer; Hannaneh Hajishirzi; Robin Jia; Andrew Mccallum; |

208 | Framework for Evaluating Faithfulness of Local ExplanationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the faithfulness of an explanation system to the underlying prediction model. |
Sanjoy Dasgupta; Nave Frost; Michal Moshkovitz; |

209 | Distinguishing Rule and Exemplar-based Generalization in Learning SystemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: The trade-off between exemplar- and rule-based generalization has been studied extensively in cognitive psychology; in this work, we present a protocol inspired by these experimental approaches to probe the inductive biases that control this trade-off in category-learning systems such as artificial neural networks. |
Ishita Dasgupta; Erin Grant; Tom Griffiths; |

210 | Robust Multi-Objective Bayesian Optimization Under Input NoiseRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Since directly optimizing MVaR is computationally infeasible in many settings, we propose a scalable, theoretically-grounded approach for optimizing MVaR using random scalarizations. |
Samuel Daulton; Sait Cakmak; Maximilian Balandat; Michael A. Osborne; Enlu Zhou; Eytan Bakshy; |

211 | Attentional Meta-learners for Few-shot Polythetic ClassificationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, we find that in the presence of task-irrelevant features, inherent to meta-learning problems, attentional models are susceptible to misclassification. To address this challenge, we propose a self-attention feature-selection mechanism that adaptively dilutes non-discriminative features. |
Ben J Day; Ramon Vi?as Torn?; Nikola Simidjievski; Pietro Li?; |

212 | Adversarial Vulnerability of Randomized EnsemblesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, this impressive performance raises the question: Are these robustness gains provided by randomized ensembles real? In this work we address this question both theoretically and empirically. |
Hassan Dbouk; Naresh Shanbhag; |

213 | Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We introduce a novel framework for optimization based on energy-conserving Hamiltonian dynamics in a strongly mixing (chaotic) regime and establish its key properties analytically and numerically. |
Giuseppe Bruno De Luca; Eva Silverstein; |

214 | Error-driven Input Modulation: Solving The Credit Assignment Problem Without A Backward PassRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Here, we propose to replace the backward pass with a second forward pass in which the input signal is modulated based on the error of the network. |
Giorgia Dellaferrera; Gabriel Kreiman; |

215 | DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical RepresentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a reconstruction-free MBRL agent, called DreamerPro, that can enhance robustness to distractions. |
Fei Deng; Ingook Jang; Sungjin Ahn; |

216 | NeuralEF: Deconstructing Kernels By Deep Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, the existing method relies on an expensive orthogonalization step and is difficult to implement. We show that these problems can be fixed by using a new series of objective functions that generalizes the EigenGame to function space. |
Zhijie Deng; Jiaxin Shi; Jun Zhu; |

217 | Deep Causal Metric LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, this can lead the model to recklessly learn all the correlated distances found in training data including the spurious distance (e.g., background differences) that is not the distance of interest and can harm the generalization of the learned metric. To address this issue, we study metric learning from a causality perspective and accordingly propose deep causal metric learning (DCML) that pursues the true causality of the distance between samples. |
Xiang Deng; Zhongfei Zhang; |

218 | On The Convergence of Inexact Predictor-Corrector Methods for Linear ProgrammingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To remedy this, we theoretically and empirically analyze (slightly modified) predictor-corrector IPMs when using approximate linear solvers: our approach guarantees that, when certain conditions are satisfied, the number of IPM iterations does not increase and that the final solution remains feasible. |
Gregory Dexter; Agniva Chowdhury; Haim Avron; Petros Drineas; |

219 | Analysis of Stochastic Processes Through Replay BuffersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we analyze a system where a stochastic process X is pushed into a replay buffer and then randomly sampled to generate a stochastic process Y from the replay buffer. |
Shirli Di-Castro; Shie Mannor; Dotan Di Castro; |

220 | Streaming Algorithms for High-Dimensional Robust StatisticsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). |
Ilias Diakonikolas; Daniel M. Kane; Ankit Pensia; Thanasis Pittas; |

221 | Learning General Halfspaces with Adversarial Label Noise Via Online Gradient DescentRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we show that the problem can be solved directly via online gradient descent applied to a sequence of natural non-convex surrogates. |
Ilias Diakonikolas; Vasilis Kontonis; Christos Tzamos; Nikos Zarifis; |

222 | Variational Feature Pyramid NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we opt to learn a dataset-specific architecture for Feature Pyramid Networks. |
Panagiotis Dimitrakopoulos; Giorgos Sfikas; Christophoros Nikou; |

223 | Understanding Doubly Stochastic ClusteringRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, the analysis of why this projection improves clustering has been limited. In this paper we present theoretical conditions on the given affinity matrix under which its doubly stochastic projection is an ideal affinity matrix (i.e., it has no false connections between clusters, and is well-connected within each cluster). |
Tianjiao Ding; Derek Lim; Rene Vidal; Benjamin D Haeffele; |

224 | Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic ConvergenceRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. |
Dongsheng Ding; Chen-Yu Wei; Kaiqing Zhang; Mihailo Jovanovic; |

225 | Generalization and Robustness Implications in Object-Centric LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation metrics and downstream object property prediction. |
Andrea Dittadi; Samuele S Papa; Michele De Vita; Bernhard Sch?lkopf; Ole Winther; Francesco Locatello; |

226 | Fair Generalized Linear Models with A Convex PenaltyRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. |
Hyungrok Do; Preston Putzel; Axel S Martin; Padhraic Smyth; Judy Zhong; |

227 | Bayesian Learning with Information Gain Provably Bounds Risk for A Robust Adversarial DefenseRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a new algorithm to learn a deep neural network model robust against adversarial attacks. |
Bao Gia Doan; Ehsan M Abbasnejad; Javen Qinfeng Shi; Damith Ranashinghe; |

228 | On The Adversarial Robustness of Causal Algorithmic RecourseRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we formulate the adversarially robust recourse problem and show that recourse methods that offer minimally costly recourse fail to be robust. |
Ricardo Dominguez-Olmedo; Amir H Karimi; Bernhard Sch?lkopf; |

229 | Finding The Task-Optimal Low-Bit Sub-Distribution in Deep Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models and smoothly approximated with a concrete Gaussian Mixture (GM). |
Runpei Dong; Zhanhong Tan; Mengdi Wu; Linfeng Zhang; Kaisheng Ma; |

230 | PACE: A Parallelizable Computation Encoder for Directed Acyclic GraphsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a Parallelizable Attention-based Computation structure Encoder (PACE) that processes nodes simultaneously and encodes DAGs in parallel. |
Zehao Dong; Muhan Zhang; Fuhai Li; Yixin Chen; |

231 | Privacy for Free: How Does Dataset Condensation Help Privacy?Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency is also a better solution to replace the traditional data generators for private data generation, thus providing privacy for free. |
Tian Dong; Bo Zhao; Lingjuan Lyu; |

232 | Fast Rates for Noisy Interpolation Require Rethinking The Effect of Inductive BiasRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Even though this intuition is valid for regularized models, in this paper we caution against a strong inductive bias for interpolation in the presence of noise: While a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise. |
Konstantin Donhauser; Nicol? Ruggeri; Stefan Stojanovic; Fanny Yang; |

233 | Adapting to Mixing Time in Stochastic Optimization with Markovian DataRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. |
Ron Dorfman; Kfir Yehuda Levy; |

234 | TACTiS: Transformer-Attentional Copulas for Time SeriesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we address the problem of estimating the joint predictive distribution of high-dimensional multivariate time series. |
Alexandre Drouin; ?tienne Marcotte; Nicolas Chapados; |

235 | Branching Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model. |
Yihan Du; Wei Chen; |

236 | Bayesian Imitation Learning for End-to-End Mobile ManipulationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work we investigate and demonstrate benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator. |
Yuqing Du; Daniel Ho; Alex Alemi; Eric Jang; Mohi Khansari; |

237 | GLaM: Efficient Scaling of Language Models with Mixture-of-ExpertsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose and develop a family of language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. |
Nan Du; Yanping Huang; Andrew M Dai; Simon Tong; Dmitry Lepikhin; Yuanzhong Xu; Maxim Krikun; Yanqi Zhou; Adams Wei Yu; Orhan Firat; Barret Zoph; Liam Fedus; Maarten P Bosma; Zongwei Zhou; Tao Wang; Emma Wang; Kellie Webster; Marie Pellat; Kevin Robinson; Kathleen Meier-Hellstern; Toju Duke; Lucas Dixon; Kun Zhang; Quoc Le; Yonghui Wu; Zhifeng Chen; Claire Cui; |

238 | Learning Iterative Reasoning Through Energy MinimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we present a new framework for iterative reasoning with neural networks. |
Yilun Du; Shuang Li; Joshua Tenenbaum; Igor Mordatch; |

239 | SE(3) Equivariant Graph Neural Networks with Complete Local FramesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently. |
Weitao Du; He Zhang; Yuanqi Du; Qi Meng; Wei Chen; Nanning Zheng; Bin Shao; Tie-Yan Liu; |

240 | A Context-Integrated Transformer-Based Neural Network for Auction DesignRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, these works either focus on a fixed set of bidders and items, or restrict the auction to be symmetric. In this work, we overcome such limitations by factoring public contextual information of bidders and items into the auction learning framework. |
Zhijian Duan; Jingwu Tang; Yutong Yin; Zhe Feng; Xiang Yan; Manzil Zaheer; Xiaotie Deng; |

241 | Augment with Care: Contrastive Learning for Combinatorial ProblemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We find that label-preserving augmentations are critical for the success of contrastive pre-training. |
Haonan Duan; Pashootan Vaezipoor; Max B Paulus; Yangjun Ruan; Chris Maddison; |

242 | Parametric Visual Program Induction with Function ModularizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose the concept of parametric visual program induction. |
Xuguang Duan; Xin Wang; Ziwei Zhang; Wenwu Zhu; |

243 | Bayesian Deep Embedding Topic Meta-LearnerRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose a novel framework that efficiently solves the problem of topic modeling under the small data regime. |
Zhibin Duan; Yishi Xu; Jianqiao Sun; Bo Chen; Wenchao Chen; Chaojie Wang; Mingyuan Zhou; |

244 | Deletion Robust Submodular Maximization Over MatroidsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we study the deletion robust version of the problem under the classic matroids constraint. |
Paul Duetting; Federico Fusco; Silvio Lattanzi; Ashkan Norouzi-Fard; Morteza Zadimoghaddam; |

245 | From Data to Functa: Your Data Point Is A Function and You Can Treat It Like OneRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? |
Emilien Dupont; Hyunjik Kim; S. M. Ali Eslami; Danilo Jimenez Rezende; Dan Rosenbaum; |

246 | Efficient Low Rank Convex Bounds for Pairwise Discrete Graphical ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we extend a Burer-Monteiro style method to compute low rank Semi-Definite Programming (SDP) bounds for the MAP problem on discrete graphical models with an arbitrary number of states and arbitrary pairwise potentials. |
Valentin Durante; George Katsirelos; Thomas Schiex; |

247 | Robust Counterfactual Explanations for Tree-Based EnsemblesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose a novel strategy – that we call RobX – to generate robust counterfactuals for tree-based ensembles, e.g., XGBoost. |
Sanghamitra Dutta; Jason Long; Saumitra Mishra; Cecilia Tilli; Daniele Magazzeni; |

248 | On The Difficulty of Defending Self-Supervised Learning Against Model ExtractionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We thus explore model stealing attacks against SSL. |
Adam Dziedzic; Nikita Dhawan; Muhammad Ahmad Kaleem; Jonas Guan; Nicolas Papernot; |

249 | LIMO: Latent Inceptionism for Targeted Molecule GenerationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We present Latent Inceptionism on Molecules (LIMO), which significantly accelerates molecule generation with an inceptionism-like technique. |
Peter Eckmann; Kunyang Sun; Bo Zhao; Mudong Feng; Michael Gilson; Rose Yu; |

250 | Inductive Biases and Variable Creation in Self-Attention MechanismsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To support our analysis, we present synthetic experiments to probe the sample complexity of learning sparse Boolean functions with Transformers. |
Benjamin L Edelman; Surbhi Goel; Sham Kakade; Cyril Zhang; |

251 | Provable Reinforcement Learning with A Short-Term MemoryRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. |
Yonathan Efroni; Chi Jin; Akshay Krishnamurthy; Sobhan Miryoosefi; |

252 | Sparsity in Partially Controllable Linear SystemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In particular, our structural results characterize those state variables which are irrelevant for optimal control, an analysis which departs from classical control techniques. |
Yonathan Efroni; Sham Kakade; Akshay Krishnamurthy; Cyril Zhang; |

253 | FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we introduced a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS, hence resolving the bottleneck to improve communication efficiency. |
Anis Elgabli; Chaouki Ben Issaid; Amrit Singh Bedi; Ketan Rajawat; Mehdi Bennis; Vaneet Aggarwal; |

254 | PathGCN: Learning General Graph Spatial Operators from PathsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we propose pathGCN, a novel approach to learn the spatial operator from random paths on the graph. |
Moshe Eliasof; Eldad Haber; Eran Treister; |

255 | Discrete Tree Flows Via Tree-Structured PermutationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Our approach seeks to reduce computational burden and remove the need for pseudo-gradients by developing a discrete flow based on decision trees—building upon the success of efficient tree-based methods for classification and regression for discrete data. |
Mai Elkady; Hyung Zin Lim; David I Inouye; |

256 | For Learning in Symmetric Teams, Local Optima Are Global Nash EquilibriaRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. |
Scott Emmons; Caspar Oesterheld; Andrew Critch; Vincent Conitzer; Stuart Russell; |

257 | Streaming Algorithm for Monotone K-Submodular Maximization with Cardinality ConstraintsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we develop a new streaming algorithm for maximizing a monotone k-submodular function subject to a per-coordinate cardinality constraint attaining an approximation guarantee close to the state of the art guarantee in the offline setting. |
Alina Ene; Huy Nguyen; |

258 | Towards Scaling Difference Target Propagation By Learning Backprop TargetsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. |
Maxence M Ernoult; Fabrice Normandin; Abhinav Moudgil; Sean Spinney; Eugene Belilovsky; Irina Rish; Blake Richards; Yoshua Bengio; |

259 | Understanding Dataset Difficulty with $\mathcalV$-Usable InformationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To address these questions, we frame dataset difficulty-w.r.t. a model V-as the lack of V-usable information (Xu et al., 2019), where a lower value indicates a more difficult dataset for V. |
Kawin Ethayarajh; Yejin Choi; Swabha Swayamdipta; |

260 | Head2Toe: Utilizing Intermediate Representations for Better Transfer LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain. |
Utku Evci; Vincent Dumoulin; Hugo Larochelle; Michael C Mozer; |

261 | Variational Sparse Coding with Learned ThresholdingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples, avoiding the use of problematic relaxations. |
Kion Fallah; Christopher J Rozell; |

262 | Training Discrete Deep Generative Models Via Gapped Straight-Through EstimatorRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a Gapped Straight-Through (GST) estimator to reduce the variance without incurring resampling overhead. |
Ting-Han Fan; Ta-Chung Chi; Alexander I. Rudnicky; Peter J Ramadge; |

263 | DRIBO: Robust Deep Reinforcement Learning Via Multi-View Information BottleneckRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Specif- ically, we introduce a novel contrastive version of the Multi-View Information Bottleneck (MIB) objective for temporal data. |
Jiameng Fan; Wenchao Li; |

264 | Generalized Data Distribution IterationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To obtain higher sample efficiency and superior final performance simultaneously has been one of the major challenges for deep reinforcement learning (DRL). Previous work could handle one of these challenges but typically failed to address them concurrently. In this paper, we try to tackle these two challenges simultaneously. |
Jiajun Fan; Changnan Xiao; |

265 | Variational Wasserstein Gradient FlowRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This paper builds on the recent works with a slight but crucial difference: we propose to utilize a variational formulation of the objective function formulated as maximization over a parametric class of functions. |
Jiaojiao Fan; Qinsheng Zhang; Amirhossein Taghvaei; Yongxin Chen; |

266 | Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. |
Alex Fang; Gabriel Ilharco; Mitchell Wortsman; Yuhao Wan; Vaishaal Shankar; Achal Dave; Ludwig Schmidt; |

267 | Bayesian Continuous-Time Tucker DecompositionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition. |
Shikai Fang; Akil Narayan; Robert Kirby; Shandian Zhe; |

268 | Byzantine Machine Learning Made Easy By Resilient Averaging of MomentumsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present RESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. |
Sadegh Farhadkhani; Rachid Guerraoui; Nirupam Gupta; Rafael Pinot; John Stephan; |

269 | An Equivalence Between Data Poisoning and Byzantine Gradient AttacksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. |
Sadegh Farhadkhani; Rachid Guerraoui; L? Nguy?n Hoang; Oscar Villemaud; |

270 | Investigating Generalization By Controlling Normalized MarginRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: The paper finds that yes{—}in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior. |
Alexander R Farhang; Jeremy D Bernstein; Kushal Tirumala; Yang Liu; Yisong Yue; |

271 | Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging The Gap Between Learning in Extensive-Form and Normal-Form GamesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we show that the Optimistic Multiplicative Weights Update (OMWU) algorithm—the premier learning algorithm for NFGs—can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick. |
Gabriele Farina; Chung-Wei Lee; Haipeng Luo; Christian Kroer; |

272 | Local Linear Convergence of Douglas-Rachford for Linear Programming: A Probabilistic AnalysisRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we analyze the local linear convergence rate $r$ of the DRS method for random linear programs, and give explicit and tight bounds on $r$. |
Oisin Faust; Hamza Fawzi; |

273 | Matching Structure for Dual LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose to further enhance dual learning with structure matching that explicitly builds structural connections in between. |
Hao Fei; Shengqiong Wu; Yafeng Ren; Meishan Zhang; |

274 | Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. |
Yingjie Fei; Ruitu Xu; |

275 | Private Frequency Estimation Via Projective GeometryRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. |
Vitaly Feldman; Jelani Nelson; Huy Nguyen; Kunal Talwar; |

276 | An Intriguing Property of Geophysics InversionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To alleviate those issues, recent studies leverage deep neural networks to learn the inversion mappings from measurements to the property directly. In this paper, we show that such a mapping can be well modeled by a very shallow (but not wide) network with only five layers. |
Yinan Feng; Yinpeng Chen; Shihang Feng; Peng Jin; Zicheng Liu; Youzuo Lin; |

277 | Principled Knowledge Extrapolation with GANsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose to study counterfactual synthesis from a new perspective of knowledge extrapolation, where a given knowledge dimension of the data distribution is extrapolated, but the remaining knowledge is kept indistinguishable from the original distribution. |
Ruili Feng; Jie Xiao; Kecheng Zheng; Deli Zhao; Jingren Zhou; Qibin Sun; Zheng-Jun Zha; |

278 | A Resilient Distributed Boosting AlgorithmRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a distributed boosting algorithm which is resilient to a limited amount of noise. |
Yuval Filmus; Idan Mehalel; Shay Moran; |

279 | Model-Value Inconsistency As A Signal for Epistemic UncertaintyRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). |
Angelos Filos; Eszter V?rtes; Zita Marinho; Gregory Farquhar; Diana Borsa; Abram Friesen; Feryal Behbahani; Tom Schaul; Andre Barreto; Simon Osindero; |

280 | Coordinated Double Machine LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: While this methodology is flexible and can accommodate arbitrary predictive models, typically trained independently of one another, this paper argues that a carefully coordinated learning algorithm for deep neural networks may reduce the estimation bias. |
Nitai Fingerhut; Matteo Sesia; Yaniv Romano; |

281 | Conformal Prediction Sets with Limited False PositivesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We develop a new approach to multi-label conformal prediction in which we aim to output a precise set of promising prediction candidates with a bounded number of incorrect answers. |
Adam Fisch; Tal Schuster; Tommi Jaakkola; Dr.Regina Barzilay; |

282 | Fast Population-Based Reinforcement Learning on A Single MachineRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we compare implementations and revisit previous studies to show that the judicious use of compilation and vectorization allows population-based training to be performed on a single machine with one accelerator with minimal overhead compared to training a single agent. |
Arthur Flajolet; Claire Bizon Monroc; Karim Beguir; Thomas Pierrot; |

283 | Fast Relative Entropy Coding with A* CodingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce AS* and AD* coding, two REC algorithms based on A* sampling. |
Gergely Flamich; Stratis Markou; Jose Miguel Hernandez-Lobato; |

284 | Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and FairnessRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. |
Adam Foster; Arpi Vezer; Craig A. Glastonbury; Paidi Creed; Samer Abujudeh; Aaron Sim; |

285 | Label Ranking Through Nonparametric RegressionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce a generative model for Label Ranking, in noiseless and noisy nonparametric regression settings, and provide sample complexity bounds for learning algorithms in both cases. |
Dimitris Fotakis; Alkis Kalavasis; Eleni Psaroudaki; |

286 | A Neural Tangent Kernel Perspective of GANsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a novel theoretical framework of analysis for Generative Adversarial Networks (GANs). |
Jean-Yves Franceschi; Emmanuel De B?zenac; Ibrahim Ayed; Mickael Chen; Sylvain Lamprier; Patrick Gallinari; |

287 | Extracting Latent State Representations with Linear Dynamics from Rich ObservationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider a setting where there is a hidden linear subspace of the high-dimensional feature space in which the dynamics are linear. |
Abraham Frandsen; Rong Ge; Holden Lee; |

288 | SPDY: Accurate Pruning with Speedup GuaranteesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. |
Elias Frantar; Dan Alistarh; |

289 | Revisiting The Effects of Stochasticity for Hamiltonian SamplersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. |
Giulio Franzese; Dimitrios Milios; Maurizio Filippone; Pietro Michiardi; |

290 | Bregman Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a framework based on bilevel optimization for learning multilayer, deep data representations. |
Jordan Frecon; Gilles Gasso; Massimiliano Pontil; Saverio Salzo; |

291 | (Non-)Convergence Results for Predictive Coding NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: One major open problem around PCNs is their convergence behavior. In this paper, we use dynamical systems theory to formally investigate the convergence of PCNs as they are used in machine learning. |
Simon Frieder; Thomas Lukasiewicz; |

292 | Scaling Structured Inference with RandomizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. |
Yao Fu; John Cunningham; Mirella Lapata; |

293 | Greedy When Sure and Conservative When Uncertain About The OpponentsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We develop a new approach, named Greedy when Sure and Conservative when Uncertain (GSCU), to competing online against unknown and nonstationary opponents. |
Haobo Fu; Ye Tian; Hongxiang Yu; Weiming Liu; Shuang Wu; Jiechao Xiong; Ying Wen; Kai Li; Junliang Xing; Qiang Fu; Wei Yang; |

294 | DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we open up a new compression paradigm for developing real-hardware efficient DNNs, leading to boosted hardware efficiency while maintaining model accuracy. |
Yonggan Fu; Haichuan Yang; Jiayi Yuan; Meng Li; Cheng Wan; Raghuraman Krishnamoorthi; Vikas Chandra; Yingyan Lin; |

295 | Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. |
Wei Fu; Chao Yu; Zelai Xu; Jiaqi Yang; Yi Wu; |

296 | $p$-Laplacian Based Graph Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Moreover, when the topology is non-informative for label prediction, ordinary GNNs may work significantly worse than simply applying multi-layer perceptrons (MLPs) on each node. To tackle the above problem, we propose a new $p$-Laplacian based GNN model, termed as $^p$GNN, whose message passing mechanism is derived from a discrete regularization framework and could be theoretically explained as an approximation of a polynomial graph filter defined on the spectral domain of $p$-Laplacians. |
Guoji Fu; Peilin Zhao; Yatao Bian; |

297 | Why Should I Trust You, Bellman? The Bellman Error Is A Poor Replacement for Value ErrorRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. |
Scott Fujimoto; David Meger; Doina Precup; Ofir Nachum; Shixiang Shane Gu; |

298 | Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic DataRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We analyze the impact of DP on these models vis-a-vis underrepresented classes/subgroups of data, specifically, studying: 1) the size of classes/subgroups in the synthetic data, and 2) the accuracy of classification tasks run on them. |
Georgi Ganev; Bristena Oprisanu; Emiliano De Cristofaro; |

299 | The Complexity of K-Means Clustering When Little Is KnownRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Here, we study the complexity of k-means clustering in settings where most of the data is not known or simply irrelevant. |
Robert Ganian; Thekla Hamm; Viktoriia Korchemna; Karolina Okrasa; Kirill Simonov; |

300 | IDYNO: Learning Nonparametric DAGs from Interventional Dynamic DataRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a new algorithm, IDYNO, to learn the DAG structure from potentially nonlinear times series data by using a continuous optimization framework that includes a recent formulation for continuous acyclicity constraint. |
Tian Gao; Debarun Bhattacharjya; Elliot Nelson; Miao Liu; Yue Yu; |

301 | Loss Function Learning for Domain Generalization By Implicit GradientRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In particular, we introduce a novel meta-learning approach to loss function search based on implicit gradient. |
Boyan Gao; Henry Gouk; Yongxin Yang; Timothy Hospedales; |

302 | On The Convergence of Local Stochastic Compositional Gradient Descent with MomentumRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we developed a novel local stochastic compositional gradient descent with momentum method, which facilitates Federated Learning for the stochastic compositional problem. |
Hongchang Gao; Junyi Li; Heng Huang; |

303 | Deep Reference Priors: What Is The Best Way to Pretrain A Model?Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. |
Yansong Gao; Rahul Ramesh; Pratik Chaudhari; |

304 | On The Equivalence Between Temporal and Static Equivariant Graph RepresentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This work formalizes the associational task of predicting node attribute evolution in temporal graphs from the perspective of learning equivariant representations. |
Jianfei Gao; Bruno Ribeiro; |

305 | Generalizing Gaussian Smoothing for Random SearchRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Based on an analysis of DFO for non-convex functions, we propose to choose a distribution for perturbations that minimizes the mean squared error (MSE) of the gradient estimate. |
Katelyn Gao; Ozan Sener; |

306 | Rethinking Image-Scaling Attacks: The Interplay Between Vulnerabilities in Machine Learning SystemsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we investigate the interplay between vulnerabilities of the image scaling procedure and machine learning models in the decision-based black-box setting. |
Yue Gao; Ilia Shumailov; Kassem Fawaz; |

307 | Lazy Estimation of Variable Importance for Large Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a fast and flexible method for approximating the reduced model with important inferential guarantees. |
Yue Gao; Abby Stevens; Garvesh Raskutti; Rebecca Willett; |

308 | Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin AttackRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a novel method, minimum-margin (MM) attack, to fast and reliably evaluate adversarial robustness. |
Ruize Gao; Jiongxiao Wang; Kaiwen Zhou; Feng Liu; Binghui Xie; Gang Niu; Bo Han; James Cheng; |

309 | Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection ProblemsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). |
Lucy L Gao; Jane Ye; Haian Yin; Shangzhi Zeng; Jin Zhang; |

310 | Learning to Incorporate Texture Saliency Adaptive Attention to Image CartoonizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To this end, a novel cartoon-texture-saliency-sampler (CTSS) module is proposed to adaptively sample cartoon-texture-salient patches from training data. |
Xiang Gao; Yuqi Zhang; Yingjie Tian; |

311 | Stochastic Smoothing of The Top-K Calibrated Hinge Loss for Deep Imbalanced ClassificationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper we introduce a stochastic top-K hinge loss inspired by recent developments on top-K calibrated losses. |
Camille Garcin; Maximilien Servajean; Alexis Joly; Joseph Salmon; |

312 | PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient EstimationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: After a compact survey on some of the main variance-reduced REINFORCE-type methods, we propose ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG), a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of update. |
Matilde Gargiani; Andrea Zanelli; Andrea Martinelli; Tyler Summers; John Lygeros; |

313 | The Power of First-order Smooth Optimization for Black-box Non-smooth ProblemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, besides the oracle complexity, we focus also on iteration complexity, and propose a generic approach that, based on optimal first-order methods, allows to obtain in a black-box fashion new zeroth-order algorithms for non-smooth convex optimization problems. |
Alexander Gasnikov; Anton Novitskii; Vasilii Novitskii; Farshed Abdukhakimov; Dmitry Kamzolov; Aleksandr Beznosikov; Martin Takac; Pavel Dvurechensky; Bin Gu; |

314 | A Functional Information Perspective on Model InterpretationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input. |
Itai Gat; Nitay Calderon; Roi Reichart; Tamir Hazan; |

315 | UniRank: Unimodal Bandit Algorithms for Online RankingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. |
Camille-Sovanneary Gauthier; Romaric Gaudel; Elisa Fromont; |

316 | Variational Inference with Locally Enhanced Bounds for Hierarchical ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a new family of variational bounds for hierarchical models, based on the application of tightening methods (e.g. importance weighting) separately for each group of local random variables. |
Tomas Geffner; Justin Domke; |

317 | Inducing Causal Structure for Interpretable Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchange intervention training (IIT). |
Atticus Geiger; Zhengxuan Wu; Hanson Lu; Josh Rozner; Elisa Kreiss; Thomas Icard; Noah Goodman; Christopher Potts; |

318 | Achieving Minimax Rates in Pool-Based Batch Active LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. |
Claudio Gentile; Zhilei Wang; Tong Zhang; |

319 | Near-Exact Recovery for Tomographic Inverse Problems Via Deep LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This work is concerned with the following fundamental question in scientific machine learning: Can deep-learning-based methods solve noise-free inverse problems to near-perfect accuracy? |
Martin Genzel; Ingo G?hring; Jan Macdonald; Maximilian M?rz; |

320 | Online Learning for Min Sum Set Cover and Pandora’s BoxRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a computationally efficient algorithm that is constant-competitive against the cost of the optimal search order. |
Evangelia Gergatsouli; Christos Tzamos; |

321 | Equivariance Versus Augmentation for Spherical ImagesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. |
Jan Gerken; Oscar Carlsson; Hampus Linander; Fredrik Ohlsson; Christoffer Petersson; Daniel Persson; |

322 | A Regret Minimization Approach to Multi-Agent ControlRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. |
Udaya Ghai; Udari Madhushani; Naomi Leonard; Elad Hazan; |

323 | Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce a naturalistic physics-based environment with a set of connectable magnet blocks inspired by children’s toy kits. |
Seyed Kamyar Seyed Ghasemipour; Satoshi Kataoka; Byron David; Daniel Freeman; Shixiang Shane Gu; Igor Mordatch; |

324 | Faster Privacy Accounting Via Evolving DiscretizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for compositions of mechanisms. |
Badih Ghazi; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; |

325 | Plug-In Inversion: Model-Agnostic Inversion for Vision with Data AugmentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. |
Amin Ghiasi; Hamid Kazemi; Steven Reich; Chen Zhu; Micah Goldblum; Tom Goldstein; |

326 | Offline RL Policies Should Be Trained to Be AdaptiveRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose that offline RL methods should instead be adaptive in the presence of uncertainty. |
Dibya Ghosh; Anurag Ajay; Pulkit Agrawal; Sergey Levine; |

327 | Breaking The $\sqrtT$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we show that stochastic contexts indeed help to reduce the regret from $\sqrt{T}$ to $\polylog(T)$. |
Avishek Ghosh; Abishek Sankararaman; |

328 | SCHA-VAE: Hierarchical Context Aggregation for Few-Shot GenerationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We extend current latent variable models for sets to a fully hierarchical approach with an attention-based point to set-level aggregation and call our method SCHA-VAE for Set-Context-Hierarchical-Aggregation Variational Autoencoder. |
Giorgio Giannone; Ole Winther; |

329 | A Joint Exponential Mechanism For Differentially Private Top-$k$Related Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a differentially private algorithm for releasing the sequence of $k$ elements with the highest counts from a data domain of $d$ elements. |
Jennifer Gillenwater; Matthew Joseph; Andres Munoz; Monica Ribero Diaz; |

330 | Neuro-Symbolic Hierarchical Rule InductionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose Neuro-Symbolic Hierarchical Rule Induction, an efficient interpretable neuro-symbolic model, to solve Inductive Logic Programming (ILP) problems. |
Claire Glanois; Zhaohui Jiang; Xuening Feng; Paul Weng; Matthieu Zimmer; Dong Li; Wulong Liu; Jianye Hao; |

331 | It’s Raw! Audio Generation with State-Space ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. |
Karan Goel; Albert Gu; Chris Donahue; Christopher Re; |

332 | RankSim: Ranking Similarity Regularization for Deep Imbalanced RegressionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This paper presents the RankSim (ranking similarity) regularizer for deep imbalanced regression, which encodes an inductive bias that samples that are closer in label space should also be closer in feature space. |
Yu Gong; Greg Mori; Fred Tung; |

333 | How to Fill The Optimum Set? Population Gradient Descent with Harmless DiversityRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Therefore, it is useful to consider the problem of finding a set of diverse points in the optimum set of an objective function. In this work, we frame this problem as a bi-level optimization problem of maximizing a diversity score inside the optimum set of the main loss function, and solve it with a simple population gradient descent framework that iteratively updates the points to maximize the diversity score in a fashion that does not hurt the optimization of the main loss. |
Chengyue Gong; Lemeng Wu; Qiang Liu; |

334 | Partial Label Learning Via Label Influence FunctionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, inspired by influence function, we develop a novel PLL framework called Partial Label Learning via Label Influence Function (PLL-IF). |
Xiuwen Gong; Dong Yuan; Wei Bao; |

335 | Secure Distributed Training at ScaleRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency. |
Eduard Gorbunov; Alexander Borzunov; Michael Diskin; Max Ryabinin; |

336 | Retrieval-Augmented Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. |
Anirudh Goyal; Abram Friesen; Andrea Banino; Theophane Weber; Nan Rosemary Ke; Adri? Puigdom?nech Badia; Arthur Guez; Mehdi Mirza; Peter C Humphreys; Ksenia Konyushova; Michal Valko; Simon Osindero; Timothy Lillicrap; Nicolas Heess; Charles Blundell; |

337 | The State of Sparse Training in Deep Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. |
Laura Graesser; Utku Evci; Erich Elsen; Pablo Samuel Castro; |

338 | Causal Inference Through The Structural Causal Marginal ProblemRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We introduce an approach to counterfactual inference based on merging information from multiple datasets. |
Luigi Gresele; Julius Von K?gelgen; Jonas K?bler; Elke Kirschbaum; Bernhard Sch?lkopf; Dominik Janzing; |

339 | Mirror Learning: A Unifying Framework of Policy OptimisationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In contrast, in this paper, we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO. |
Jakub Grudzien; Christian A Schroeder De Witt; Jakob Foerster; |

340 | Adapting K-means Algorithms for OutliersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we build on their ideas and show how to adapt several sequential and distributed k-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output (1 + $\epsilon$)z outliers while achieving an O(1/$\epsilon$)-approximation to the objective function. |
Christoph Grunau; V?clav Rozhon; |

341 | Variational Mixtures of ODEs for Inferring Cellular Gene Expression DynamicsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Additionally, a single progenitor cell type often bifurcates into multiple child cell types, further complicating the problem of modeling the dynamics. To address this problem, we developed an approach called variational mixtures of ordinary differential equations. |
Yichen Gu; David T Blaauw; Joshua Welch; |

342 | Learning Pseudometric-based Action Representations for Offline Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper proposes an action representation learning framework for offline RL based on a pseudometric, which measures both the behavioral relation and the data-distributional relation between actions. |
Pengjie Gu; Mengchen Zhao; Chen Chen; Dong Li; Jianye Hao; Bo An; |

343 | NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance FieldsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we consider a partially observable scenario known as fluid dynamics grounding, that is, inferring the state transitions and interactions within the fluid particle systems from sequential visual observations of the fluid surface. |
Shanyan Guan; Huayu Deng; Yunbo Wang; Xiaokang Yang; |

344 | Fast-Rate PAC-Bayesian Generalization Bounds for Meta-LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose a general PAC-Bayesian framework to cope with single-task learning and meta-learning uniformly. |
Jiechao Guan; Zhiwu Lu; |

345 | Leveraging Approximate Symbolic Models for Reinforcement Learning Via Skill DiversityRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Symbolic models of real world tasks are however often incomplete. To this end, we introduce Approximate Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to characterize the incompleteness of the symbolic model. |
Lin Guan; Sarath Sreedharan; Subbarao Kambhampati; |

346 | Large-Scale Graph Neural Architecture SearchRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, existing approaches fail to handle large-scale graphs because current performance estimation strategies in GNAS are computationally expensive for large-scale graphs and suffer from consistency collapse issues. To tackle these problems, we propose the Graph ArchitectUre Search at Scale (GAUSS) method that can handle large-scale graphs by designing an efficient light-weight supernet and the joint architecture-graph sampling. |
Chaoyu Guan; Xin Wang; Hong Chen; Ziwei Zhang; Wenwu Zhu; |

347 | Identifiability Conditions for Domain AdaptationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Unfortunately, it is unclear under what conditions this identifiability assumption holds, even when restricting ourselves to the case where a correct bijective map between domains exists. We study this bijective domain mapping problem and provide several new sufficient conditions for the identifiability of linear domain maps. |
Ishaan Gulrajani; Tatsunori Hashimoto; |

348 | A Parametric Class of Approximate Gradient Updates for Policy OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To better capture the commonalities and identify key differences between policy optimization methods, we develop a unified perspective that re-expresses the underlying updates in terms of a limited choice of gradient form and scaling function. |
Ramki Gummadi; Saurabh Kumar; Junfeng Wen; Dale Schuurmans; |

349 | Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision ProcessesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In the offline setting, estimating these operators directly is challenging due to (i) the large observation space and (ii) insufficient coverage of the offline dataset. To tackle these challenges, we propose a novel algorithm that constructs confidence regions for these Bellman operators via offline estimation of their RKHS embeddings, and returns the final policy via pessimistic planning within the confidence regions. |
Hongyi Guo; Qi Cai; Yufeng Zhang; Zhuoran Yang; Zhaoran Wang; |

350 | No-Regret Learning in Partially-Informed AuctionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Auctions with partially-revealed information about items are broadly employed in real-world applications, but the underlying mechanisms have limited theoretical support. In this work, we study a machine learning formulation of these types of mechanisms, presenting algorithms that are no-regret from the buyer’s perspective. |
Wenshuo Guo; Michael Jordan; Ellen Vitercik; |

351 | Bounding Training Data Reconstruction in Private (Deep) LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we derive the first semantic guarantees for DP mechanisms against training data reconstruction attacks under a formal threat model. |
Chuan Guo; Brian Karrer; Kamalika Chaudhuri; Laurens van der Maaten; |

352 | Adversarially Trained Neural Representations Are Already As Robust As Biological Neural RepresentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. |
Chong Guo; Michael Lee; Guillaume Leclerc; Joel Dapello; Yug Rao; Aleksander Madry; James Dicarlo; |

353 | Class-Imbalanced Semi-Supervised Learning with Adaptive ThresholdingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we develop a simple yet effective framework, which only involves adaptive thresholding for different classes in SSL algorithms, and achieves remarkable performance improvement on more than twenty imbalance ratios. |
Lan-Zhe Guo; Yu-Feng Li; |

354 | Deep Squared Euclidean Approximation to The Levenshtein Distance for DNA StorageRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose a novel deep squared Euclidean embedding for DNA sequences using Siamese neural network, squared Euclidean embedding, and chi-squared regression. |
Alan J.X. Guo; Cong Liang; Qing-Hu Hou; |

355 | Online Continual Learning Through Mutual Information MaximizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper proposed a new online continual learning approach called OCMM based on mutual information (MI) maximization. |
Yiduo Guo; Bing Liu; Dongyan Zhao; |

356 | Fast Provably Robust Decision Trees and BoostingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This work proposes the Fast Provably Robust Decision Tree (FPRDT) with the smallest computational complexity O(n log n), a tradeoff between global and local optimizations over the adversarial 0/1 loss. |
Jun-Qi Guo; Ming-Zhuo Teng; Wei Gao; Zhi-Hua Zhou; |

357 | Understanding and Improving Knowledge Graph Embedding for Entity AlignmentRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To fill the research gap, we define a typical paradigm abstracted from existing EEA methods and analyze how the embedding discrepancy between two potentially aligned entities is implicitly bounded by a predefined margin in the score function. |
Lingbing Guo; Qiang Zhang; Zequn Sun; Mingyang Chen; Wei Hu; Huajun Chen; |

358 | NISPA: Neuro-Inspired Stability-Plasticity Adaptation for Continual Learning in Sparse NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: The main desiderata associated with CL are to maintain performance on older tasks, leverage the latter to improve learning of future tasks, and to introduce minimal overhead in the training process (for instance, to not require a growing model or retraining). We propose the Neuro-Inspired Stability-Plasticity Adaptation (NISPA) architecture that addresses these desiderata through a sparse neural network with fixed density. |
Mustafa B Gurbuz; Constantine Dovrolis; |

359 | Active Learning on A Budget: Opposite Strategies Suit High and Low BudgetsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. |
Guy Hacohen; Avihu Dekel; Daphna Weinshall; |

360 | You Only Cut Once: Boosting Data Augmentation with A Single CutRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We present You Only Cut Once (YOCO) for performing data augmentations. |
Junlin Han; Pengfei Fang; Weihao Li; Jie Hong; Mohammad Ali Armin; Ian Reid; Lars Petersson; Hongdong Li; |

361 | Scalable MCMC Sampling for Nonsymmetric Determinantal Point ProcessesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we develop a scalable MCMC sampling algorithm for $k$-NDPPs with low-rank kernels, thus enabling runtime that is sublinear in $n$. |
Insu Han; Mike Gartrell; Elvis Dohmatob; Amin Karbasi; |

362 | G-Mixup: Graph Data Augmentation for Graph ClassificationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, it is challenging to directly adopt Mixup to augment graph data because different graphs typically: 1) have different numbers of nodes; 2) are not readily aligned; and 3) have unique typologies in non-Euclidean space. To this end, we propose G-Mixup to augment graphs for graph classification by interpolating the generator (i.e., graphon) of different classes of graphs. |
Xiaotian Han; Zhimeng Jiang; Ninghao Liu; Xia Hu; |

363 | Private Streaming SCO in $\ell_p$ Geometry with Applications in High Dimensional Online Decision MakingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a private variant of the Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data. |
Yuxuan Han; Zhicong Liang; Zhipeng Liang; Yang Wang; Yuan Yao; Jiheng Zhang; |

364 | Off-Policy Reinforcement Learning with Delayed RewardsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study deep reinforcement learning (RL) algorithms with delayed rewards. |
Beining Han; Zhizhou Ren; Zuofan Wu; Yuan Zhou; Jian Peng; |

365 | Adversarial Attacks on Gaussian Process BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Our goal is to understand adversarial attacks on GP bandits from theoretical and practical perspectives. |
Eric Han; Jonathan Scarlett; |

366 | Random Gegenbauer Features for Scalable Kernel MethodsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose efficient random features for approximating a new and rich class of kernel functions that we refer to as Generalized Zonal Kernels (GZK). |
Insu Han; Amir Zandieh; Haim Avron; |

367 | Stochastic Reweighted Gradient DescentRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose stochastic reweighted gradient descent (SRG), a stochastic gradient method based solely on importance sampling that can reduce the variance of the gradient estimator and improve on the asymptotic error of stochastic gradient descent (SGD) in the strongly convex and smooth case. |
Ayoub El Hanchi; David Stephens; Chris Maddison; |

368 | Dual Perspective of Label-Specific Feature Learning for Multi-Label ClassificationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose a dual perspective for label-specific feature learning, where label-specific discriminative properties are considered by identifying each label’s own non-informative features and making the discrimination process immutable to variations of these features. |
Jun-Yi Hang; Min-Ling Zhang; |

369 | Temporal Difference Learning for Model Predictive ControlRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we combine the strengths of model-free and model-based methods. |
Nicklas A Hansen; Hao Su; Xiaolong Wang; |

370 | Bisimulation Makes Analogies in Goal-Conditioned Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a new form of state abstraction called goal-conditioned bisimulation that captures functional equivariance, allowing for the reuse of skills to achieve new goals. |
Philippe Hansen-Estruch; Amy Zhang; Ashvin Nair; Patrick Yin; Sergey Levine; |

371 | TURF: Two-Factor, Universal, Robust, Fast Distribution Learning AlgorithmRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We derive a near-linear-time and essentially sample-optimal estimator that establishes c_{t,d}=2 for all (t,d)!=(1,0). |
Yi Hao; Ayush Jain; Alon Orlitsky; Vaishakh Ravindrakumar; |

372 | Contextual Information-Directed SamplingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. |
Botao Hao; Tor Lattimore; Chao Qin; |

373 | GSmooth: Certified Robustness Against Semantic Transformations Via Generalized Randomized SmoothingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, existing methods are insufficient or unable to provably defend against semantic transformations, especially those without closed-form expressions (such as defocus blur and pixelate), which are more common in practice and often unrestricted. To fill up this gap, we propose generalized randomized smoothing (GSmooth), a unified theoretical framework for certifying robustness against general semantic transformations via a novel dimension augmentation strategy. |
Zhongkai Hao; Chengyang Ying; Yinpeng Dong; Hang Su; Jian Song; Jun Zhu; |

374 | Implicit Regularization with Polynomial Growth in Deep Tensor FactorizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the implicit regularization effects of deep learning in tensor factorization. |
Kais Hariz; Hachem Kadri; Stephane Ayache; Maher Moakher; Thierry Artieres; |

375 | Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic ResponsesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Specifically, our work establishes a novel connection between strategic responses to ML models and instrumental variable (IV) regression by observing that the sequence of deployed models can be viewed as an instrument that affects agents’ observable features but does not directly influence their outcomes. |
Keegan Harris; Dung Daniel T Ngo; Logan Stapleton; Hoda Heidari; Steven Wu; |

376 | C*-algebra Net: A New Approach Generalizing Neural Network Parameters to C*-algebraRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a new framework that generalizes the parameters of neural network models to $C^*$-algebra-valued ones. |
Yuka Hashimoto; Zhao Wang; Tomoko Matsui; |

377 | General-purpose, Long-context Autoregressive Modeling with Perceiver ARRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. |
Curtis Hawthorne; Andrew Jaegle; Catalina Cangea; Sebastian Borgeaud; Charlie Nash; Mateusz Malinowski; Sander Dieleman; Oriol Vinyals; Matthew Botvinick; Ian Simon; Hannah Sheahan; Neil Zeghidour; Jean-Baptiste Alayrac; Joao Carreira; Jesse Engel; |

378 | On Distribution Shift in Learning-based Bug DetectorsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we argue that this massive performance difference is caused by a distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. |
Jingxuan He; Luca Beurer-Kellner; Martin Vechev; |

379 | GNNRank: Learning Global Rankings from Pairwise Comparisons Via Directed Graph Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we introduce neural networks into the ranking recovery problem by proposing the so-called GNNRank, a trainable GNN-based framework with digraph embedding. |
Yixuan He; Quan Gan; David Wipf; Gesine D Reinert; Junchi Yan; Mihai Cucuringu; |

380 | Exploring The Gap Between Collapsed & Whitened Features in Self-Supervised LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We identify power law behaviour in eigenvalue decay, parameterised by exponent ß=0, as a spectrum that bridges between the collapsed & whitened feature extremes. |
Bobby He; Mete Ozay; |

381 | Sparse Double Descent: Where Network Pruning Aggravates OverfittingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. |
Zheng He; Zeke Xie; Quanzhi Zhu; Zengchang Qin; |

382 | A Reduction from Linear Contextual Bandit Lower Bounds to Estimation Lower BoundsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we complete the reverse direction by establishing the necessity. |
Jiahao He; Jiheng Zhang; Rachel Zhang; |

383 | HyperPrompt: Prompt-based Task-Conditioning of TransformersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. |
Yun He; Steven Zheng; Yi Tay; Jai Gupta; Yu Du; Vamsi Aribandi; Zhe Zhao; Yaguang Li; Zhao Chen; Donald Metzler; Heng-Tze Cheng; Ed H. Chi; |

384 | Label-Descriptive Patterns and Their Application to Characterizing Classification ErrorsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose to discover those feature-value combinations (i.e., patterns) that strongly correlate with correct resp. |
Michael A. Hedderich; Jonas Fischer; Dietrich Klakow; Jilles Vreeken; |

385 | NOMU: Neural Optimization-based Model UncertaintyRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, we find that established benchmarks often fail to reliably capture some of these desiderata, even those that are required by Bayesian theory. To address this, we introduce a new approach for capturing model uncertainty for NNs, which we call Neural Optimization-based Model Uncertainty (NOMU). |
Jakob M Heiss; Jakob Weissteiner; Hanna S Wutte; Sven Seuken; Josef Teichmann; |

386 | Scaling Out-of-Distribution Detection for Real-World SettingsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To set the stage for more realistic out-of-distribution detection, we depart from small-scale settings and explore large-scale multiclass and multi-label settings with high-resolution images and thousands of classes. |
Dan Hendrycks; Steven Basart; Mantas Mazeika; Andy Zou; Joseph Kwon; Mohammadreza Mostajabi; Jacob Steinhardt; Dawn Song; |

387 | Generalization Bounds Using Lower Tail Exponents in Stochastic OptimizersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. |
Liam Hodgkinson; Umut Simsekli; Rajiv Khanna; Michael Mahoney; |

388 | Unsupervised Detection of Contextualized Embedding Bias with Application to IdeologyRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a fully unsupervised method to detect bias in contextualized embeddings. |
Valentin Hofmann; Janet Pierrehumbert; Hinrich Sch?tze; |

389 | Neural Laplace: Learning Diverse Classes of Differential Equations in The Laplace DomainRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose Neural Laplace, a unifying framework for learning diverse classes of DEs including all the aforementioned ones. |
Samuel I Holt; Zhaozhi Qian; Mihaela van der Schaar; |

390 | Deep Hierarchy in BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Since the hierarchy can have multiple layers, we call it deep. We propose a hierarchical Thompson sampling algorithm (HierTS) for this problem and show how to implement it efficiently for Gaussian hierarchies. |
Joey Hong; Branislav Kveton; Sumeet Katariya; Manzil Zaheer; Mohammad Ghavamzadeh; |

391 | DAdaQuant: Doubly-adaptive Quantization for Communication-efficient Federated LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce DAdaQuant as a doubly-adaptive quantization algorithm that dynamically changes the quantization level across time and different clients. |
Robert H?nig; Yiren Zhao; Robert Mullins; |

392 | Equivariant Diffusion for Molecule Generation in 3DRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. |
Emiel Hoogeboom; Vi?ctor Garcia Satorras; Cl?ment Vignac; Max Welling; |

393 | Conditional GANs with Auxiliary Discriminative ClassifierRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: The fundamental reason pointed out in this paper is that the classifier of AC-GAN is generator-agnostic, which therefore cannot provide informative guidance for the generator to approach the joint distribution, resulting in a minimization of the conditional entropy that decreases the intra-class diversity. |
Liang Hou; Qi Cao; Huawei Shen; Siyuan Pan; Xiaoshuang Li; Xueqi Cheng; |

394 | AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail ProblemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Under such a scenario, AUC is a much more reasonable metric than accuracy since it is insensitive toward class distribution. Motivated by this, we present an early trial to explore adversarial training methods to optimize AUC. |
Wenzheng Hou; Qianqian Xu; Zhiyong Yang; Shilong Bao; Yuan He; Qingming Huang; |

395 | Wide Bayesian Neural Networks Have A Simple Weight Posterior: Theory and Accelerated SamplingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. |
Jiri Hron; Roman Novak; Jeffrey Pennington; Jascha Sohl-Dickstein; |

396 | Learning Inverse Folding from Millions of Predicted StructuresRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider the problem of predicting a protein sequence from its backbone atom coordinates. |
Chloe Hsu; Robert Verkuil; Jason Liu; Zeming Lin; Brian Hie; Tom Sercu; Adam Lerer; Alexander Rives; |

397 | Nearly Minimax Optimal Reinforcement Learning with Linear Function ApproximationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB$^+$, which achieves an $\widetilde{O}(Hd\sqrt{T})$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps. |
Pihe Hu; Yu Chen; Longbo Huang; |

398 | Neuron Dependency Graphs: A Causal Abstraction of Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We discover that neural networks exhibit approximate logical dependencies among neurons, and we introduce Neuron Dependency Graphs (NDG) that extract and present them as directed graphs. |
Yaojie Hu; Jin Tian; |

399 | Policy Diagnosis Via Measuring Role Diversity in Cooperative Multi-agent RLRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this study, we quantify the agent’s behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks. |
Siyi Hu; Chuanlong Xie; Xiaodan Liang; Xiaojun Chang; |

400 | On The Role of Discount Factor in Offline Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper examines two distinct effects of $\gamma$ in offline RL with theoretical analysis, namely the regularization effect and the pessimism effect. |
Hao Hu; Yiqin Yang; Qianchuan Zhao; Chongjie Zhang; |

401 | Transformer Quality in Linear TimeRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences. |
Weizhe Hua; Zihang Dai; Hanxiao Liu; Quoc Le; |

402 | Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied AgentsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. “make breakfast”), to a chosen set of actionable steps (e.g. “open fridge”). |
Wenlong Huang; Pieter Abbeel; Deepak Pathak; Igor Mordatch; |

403 | Forward Operator Estimation in Generative Models with Kernel Transfer OperatorsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a substantially cheaper (and simpler) forward operator estimation strategy based on adapting known results on kernel transfer operators. |
Zhichun Huang; Rudrasis Chakraborty; Vikas Singh; |

404 | Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $\alpha$-th ($1<\alpha\le 2$) moments bounded by $\sigma^\alpha$, while the variances may not exist. |
Jiatai Huang; Yan Dai; Longbo Huang; |

405 | Frustratingly Easy Transferability EstimationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. To this end, we propose a simple, efficient, and effective transferability measure named TransRate. |
Long-Kai Huang; Junzhou Huang; Yu Rong; Qiang Yang; Ying Wei; |

406 | Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)Related Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network across different combinations of modalities on various tasks, which is counter-intuitive since multiple signals would bring more information (Wang et al., 2020). This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. |
Yu Huang; Junyang Lin; Chang Zhou; Hongxia Yang; Longbo Huang; |

407 | Action-Sufficient State Representation Learning for Control with Structural ConstraintsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs). |
Biwei Huang; Chaochao Lu; Liu Leqi; Jose Miguel Hernandez-Lobato; Clark Glymour; Bernhard Sch?lkopf; Kun Zhang; |

408 | 3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker DesignRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we focus on a new type of drug design problem — generating a small “linker” to physically attach two independent molecules with their distinct functions. |
Yinan Huang; Xingang Peng; Jianzhu Ma; Muhan Zhang; |

409 | SDQ: Stochastic Differentiable Quantization with Mixed PrecisionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we present a novel Stochastic Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy in a more flexible and globally-optimized space with a smoother gradient approximation. |
Xijie Huang; Zhiqiang Shen; Shichao Li; Zechun Liu; Hu Xianghong; Jeffry Wicaksana; Eric Xing; Kwang-Ting Cheng; |

410 | Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced TopologyRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We develop a general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios. |
Yan Huang; Ying Sun; Zehan Zhu; Changzhi Yan; Jinming Xu; |

411 | Efficient Representation Learning Via Adaptive Context PoolingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. |
Chen Huang; Walter Talbott; Navdeep Jaitly; Joshua M Susskind; |

412 | On The Learning of Non-Autoregressive TransformersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. |
Fei Huang; Tianhua Tao; Hao Zhou; Lei Li; Minlie Huang; |

413 | Going Deeper Into Permutation-Sensitive Graph Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we devise an efficient permutation-sensitive aggregation mechanism via permutation groups, capturing pairwise correlations between neighboring nodes. |
Zhongyu Huang; Yingheng Wang; Chaozhuo Li; Huiguang He; |

414 | Directed Acyclic Transformer for Non-Autoregressive Machine TranslationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. |
Fei Huang; Hao Zhou; Yang Liu; Hang Li; Minlie Huang; |

415 | Unsupervised Ground Metric Learning Using Wasserstein Singular VectorsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose for the first time a canonical answer by simultaneously computing an OT distance between samples and between features of a dataset. |
Geert-Jan Huizing; Laura Cantini; Gabriel Peyr?; |

416 | Robust Kernel Density Estimation with Median-of-Means PrincipleRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we introduce a robust non-parametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). |
Pierre Humbert; Batiste Le Bars; Ludovic Minvielle; |

417 | A Data-driven Approach for Learning to Control ComputersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. |
Peter C Humphreys; David Raposo; Tobias Pohlen; Gregory Thornton; Rachita Chhaparia; Alistair Muldal; Josh Abramson; Petko Georgiev; Adam Santoro; Timothy Lillicrap; |

418 | Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex RegularizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Using such a denoiser guarantees the convergence of the PnP version of the Half-Quadratic-Splitting (PnP-HQS) iterative algorithm. In this paper, we show that this gradient denoiser can actually correspond to the proximal operator of another scalar function. |
Samuel Hurault; Arthur Leclaire; Nicolas Papadakis; |

419 | Inverse Contextual Bandits: Learning How Behavior Evolves Over TimeRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner. |
Alihan H?y?k; Daniel Jarrett; Mihaela van der Schaar; |

420 | Datamodels: Understanding Predictions with Data and Data with PredictionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. |
Andrew Ilyas; Sung Min Park; Logan Engstrom; Guillaume Leclerc; Aleksander Madry; |

421 | Parsimonious Learning-Augmented CachingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we introduce and study the setting in which the learning-augmented algorithm can utilize the predictions parsimoniously. |
Sungjin Im; Ravi Kumar; Aditya Petety; Manish Purohit; |

422 | Bayesian Optimization for Distributionally Robust Chance-constrained ProblemRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this study, we consider distributionally robust CC (DRCC) problem and propose a novel DRCC Bayesian optimization method for the case where the distribution of the environmental variables cannot be precisely specified. |
Yu Inatsu; Shion Takeno; Masayuki Karasuyama; Ichiro Takeuchi; |

423 | LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial OptimisationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a low-complexity approach for identifying a (possibly much smaller) subgraph of the original graph where the heuristics can be run in reasonable time and with a high likelihood of finding a global near-optimal solution. |
David Ireland; Giovanni Montana; |

424 | The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns Via Spotlights of AttentionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, this dual formulation offers a possibility of directly visualising how an NN makes use of training patterns at test time, by examining the corresponding attention weights. We conduct experiments on small scale supervised image classification tasks in single-task, multi-task, and continual learning settings, as well as language modelling, and discuss potentials and limits of this view for better understanding and interpreting how NNs exploit training patterns. |
Kazuki Irie; R?bert Csord?s; J?rgen Schmidhuber; |

425 | A Modern Self-Referential Weight Matrix That Learns to Modify ItselfRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a scalable self-referential WM (SRWM) that learns to use outer products and the delta update rule to modify itself. |
Kazuki Irie; Imanol Schlag; R?bert Csord?s; J?rgen Schmidhuber; |

426 | Revisiting Online Submodular Minimization: Gap-Dependent Regret Bounds, Best of Both Worlds and Adversarial RobustnessRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we consider online decision problems with submodular loss functions. |
Shinji Ito; |

427 | Modeling Strong and Human-Like Gameplay with KL-Regularized SearchRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider the task of accurately modeling strong human policies in multi-agent decision-making problems, given examples of human behavior. |
Athul Paul Jacob; David J Wu; Gabriele Farina; Adam Lerer; Hengyuan Hu; Anton Bakhtin; Jacob Andreas; Noam Brown; |

428 | A Deep Convolutional Neural Network That Is Invariant to Time RescalingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This paper presents a deep CNN (SITHCon) that uses a logarithmically compressed temporal representation at each level. |
Brandon G Jacques; Zoran Tiganj; Aakash Sarkar; Marc Howard; Per Sederberg; |

429 | Input Dependent Sparse Gaussian ProcessesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: A limitation is, however, that in some tasks a large number of inducing points may be required to obtain good results. To alleviate this, we propose here to amortize the computation of the inducing points locations, as well as the parameters of $q$. |
Bahram Jafrasteh; Carlos Villacampa-Calvo; Daniel Hernandez-Lobato; |

430 | Regret Minimization with Performative FeedbackRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Our main contribution is regret bounds that scale only with the complexity of the distribution shifts and not that of the reward function. |
Meena Jagadeesan; Tijana Zrnic; Celestine Mendler-D?nner; |

431 | Biological Sequence Design with GFlowNetsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. |
Moksh Jain; Emmanuel Bengio; Alex Hernandez-Garcia; Jarrid Rector-Brooks; Bonaventure F. P. Dossou; Chanakya Ajit Ekbote; Jie Fu; Tianyu Zhang; Michael Kilgour; Dinghuai Zhang; Lena Simine; Payel Das; Yoshua Bengio; |

432 | Combining Diverse Feature PriorsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly. In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. |
Saachi Jain; Dimitris Tsipras; Aleksander Madry; |

433 | Training Your Sparse Neural Network Better with Any MaskRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing “ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. |
Ajay Kumar Jaiswal; Haoyu Ma; Tianlong Chen; Ying Ding; Zhangyang Wang; |

434 | Sequential Covariate Shift Detection Using Classifier Two-Sample TestsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider the problem of detecting covariate shift, where the covariate distribution shifts but the conditional distribution of labels given covariates remains the same. |
Sooyong Jang; Sangdon Park; Insup Lee; Osbert Bastani; |

435 | Surrogate Likelihoods for Variational Annealed Importance SamplingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, supporting data subsampling in these hybrid methods can be a challenge, a shortcoming that we address by introducing a surrogate likelihood that can be learned jointly with other variational parameters. |
Martin Jankowiak; Du Phan; |

436 | Planning with Diffusion for Flexible Behavior SynthesisRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. |
Michael Janner; Yilun Du; Joshua Tenenbaum; Sergey Levine; |

437 | HyperImpute: Generalized Iterative Imputation with Automatic Model SelectionRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we study an approach that marries the advantages of both: We propose *HyperImpute*, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters. |
Daniel Jarrett; Bogdan C Cebere; Tennison Liu; Alicia Curth; Mihaela van der Schaar; |

438 | Mitigating Modality Collapse in Multimodal VAEs Via Impartial OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We refer to this limitation as modality collapse. In this work, we argue that this effect is a consequence of conflicting gradients during multimodal VAE training. |
Adrian Javaloy; Maryam Meghdadi; Isabel Valera; |

439 | Towards Understanding How Momentum Improves Generalization in Deep LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we adopt another perspective and first empirically show that gradient descent with momentum (GD+M) significantly improves generalization compared to gradient descent (GD) in some deep learning problems. From this observation, we formally study how momentum improves generalization. |
Samy Jelassi; Yuanzhi Li; |

440 | MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay BufferRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we consider cooperative multi-agent reinforcement learning (MARL) with sparse reward. |
Jeewon Jeon; Woojun Kim; Whiyoung Jung; Youngchul Sung; |

441 | An Exact Symbolic Reduction of Linear Smart Predict+Optimize to Mixed Integer Linear ProgrammingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we cast the SPO problem as a bi-level program and apply Symbolic Variable Elimination (SVE) to analytically solve the lower optimization. |
Jihwan Jeong; Parth Jaggi; Andrew Butler; Scott Sanner; |

442 | Agnostic Learnability of Halfspaces Via Logistic LossRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Previously, for a certain broad class of “well-behaved” distributions on the examples, Diakonikolas et al. (2020) proved an tilde{Omega}(OPT) lower bound, while Frei et al. (2021) proved an tilde{O}(sqrt{OPT}) upper bound, where OPT denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves Omega(sqrt{OPT}) misclassification risk, matching the upper bound in (Frei et al., 2021). |
Ziwei Ji; Kwangjun Ahn; Pranjal Awasthi; Satyen Kale; Stefani Karp; |

443 | Improving Policy Optimization with Generalist-Specialist LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To have the best of both worlds, we propose a novel generalist-specialist training framework. |
Zhiwei Jia; Xuanlin Li; Zhan Ling; Shuang Liu; Yiran Wu; Hao Su; |

444 | Translatotron 2: High-quality Direct Speech-to-speech Translation with Voice PreservationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. |
Ye Jia; Michelle Tadmor Ramanovich; Tal Remez; Roi Pomerantz; |

445 | Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential RewardsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a rate-optimal online learning and pricing algorithm, termed Batch Linear Confidence Bound (BLinUCB), and prove that the cumulative regret is $\tilde{O}( d_f \sqrt{T } )$. |
Huiwen Jia; Cong Shi; Siqian Shen; |

446 | The Role of Deconfounding in Meta-learningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we offer a novel causal perspective of meta-learning. |
Yinjie Jiang; Zhengyu Chen; Kun Kuang; Luotian Yuan; Xinhai Ye; Zhihua Wang; Fei Wu; Ying Wei; |

447 | Subspace Learning for Effective Meta-LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose an algorithm to learn the meta-parameters (\ie, subspace bases). |
Weisen Jiang; James Kwok; Yu Zhang; |

448 | Optimal Algorithms for Stochastic Multi-Level Compositional OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we investigate the problem of stochastic multi-level compositional optimization, where the objective function is a composition of multiple smooth but possibly non-convex functions. |
Wei Jiang; Bokun Wang; Yibo Wang; Lijun Zhang; Tianbao Yang; |

449 | Antibody-Antigen Docking and Design Via Hierarchical Structure RefinementRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose a new model called Hierarchical Structure Refinement Network (HSRN) for paratope docking and design. |
Wengong Jin; Dr.Regina Barzilay; Tommi Jaakkola; |

450 | Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence NeighborhoodRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This is due to the fact that in Greedy-BFGS the Hessian is directly approximated and the Newton direction approximation may not be as accurate as the one for BFGS. In this paper, we close this gap and present a novel BFGS method that has the best of two worlds. |
Qiujiang Jin; Alec Koppel; Ketan Rajawat; Aryan Mokhtari; |

451 | The Power of Exploiter: Provable Multi-Agent RL in Large State SpacesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper considers two-player zero-sum Markov Games (MGs). We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension—a new complexity measure adapted from its single-agent version (Jin et al., 2021). |
Chi Jin; Qinghua Liu; Tiancheng Yu; |

452 | Domain Adaptation for Time Series Forecasting Via Attention SharingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This poses a challenge for typical forecasting problems in practice, where there is a limited number of time series or observations per time series, or both. To cope with this data scarcity issue, we propose a novel domain adaptation framework, Domain Adaptation Forecaster (DAF). |
Xiaoyong Jin; Youngsuk Park; Danielle Maddix; Hao Wang; Yuyang Wang; |

453 | Accelerated Federated Learning with Decoupled Adaptive OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs). |
Jiayin Jin; Jiaxiang Ren; Yang Zhou; Lingjuan Lyu; Ji Liu; Dejing Dou; |

454 | Supervised Off-Policy RankingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We propose a method to solve SOPR, which learns a policy scoring model by minimizing a ranking loss of the training policies rather than estimating the precise policy performance. |
Yue Jin; Yue Zhang; Tao Qin; Xudong Zhang; Jian Yuan; Houqiang Li; Tie-Yan Liu; |

455 | Input-agnostic Certified Group Fairness Via Gaussian Parameter SmoothingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper proposes an input-agnostic certified group fairness algorithm, FairSmooth, for improving the fairness of classification models while maintaining the remarkable prediction accuracy. |
Jiayin Jin; Zeru Zhang; Yang Zhou; Lingfei Wu; |

456 | Score-based Generative Modeling of Graphs Via The System of Stochastic Differential EquationsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Yet, this is a challenging problem, and the previous graph generative methods either fail to capture the permutation-invariance property of graphs or cannot sufficiently model the complex dependency between nodes and edges, which is crucial for generating real-world graphs such as molecules. To overcome such limitations, we propose a novel score-based generative model for graphs with a continuous-time framework. |
Jaehyeong Jo; Seul Lee; Sung Ju Hwang; |

457 | Choosing Answers in Epsilon-Best-Answer Identification for Linear BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We demonstrate that picking the answer with highest mean does not allow an algorithm to reach asymptotic optimality in terms of expected sample complexity. Instead, a furthest answer should be identified. |
Marc Jourdan; R?my Degenne; |

458 | Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization GuaranteesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning. |
Haotian Ju; Dongyue Li; Hongyang R Zhang; |

459 | Robust Alignment of Cross-session Recordings of Neural Population Activity By Behaviour Via Unsupervised Domain AdaptationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: On the other hand, evidence suggests that the latent dynamics underlying behaviour may be stable even over months and years. Based on this idea, we introduce a model capable of inferring behaviourally relevant latent dynamics from previously unseen data recorded from the same animal, without any need for decoder recalibration. |
Justin Jude; Matthew Perich; Lee Miller; Matthias Hennig; |

460 | On Measuring Causal Contributions Via Do-interventionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we develop a principled method for quantifying causal contributions. First, we provide desiderata of properties axioms that causal contribution measures should satisfy and propose the do-Shapley values (inspired by do-interventions [Pearl, 2000]) as a unique method satisfying these properties. |
Yonghan Jung; Shiva Kasiviswanathan; Jin Tian; Dominik Janzing; Patrick Bloebaum; Elias Bareinboim; |

461 | Efficient Approximate Inference for Stationary Kernel on Frequency DomainRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, despite its expressive power, training this kernel is typically difficult because scalability and overfitting issues often arise due to a large number of training parameters. To resolve these issues, we propose an approximate inference method for estimating the Spectral mixture kernel hyperparameters. |
Yohan Jung; Kyungwoo Song; Jinkyoo Park; |

462 | Sketching Algorithms and Lower Bounds for Ridge RegressionRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We give a sketching-based iterative algorithm that computes a $1+\varepsilon$ approximate solution for the ridge regression problem $\min_x \|Ax-b\|_2^2 +\lambda\|x\|_2^2$ where $A \in R^{n \times d}$ with $d \ge n$. |
Praneeth Kacham; David Woodruff; |

463 | Flashlight: Enabling Innovation in Tools for Machine LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems by prioritizing open, modular, customizable internals and state-of-the-art, research-ready models and training setups across a variety of domains. |
Jacob D Kahn; Vineel Pratap; Tatiana Likhomanenko; Qiantong Xu; Awni Hannun; Jeff Cai; Paden Tomasello; Ann Lee; Edouard Grave; Gilad Avidov; Benoit Steiner; Vitaliy Liptchinsky; Gabriel Synnaeve; Ronan Collobert; |

464 | Learning-based Optimisation of Particle Accelerators Under Partial Observability Without Real-World TrainingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this contribution, we demonstrate how to successfully apply RL to the optimisation of a highly complex real-world machine {–} specifically a linear particle accelerator {–} in an only partially observable setting and without requiring training on the real machine. |
Jan Kaiser; Oliver Stein; Annika Eichler; |

465 | Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewAbstract: This work addresses meta-learning (ML) by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units results in sparse … |
Konstantinos Kalais; Sotirios Chatzis; |

466 | Doubly Robust Distributionally Robust Off-Policy Evaluation and LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. |
Nathan Kallus; Xiaojie Mao; Kaiwen Wang; Zhengyuan Zhou; |

467 | Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed DataRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). |
Gautam Kamath; Xingtu Liu; Huanyu Zhang; |

468 | Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning. |
Hidetaka Kamigaito; Katsuhiko Hayashi; |

469 | Matching Learned Causal Effects of Neural Networks with Domain PriorsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Therefore, we propose a regularization method that aligns the learned causal effects of a neural network with domain priors, including both direct and total causal effects. |
Sai Srinivas Kancheti; Abbavaram Gowtham Reddy; Vineeth N Balasubramanian; Amit Sharma; |

470 | Deduplicating Training Data Mitigates Privacy Risks in Language ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. |
Nikhil Kandpal; Eric Wallace; Colin Raffel; |

471 | Lyapunov Density Models: Constraining Distribution Shift in Learning-Based ControlRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Can we combine these two concepts, producing learning-based control algorithms that constrain the system to in-distribution states using only in-distribution actions? In this paper, we propose to do this by combining concepts from Lyapunov stability and density estimation, introducing Lyapunov density models: a generalization of control Lyapunov functions and density models that provides guarantees about an agent’s ability to stay in-distribution over its entire trajectory. |
Katie Kang; Paula Gradu; Jason J Choi; Michael Janner; Claire Tomlin; Sergey Levine; |

472 | Forget-free Continual Learning with Winning SubnetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Inspired by Lottery Ticket Hypothesis that competitive subnetworks exist within a dense network, we propose a continual learning method referred to as Winning SubNetworks (WSN), which sequentially learns and selects an optimal subnetwork for each task. |
Haeyong Kang; Rusty John Lloyd Mina; Sultan Rizky Hikmawan Madjid; Jaehong Yoon; Mark Hasegawa-Johnson; Sung Ju Hwang; Chang D. Yoo; |

473 | Differentially Private Approximate QuantilesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work we study the problem of differentially private (DP) quantiles, in which given dataset $X$ and quantiles $q_1, …, q_m \in [0,1]$, we want to output $m$ quantile estimations which are as close as possible to the true quantiles and preserve DP. |
Haim Kaplan; Shachar Schnapp; Uri Stemmer; |

474 | Simultaneous Graph Signal Clustering and Graph LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we address the problem of learning multiple graphs from heterogeneous data by formulating an optimization problem for joint graph signal clustering and graph topology inference. |
Abdullah Karaaslanli; Selin Aviyente; |

475 | Composing Partial Differential Equations with Physics-Aware Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We introduce a compositional physics-aware FInite volume Neural Network (FINN) for learning spatiotemporal advection-diffusion processes. |
Matthias Karlbauer; Timothy Praditia; Sebastian Otte; Sergey Oladyshkin; Wolfgang Nowak; Martin V. Butz; |

476 | Meta-Learning Hypothesis Spaces for Sequential Decision-makingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose to meta-learn a kernel from offline data (Meta-KeL). |
Parnian Kassraie; Jonas Rothfuss; Andreas Krause; |

477 | FOCUS: Familiar Objects in Common and Uncommon SettingsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we introduce FOCUS (Familiar Objects in Common and Uncommon Settings), a dataset for stress-testing the generalization power of deep image classifiers. |
Priyatham Kattakinda; Soheil Feizi; |

478 | Training OOD Detectors in Their Natural HabitatsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a novel framework that leverages wild mixture data—that naturally consists of both ID and OOD samples. |
Julian Katz-Samuels; Julia B Nakhleh; Robert Nowak; Yixuan Li; |

479 | Robustness Implies Generalization Via Data-Dependent Generalization BoundsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable. |
Kenji Kawaguchi; Zhun Deng; Kyle Luh; Jiaoyang Huang; |

480 | Generating Distributional Adversarial Examples to Evade Statistical DetectorsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Due to the difficulties in designing adaptive attacks, however, recent work suggests that most detectors have incomplete evaluation. We aim to fill this gap by designing a generic adaptive attack against detectors: the ’statistical indistinguishability attack’ (SIA). |
Yigitcan Kaya; Muhammad Bilal Zafar; Sergul Aydore; Nathalie Rauschmayr; Krishnaram Kenthapadi; |

481 | Secure Quantized Training for Deep LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We implement training of neural networks in secure multi-party computation (MPC) using quantization commonly used in said setting. |
Marcel Keller; Ke Sun; |

482 | A Convergent and Dimension-Independent Min-Max Optimization AlgorithmRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. |
Vijay Keswani; Oren Mangoubi; Sushant Sachdeva; Nisheeth K. Vishnoi; |

483 | Neural Network Poisson Models for Behavioural and Neural Spike Train DataRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Predominant modeling methods apply rather disjoint techniques to these scales; by contrast, we suggest an end-to-end model which exploits recent developments of flexible, but tractable, neural network point-process models to characterize dependencies between stimuli, actions, and neural data. |
Moein Khajehnejad; Forough Habibollahi; Richard Nock; Ehsan Arabzadeh; Peter Dayan; Amir Dezfouli; |

484 | Federated Reinforcement Learning: Linear Speedup Under Markovian SamplingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we consider a federated reinforcement learning framework where multiple agents collaboratively learn a global model, without sharing their individual data and policies. |
Sajad Khodadadian; Pranay Sharma; Gauri Joshi; Siva Theja Maguluri; |

485 | Multi-Level Branched Regularization for Federated LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To alleviate the limitations, we propose a novel architectural regularization technique that constructs multiple auxiliary branches in each local model by grafting local and global subnetworks at several different levels and that learns the representations of the main pathway in the local model congruent to the auxiliary hybrid pathways via online knowledge distillation. |
Jinkyu Kim; Geeho Kim; Bohyung Han; |

486 | Learning Fair Representation with A Parametric Integral Probability MetricRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose a new adversarial training scheme for LFR, where the integral probability metric (IPM) with a specific parametric family of discriminators is used. |
Dongha Kim; Kunwoong Kim; Insung Kong; Ilsang Ohn; Yongdai Kim; |

487 | Dataset Condensation Via Efficient Synthetic-Data ParameterizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To this end, we propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity. |
Jang-Hyun Kim; Jinuk Kim; Seong Joon Oh; Sangdoo Yun; Hwanjun Song; Joonhyun Jeong; Jung-Woo Ha; Hyun Oh Song; |

488 | Guided-TTS: A Diffusion Model for Text-to-Speech Via Classifier GuidanceRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require any transcript of target speaker using classifier guidance. |
Heeseung Kim; Sungwon Kim; Sungroh Yoon; |

489 | Variational On-the-Fly PersonalizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose a novel personalization method, Variational On-the-Fly Personalization. |
Jangho Kim; Jun-Tae Lee; Simyung Chang; Nojun Kwak; |

490 | Fisher SAM: Information Geometry and Sharpness Aware MinimisationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper we consider the information geometry of the model parameter space when defining the neighborhood, namely replacing SAM’s Euclidean balls with ellipsoids induced by the Fisher information. |
Minyoung Kim; Da Li; Shell X Hu; Timothy Hospedales; |

491 | ViT-NeT: Interpretable Vision Transformers with Neural Tree DecoderRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this study, we propose a new ViT neural tree decoder (ViT-NeT). |
Sangwon Kim; Jaeyeal Nam; Byoung Chul Ko; |

492 | Sanity Simulations for Saliency MethodsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we design a synthetic benchmarking framework, SMERF, that allows us to perform ground-truth-based evaluation while controlling the complexity of the model’s reasoning. |
Joon Sik Kim; Gregory Plumb; Ameet Talwalkar; |

493 | Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score EstimationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: For successful training, therefore, we introduce Soft Truncation, a universally applicable training technique for diffusion models, that softens the fixed and static truncation hyperparameter into a random variable. |
Dongjun Kim; Seungjae Shin; Kyungwoo Song; Wanmo Kang; Il-Chul Moon; |

494 | Rotting Infinitely Many-Armed BanditsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\rot=o(1)$. |
Jung-Hun Kim; Milan Vojnovic; Se-Young Yun; |

495 | Accelerated Gradient Methods for Geodesically Convex Optimization: Tractable Algorithms and Convergence AnalysisRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose computationally tractable accelerated first-order methods for Riemannian optimization, extending the Nesterov accelerated gradient (NAG) method. |
Jungbin Kim; Insoon Yang; |

496 | Generalizing to New Physical Systems Via Context-Informed Dynamics ModelRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Data-driven approaches to modeling physical systems fail to generalize to unseen systems that share the same general dynamics with the learning domain, but correspond to different physical contexts. We propose a new framework for this key problem, context-informed dynamics adaptation (CoDA), which takes into account the distributional shift across systems for fast and efficient adaptation to new dynamics. |
Matthieu Kirchmeyer; Yuan Yin; Jeremie Dona; Nicolas Baskiotis; Alain Rakotomamonjy; Patrick Gallinari; |

497 | SoQal: Selective Oracle Questioning for Consistency Based Active Learning of Cardiac SignalsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: One way to mitigate this burden is via active learning (AL) which involves the (a) acquisition and (b) annotation of informative unlabelled instances. Whereas previous work addresses either one of these elements independently, we propose an AL framework that addresses both. |
Dani Kiyasseh; Tingting Zhu; David A Clifton; |

498 | Curriculum Reinforcement Learning Via Constrained Optimal TransportRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we focus on the idea of framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. |
Pascal Klink; Haoyi Yang; Carlo D?Eramo; Jan Peters; Joni Pajarinen; |

499 | Exploiting Redundancy: Separable Group Convolutional Networks on Lie GroupsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we investigate the properties of representations learned by regular G-CNNs, and show considerable parameter redundancy in group convolution kernels. |
David M. Knigge; David W Romero; Erik J Bekkers; |

500 | Revisiting Contrastive Learning Through The Lens of Neighborhood Component Analysis: An Integrated FrameworkRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: By investigating the connection between contrastive learning and neighborhood component analysis (NCA), we provide a novel stochastic nearest neighbor viewpoint of contrastive learning and subsequently propose a series of contrastive losses that outperform the existing ones. |
Ching-Yun Ko; Jeet Mohapatra; Sijia Liu; Pin-Yu Chen; Luca Daniel; Lily Weng; |

501 | Transfer Learning In Differential Privacy’s Hybrid-ModelRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Here we study the problem of machine learning in the hybrid-model where the $n$ individuals in the curator’s dataset are drawn from a different distribution than the one of the general population (the local-agents). |
Refael Kohen; Or Sheffet; |

502 | Markov Chain Monte Carlo for Continuous-Time Switching Dynamical SystemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we propose a novel inference algorithm utilizing a Markov Chain Monte Carlo approach. |
Lukas K?hs; Bastian Alt; Heinz Koeppl; |

503 | Partial Disentanglement for Domain AdaptationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Given the theoretical insights, we propose a practical domain adaptation framework, called iMSDA. |
Lingjing Kong; Shaoan Xie; Weiran Yao; Yujia Zheng; Guangyi Chen; Petar Stojanov; Victor Akinwande; Kun Zhang; |

504 | Simultaneously Learning Stochastic and Adversarial Bandits with General Graph FeedbackRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments. In this work, we overcome this difficulty by a new trade-off mechanism with a carefully-designed proportion for exploration and exploitation. |
Fang Kong; Yichi Zhou; Shuai Li; |

505 | Adaptive Data Analysis with Correlated ObservationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. |
Aryeh Kontorovich; Menachem Sadigurschi; Uri Stemmer; |

506 | Controlling Conditional Language Models Without Catastrophic ForgettingRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we extend DPG to conditional tasks by proposing Conditional DPG (CDPG). |
Tomasz Korbak; Hady Elsahar; German Kruszewski; Marc Dymetman; |

507 | Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration ComplexityRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work we propose a batch multimarginal version of the Greenkhorn algorithm for the entropic-regularized optimal transport problem. |
Vladimir R. Kostic; Saverio Salzo; Massimiliano Pontil; |

508 | Certified Adversarial Robustness Under The Bounded Support SetRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we generalize the $f$-divergence-based framework to a Wasserstein-distance-based and total-variation-distance-based framework that is first able to analyze robustness properties of bounded support set smoothing measures both theoretically and experimentally. |
Yiwen Kou; Qinyuan Zheng; Yisen Wang; |

509 | Exact Learning of Preference Structure: Single-peaked Preferences and BeyondRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider the setting where the members of a society (voters) have preferences over candidates, and the candidates can be ordered on an axis so that the voters’ preferences are single-peaked on this axis. |
Sonja Kraiczy; Edith Elkind; |

510 | Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time SeriesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Here we propose a general framework for multi-modal data integration for the purpose of nonlinear DS reconstruction and the analysis of cross-modal relations. |
Daniel Kramer; Philine L Bommer; Daniel Durstewitz; Carlo Tombolini; Georgia Koppe; |

511 | Probabilistic ODE Solutions in Millions of DimensionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we explain the mathematical assumptions and detailed implementation schemes behind solving high-dimensional ODEs with a probabilistic numerical algorithm. |
Nicholas Kr?mer; Nathanael Bosch; Jonathan Schmidt; Philipp Hennig; |

512 | Active Nearest Neighbor Regression Through Delaunay RefinementRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce an algorithm for active function approximation based on nearest neighbor regression. |
Alexander Kravberg; Giovanni Luca Marchetti; Vladislav Polianskii; Anastasiia Varava; Florian T. Pokorny; Danica Kragic; |

513 | Functional Generalized Empirical Likelihood Estimation for Conditional Moment RestrictionsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To benefit from recent developments in machine learning, we provide a functional reformulation of GEL in which arbitrary models can be leveraged. Motivated by a dual formulation of the resulting infinite dimensional optimization problem, we devise a practical method and explore its asymptotic properties. |
Heiner Kremer; Jia-Jie Zhu; Krikamol Muandet; Bernhard Sch?lkopf; |

514 | Calibrated and Sharp Uncertainties in Deep Learning Via Density EstimationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We introduce a simple training procedure based on recalibration that yields calibrated models without sacrificing overall performance; unlike previous approaches, ours ensures the most general property of distribution calibration and applies to any model, including neural networks. |
Volodymyr Kuleshov; Shachi Deshpande; |

515 | ActiveHedge: Hedge Meets Active LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We consider the classical problem of multi-class prediction with expert advice, but with an active learning twist. |
Bhuvesh Kumar; Jacob D Abernethy; Venkatesh Saligrama; |

516 | Balancing Discriminability and Transferability for Source-Free Domain AdaptationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Upon analyzing the hurdles from both theoretical and empirical standpoints, we derive novel insights to show that a mixup between original and corresponding translated generic samples enhances the discriminability-transferability trade-off while duly respecting the privacy-oriented source-free setting. |
Jogendra Nath Kundu; Akshay R Kulkarni; Suvaansh Bhambri; Deepesh Mehta; Shreyas Anand Kulkarni; Varun Jampani; Venkatesh Babu Radhakrishnan; |

517 | Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget MattersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we argue for the importance of an online evaluation budget for a reliable comparison of deep offline RL algorithms. |
Vladislav Kurenkov; Sergey Kolesnikov; |

518 | Equivariant Priors for Compressed Sensing with Unknown OrientationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Additionally, in many scenarios, the signal has an unknown orientation prior to measurements. To address such recovery problems, we propose using equivariant generative models as a prior, which encapsulate orientation information in their latent space. |
Anna Kuzina; Kumar Pratik; Fabio Valerio Massoli; Arash Behboodi; |

519 | Coordinated Attacks Against Contextual Bandits: Fundamental Limits and Defense MechanismsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction $\alpha < 1/2$ of tasks (users) are arbitrary and adversarial. |
Jeongyeol Kwon; Yonathan Efroni; Constantine Caramanis; Shie Mannor; |

520 | Large Batch Experience ReplayRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient. |
Thibault Lahire; Matthieu Geist; Emmanuel Rachelson; |

521 | FedScale: Benchmarking Model and System Performance of Federated Learning at ScaleRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We present FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL research. |
Fan Lai; Yinwei Dai; Sanjay Singapuram; Jiachen Liu; Xiangfeng Zhu; Harsha Madhyastha; Mosharaf Chowdhury; |

522 | Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution DataRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Based on this study, we propose a self-adaptive algorithm, named Smoothed Adaptive Weighting (SAW). |
Zhengfeng Lai; Chao Wang; Henrry Gunawan; Sen-Ching S Cheung; Chen-Nee Chuah; |

523 | Functional Output Regression with Infimal Convolution: Exploring The Huber and $e$-insensitive LossesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We derive computationally tractable algorithms relying on duality to tackle the resulting tasks in the context of vector-valued reproducing kernel Hilbert spaces. |
Alex Lambert; Dimitri Bouche; Zoltan Szabo; Florence D?Alch?-Buc; |

524 | Tell Me Why! Explanations Support Learning Relational and Causal StructureRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Here, we show that language can play a similar role for deep RL agents in complex environments. |
Andrew K Lampinen; Nicholas Roy; Ishita Dasgupta; Stephanie Cy Chan; Allison Tam; James Mcclelland; Chen Yan; Adam Santoro; Neil C Rabinowitz; Jane Wang; Felix Hill; |

525 | Generative Cooperative Networks for Natural Language GenerationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we introduce Generative Cooperative Networks, in which the discriminator architecture is cooperatively used along with the generation policy to output samples of realistic texts for the task at hand. |
Sylvain Lamprier; Thomas Scialom; Antoine Chaffin; Vincent Claveau; Ewa Kijak; Jacopo Staiano; Benjamin Piwowarski; |

526 | DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow ForecastingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper proposes a novel Dynamic Spatial-Temporal Aware Graph Neural Network (DSTAGNN) to model the complex spatial-temporal interaction in road network. |
Shiyong Lan; Yitong Ma; Weikang Huang; Wenwu Wang; Hongyu Yang; Pyang Li; |

527 | Cooperative Online Learning in Stochastic and Adversarial MDPsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). |
Tal Lancewicki; Aviv Rosenberg; Yishay Mansour; |

528 | PINs: Progressive Implicit Networks for Multi-Scale Neural RepresentationsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, scenes with a wide frequency spectrum remain a challenge: choosing high frequencies for positional encoding introduces noise in low structure areas, while low frequencies results in poor fitting of detailed regions. To address this, we propose a progressive positional encoding, exposing a hierarchical MLP structure to incremental sets of frequency encodings. |
Zoe Landgraf; Alexander Sorkine Hornung; Ricardo S Cabral; |

529 | Co-training Improves Prompt-based Learning for Large Language ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data. |
Hunter Lang; Monica N Agrawal; Yoon Kim; David Sontag; |

530 | Goal Misgeneralization in Deep Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We study goal misgeneralization, a type of out-of-distribution robustness failure in reinforcement learning (RL). |
Lauro Langosco Di Langosco; Jack Koch; Lee D Sharkey; Jacob Pfau; David Krueger; |

531 | Marginal Tail-Adaptive Normalizing FlowsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate models. |
Mike Laszkiewicz; Johannes Lederer; Asja Fischer; |

532 | Bregman Proximal Langevin Monte Carlo Via Bregman-Moreau EnvelopesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose efficient Langevin Monte Carlo algorithms for sampling distributions with nonsmooth convex composite potentials, which is the sum of a continuously differentiable function and a possibly nonsmooth function. |
Tim Tsz-Kit Lau; Han Liu; |

533 | Scalable Deep Reinforcement Learning Algorithms for Mean Field GamesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. |
Mathieu Lauriere; Sarah Perrin; Sertan Girgin; Paul Muller; Ayush Jain; Theophile Cabannes; Georgios Piliouras; Julien Perolat; Romuald Elie; Olivier Pietquin; Matthieu Geist; |

534 | Implicit Bias of Linear Equivariant NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this context, we show that L-layer full-width linear G-CNNs trained via gradient descent for binary classification converge to solutions with low-rank Fourier matrix coefficients, regularized by the 2/L-Schatten matrix norm. |
Hannah Lawrence; Bobak Kiani; Kristian G Georgiev; Andrew K Dienes; |

535 | Differentially Private Maximal Information CoefficientsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: As a solution, we present algorithms to approximate MIC in a way that provides differential privacy. |
John Lazarsfeld; Aaron Johnson; Emmanuel Adeniran; |

536 | Entropic Gromov-Wasserstein Between Gaussian DistributionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. |
Khang Le; Dung Q Le; Huy Nguyen; Dat Do; Tung Pham; Nhat Ho; |

537 | Neurocoder: General-Purpose Computation Using Stored Neural ProgramsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Here we design Neurocoder, a new class of general-purpose neural networks in which the neural network “codes” itself in a data-responsive way by composing relevant programs from a set of shareable, modular programs stored in external memory. |
Hung Le; Svetha Venkatesh; |

538 | Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in The Mean-Field RegimeRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). |
James-Michael Leahy; Bekzhan Kerimkulov; David Siska; Lukasz Szpruch; |

539 | A Random Matrix Analysis of Data Stream Clustering: Coping With Limited Memory ResourcesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This article introduces a random matrix framework for the analysis of clustering on high-dimensional data streams, a particularly relevant setting for a more sober processing of large amounts of data with limited memory and energy resources. |
Hugo Lebeau; Romain Couillet; Florent Chatelain; |

540 | Neural Tangent Kernel Analysis of Deep Narrow Neural NetworksRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. |
Jongmin Lee; Joo Young Choi; Ernest K Ryu; Albert No; |

541 | Dataset Condensation with Contrastive SignalsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We attribute this to the lack of participation of the contrastive signals between the classes resulting from the class-wise gradient matching strategy. To address this problem, we propose Dataset Condensation with Contrastive signals (DCC) by modifying the loss function to enable the DC methods to effectively capture the differences between classes. |
Saehyung Lee; Sanghyuk Chun; Sangwon Jung; Sangdoo Yun; Sungroh Yoon; |

542 | Confidence Score for Source-Free Unsupervised Domain AdaptationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: To differentiate between sample importance, in this study, we propose a novel sample-wise confidence score, the Joint Model-Data Structure (JMDS) score for SFUDA. |
Jonghyun Lee; Dahuin Jung; Junho Yim; Sungroh Yoon; |

543 | A Statistical Manifold Framework for Point Cloud DataRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: A growing number of applications require a means of measuring not only distances between point clouds, but also angles, volumes, derivatives, and other more advanced concepts. To formulate and quantify these concepts in a coordinate-invariant way, we develop a Riemannian geometric framework for point cloud data. |
Yonghyeon Lee; Seungyeon Kim; Jinwon Choi; Frank Park; |

544 | Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel ConvolutionsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To improve the performance, we first minimize total bootstrapping runtime using multiplexed parallel convolution that collects sparse output data for multiple channels compactly. We also propose the imaginary-removing bootstrapping to prevent the deep neural networks from catastrophic divergence during approximate ReLU operations. |
Eunsang Lee; Joon-Woo Lee; Junghyun Lee; Young-Sik Kim; Yongjune Kim; Jong-Seon No; Woosuk Choi; |

545 | Statistical Inference with Implicit SGD: Proximal Robbins-Monro Vs. Polyak-RuppertRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. |
Yoonhyung Lee; Sungdong Lee; Joong-Ho Won; |

546 | Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use Vs. Node ActivationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow’s Hammer hypothesis. |
Sebastian Lee; Stefano Sarao Mannelli; Claudia Clopath; Sebastian Goldt; Andrew Saxe; |

547 | Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data Via Bayesian OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: We introduce block decomposition and history subsampling techniques to improve the scalability of Bayesian optimization when an input sequence becomes long. |
Deokjae Lee; Seungyong Moon; Junhyeok Lee; Hyun Oh Song; |

548 | Least Squares Estimation Using Sketched Data with Heteroskedastic ErrorsRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: This paper considers the case when the regression errors do not have constant variance and heteroskedasticity robust standard errors would normally be needed for test statistics to provide accurate inference. |
Sokbae Lee; Serena Ng; |

549 | Why The Rich Get Richer? On The Balancedness of Random Partition ModelsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a principled way to compare the balancedness of random partition models, which gives a better understanding of what model works better and what doesn’t for different applications. |
Changwoo J Lee; Huiyan Sang; |

550 | Model Selection in Batch Policy OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We study the problem of model selection in batch policy optimization: given a fixed, partial-feedback dataset and M model classes, learn a policy with performance that is competitive with the policy derived from the best model class. |
Jonathan Lee; George Tucker; Ofir Nachum; Bo Dai; |

551 | Supervised Learning with General Risk FunctionalsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We establish the first uniform convergence results for estimating the CDF of the loss distribution, which yield uniform convergence guarantees that hold simultaneously both over a class of Hölder risk functionals and over a hypothesis class. |
Liu Leqi; Audrey Huang; Zachary Lipton; Kamyar Azizzadenesheli; |

552 | Generalized Strategic Classification and The Case of Aligned IncentivesRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work we argue for a broader perspective on what accounts for strategic user behavior, and propose and study a flexible model of generalized strategic classification. |
Sagi Levanon; Nir Rosenfeld; |

553 | A Simple Unified Framework for High Dimensional Bandit ProblemsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Stochastic high dimensional bandit problems with low dimensional structures are useful in different applications such as online advertising and drug discovery. In this work, we propose a simple unified algorithm for such problems and present a general analysis framework for the regret upper bound of our algorithm. |
Wenjie Li; Adarsh Barik; Jean Honorio; |

554 | Robust Training of Neural Networks Using Scale Invariant ArchitecturesRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\frac{2\lambda}{\eta}}$, where $\eta$ is learning rate and $\lambda$ is weight decay. |
Zhiyuan Li; Srinadh Bhojanapalli; Manzil Zaheer; Sashank Reddi; Sanjiv Kumar; |

555 | Spatial-Channel Token Distillation for Vision MLPsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This work solves the problem from a novel knowledge distillation perspective. We propose a novel Spatial-channel Token Distillation (STD) method, which improves the information mixing in the two dimensions by introducing distillation tokens to each of them. |
Yanxi Li; Xinghao Chen; Minjing Dong; Yehui Tang; Yunhe Wang; Chang Xu; |

556 | An Analytical Update Rule for General Policy OptimizationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We present an analytical policy update rule that is independent of parametric function approximators. |
Hepeng Li; Nicholas Clavette; Haibo He; |

557 | On Convergence of Gradient Descent Ascent: A Tight Local AnalysisRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the local convergence of general nonconvex-nonconcave minimax problems. |
Haochuan Li; Farzan Farnia; Subhro Das; Ali Jadbabaie; |

558 | On The Finite-Time Performance of The Knowledge Gradient AlgorithmRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this research, we present new theoretical results about the finite-time performance of the KG algorithm. |
Yanwen Li; Siyang Gao; |

559 | Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: We propose a novel phasic solution by alternating online RL and offline SL for tackling sparse-reward goal-conditioned problems.In the online phase, we perform RL training and collect rollout data while in the offline phase, we perform SL on those successful trajectories from the dataset. |
Yunfei Li; Tian Gao; Jiaqi Yang; Huazhe Xu; Yi Wu; |

560 | G$^2$CN: Graph Gaussian Convolution Networks with Concentrated Graph FiltersRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: Nevertheless, we notice that existing spectral analysis fails to explain why existing graph propagations with the same global tendency, such as low-pass or high-pass, still yield very different results. Motivated by this situation, we develop a new framework for spectral analysis in this paper called concentration analysis. |
Mingjie Li; Xiaojun Guo; Yifei Wang; Yisen Wang; Zhouchen Lin; |

561 | Decomposing Temporal High-Order Interactions Via Latent ODEsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: As a result, these methods might not be capable enough of capturing complex, fine-grained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). |
Shibo Li; Robert Kirby; Shandian Zhe; |

562 | Neural Inverse Transform SamplerRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we show that when modeling one-dimensional conditional densities with a neural network, $Z$ can be exactly and efficiently computed by letting the network represent the cumulative distribution function of a target density, and applying a generalized fundamental theorem of calculus. |
Henry Li; Yuval Kluger; |

563 | PLATINUM: Semi-Supervised Model Agnostic Meta-Learning Using Submodular Mutual InformationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this work, we propose PLATINUM (semi-suPervised modeL Agnostic meTa learnIng usiNg sUbmodular Mutual information ), a novel semi-supervised model agnostic meta learning framework that uses the submodular mutual in- formation (SMI) functions to boost the perfor- mance of FSC. |
Changbin Li; Suraj Kothawade; Feng Chen; Rishabh Iyer; |

564 | Deconfounded Value Decomposition for Multi-Agent Reinforcement LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we investigate VD from a novel perspective of causal inference. |
Jiahui Li; Kun Kuang; Baoxiang Wang; Furui Liu; Long Chen; Changjie Fan; Fei Wu; Jun Xiao; |

565 | C-MinHash: Improving Minwise Hashing with Circulant PermutationRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we propose Circulant MinHash (C-MinHash) and provide the surprising theoretical results that using only two independent random permutations in a circulant manner leads to uniformly smaller Jaccard estimation variance than that of the classical MinHash with K independent permutations. |
Xiaoyun Li; Ping Li; |

566 | BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. |
Junnan Li; Dongxu Li; Caiming Xiong; Steven Hoi; |

567 | Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in The $O(e^-7/4)$ ComplexityRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: This paper studies the accelerated gradient descent for general nonconvex problems under the gradient Lipschitz and Hessian Lipschitz assumptions. |
Huan Li; Zhouchen Lin; |

568 | Achieving Fairness at No Utility Cost Via Data Reweighing with InfluenceRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In this paper, we focus on the pre-processing aspect for achieving fairness, and propose a data reweighing approach that only adjusts the weight for samples in the training phase. |
Peizhao Li; Hongfu Liu; |

569 | High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy TailsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we develop high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance. |
Shaojie Li; Yong Liu; |

570 | MetAug: Contrastive Learning Via Meta Feature AugmentationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: In response, we propose to directly augment the features in latent space, thereby learning discriminative representations without a large amount of input data. |
Jiangmeng Li; Wenwen Qiang; Changwen Zheng; Bing Su; Hui Xiong; |

571 | PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information CollaborationRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. |
Pengyi Li; Hongyao Tang; Tianpei Yang; Xiaotian Hao; Tong Sang; Yan Zheng; Jianye Hao; Matthew E. Taylor; Wenyuan Tao; Zhen Wang; |

572 | CerDEQ: Certifiable Deep Equilibrium ModelRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this work, we aim to tackle the problem of DEQ’s certified training. |
Mingjie Li; Yisen Wang; Zhouchen Lin; |

573 | Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology SamplingRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: To the best of our knowledge, this paper provides the first theoretical justification of graph topology sampling in training (up to) three-layer GCNs for semi-supervised node classification. |
Hongkang Li; Meng Wang; Sijia Liu; Pin-Yu Chen; Jinjun Xiong; |

574 | Let Invariant Rationale Discovery Inspire Graph Contrastive LearningRelated Papers Related Patents Related Grants Related Orgs Related Experts Related Code ViewHighlight: Taking an invariance look at GCL, we argue that a high-performing augmentation should preserve the salient semantics of anchor graphs regarding instance-discrimination. To this end, we relate GCL with invariant rationale discovery, and propose a new framework, Rationale-aware Graph Contrastive Learning (RGCL). |
Sihang Li; Xiang Wang; An Zhang; Yingxin Wu; Xiangnan He; Tat-Seng Chua; |

575 | Difference Advantage Estimation for Multi-Agent Policy GradientsRelated Papers Related Patents Related Grants Related Orgs Related Experts ViewHighlight: In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment a |