# Paper Digest: ICML 2023 Highlights

To search or review papers within ICML-2023 related to a specific topic, please use the search by venue and review by venue services. To browse papers by author, here is a list of top authors (ICML-2023). You may also like to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.

The Internationl Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2023, it is to be held in Hawaii, US.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.

If you do not want to miss interesting academic papers, you are welcome to **sign up our daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

New York City, New York, 10017

team@paperdigest.org

#### TABLE 1: Paper Digest: ICML 2023 Highlights

Paper | Author(s) | |
---|---|---|

1 | Bayesian Estimation of Differential PrivacyRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose a novel, more efficient Bayesian approach that brings privacy estimates within the reach of practitioners. |
Santiago Zanella-Beguelin; Lukas Wutschitz; Shruti Tople; Ahmed Salem; Victor Rühle; Andrew Paverd; Mohammad Naseri; Boris Köpf; Daniel Jones; |

2 | Adaptive Estimation of Graphical Models Under Total PositivityRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose an adaptive multiple-stage estimation method, which refines the estimate by solving a weighted $\ell_1$-regularized problem in each stage. |
Jiaxi Ying; José Vinícius De Miranda Cardoso; Daniel P. Palomar; |

3 | GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). |
Hanjing Wang; Man-Kit Sit; Congjie He; Ying Wen; Weinan Zhang; Jun Wang; Yaodong Yang; Luo Mai; |

4 | Disentangled Multi-Fidelity Deep Bayesian Active LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose a novel framework called Disentangled Multi-fidelity Deep Bayesian Active Learning (D-MFDAL), which learns the surrogate models conditioned on the distribution of functions at multiple fidelities. |
Dongxia Wu; Ruijia Niu; Matteo Chinazzi; Yian Ma; Rose Yu; |

5 | Understand and Modularize Generator Optimization in ELECTRA-style PretrainingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Despite the effectiveness of ELECTRA-style pre-training, their performance is dependent on the careful selection of the model size for the auxiliary generator, leading to high trial-and-error costs. In this paper, we present the first systematic study of this problem. |
Chengyu Dong; Liyuan Liu; Hao Cheng; Jingbo Shang; Jianfeng Gao; Xiaodong Liu; |

6 | NeRFool: Uncovering The Vulnerability of Generalizable Neural Radiance Fields Against Adversarial PerturbationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To this end, we present NeRFool, which to the best of our knowledge is the first work that sets out to understand the adversarial robustness of GNeRF. |
Yonggan Fu; Ye Yuan; Souvik Kundu; Shang Wu; Shunyao Zhang; Celine Lin; |

7 | Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDPRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning. |
Jiacheng Guo; Zihao Li; Huazheng Wang; Mengdi Wang; Zhuoran Yang; Xuezhou Zhang; |

8 | Block Subsampled Randomized Hadamard Transform for Nyström Approximation on Distributed ArchitecturesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). |
Oleg Balabanov; Matthias Beaupère; Laura Grigori; Victor Lederer; |

9 | Unconstrained Online Learning with Unbounded LossesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we develop a new setting for online learning with unbounded domains and non-Lipschitz losses. |
Andrew Jacobsen; Ashok Cutkosky; |

10 | Optimistic Planning By Regularized Dynamic ProgrammingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. |
Antoine Moulin; Gergely Neu; |

11 | Autoregressive Diffusion Model for Graph GenerationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose an *autoregressive diffusion* model for graph generation. |
Lingkai Kong; Jiaming Cui; Haotian Sun; Yuchen Zhuang; B. Aditya Prakash; Chao Zhang; |

12 | Differentiable Tree Operations Promote Compositional GeneralizationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. |
Paul Soulos; Edward J Hu; Kate McCurdy; Yunmo Chen; Roland Fernandez; Paul Smolensky; Jianfeng Gao; |

13 | Can Neural Network Memorization Be Localized?Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. |
Pratyush Maini; Michael Curtis Mozer; Hanie Sedghi; Zachary Chase Lipton; J Zico Kolter; Chiyuan Zhang; |

14 | Domain Adaptation for Time Series Under Feature and Label ShiftsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We present RAINCOAT, the first model for both closed-set and universal domain adaptation on complex time series. |
Huan He; Owen Queen; Teddy Koker; Consuelo Cuevas; Theodoros Tsiligkaridis; Marinka Zitnik; |

15 | Towards Sustainable Learning: Coresets for Data-efficient Deep LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly deep networks. |
Yu Yang; Kang Hao; Baharan Mirzasoleiman; |

16 | On Enhancing Expressive Power Via Compositions of Single Fixed-Size ReLU NetworkRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper explores the expressive power of deep neural networks through the framework of function compositions. |
Shijun Zhang; Jianfeng Lu; Hongkai Zhao; |

17 | Contextual Reliability: When Different Features Matter in Different ContextsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We formalize a new setting called contextual reliability which accounts for the fact that the right features to use may vary depending on the context.Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability. |
Gaurav Rohit Ghosal; Amrith Setlur; Daniel S. Brown; Anca Dragan; Aditi Raghunathan; |

18 | On Data Manifolds Entailed By Structural Causal ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we characterize the data manifolds entailed by structural causal models. |
Ricardo Dominguez-Olmedo; Amir-Hossein Karimi; Georgios Arvanitidis; Bernhard Schölkopf; |

19 | Are Neurons Actually Collapsed? On The Fine-Grained Structure in Neural RepresentationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Recent work has observed an intriguing Neural Collapse” phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. |
Yongyi Yang; Jacob Steinhardt; Wei Hu; |

20 | Fast Sampling of Diffusion Models Via Operator LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. |
Hongkai Zheng; Weili Nie; Arash Vahdat; Kamyar Azizzadenesheli; Anima Anandkumar; |

21 | Unsupervised Out-of-Distribution Detection with Diffusion InpaintingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Unsupervised out-of-distribution detection (OOD) seeks to identify out-of-domain data by learning only from unlabeled in-domain data. We present a novel approach for this task — Lift, Map, Detect (LMD) — that leverages recent advancement in diffusion models. |
Zhenzhen Liu; Jin Peng Zhou; Yufan Wang; Kilian Q Weinberger; |

22 | Sequence Modeling with Multiresolution Convolutional MemoryRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. |
Jiaxin Shi; Ke Alexander Wang; Emily Fox; |

23 | The Hessian Perspective Into The Nature of Convolutional Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We prove tight upper bounds (with linear activations), which closely follow the empirical trend of the Hessian rank and in practice also hold for more general settings. |
Sidak Pal Singh; Thomas Hofmann; Bernhard Schölkopf; |

24 | Towards Learning Geometric Eigen-Lengths Crucial for Fitting TasksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, it remains obscure and underexplored if learning systems can be equipped with similar capabilities of automatically discovering such key geometric quantities from doing tasks. In this work, we therefore for the first time formulate and propose a novel learning problem on this question and set up a benchmark suite including tasks, data, and evaluation metrics for studying the problem. |
Yijia Weng; Kaichun Mo; Ruoxi Shi; Yanchao Yang; Leonidas Guibas; |

25 | Improved Learning-Augmented Algorithms for The Multi-Option Ski Rental Problem Via Best-Possible Competitive AnalysisRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present improved learning-augmented algorithms for the multi-option ski rental problem. |
Yongho Shin; Changyeol Lee; Gukryeol Lee; Hyung-Chan An; |

26 | On Regularization and Inference with Label ConstraintsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, *regularization with constraints* and *constrained inference*, by quantifying their impact on model performance. |
Kaifu Wang; Hangfeng He; Tin D. Nguyen; Piyush Kumar; Dan Roth; |

27 | Simple Disentanglement of Style and Content in Visual RepresentationsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we propose a simple post-processing framework to disentangle content and style in learned representations from pre-trained vision models. |
Lilian Ngweta; Subha Maity; Alex Gittens; Yuekai Sun; Mikhail Yurochkin; |

28 | Beyond The Edge of Stability Via Two-step Gradient UpdatesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be associated to a `Sharpness-Minimisation’ implicit regularisation within the manifold of minimisers, under appropriate asymptotic limits. In contrast, in this work we directly examine the conditions for such unstable convergence, focusing on simple, yet representative, learning problems, via analysis of two-step gradient updates. |
Lei Chen; Joan Bruna; |

29 | On The Role of Attention in Prompt-tuningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mixture-models where each input token belongs to a context-relevant or -irrelevant set. |
Samet Oymak; Ankit Singh Rawat; Mahdi Soltanolkotabi; Christos Thrampoulidis; |

30 | Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To this end, we propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. |
Zhongzhi Yu; Yang Zhang; Kaizhi Qian; Cheng Wan; Yonggan Fu; Yongan Zhang; Celine Lin; |

31 | Hyperbolic Representation Learning: Revisiting and AdvancingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we first introduce a position-tracking mechanism to scrutinize existing prevalent hyperbolic models, revealing that the learned representations are sub-optimal and unsatisfactory. To address this, we propose a simple yet effective method, hyperbolic informed embedding (HIE), by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to the origin (i.e., induced hyperbolic norm) to advance existing hyperbolic models. |
Menglin YANG; min zhou; Zhitao Ying; yankai Chen; Irwin King; |

32 | Learning Belief Representations for Partially Observable Deep RLRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. |
Andrew Wang; Andrew C Li; Toryn Q. Klassen; Rodrigo Toro Icarte; Sheila A. McIlraith; |

33 | MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of BehaviorRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce MABe22, a large-scale, multi-agent video and trajectory benchmark to assess the quality of learned behavior representations. |
Jennifer J. Sun; Markus Marks; Andrew Wesley Ulmer; Dipam Chakraborty; Brian Geuther; Edward Hayes; Heng Jia; Vivek Kumar; Sebastian Oleszko; Zachary Partridge; Milan Peelman; Alice Robie; Catherine E Schretter; Keith Sheppard; Chao Sun; Param Uttarwar; Julian Morgan Wagner; Erik Werner; Joseph Parker; Pietro Perona; Yisong Yue; Kristin Branson; Ann Kennedy; |

34 | Scaling Up Dataset Distillation to ImageNet-1K with Constant MemoryRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Can we scale this SOTA method to ImageNet-1K and does its effectiveness on CIFAR transfer to ImageNet-1K? To answer these questions, we first propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with $\sim 6$x reduction in memory footprint. We further discover that it is challenging for MTT to handle datasets with a large number of classes, and propose a novel soft label assignment that drastically improves its convergence. |
Justin Cui; Ruochen Wang; Si Si; Cho-Jui Hsieh; |

35 | MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems. |
Jiachen Yao; Chang Su; Zhongkai Hao; Songming Liu; Hang Su; Jun Zhu; |

36 | Polynomial Preconditioning for Gradient MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a new family of preconditioners generated by the symmetric polynomials. |
Nikita Doikov; Anton Rodomanov; |

37 | Internally Rewarded Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we formally formulate IRRL and present a class of problems that belong to IRRL. |
Mengdi Li; Xufeng Zhao; Jae Hee Lee; Cornelius Weber; Stefan Wermter; |

38 | Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free TrainabilityRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper we establish the existence of highly sparse trainable initializations for evolution strategies (ES) and characterize qualitative differences compared to gradient descent (GD)-based sparse training. |
Robert Tjarko Lange; Henning Sprekeler; |

39 | Slot-VAE: Object-Centric Scene Generation with Slot AttentionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose the Slot-VAE, a generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation. |
Yanbo Wang; Letao Liu; Justin Dauwels; |

40 | Model-based Reinforcement Learning with Scalable Composite Policy Gradient EstimatorsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a scalable method, Total Propagation X (TPX) that improves over TP by changing the node used for IVW, and employing coordinate wise weighting. |
Paavo Parmas; Takuma Seno; Yuma Aoki; |

41 | Predicting Rare Events By Shrinking Towards Proportional OddsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We present PRESTO, a relaxation of the proportional odds model for ordinal regression. |
Gregory Faletto; Jacob Bien; |

42 | The Optimal Approximation Factors in Misspecified Off-Policy Value Function EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Yet, the nature of such *approximation factors*—especially their optimal form in a given learning problem—is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. |
Philip Amortila; Nan Jiang; Csaba Szepesvari; |

43 | Internet Explorer: Targeted Representation Learning on The Open WebRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on a target dataset. |
Alexander Cong Li; Ellis Langham Brown; Alexei A Efros; Deepak Pathak; |

44 | Robust and Private Stochastic Linear BanditsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study the stochastic linear bandit problem under the additional requirements of *differential privacy*, *robustness* and *batched observations*. |
Vasileios Charisopoulos; Hossein Esfandiari; Vahab Mirrokni; |

45 | High-dimensional Location Estimation Via Norm Concentration for Subgamma VectorsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We build on the theory using *smoothed* estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$-smoothed distribution. |
Shivam Gupta; Jasper C.H. Lee; Eric Price; |

46 | Action Matching: Learning Stochastic Dynamics from SamplesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In order to better understand the systems under observation, we would like to learn a model of the underlying process that allows us to propagate samples in time and thereby simulate entire individual trajectories. In this work, we propose Action Matching, a method for learning a rich family of dynamics using only independent samples from its time evolution. |
Kirill Neklyudov; Rob Brekelmans; Daniel Severo; Alireza Makhzani; |

47 | Short-lived High-volume BanditsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose an $\ell$-Layered Sieve Policy that recursively refines the action space for $\ell\leq w$ times. |
Su Jia; Nishant Oli; Ian Anderson; Paul Duff; Andrew A Li; Ramamoorthi Ravi; |

48 | Learning Temporally AbstractWorld Models Without Online ExperimentationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present an approach for simultaneously learning sets of skills and temporally abstract, skill-conditioned world models purely from offline data, enabling agents to perform zero-shot online planning of skill sequences for new tasks. |
Benjamin Freed; Siddarth Venkatraman; Guillaume Adrien Sartoretti; Jeff Schneider; Howie Choset; |

49 | Active Policy Improvement from Multiple Black-box OraclesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. |
Xuefeng Liu; Takuma Yoneda; Chaoqi Wang; Matthew Walter; Yuxin Chen; |

50 | Stochastic Gradient Descent-Induced Drift of Representation in A Two-Layer Neural NetworkRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood. Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network. |
Farhad Pashakhanloo; Alexei Koulakov; |

51 | IRNeXt: Rethinking Convolutional Network Design for Image RestorationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we excavate the potential of the convolutional neural network (CNN) and show that our CNN-based model can receive comparable or better performance than Transformer models with low computation overhead on several image restoration tasks. |
Yuning Cui; Wenqi Ren; Sining Yang; Xiaochun Cao; Alois Knoll; |

52 | TabLeak: Tabular Data Leakage in Federated LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. |
Mark Vero; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev; |

53 | Can Large Language Models Reason About Program Invariants?Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study the application of large language models to invariant prediction, finding that models trained on source code and fine-tuned for invariant generation can perform invariant prediction as static rather than dynamic analysis. |
Kexin Pei; David Bieber; Kensen Shi; Charles Sutton; Pengcheng Yin; |

54 | Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversion. |
Alexander Lin; Bahareh Tolooshams; Yves Atchade; Demba E. Ba; |

55 | Stable and Consistent Prediction of 3D Characteristic Orientation Via Invariant Residual LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we introduce a novel method to decouple the shape geometry and semantics of the input point cloud to achieve both stability and consistency. |
Seungwook Kim; Chunghyun Park; Yoonwoo Jeong; Jaesik Park; Minsu Cho; |

56 | Deep Generative Symbolic Regression with Monte-Carlo-Tree-SearchRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure using a context-aware neural mutation model, which is initially pre-trained to learn promising mutations, and further refined from successful experiences in an online fashion. |
Pierre-Alexandre Kamienny; Guillaume Lample; sylvain lamprier; Marco Virgolin; |

57 | Mixing Predictions for Online Metric AlgorithmsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. |
Antonios Antoniadis; Christian Coester; Marek Elias; Adam Polak; Bertrand Simon; |

58 | Approximate Causal Effect Identification Under Weak ConfoundingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we analyze the effect of weak confounding’ on causal estimands. |
Ziwei Jiang; Lai Wei; Murat Kocaoglu; |

59 | Bootstrap in High Dimension with Low ComputationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study the use of bootstraps in high-dimensional environments with a small number of resamples. |
Henry Lam; Zhenyuan Liu; |

60 | Image Shortcut Squeezing: Countering Perturbative Availability Poisons with CompressionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Current research adopts the belief that practical and effective approaches to countering such poisons do not exist. In this paper, we argue that it is time to abandon this belief. |
Zhuoran Liu; Zhengyu Zhao; Martha Larson; |

61 | Proper Losses for Discrete Generative ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We initiate the study of proper losses for evaluating generative models in the discrete setting. |
Dhamma Kimpara; Rafael Frongillo; Bo Waggoner; |

62 | Private Federated Learning with Autotuned CompressionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. |
Enayat Ullah; Christopher A. Choquette-Choo; Peter Kairouz; Sewoong Oh; |

63 | SpotEM: Efficient Video Search for Episodic MemoryRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy. |
Santhosh Kumar Ramakrishnan; Ziad Al-Halah; Kristen Grauman; |

64 | DRCFS: Doubly Robust Causal Feature SelectionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. |
Francesco Quinzan; Ashkan Soleymani; Patrick Jaillet; Cristian R. Rojas; Stefan Bauer; |

65 | Open-Vocabulary Universal Image Segmentation with MaskCLIPRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time. |
Zheng Ding; Jieke Wang; Zhuowen Tu; |

66 | Universal Physics-Informed Neural Networks: Symbolic Differential Operator Discovery with Sparse DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work we perform symbolic discovery of differential operators in a situation where there is sparse experimental data. |
Lena Podina; Brydon Eastman; Mohammad Kohandel; |

67 | Partial Optimality in Cubic Correlation ClusteringRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Here, we focus on establishing partial optimality conditions for the special case of complete graphs and cubic objective functions. |
David Stein; Silvia Di Gregorio; Bjoern Andres; |

68 | InfoDiffusion: Representation Learning Using Information Maximizing Diffusion ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Here, we propose InfoDiffusion, an algorithm that augments diffusion models with low-dimensional latent variables that capture high-level factors of variation in the data. |
Yingheng Wang; Yair Schiff; Aaron Gokaslan; Weishen Pan; Fei Wang; Christopher De Sa; Volodymyr Kuleshov; |

69 | A Large-Scale Study of Probabilistic Calibration in Neural Network RegressionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: While neural network miscalibration has been studied primarily in classification, we investigate this in the less-explored domain of regression. |
Victor Dheur; Souhaib Ben Taieb; |

70 | Global Optimality of Elman-type RNNs in The Mean-field RegimeRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. |
Andrea Agazzi; Jianfeng Lu; Sayan Mukherjee; |

71 | Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing FlowsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. |
Phillip Si; Zeyi Chen; Subham Sekhar Sahoo; Yair Schiff; Volodymyr Kuleshov; |

72 | When Do Minimax-fair Learning and Empirical Risk Minimization Coincide?Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group. |
Harvineet Singh; Matthäus Kleindessner; Volkan Cevher; Rumi Chunara; Chris Russell; |

73 | Theoretical Behavior of XAI Methods in The Presence of Suppressor VariablesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We show that the majority of the studied approaches will attribute non-zero importance to a non-class-related suppressor feature in the presence of correlated noise. |
Rick Wilming; Leo Kieslich; Benedict Clark; Stefan Haufe; |

74 | Probabilistic Imputation for Time-series Classification with Missing DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a novel probabilistic framework for classification with multivariate time series data with missing values. |
SeungHyun Kim; Hyunsu Kim; Eunggu Yun; Hwangrae Lee; Jaehun Lee; Juho Lee; |

75 | MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from ObservationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we present a generic approach to offline PLfO, called Modality-agnostic Adversarial Hypothesis Adaptation for Learning from Observations (MAHALO). |
Anqi Li; Byron Boots; Ching-An Cheng; |

76 | Robust Counterfactual Explanations for Neural Networks With Probabilistic GuaranteesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed *naturally-occurring* model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. |
Faisal Hamman; Erfaun Noorani; Saumitra Mishra; Daniele Magazzeni; Sanghamitra Dutta; |

77 | Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. |
Matthias Gerstgrasser; David C. Parkes; |

78 | Multi-Agent Learning from LearnersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Nevertheless, some recent works move away from the optimality assumption to study the Learning from a Learner (LfL) problem, where the challenge is inferring the reward function of a learning agent from a sequence of demonstrations produced by progressively improving policies. In this work, we take one of the initial steps in addressing the multi-agent version of this problem and propose a new algorithm, MA-LfL (Multiagent Learning from a Learner). |
Mine Melodi Caliskan; Francesco Chini; Setareh Maghsudi; |

79 | Cut Your Losses with SquentropyRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper we propose the squentropy loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes.We provide an extensive set of experiment on multi-class classification problems showing that the squentropy loss outperforms both the pure cross-entropy and rescaled square losses in terms of the classification accuracy. |
Like Hui; Mikhail Belkin; Stephen Wright; |

80 | The Statistical Scope of MulticalibrationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We make a connection between multicalibration and property elicitation and show that (under mild technical conditions) it is possible to produce a multicalibrated predictor for a continuous scalar property $\Gamma$ if and only if $\Gamma$ is *elicitable*. |
Georgy Noarov; Aaron Roth; |

81 | Sparse Learning of Dynamical Systems in RKHS: An Operator-Theoretic ApproachRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we present a method for sparse learning of transfer operators from $\beta$-mixing stochastic processes, in both discrete and continuous time, and provide sample complexity analysis extending existing theoretical guarantees for learning from non-sparse, i.i.d. data. |
Boya Hou; Sina Sanjari; Nathan Dahlin; Subhonmesh Bose; Umesh Vaidya; |

82 | Conformal Prediction with Missing ValuesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We study conformal prediction with missing values in the covariates — a setting that brings new challenges to uncertainty quantification. |
Margaux Zaffran; Aymeric Dieuleveut; Julie Josse; Yaniv Romano; |

83 | Weakly Supervised Regression with Interval TargetsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we provide a thorough study on RIT. |
Xin Cheng; Yuzhou Cao; Ximing Li; Bo An; Lei Feng; |

84 | Controllable Neural Symbolic RegressionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, these methods, in their current form, lack the capability to incorporate user-defined prior knowledge, which is often required in natural sciences and engineering fields. To overcome this limitation, we propose a novel neural symbolic regression method, named Neural Symbolic Regression with Hypothesis (NSRwH) that enables the explicit incorporation of assumptions about the expected structure of the ground-truth expression into the prediction process. |
Tommaso Bendinelli; Luca Biggio; Pierre-Alexandre Kamienny; |

85 | DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus AlgorithmRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. |
Lisang Ding; Kexin Jin; Bicheng Ying; Kun Yuan; Wotao Yin; |

86 | How Do Transformers Learn Topic Structure: Towards A Mechanistic UnderstandingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In our paper, we provide fine-grained mechanistic understanding of how transformers learn “semantic structure”, understood as capturing co-occurrence structure of words. |
Yuchen Li; Yuanzhi Li; Andrej Risteski; |

87 | Subset-Based Instance Optimality in Private EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a new definition of instance optimality for differentially private estimation algorithms. |
Travis Dick; Alex Kulesza; Ziteng Sun; Ananda Theertha Suresh; |

88 | A Statistical Perspective on Retrieval-Based ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present a formal treatment of retrieval-based models to characterize their performance via a novel statistical perspective. |
Soumya Basu; Ankit Singh Rawat; Manzil Zaheer; |

89 | Adaptive IMLE for Few-shot Pretraining-free Generative ModellingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present a more generalized formulation of IMLE which includes the original formulation as a special case, and we prove that the theoretical guarantees hold under weaker conditions. |
Mehran Aghabozorgi; Shichong Peng; Ke Li; |

90 | Polyhedral Complex Extraction from ReLU Networks Using Edge SubdivisionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Instead of regions, we propose to subdivide edges, leading to a novel method for polyhedral complex extraction. |
Arturs Berzins; |

91 | Task-Specific Skill Localization in Fine-tuned Language ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Thus fine-tuning allows the model to quickly pick up task-specific skills, but there has been limited study of *where* these newly-learnt skills reside inside the massive model. This paper introduces the term *skill localization* for this problem and proposes a solution. |
Abhishek Panigrahi; Nikunj Saunshi; Haoyu Zhao; Sanjeev Arora; |

92 | Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain TranslatorRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we learn unforeseen robustness by harnessing the variations in the abundant out-of-distribution data. |
Sicheng Zhu; Bang An; Furong Huang; Sanghyun Hong; |

93 | Fast Federated Machine Unlearning with Nonlinear Functional TheoryRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper proposes a fast FMU algorithm, FFMU, for improving the FMU efficiency while maintaining the unlearning quality. |
Tianshi Che; Yang Zhou; Zijie Zhang; Lingjuan Lyu; Ji Liu; Da Yan; Dejing Dou; Jun Huan; |

94 | FARE: Provably Fair Representation Learning with Practical CertificatesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we address that challenge and introduce FARE (Fairness with Restricted Encoders), the first FRL method with practical fairness certificates. |
Nikola Jovanović; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev; |

95 | Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical TheoriesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper introduces a relaxed assumption that the input data are concentrated around a subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension of $\mathcal{S}$ can be characterized by a new complexity notation — effective Minkowski dimension. |
Zixuan Zhang; Minshuo Chen; Mengdi Wang; Wenjing Liao; Tuo Zhao; |

96 | BiRT: Bio-inspired Replay in Vision Transformers for Continual LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, representation rehearsal in vision transformers lacks diversity, resulting in overfitting and consequently, performance drops significantly compared to raw image rehearsal. Therefore, we propose BiRT, a novel representation rehearsal-based continual learning approach using vision transformers. |
Kishaan Jeeveswaran; Prashant Shivaram Bhat; Bahram Zonooz; Elahe Arani; |

97 | On The Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, an important challenge of this approach is the representational collapse, where the subspace of the latent representations collapses into a low-dimensional manifold. To address this issue, we propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold by decorrelating the features in the latent space. |
Hojoon Lee; Koanho Lee; Dongyoon Hwang; Hyunho Lee; Byungkun Lee; Jaegul Choo; |

98 | Trainability, Expressivity and Interpretability in Gated Neural ODEsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. |
Timothy Doyeon Kim; Tankut Can; Kamesh Krishnamurthy; |

99 | Feature Learning in Deep Classifiers Through Intermediate Neural CollapseRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we conduct an empirical study of the feature learning process in deep classifiers. |
Akshay Rangamani; Marius Lindegaard; Tomer Galanti; tomaso a poggio; |

100 | Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing FlowsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Thus, we model the Q function seamlessly with *off-the-shelf* deep generative models such as normalizing flows. |
Owen M Dugan; Peter Y. Lu; Rumen Dangovski; Di Luo; Marin Soljacic; |

101 | Unearthing InSights Into Mars: Unsupervised Source Separation with Limited DataRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: While data-driven methods have shown great promise in source separation, they often require large amounts of data, which rarely exists in planetary space missions. To address this challenge, we propose an unsupervised source separation scheme for domains with limited data access that involves solving an optimization problem in the wavelet scattering covariance representation space—an interpretable, low-dimensional representation of stationary processes. |
Ali Siahkoohi; Rudy Morel; Maarten V. de Hoop; Erwan Allys; Grégory Sainton; Taichi Kawamura; |

102 | Doubly Optimal No-Regret Learning in Monotone GamesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose the *accelerated optimistic gradient* (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. |
Yang Cai; Weiqiang Zheng; |

103 | The Test of Tests: A Framework for Differentially Private Hypothesis TestingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We present a generic framework for creating differentially private versions of any hypothesis test in a black-box way. |
Zeki Kazan; Kaiyan Shi; Adam Groce; Andrew Bray; |

104 | Multi-Symmetry Ensembles: Improving Diversity and Generalization Via Opposing SymmetriesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we present Multi-Symmetry Ensembles (MSE), a framework for constructing diverse ensembles by capturing the multiplicity of hypotheses along symmetry axes, which explore the hypothesis space beyond stochastic perturbations of model weights and hyperparameters. |
Charlotte Loh; Seungwook Han; Shivchander Sudalairaj; Rumen Dangovski; Kai Xu; Florian Wenzel; Marin Soljacic; Akash Srivastava; |

105 | When Is Realizability Sufficient for Off-Policy Reinforcement Learning?Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. |
Andrea Zanette; |

106 | Hidden Symmetries of ReLU NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. |
Elisenda Grigsby; Kathryn Lindsey; David Rolnick; |

107 | Bootstrapped Representations in Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). |
Charline Le Lan; Stephen Tu; Mark Rowland; Anna Harutyunyan; Rishabh Agarwal; Marc G Bellemare; Will Dabney; |

108 | New Metrics and Search Algorithms for Weighted Causal DAGsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, under some standard assumptions, we study causal graph discovery via _adaptive interventions with node-dependent interventional costs_. |
Davin Choo; Kirankumar Shiragur; |

109 | Exact Inference in High-order Structured PredictionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study the problem of inference in high-order structured prediction tasks. |
Chuyang Ke; Jean Honorio; |

110 | Path Neural Networks: Expressive and Accurate Graph Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose Path Neural Networks (PathNNs), a model that updates node representations by aggregating paths emanating from nodes. |
Gaspard Michel; Giannis Nikolentzos; Johannes F. Lutzeyer; Michalis Vazirgiannis; |

111 | The Statistical Benefits of Quantile Temporal-Difference Learning for Value EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study the problem of temporal-difference-based policy evaluation in reinforcement learning. |
Mark Rowland; Yunhao Tang; Clare Lyle; Remi Munos; Marc G Bellemare; Will Dabney; |

112 | Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic Analysis For DDIM-type SamplersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling. |
Sitan Chen; Giannis Daras; Alex Dimakis; |

113 | On Bridging The Gap Between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch NormalizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we analyze the role of depth in the concentration of mean-field predictions for Gram matrices of hidden representations in deep multilayer perceptron (MLP) with batch normalization (BN) at initialization. |
Amir Joudaki; Hadi Daneshmand; Francis Bach; |

114 | Hyperparameters in Reinforcement Learning and How To Tune ThemRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting. |
Theresa Eimer; Marius Lindauer; Roberta Raileanu; |

115 | Towards Constituting Mathematical Structures for Learning to OptimizeRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we derive the basic mathematical conditions that successful update rules commonly satisfy. |
Jialin Liu; Xiaohan Chen; Zhangyang Wang; Wotao Yin; HanQin Cai; |

116 | Identifiability and Generalizability in Constrained Inverse Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Two main challenges in Reinforcement Learning (RL) are designing appropriate reward functions and ensuring the safety of the learned policy. To address these challenges, we present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes. |
Andreas Schlaginhaufen; Maryam Kamgarpour; |

117 | Adaptive Whitening in Neural Populations with Gain-modulating InterneuronsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Existing neural circuit models of adaptive whitening operate by modifying synaptic interactions; however, such modifications would seem both too slow and insufficiently reversible. Motivated by the extensive neuroscience literature on gain modulation, we propose an alternative model that adaptively whitens its responses by modulating the gains of individual neurons. |
Lyndon Duong; David Lipshutz; David Heeger; Dmitri Chklovskii; Eero P Simoncelli; |

118 | Optimizing The Collaboration Structure in Cross-Silo Federated LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose FedCollab, a novel FL framework that alleviates negative transfer by clustering clients into non-overlapping coalitions based on their distribution distances and data quantities. |
Wenxuan Bao; Haohan Wang; Jun Wu; Jingrui He; |

119 | Smart Initial Basis Selection for Linear ProgramsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a learning-based approach for initial basis selection. |
Zhenan Fan; Xinglu Wang; Oleksandr Yakovenko; Abdullah Ali Sivas; Owen Ren; Yong Zhang; Zirui Zhou; |

120 | From Relational Pooling to Subgraph GNNs: A Universal Framework for More Expressive Graph Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Starting from RP, we propose to explicitly assign labels to nodes as additional features to improve graph isomorphism distinguishing power of message passing neural networks. |
Cai Zhou; Xiyuan Wang; Muhan Zhang; |

121 | The Unreasonable Effectiveness of Few-shot Learning for Machine TranslationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. |
Xavier Garcia; Yamini Bansal; Colin Cherry; George Foster; Maxim Krikun; Melvin Johnson; Orhan Firat; |

122 | Auxiliary Learning As An Asymmetric Bargaining GameRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we propose a novel approach, named AuxiNash, for balancing tasks in auxiliary learning by formalizing the problem as generalized bargaining game with asymmetric task bargaining power. |
Aviv Shamsian; Aviv Navon; Neta Glazer; Kenji Kawaguchi; Gal Chechik; Ethan Fetaya; |

123 | Additive Causal Bandits with Unknown GraphRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To remedy this, we adopt an additional additive assumption on the outcome which allows us to solve the problem by casting it as an additive combinatorial linear bandit problem with full-bandit feedback. We propose a novel action-elimination algorithm for this setting, show how to apply this algorithm to the causal bandit problem, provide sample complexity bounds, and empirically validate our findings on a suite of randomly generated causal models, effectively showing that one does not need to explicitly learn the parents of the outcome to identify the best intervention. |
Alan Malek; Virginia Aglietti; Silvia Chiappa; |

124 | Mitigating Spurious Correlations in Multi-modal Models During Fine-tuningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper proposes a novel approach to address spurious correlations during fine-tuning for a given domain of interest. |
Yu Yang; Besmira Nushi; Hamid Palangi; Baharan Mirzasoleiman; |

125 | Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust DecisionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. |
Ezgi Korkmaz; Jonah Brown-Cohen; |

126 | Efficient Exploration Via Epistemic-Risk-Seeking Policy OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper we propose a new, differentiable optimistic objective that when optimized yields a policy that provably explores efficiently, with guarantees even under function approximation. |
Brendan O’Donoghue; |

127 | Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes Via Double Kullback-Leibler MinimizationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. |
Jian Cao; Myeongjong Kang; Felix Jimenez; Huiyan Sang; Florian Tobias Schaefer; Matthias Katzfuss; |

128 | Improved Policy Evaluation for Randomized Trials of Algorithmic Resource AllocationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. |
Aditya Mate; Bryan Wilder; Aparna Taneja; Milind Tambe; |

129 | Efficient Quantum Algorithms for Quantum Optimal ControlRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present efficient quantum algorithms that are exponentially faster than classical algorithms for solving the quantum optimal control problem. |
Xiantao Li; Chunhao Wang; |

130 | SLAMB: Accelerated Large Batch Training with Sparse CommunicationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we combine sparsification-based gradient compression with the layer-wise adaptive moments optimizer for large batch training (LAMB). |
Hang Xu; Wenxuan Zhang; Jiawei Fei; Yuzhe Wu; TingWen Xie; Jun Huang; Yuchen Xie; Mohamed Elhoseiny; Panos Kalnis; |

131 | Cooperation in The Latent Space: The Benefits of Adding Mixture Components in Variational AutoencodersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we show how the mixture components cooperate when they jointly adapt to maximize the ELBO. |
Oskar Kviman; Ricky Molén; Alexandra Hotti; Semih Kurt; Víctor Elvira; Jens Lagergren; |

132 | Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent ModelRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. |
Siyu Chen; Jibang Wu; Yifan Wu; Zhuoran Yang; |

133 | High Probability Convergence of Stochastic Gradient MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and non-convex optimization with sub-Gaussian noise. |
Zijian Liu; Ta Duy Nguyen; Thien Hang Nguyen; Alina Ene; Huy Nguyen; |

134 | Towards Understanding and Reducing Graph Structural Noise for GNNsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we focus on understanding and alleviating the effect of graph structural noise on GNN performance. |
Mingze Dong; Yuval Kluger; |

135 | On The Convergence of Gradient Flow on Multi-layer Linear ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form $f(W_1W_2\cdots W_L)$. |
Hancheng Min; Rene Vidal; Enrique Mallada; |

136 | Single Point-Based Distributed Zeroth-Order Optimization with A Non-Convex Stochastic Objective FunctionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we introduce a zero-order distributed optimization method based on a one-point estimate of the gradient tracking technique. |
Elissa Mhanna; Mohamad Assaad; |

137 | Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning Via Multi-Level Monte Carlo Actor-CriticRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm. |
Wesley Suttle; Amrit Bedi; Bhrij Patel; Brian M. Sadler; Alec Koppel; Dinesh Manocha; |

138 | Neural Network Accelerated Implicit Filtering: Integrating Neural Network Surrogates With Provably Convergent Derivative Free Optimization MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we introduce neural network accelerated implicit filtering (NNAIF), a novel family of methods for solving noisy derivative free (i.e. black box, zeroth order) optimization problems. |
Brian Irwin; Eldad Haber; Raviv Gal; Avi Ziv; |

139 | Stochastic Marginal Likelihood Gradients Using Neural Tangent KernelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. |
Alexander Immer; Tycho F.A. van der Ouderaa; Mark van der Wilk; Gunnar Ratsch; Bernhard Schölkopf; |

140 | Constrained Optimization Via Exact Augmented Lagrangian and Randomized Iterative SketchingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. |
Ilgee Hong; Sen Na; Michael W. Mahoney; mladen kolar; |

141 | FAENet: Frame Averaging Equivariant GNN for Materials ModelingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we introduce (1) a flexible, model-agnostic framework based on stochastic frame averaging that enforces E(3) equivariance or invariance, without any architectural constraints; (2) FAENet: a simple, fast and expressive GNN that leverages stochastic frame averaging to process geometric information without constraints. |
Alexandre AGM Duval; Victor Schmidt; Alex Hernández-García; Santiago Miret; Fragkiskos D. Malliaros; Yoshua Bengio; David Rolnick; |

142 | Maximal Initial Learning Rates in Deep ReLU NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ – the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. |
Gaurav Iyer; Boris Hanin; David Rolnick; |

143 | Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Motivated by a recent success in analyzing an extrapolated cyclic scheme for generalized variational inequalities, we propose an *Accelerated Cyclic Coordinate Dual Averaging with Extrapolation* (A-CODER) method for composite convex optimization, where the objective function can be expressed as the sum of a smooth convex function accessible via a gradient oracle and a convex, possibly nonsmooth, function accessible via a proximal oracle. |
Cheuk Yin Lin; Chaobing Song; Jelena Diakonikolas; |

144 | A Robust Test for The Stationarity Assumption in Sequential Decision MakingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. |
Jitao Wang; Chengchun Shi; Zhenke Wu; |

145 | Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. |
Minshuo Chen; Kaixuan Huang; Tuo Zhao; Mengdi Wang; |

146 | Why Target Networks Stabilise Temporal Difference MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Yet a complete theoretical explanation for the effectiveness of target networks remains elusive. In this work, we provide an analysis of this popular class of algorithms, to finally answer the question: “why do target networks stabilise TD learning”? |
Mattie Fellows; Matthew J. A. Smith; Shimon Whiteson; |

147 | Simplified Temporal Consistency Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. |
Yi Zhao; Wenshuai Zhao; Rinu Boney; Juho Kannala; Joni Pajarinen; |

148 | Generalized Implicit Follow-The-Regularized-LeaderRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework. |
Keyi Chen; Francesco Orabona; |

149 | On Pitfalls of Test-Time AdaptationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Despite the sheer number of existing methods, the inconsistent experimental conditions and lack of standardization in prior literature make it difficult to measure their actual efficacies and progress. To address this issue, we present a large-scale open-sourced Test-Time Adaptation Benchmark, dubbed TTAB, which includes nine state-of-the-art algorithms, a diverse array of distribution shifts, and two comprehensive evaluation protocols. |
Hao Zhao; Yuejiang Liu; Alexandre Alahi; Tao Lin; |

150 | Algorithms for Bounding Contribution for Histogram Estimation Under User-level PrivacyRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose algorithms to choose the best user contribution bound for histogram estimation under both bounded and unbounded domain settings. |
Yuhan Liu; Ananda Theertha Suresh; Wennan Zhu; Peter Kairouz; Marco Gruteser; |

151 | Are Gaussian Data All You Need? The Extents and Limits of Universality in High-Dimensional Generalized Linear EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. |
Luca Pesce; Florent Krzakala; Bruno Loureiro; Ludovic Stephan; |

152 | Model-based Offline Reinforcement Learning with Count-based ConservatismRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. |
Byeongchan Kim; Min-hwan Oh; |

153 | One-shot Imitation in A Non-Stationary Environment Via Multi-Modal SkillRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To tackle the problem, we explore the compositionality of complex tasks, and present a novel skill-based imitation learning framework enabling one-shot imitation and zero-shot adaptation; from a single demonstration for a complex unseen task, a semantic skill sequence is inferred and then each skill in the sequence is converted into an action sequence optimized for environmental hidden dynamics that can vary over time. |
Sangwoo Shin; Daehee Lee; Minjong Yoo; Woo Kyung Kim; Honguk Woo; |

154 | Structural Re-weighting Improves Graph Domain AdaptationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This work examines different impacts of distribution shifts caused by either graph structure or node attributes and identifies a new type of shift, named conditional structure shift (CSS), which current GDA approaches are provably sub-optimal to deal with. A novel approach, called structural reweighting (StruRW), is proposed to address this issue and is tested on synthetic graphs, four benchmark datasets, and a new application in HEP. |
Shikun Liu; Tianchun Li; Yongbin Feng; Nhan Tran; Han Zhao; Qiang Qiu; Pan Li; |

155 | Hypervolume Knowledge Gradient: A Lookahead Approach for Multi-Objective Bayesian Optimization with Partial InformationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose a general one-step lookahead acquisition function based on the Knowledge Gradient that addresses the complex question of what to evaluate when and at which design points in a principled Bayesian decision-theoretic fashion. |
Sam Daulton; Maximilian Balandat; Eytan Bakshy; |

156 | Machine Learning Force Fields with Data Cost Aware TrainingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$, with $n$ proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework — ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data. |
Alexander Bukharin; Tianyi Liu; Shengjie Wang; Simiao Zuo; Weihao Gao; Wen Yan; Tuo Zhao; |

157 | Continuous Spatiotemporal TransformerRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for modeling of continuous systems. |
Antonio Henrique de Oliveira Fonseca; Emanuele Zappala; Josue Ortega Caro; David van Dijk; |

158 | ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks Via Learned Finite State MachinesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we provide an approach for batching dynamic DNNs based on finite state machines, which enables the automatic discovery of batching policies specialized for each DNN via reinforcement learning. |
Siyuan Chen; Pratik Pramod Fegade; Tianqi Chen; Phillip Gibbons; Todd Mowry; |

159 | Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High DimensionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Given their central role in comparing and improving generative models, understanding their limitations are crucially important. To that end, in this work, we identify a critical flaw in the common approximation of these metrics using k-nearest-neighbors, namely, that the very interpretations of fidelity and diversity that are assigned to Precision and Recall can fail in high dimensions, resulting in very misleading conclusions. |
Mahyar Khayatkhoei; Wael AbdAlmageed; |

160 | An SDE for Modeling SAM: Theory and InsightsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. |
Enea Monzio Compagnoni; Luca Biggio; Antonio Orvieto; Frank Norbert Proske; Hans Kersting; Aurelien Lucchi; |

161 | Simple and Fast Group Robustness By Automatic Feature ReweightingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose Automatic Feature Reweighting (AFR), an extremely simple and fast method for updating the model to reduce the reliance on spurious features. |
Shikai Qiu; Andres Potapczynski; Pavel Izmailov; Andrew Gordon Wilson; |

162 | MG-GNN: Multigrid Graph Neural Networks for Learning Multilevel Domain Decomposition MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose multigrid graph neural networks (MG-GNN), a novel GNN architecture for learning optimized parameters in two-level DDMs. |
Ali Taghibakhshi; Nicolas Nytko; Tareq Uz Zaman; Scott MacLachlan; Luke Olson; Matthew West; |

163 | Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient ApproachRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In contrast, in this work, we develop an implicit gradient-based approach, which is easy to implement, and is suitable for machine learning applications. |
Prashant Khanduri; Ioannis Tsaknakis; Yihua Zhang; Jia Liu; Sijia Liu; Jiawei Zhang; Mingyi Hong; |

164 | Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information SharingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To circumvent the known hardness results and the use of computationally intractable oracles, we propose to leverage the potential *information-sharing* among agents, a standard practice in empirical MARL and a common model for multi-agent control systems with communications. |
Xiangyu Liu; Kaiqing Zhang; |

165 | Revisiting Structured Variational AutoencodersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Here, we revisit SVAEs using modern machine learning tools and demonstrate their advantages over more general alternatives in terms of both accuracy and efficiency. |
Yixiu Zhao; Scott Linderman; |

166 | Tied-Augment: Controlling Representation Similarity Improves Data AugmentationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose a general framework called Tied-Augment, which improves the efficacy of data augmentation in a wide range of applications by adding a simple term to the loss that can control the similarity of representations under distortions. |
Emirhan Kurtuluş; Zichao Li; Yann Dauphin; Ekin Dogus Cubuk; |

167 | Equivariance with Learned Canonicalization FunctionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. |
Sékou-Oumar Kaba; Arnab Kumar Mondal; Yan Zhang; Yoshua Bengio; Siamak Ravanbakhsh; |

168 | Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy AdaptationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a human-in-the-loop framework to leverage feedback from the end user to quickly identify and augment task-irrelevant visual state concepts. |
Andi Peng; Aviv Netanyahu; Mark K Ho; Tianmin Shu; Andreea Bobu; Julie Shah; Pulkit Agrawal; |

169 | Applied Online Algorithms with Heterogeneous PredictorsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We demonstrate how to more effectively utilize historical datasets and application domain knowledge by intentionally using predictors of *different* quantities. |
Jessica Maghakian; Russell Lee; Mohammad Hajiesmaili; Jian Li; Ramesh Sitaraman; Zhenhua Liu; |

170 | Ewald-based Long-Range Message Passing for Molecular GraphsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: While this focus on locality is a useful inductive bias, it also impedes the learning of long-range interactions such as electrostatics and van der Waals forces. To address this drawback, we propose Ewald message passing: a nonlocal Fourier space scheme which limits interactions via a cutoff on frequency instead of distance, and is theoretically well-founded in the Ewald summation method. |
Arthur Kosmala; Johannes Gasteiger; Nicholas Gao; Stephan Günnemann; |

171 | Future-conditioned Unsupervised Pretraining for Decision TransformerRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we aim to utilize generalized future conditioning to enable efficient unsupervised pretraining from reward-free and sub-optimal offline data. |
Zhihui Xie; Zichuan Lin; Deheng Ye; QIANG FU; Yang Wei; Shuai Li; |

172 | A Model-Based Method for Minimizing CVaR and BeyondRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We develop a variant of the stochastic prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective. |
Si Yi Meng; Robert M. Gower; |

173 | Extrapolative Controlled Sequence Generation Via Iterative RefinementRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Instead, in this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation. |
Vishakh Padmakumar; Richard Yuanzhe Pang; He He; Ankur P Parikh; |

174 | General Sequential Episodic Memory ModelRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we introduce a class of General Sequential Episodic Memory Models (GSEMM) that, in the adiabatic limit, exhibit a dynamic energy surface, leading to a series of meta-stable states capable of encoding memory sequences. |
Arjun Karuvally; Terrence Sejnowski; Hava T Siegelmann; |

175 | Low Complexity Homeomorphic Projection to Ensure Neural-Network Solution Feasibility for Optimization Over (Non-)Convex SetRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose homeomorphic projection as a low-complexity scheme to guarantee NN solution feasibility for optimization over a general set homeomorphic to a unit ball, covering all compact convex sets and certain classes of nonconvex sets. |
Enming Liang; Minghua Chen; Steven Low; |

176 | LEVER: Learning to Verify Language-to-Code Generation with ExecutionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. |
Ansong Ni; Srini Iyer; Dragomir Radev; Veselin Stoyanov; Wen-tau Yih; Sida Wang; Xi Victoria Lin; |

177 | End-to-end Training of Deep Boltzmann Machines By Unbiased Contrastive Divergence with Local Mode InitializationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution. |
Shohei Taniguchi; Masahiro Suzuki; Yusuke Iwasawa; Yutaka Matsuo; |

178 | MPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and VideoRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration while addressing the problem of modality entanglement. |
Haiyang Xu; Qinghao Ye; Ming Yan; Yaya Shi; Jiabo Ye; Yuanhong Xu; Chenliang Li; Bin Bi; Qi Qian; Wei Wang; Guohai Xu; Ji Zhang; Songfang Huang; Fei Huang; Jingren Zhou; |

179 | Kernel QuantTreeRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present Kernel QuantTree (KQT), a non-parametric change detection algorithm that monitors multivariate data through a histogram. |
Diego Stucchi; Paolo Rizzo; Nicolò Folloni; Giacomo Boracchi; |

180 | Rethinking Backdoor AttacksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we present a different approach to the backdoor attack problem. |
Alaa Khaddaj; Guillaume Leclerc; Aleksandar Makelov; Kristian Georgiev; Hadi Salman; Andrew Ilyas; Aleksander Madry; |

181 | Adversarial Robustness of Amortized Bayesian InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator, and show how it improves the adversarial robustness of amortized Bayesian inference. |
Manuel Gloeckler; Michael Deistler; Jakob H. Macke; |

182 | Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Moreover, sequential decision making problems are often carried out in a batched manner, either due to the inherent nature of the problem or to serve the purpose of reducing communication and computation costs. In this work, we jointly study these problems in two popular settings, namely, stochastic multi-armed bandits (MABs) and infinite-horizon reinforcement learning (RL), where TS is used to learn the unknown reward distributions and transition dynamics, respectively. |
Amin Karbasi; Nikki Lijing Kuang; Yian Ma; Siddharth Mitra; |

183 | Adversarially Robust PAC Learnability of Real-Valued FunctionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Along the way, we introduce a novel agnostic sample compression scheme for real-valued functions, which may be of independent interest. |
Idan Attias; Steve Hanneke; |

184 | Reprogramming Pretrained Language Models for Antibody Sequence InfillingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Specifically, we introduce ReprogBert in which a pretrained English language model is repurposed for protein sequence infilling – thus considers cross-language adaptation using less data. |
Igor Melnyk; Vijil Chenthamarakshan; Pin-Yu Chen; Payel Das; Amit Dhurandhar; Inkit Padhi; Devleena Das; |

185 | Text-To-Concept (and Back) Via Cross-Model AlignmentRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We observe that the mapping between an image’s representation in one model to its representation in another can be learned surprisingly well with just a linear layer, even across diverse models. Building on this observation, we propose *text-to-concept*, where features from a fixed pretrained model are aligned linearly to the CLIP space, so that text embeddings from CLIP’s text encoder become directly comparable to the aligned features. |
Mazda Moayeri; Keivan Rezaei; Maziar Sanjabi; Soheil Feizi; |

186 | Multi-channel Autobidding with Budget and ROI ConstraintsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. |
Yuan Deng; Negin Golrezaei; Patrick Jaillet; Jason Cheuk Nam Liang; Vahab Mirrokni; |

187 | Statistical Indistinguishability of Learning AlgorithmsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study the similarity of outcomes of learning rules through the lens of the Total Variation (TV) distance of distributions. |
Alkis Kalavasis; Amin Karbasi; Shay Moran; Grigoris Velegkas; |

188 | Online Learning with Feedback Graphs: The True Shape of RegretRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we define a new quantity $R^*$, called the *problem complexity*, and prove that the minimax regret is proportional to $R^*$ for any graph and time horizon $T$. |
Tomáš Kocák; Alexandra Carpentier; |

189 | $H$-Consistency Bounds for Pairwise Misranking Loss SurrogatesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present a detailed study of *$H$-consistency bounds* for score-based ranking. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |

190 | From Perception to Programs: Regularize, Overparameterize, and AmortizeRelated Papers Related Patents Related Grants Related Venues Related Experts ViewAbstract: We develop techniques for synthesizing neurosymbolic programs. Such programs mix discrete symbolic processing with continuous neural computation. We relax this mixed … |
Hao Tang; Kevin Ellis; |

191 | Learning to Jump: Thinning and Thickening Latent Counts for Generative ModelingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To this end, we propose learning to jump as a general recipe for generative modeling of various types of data. |
Tianqi Chen; Mingyuan Zhou; |

192 | Identifying Interpretable Subspaces in Image RepresentationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. |
Neha Kalibhat; Shweta Bhardwaj; C. Bayan Bruss; Hamed Firooz; Maziar Sanjabi; Soheil Feizi; |

193 | Neural Markov Jump ProcessesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work we introduce an alternative, variational inference algorithm for Markov jump processes which relies on neural ordinary differential equations, and is trainable via back-propagation. |
Patrick Seifner; Ramses J Sanchez; |

194 | PFGM++: Unlocking The Potential of Physics-Inspired Generative ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce a new family of physics-inspired generative models termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). |
Yilun Xu; Ziming Liu; Yonglong Tian; Shangyuan Tong; Max Tegmark; Tommi S. Jaakkola; |

195 | Parallel Neurosymbolic Integration with ConcordiaRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present Concordia, a framework overcoming the limitations of prior art. |
Jonathan Feldstein; Modestas Jurčius; Efthymia Tsamoura; |

196 | Conditional Tree Matching for Inference-Time Adaptation of Tree Prediction ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present CTreeOT, a convergent, differentiable algorithm for matching two trees when each tree is conditioned on some input. |
Harshit Varma; Abhijeet Awasthi; Sunita Sarawagi; |

197 | Uncertainty Estimation for Molecules: Desiderata and MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: By our analysis, we conclude that none of the previous works satisfies all criteria. To fill this gap, we propose Localized Neural Kernel (LNK) a Gaussian Process (GP)-based extension to existing GNNs satisfying the desiderata. |
Tom Wollschläger; Nicholas Gao; Bertrand Charpentier; Mohamed Amine Ketata; Stephan Günnemann; |

198 | Nearly-Optimal Hierarchical Clustering for Well-Clustered GraphsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta’s cost function. |
Steinar Laenen; Bogdan Adrian Manghiuc; He Sun; |

199 | Posterior Sampling for Deep Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Posterior Sampling for Reinforcement Learning is such a model-based algorithm that has attracted significant interest due to its performance in the tabular setting. This paper introduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), the first truly scalable approximation of Posterior Sampling for Reinforcement Learning that retains its model-based essence. |
Remo Sasso; Michelangelo Conserva; Paulo Rauber; |

200 | Beam Tree Recursive CellsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose Beam Tree Recursive Cell (BT-Cell) – a backpropagation-friendly framework to extend Recursive Neural Networks (RvNNs) with beam search for latent structure induction. |
Jishnu Ray Chowdhury; Cornelia Caragea; |

201 | Revisiting Sampling for Combinatorial OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we revisit the idea of using sampling for combinatorial optimization, motivated by the significant recent advances of gradient-based discrete MCMC and new techniques for parallel neighborhood exploration on accelerators. |
Haoran Sun; Katayoon Goshvadi; Azade Nova; Dale Schuurmans; Hanjun Dai; |

202 | Memory-Based Meta-Learning on Non-Stationary DistributionsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. |
Tim Genewein; Gregoire Deletang; Anian Ruoss; Li Kevin Wenliang; Elliot Catt; Vincent Dutordoir; Jordi Grau-Moya; Laurent Orseau; Marcus Hutter; Joel Veness; |

203 | Improved Online Learning Algorithms for CTR Prediction in Ad AuctionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. |
Zhe Feng; Christopher Liaw; Zixin Zhou; |

204 | User-level Private Stochastic Convex Optimization with Optimal RatesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study the problem of differentially private (DP) stochastic convex optimization (SCO) under the notion of user-level differential privacy. |
Raef Bassily; Ziteng Sun; |

205 | The Value of Out-of-Distribution DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the amount of OOD data. |
Ashwin De Silva; Rahul Ramesh; Carey Priebe; Pratik Chaudhari; Joshua T Vogelstein; |

206 | SMURF-THP: Score Matching-based UnceRtainty QuantiFication for Transformer Hawkes ProcessRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence interval for the predicted event’s arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. |
Zichong Li; Yanbo Xu; Simiao Zuo; Haoming Jiang; Chao Zhang; Tuo Zhao; Hongyuan Zha; |

207 | Linear Causal Disentanglement Via InterventionsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we study observed variables that are a linear transformation of a linear latent causal model. |
Chandler Squires; Anna Seigal; Salil S Bhate; Caroline Uhler; |

208 | Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric ReasoningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Hence, our study motivates a modification to the standard attention mechanism, where attention weights are scaled using soft masks generated by a convolutional network. |
Mattia Atzeni; Mrinmaya Sachan; Andreas Loukas; |

209 | Counterfactual Identifiability of Bijective Causal ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widely-used causal models in the literature. |
Arash Nasr-Esfahany; Mohammad Alizadeh; Devavrat Shah; |

210 | Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical SystemsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. |
Ainesh Bakshi; Allen Liu; Ankur Moitra; morris yau; |

211 | Exploring The Benefits of Training Expert Language Models Over Instruction TuningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we report surprising findings that show an expert LM trained on just a single task can outperform an MT LM trained with 300+ different tasks on 11 different unseen datasets and on 13 datasets of the BIG-bench benchmark by an average of 3.20% and 1.29%, respectively. |
Joel Jang; Seungone Kim; Seonghyeon Ye; Doyoung Kim; Lajanugen Logeswaran; Moontae Lee; Kyungjae Lee; Minjoon Seo; |

212 | E$(n)$ Equivariant Message Passing Simplicial NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper presents $\mathrm{E}(n)$ Equivariant Message Passing Simplicial Networks (EMPSNs), a novel approach to learning on geometric graphs and point clouds that is equivariant to rotations, translations, and reflections. |
Floor Eijkelboom; Rob Hesselink; Erik J Bekkers; |

213 | Generating Private Synthetic Data with Genetic AlgorithmsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, even when possible, these approaches impose a fundamental limitation in which modifications to the minimization problem become additional sources of error. Therefore, we propose Private-GSD, a private genetic algorithm based on *zeroth*-order optimization heuristics that do not require modifying the original objective; thus, it avoids the aforementioned limitations of first-order optimization. |
Terrance Liu; Jingwu Tang; Giuseppe Vietri; Steven Wu; |

214 | Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-SingleRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. |
Paul Vicol; |

215 | Modeling Temporal Data As Continuous Functions with Stochastic Process DiffusionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To this end, we define suitable noise sources and introduce novel denoising and score-matching models. |
Marin Biloš; Kashif Rasul; Anderson Schneider; Yuriy Nevmyvaka; Stephan Günnemann; |

216 | Actor-Critic Alignment for Offline-to-Online Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions. |
Zishun Yu; Xinhua Zhang; |

217 | Flexible Phase Dynamics for Bio-Plausible Contrastive LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics, that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. |
Ezekiel Williams; Colin Bredenberg; Guillaume Lajoie; |

218 | Optimal No-Regret Learning for One-Sided Lipschitz FunctionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Inspired by applications in pricing and contract design, we study the maximization of one-sided Lipschitz functions, which only provide the (weaker) guarantee that they do not grow too quickly in one direction. |
Paul Duetting; Guru Guruganesh; Jon Schneider; Joshua Ruizhi Wang; |

219 | Understanding The Distillation Process from Deep Generative Models to Tractable Probabilistic CircuitsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model. |
Xuejie Liu; Anji Liu; Guy Van den Broeck; Yitao Liang; |

220 | Neural FIM for Learning Fisher Information Metrics from Point Cloud DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data – allowing for a continuous manifold model for the data. |
Oluwadamilola Fasina; Guillaume Huguet; Alexander Tong; Yanlei Zhang; Guy Wolf; Maximilian Nickel; Ian Adelstein; Smita Krishnaswamy; |

221 | In or Out? Fixing ImageNet Out-of-Distribution Detection EvaluationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: As a solution, we introduce with NINCO a novel test OOD dataset, each sample checked to be ID free, which with its fine-grained range of OOD classes allows for a detailed analysis of an OOD detector’s strengths and failure modes, particularly when paired with a number of synthetic “OOD unit-tests”.We provide code and data at https://github.com/j-cb/NINCO. |
Julian Bitterwolf; Maximilian Müller; Matthias Hein; |

222 | Variational Mixture of HyperGenerators for Learning Distributions Over FunctionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose a novel deep generative model, named VaMoH. |
Batuhan Koyuncu; Pablo Sanchez Martin; Ignacio Peis; Pablo M. Olmos; Isabel Valera; |

223 | Robust Subtask Learning for Compositional GeneralizationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Typically, a high-level task is decomposed into a sequence of subtasks and a separate policy is trained to perform each subtask. In this paper, we focus on the problem of training subtask policies in a way that they can be used to perform any task; here, a task is given by a sequence of subtasks. |
Kishor Jothimurugan; Steve Hsu; Osbert Bastani; Rajeev Alur; |

224 | Strategic Classification with Unknown User ManipulationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a novel batch-learning setting in which we use unlabeled data from previous rounds to estimate the manipulation structure. |
Tosca Lechner; Ruth Urner; Shai Ben-David; |

225 | LSDS++ : Dual Sampling for Accelerated K-means++Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a new variant named LSDS++, which improves the sampling efficiency of LocalSearch++ via a strategy called dual sampling. |
Chenglin Fan; Ping Li; Xiaoyun Li; |

226 | Data-OOB: Out-of-bag Estimate As A Simple and Efficient Data ValueRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. |
Yongchan Kwon; James Zou; |

227 | From Adaptive Query Release to Machine UnlearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. |
Enayat Ullah; Raman Arora; |

228 | Label Differential Privacy and Private Training Data ReleaseRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Our goal is to enable learning of an accurate predictive model while protecting the privacy of each user’s label. |
Robert Istvan Busa-Fekete; Andres Munoz medina; Umar Syed; Sergei Vassilvitskii; |

229 | Learning Dense Correspondences Between Photos and SketchesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, PSC6k, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. |
Xuanchen Lu; Xiaolong Wang; Judith E Fan; |

230 | Conformalization of Sparse Generalized Linear ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. |
Etash Kumar Guha; Eugene Ndiaye; Xiaoming Huo; |

231 | Iterative Approximate Cross-ValidationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose a new paradigm to efficiently approximate CV when the ERM problem is solved via an iterative first-order algorithm, without running until convergence. |
Yuetian Luo; Zhimei Ren; Rina Barber; |

232 | DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug DesignRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. |
Jiaqi Guan; Xiangxin Zhou; Yuwei Yang; Yu Bao; Jian Peng; Jianzhu Ma; Qiang Liu; Liang Wang; Quanquan Gu; |

233 | Correcting Discount-factor Mismatch in On-policy Policy Gradient MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators. |
Fengdi Che; Gautham Vasan; A. Rupam Mahmood; |

234 | Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!Related Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this manuscript, *we use internal cost heuristics of adaptive differential equation solvers at stochastic time-points to guide the training towards learning a dynamical system that is easier to integrate*. |
Avik Pal; Alan Edelman; Christopher Vincent Rackauckas; |

235 | SinDDM: A Single Image Denoising Diffusion ModelRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Here, we introduce a framework for training a DDM on a single image. |
Vladimir Kulikov; Shahar Yadin; Matan Kleiner; Tomer Michaeli; |

236 | DRew: Dynamically Rewired Message Passing with DelayRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper we propose a framework, applicable to any MPNN architecture, that performs a layer-dependent rewiring to ensure gradual densification of the graph. |
Benjamin Gutteridge; Xiaowen Dong; Michael M. Bronstein; Francesco Di Giovanni; |

237 | Masked Trajectory Models for Prediction, Representation, and ControlRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making. |
Philipp Wu; Arjun Majumdar; Kevin Stone; Yixin Lin; Igor Mordatch; Pieter Abbeel; Aravind Rajeswaran; |

238 | Predictive Flows for Faster Ford-FulkersonRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Recent work has shown that leveraging learned predictions can improve the running time of algorithms for bipartite matching and similar combinatorial problems. In this work, we build on this idea to improve the performance of the widely used Ford-Fulkerson algorithm for computing maximum flows by seeding Ford-Fulkerson with predicted flows. |
Sami Davies; Benjamin Moseley; Sergei Vassilvitskii; Yuyan Wang; |

239 | GLOBE-CE: A Translation Based Approach for Global Counterfactual ExplanationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to propose Global & Efficient Counterfactual Explanations (GLOBE-CE), a flexible framework that tackles the reliability and scalability issues associated with current state-of-the-art, particularly on higher dimensional datasets and in the presence of continuous features. |
Dan Ley; Saumitra Mishra; Daniele Magazzeni; |

240 | In Search of Insights, Not Magic Bullets: Towards Demystification of The Model Selection Dilemma in Heterogeneous Treatment Effect EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner’, we therefore empirically investigate success- and failure modes of different selection criteria. |
Alicia Curth; Mihaela van der Schaar; |

241 | Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph DenoiseRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE. |
Zhenghao Lin; Yeyun Gong; yelong shen; Tong Wu; Zhihao Fan; Chen Lin; Nan Duan; Weizhu Chen; |

242 | Multisample Flow Matching: Straightening Flows with Minibatch CouplingsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose Multisample Flow Matching, a more general framework that uses non-trivial couplings between data and noise samples while satisfying the correct marginal constraints. |
Aram-Alexandre Pooladian; Heli Ben-Hamu; Carles Domingo-Enrich; Brandon Amos; Yaron Lipman; Ricky T. Q. Chen; |

243 | Active Causal Structure Learning with AdviceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce the problem of active causal structure learning with advice. |
Davin Choo; Themistoklis Gouleakis; Arnab Bhattacharyya; |

244 | ReDi: Efficient Learning-Free Diffusion Inference Via Trajectory RetrievalRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To accelerate the inference, we propose ReDi, a simple yet learning-free Retrieval-based Diffusion sampling framework. |
Kexun Zhang; Xianjun Yang; William Yang Wang; Lei Li; |

245 | Looped Transformers As Programmable ComputersRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. |
Angeliki Giannou; Shashank Rajput; Jy-yong Sohn; Kangwook Lee; Jason D. Lee; Dimitris Papailiopoulos; |

246 | Alternately Optimized Graph Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose a new optimization framework for semi-supervised learning on graphs from a multi-view learning perspective. |
Haoyu Han; Xiaorui Liu; Haitao Mao; MohamadAli Torkamani; Feng Shi; Victor Lee; Jiliang Tang; |

247 | Graph Inductive Biases in Transformers Without Message PassingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) — a new Graph Transformer that incorporates graph inductive biases without using message passing. |
Liheng Ma; Chen Lin; Derek Lim; Adriana Romero-Soriano; Puneet K. Dokania; Mark Coates; Philip Torr; Ser-Nam Lim; |

248 | Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We consider the block coordinate descent methods of Gauss-Seidel type with proximal regularization (BCD-PR), which is a classical method of minimizing general nonconvex objectives under constraints that has a wide range of practical applications. |
Dohyun Kwon; Hanbaek Lyu; |

249 | Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: An intriguing hypothesis is that traveling waves serve to structure neural representations both in space and time, thereby acting as an inductive bias towards natural data. In this work, we investigate this hypothesis by introducing the Neural Wave Machine (NWM) — a locally coupled oscillatory recurrent neural network capable of exhibiting traveling waves in its hidden state. |
T. Anderson Keller; Max Welling; |

250 | Principled Acceleration of Iterative Numerical Methods Using Machine LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, a systematic study of these approaches and how they differ from meta-learning is lacking. In this paper, we propose a framework to analyze such learning-based acceleration approaches, where one can immediately identify a departure from classical meta-learning. |
Sohei Arisaka; Qianxiao Li; |

251 | Efficient and Equivariant Graph Networks for Predicting Quantum HamiltonianRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we propose a SE(3)-equivariant network, named QHNet, that achieves efficiency and equivariance. |
Haiyang Yu; Zhao Xu; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji; |

252 | Measuring The Impact of Programming Language DistributionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. |
Gabriel Orlanski; Kefan Xiao; Xavier Garcia; Jeffrey Hui; Joshua Howland; Jonathan Malmaud; Jacob Austin; Rishabh Singh; Michele Catasta; |

253 | LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse ApproximationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. |
Yixiao Li; Yifan Yu; Qingru Zhang; Chen Liang; Pengcheng He; Weizhu Chen; Tuo Zhao; |

254 | Node Embedding from Neural Hamiltonian Orbits in Graph Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we model the embedding update of a node feature as a Hamiltonian orbit over time. |
Qiyu Kang; Kai Zhao; Yang Song; Sijie Wang; Wee Peng Tay; |

255 | Multi-Task Differential Privacy Under Distribution SkewRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We give a systematic analysis of the problem, by studying how to optimally allocate a user’s privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that in the presence of distribution skew, this gives a quantifiable improvement of excess empirical risk. |
Walid Krichene; Prateek Jain; Shuang Song; Mukund Sundararajan; Abhradeep Guha Thakurta; Li Zhang; |

256 | A/B Testing in Network Data with Covariate-Adaptive RandomizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a new adaptive procedure to balance both the network and the covariates. |
Jialu Wang; Ping Li; Feifang Hu; |

257 | Towards Bridging The Gaps Between The Right to Explanation and The Right to Be ForgottenRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Intuitively, enforcing the right to be forgotten may trigger model updates which in turn invalidate previously provided explanations, thus violating the right to explanation. In this work, we investigate the technical implications arising due to the interference between the two aforementioned regulatory principles, and propose the first algorithmic framework to resolve the tension between them. |
Satyapriya Krishna; Jiaqi Ma; Himabindu Lakkaraju; |

258 | General Covariance Data Augmentation for Neural PDE SolversRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To alleviate the problem, we propose a computationally cheap augmentation strategy based on general covariance and simple random coordinate transformations. |
Fanaskov Vladimir; Tianchi Yu; Alexander Rudikov; Ivan Oseledets; |

259 | Deep Anomaly Detection Under Labeling Budget ConstraintsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. |
Aodong Li; Chen Qiu; Marius Kloft; Padhraic Smyth; Stephan Mandt; Maja Rudolph; |

260 | On Kinetic Optimal Probability Paths for Generative ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper we investigate the space of Gaussian probability paths, which includes diffusion paths as an instance, and look for an optimal member in some useful sense. |
Neta Shaul; Ricky T. Q. Chen; Maximilian Nickel; Matthew Le; Yaron Lipman; |

261 | Off-Policy Average Reward Actor-Critic with Deterministic Policy SearchRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. |
Naman Saxena; Subhojyoti Khastagir; Shishir N Y; Shalabh Bhatnagar; |

262 | Coarse-to-Fine: A Hierarchical Diffusion Model for Molecule Generation in 3DRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Fragment-based molecule generation is a promising strategy, however, it is nontrivial to be adapted for 3D non-autoregressive generations because of the combinational optimization problems. In this paper, we utilize a coarse-to-fine strategy to tackle this problem, in which a Hierarchical Diffusion-based model (i.e. HierDiff) is proposed to preserve the validity of local segments without relying on autoregressive modeling. |
Bo Qiang; Yuxuan Song; Minkai Xu; Jingjing Gong; Bowen Gao; Hao Zhou; Wei-Ying Ma; Yanyan Lan; |

263 | PASTA: Pessimistic Assortment OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To this end, based on the principle of pessimism, we propose a novel algorithm called Pessimistic ASsortment opTimizAtion (PASTA for short), which can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. |
Juncheng Dong; Weibin Mo; Zhengling Qi; Cong Shi; Ethan X Fang; Vahid Tarokh; |

264 | Near-optimal Conservative Exploration in Reinforcement Learning Under Episode-wise ConstraintsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper investigates conservative exploration in reinforcement learning where the performance of the learning agent is guaranteed to be above a certain threshold throughout the learning process. |
Donghao Li; Ruiquan Huang; Cong Shen; Jing Yang; |

265 | Double-Weighting for Covariate Shift AdaptationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. |
Jose Ignacio Segovia; Santiago Mazuelas; Anqi Liu; |

266 | Near-Optimal Algorithms for Private Online Optimization in The Realizable RegimeRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. |
Hilal Asi; Vitaly Feldman; Tomer Koren; Kunal Talwar; |

267 | Leveraging Proxy of Training Data for Test-Time AdaptationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To this end, we propose two lightweight yet informative proxies of the training data and a TTA method fully exploiting them. |
Juwon Kang; Nayeong Kim; Donghyeon Kwon; Jungseul Ok; Suha Kwak; |

268 | The Monge Gap: A Regularizer to Learn All Transport MapsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: More generally, we question the relevance of using Brenier’s result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps: Given a cost $c$ and a reference measure $\rho$, we introduce a regularizer, the Monge gap $\mathcal{M}^c_{\rho}(T)$ of a map $T$. |
Théo Uscidda; marco cuturi; |

269 | Dirichlet Diffusion Score Model for Biological Sequence GenerationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution. |
Pavel Avdeyev; Chenlai Shi; Yuhao Tan; Kseniia Dudnyk; Jian Zhou; |

270 | Compositional Score Modeling for Simulation-Based InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce a new method based on conditional score modeling that enjoys the benefits of both approaches. |
Tomas Geffner; George Papamakarios; Andriy Mnih; |

271 | Local Vertex Colouring Graph Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this study, we investigate the expressivity of GNNs from the perspective of graph search. |
Shouheng Li; Dongwoo Kim; Qing Wang; |

272 | The Flan Collection: Designing Data and Methods for Effective Instruction TuningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We study the design decision of publicly available instruction tuning methods, by reproducing and breaking down the development of Flan 2022 (Chung et al., 2022). |
Shayne Longpre; Le Hou; Tu Vu; Albert Webson; Hyung Won Chung; Yi Tay; Denny Zhou; Quoc V Le; Barret Zoph; Jason Wei; Adam Roberts; |

273 | Diffusion Models for Black-Box OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose Denoising Diffusion Optimization Models (DDOM), a new inverse approach for offline black-box optimization based on diffusion models. |
Siddarth Krishnamoorthy; Satvik Mehul Mashkaria; Aditya Grover; |

274 | NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform DataRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, when faced with real-world physical data, which are often highly non-uniformly distributed, it is challenging to use mesh-based techniques such as the FFT. To address this, we introduce the Non-Uniform Neural Operator (NUNO), a comprehensive framework designed for efficient operator learning with non-uniform data. |
Songming Liu; Zhongkai Hao; Chengyang Ying; Hang Su; Ze Cheng; Jun Zhu; |

275 | Unit Scaling: Out-of-the-Box Low-Precision TrainingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats. |
Charlie Blake; Douglas Orr; Carlo Luschi; |

276 | Polarity Is All You Need to Learn and Transfer FasterRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Here, we investigate the role of weight polarity: development processes initialize NIs with advantageous polarity configurations; as NIs grow and learn, synapse magnitudes update, yet polarities are largely kept unchanged. |
Qingyang Wang; Michael Alan Powell; Eric W Bridgeford; Ali Geisa; Joshua T Vogelstein; |

277 | Training-Free Neural Active Learning with Initialization-Robustness GuaranteesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To this end, we introduce our expected variance with Gaussian processes (EV-GP) criterion for neural active learning, which is theoretically guaranteed to select data points which lead to trained NNs with both (a) good predictive performances and (b) initialization robustness. |
Apivich Hemachandra; Zhongxiang Dai; Jasraj Singh; See-Kiong Ng; Bryan Kian Hsiang Low; |

278 | Generative Decoding of Visual StimuliRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Motivated by that fact, we introduce a novel neural network architecture for the problem of neural decoding. |
Eleni Miliotou; Panagiotis Kyriakis; Jason D Hinman; Andrei Irimia; Paul Bogdan; |

279 | Forget Unlearning: Towards True Data-Deletion in Machine LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To address these, we propose a sound deletion guarantee and show that ensuring the privacy of existing records is necessary for the privacy of deleted records. |
Rishav Chourasia; Neil Shah; |

280 | Learning Rate Schedules in The Presence of Distribution ShiftRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and give upper and lower bounds for the regret that only differ by constants. |
Matthew Fahrbach; Adel Javanmard; Vahab Mirrokni; Pratik Worah; |

281 | Extending Conformal Prediction to Hidden Markov Models with Exact Validity Via De Finetti’s Theorem for Markov ChainsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We generalize conformal prediction to the Hidden Markov Model (HMM) framework where the assumption of exchangeability is not valid. |
Buddhika Nettasinghe; Samrat Chatterjee; Ramakrishna Tipireddy; Mahantesh M Halappanavar; |

282 | Gradient Descent Converges Linearly for Logistic Regression on Separable DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We show that running gradient descent with variable learning rate guarantees loss $f(x) ≤ 1.1 \cdot f(x^*)+\epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x$. |
Kyriakos Axiotis; Maxim Sviridenko; |

283 | Approximate Stein Classes for Truncated Density EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation. |
Daniel James Williams; Song Liu; |

284 | Pareto Manifold Learning: Tackling Multiple Tasks Via Ensembles of Single-task ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose *Pareto Manifold Learning*, an ensembling method in weight space. |
Nikolaos Dimitriadis; Pascal Frossard; François Fleuret; |

285 | Mitigating Propagation Failures in Physics-informed Neural Networks Using Retain-Resample-Release (R3) SamplingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we provide a novel perspective of failure modes of PINNs by hypothesizing that training PINNs relies on successful propagation of solution from initial and/or boundary condition points to interior points. |
Arka Daw; Jie Bu; Sifan Wang; Paris Perdikaris; Anuj Karpatne; |

286 | DualHSIC: HSIC-Bottleneck and Alignment for Continual LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: By appropriately leveraging inter-task relationships, we propose a novel CL method, named DualHSIC, to boost the performance of existing rehearsal-based methods in a simple yet effective way. |
Zifeng Wang; Zheng Zhan; Yifan Gong; Yucai Shao; Stratis Ioannidis; Yanzhi Wang; Jennifer Dy; |

287 | Certifying Ensembles: A General Certification Theory with S-LipschitznessRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we generalise Lipschitz continuity by introducing S-Lipschitz classifiers, which we use to analyse the theoretical robustness of ensembles. |
Aleksandar Petrov; Francisco Eiras; Amartya Sanyal; Philip Torr; Adel Bibi; |

288 | Regions of Reliability in The Evaluation of Multivariate Probabilistic ForecastsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we provide the first systematic finite-sample study of proper scoring rules for time series forecasting evaluation. |
Étienne Marcotte; Valentina Zantedeschi; Alexandre Drouin; Nicolas Chapados; |

289 | AbODE: Ab Initio Antibody Design Using Conjoined ODEsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates, some central challenges from multiple tasks including protein folding (sequence to structure), inverse folding (structure to sequence), and docking (binding). We strive to surmount these challenges with a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions. |
Yogesh Verma; Markus Heinonen; Vikas Garg; |

290 | TGRL: An Algorithm for Teacher Guided Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce Teacher Guided Reinforcement Learning (TGRL), a principled approach to dynamically balance following the teacher’s guidance and leveraging RL. |
Idan Shenfeld; Zhang-Wei Hong; Aviv Tamar; Pulkit Agrawal; |

291 | Learning-augmented Private Algorithms for Multiple Quantile ReleaseRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms with predictions) framework—previously applied largely to improve time complexity or competitive ratios—as a powerful way of designing and analyzing privacy-preserving methods that can take advantage of such external information to improve utility. |
Mikhail Khodak; Kareem Amin; Travis Dick; Sergei Vassilvitskii; |

292 | Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization Under Concept ShiftRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To exploit data features of label-sparse samples more efficiently, we propose an adaptively weighted online optimization algorithm — AdaWAC — to incorporate data augmentation consistency regularization in sample reweighting. |
Yijun Dong; Yuege Xie; Rachel Ward; |

293 | Spatial Implicit Neural Representations for Global-Scale Species MappingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously. |
Elijah Cole; Grant Van Horn; Christian Lange; Alexander Shepard; Patrick Leary; Pietro Perona; Scott Loarie; Oisin Mac Aodha; |

294 | Adaptive Coordination in Social Embodied RearrangementRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Prior ZSC approaches struggle to generalize in our complex and visually rich setting, and on further analysis, we find that they fail to generate diverse coordination behaviors at training time. To counter this, we propose Behavior Diversity Play (BDP), a novel ZSC approach that encourages diversity through a discriminability objective. |
Andrew Szot; Unnat Jain; Dhruv Batra; Zsolt Kira; Ruta Desai; Akshara Rai; |

295 | OMS-DPM: Optimizing The Model Schedule for Diffusion Probabilistic ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we reveal an overlooked dimension—model schedule—for optimizing the trade-off between generation quality and speed. |
Enshu Liu; Xuefei Ning; Zinan Lin; Huazhong Yang; Yu Wang; |

296 | Prefer to Classify: Improving Text Classifiers Via Auxiliary Preference LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To this end, we propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences. |
Jaehyung Kim; Jinwoo Shin; Dongyeop Kang; |

297 | On Preemption and Learning in Stochastic SchedulingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution. |
Nadav Merlis; Hugo Richard; Flore Sentenac; Corentin Odic; Mathieu Molina; Vianney Perchet; |

298 | A Modern Look at The Relationship Between Sharpness and GeneralizationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: But does it really capture generalization in modern practical settings? We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on ImageNet and CIFAR-10 to fine-tuning CLIP on ImageNet and BERT on MNLI. |
Maksym Andriushchenko; Francesco Croce; Maximilian Müller; Matthias Hein; Nicolas Flammarion; |

299 | Linear CNNs Discover The Statistical Structure of The Dataset Using Only The Most Dominant FrequenciesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We here present a stepping stone towards a deeper understanding of convolutional neural networks (CNNs) in the form of a theory of learning in linear CNNs. |
Hannah Pinson; Joeri Lenaerts; Vincent Ginis; |

300 | Properties of The Mallows Model Depending on The Number of Alternatives: A Warning for An ExperimentalistRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We empirically and theoretically analyze how the properties of rankings sampled from the Mallows model change when increasing the number of alternatives. |
Niclas Boehmer; Piotr Faliszewski; Sonja Kraiczy; |

301 | Improved Analysis of Score-based Generative Modeling: User-Friendly Bounds Under Minimal Smoothness AssumptionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We give an improved theoretical analysis of score-based generative modeling. |
Hongrui Chen; Holden Lee; Jianfeng Lu; |

302 | Extending Kernel PCA Through Dualization: Sparsity, Robustness and Fast AlgorithmsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: The goal of this paper is to revisit Kernel Principal Component Analysis (KPCA) through dualization of a difference of convex functions. |
Francesco Tonin; Alex Lambert; Panagiotis Patrinos; Johan Suykens; |

303 | Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved BoundsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We establish a novel generic algorithm that achieves *no-horizon dependence* in terms of sample complexity for both Markov Decision Processes (MDP) and Games, via reduction to a good-conditioned *auxiliary Markovian environment*, in which only “important” state-action pairs are preserved. |
Shengshi Li; Lin Yang; |

304 | Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point ProblemsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We investigate a primal-dual (PD) method for the saddle point problem (SPP) that uses a linear approximation of the primal function instead of the standard proximal step, resulting in a linearized PD (LPD) method. |
Mohammad Khalafi; Digvijay Boob; |

305 | Are Random Decompositions All We Need in High Dimensional Bayesian Optimisation?Related Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We find that data-driven learners of decompositions can be easily misled towards local decompositions that do not hold globally across the search space. Then, we formally show that a random tree-based decomposition sampler exhibits favourable theoretical guarantees that effectively trade off maximal information gain and functional mismatch between the actual black-box and its surrogate as provided by the decomposition. |
Juliusz Krzysztof Ziomek; Haitham Bou Ammar; |

306 | Individually Fair Learning with One-Sided FeedbackRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances. |
Yahav Bechavod; Aaron Roth; |

307 | User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical SystemsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. |
Marc Anton Finzi; Anudhyan Boral; Andrew Gordon Wilson; Fei Sha; Leonardo Zepeda-Nunez; |

308 | Exphormer: Sparse Transformers for GraphsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers. |
Hamed Shirzad; Ameya Velingker; Balaji Venkatachalam; Danica J. Sutherland; Ali Kemal Sinop; |

309 | Supervised Metric Learning to Rank for Retrieval Via Contextual Similarity OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addition to cosine similarity. |
Christopher Liao; Theodoros Tsiligkaridis; Brian Kulis; |

310 | Optimal Stochastic Non-smooth Non-convex Optimization Through Online-to-Non-convex ConversionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. |
Ashok Cutkosky; Harsh Mehta; Francesco Orabona; |

311 | Federated Heavy Hitter Recovery Under Linear SketchingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). |
Adria Gascon; Peter Kairouz; Ziteng Sun; Ananda Theertha Suresh; |

312 | Coupled Variational AutoencoderRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose the Coupled Variational Auto-Encoder (C-VAE), which formulates the VAE problem as one of Optimal Transport (OT) between the prior and data distributions. |
Xiaoran Hao; Patrick Shafto; |

313 | Conditionally Strongly Log-Concave Generative ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: There is a growing gap between the impressive results of deep image generative models and classical algorithms that offer theoretical guarantees. The former suffer from mode collapse or memorization issues, limiting their application to scientific data. The latter require restrictive assumptions such as log-concavity to escape the curse of dimensionality. We partially bridge this gap by introducing conditionally strongly log-concave (CSLC) models, which factorize the data distribution into a product of conditional probability distributions that are strongly log-concave. |
Florentin Guth; Etienne Lempereur; Joan Bruna; Stéphane Mallat; |

314 | BNN-DP: Robustness Certification of Bayesian Neural Networks Via Dynamic ProgrammingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs). |
Steven Adams; Andrea Patane; Morteza Lahijanian; Luca Laurenti; |

315 | Nonlinear Causal Discovery with Latent ConfoundersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a nonlinear causal model involving hidden confounders. |
David Kaltenpoth; Jilles Vreeken; |

316 | Toward Efficient Gradient-Based Value EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. |
Arsalan Sharifnassab; Richard S. Sutton; |

317 | RACE: Improve Multi-Agent Reinforcement Learning with Representation Asymmetry and Collaborative EvolutionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper introduces a hybrid framework, Representation Asymmetry and Collaboration Evolution (RACE), which combines EA and MARL for efficient collaboration. |
Pengyi Li; Jianye HAO; Hongyao Tang; YAN ZHENG; Xian Fu; |

318 | Revisiting Data-Free Knowledge Distillation with Poisoned TeachersRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we make the first effort to uncover the security risk of data-free KD w.r.t. untrusted pre-trained models. |
Junyuan Hong; Yi Zeng; Shuyang Yu; Lingjuan Lyu; Ruoxi Jia; Jiayu Zhou; |

319 | Fast As CHITA: Neural Network Pruning with Combinatorial OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. |
Riade Benbaki; Wenyu Chen; Xiang Meng; Hussein Hazimeh; Natalia Ponomareva; Zhe Zhao; Rahul Mazumder; |

320 | Provably Learning Diverse Features in Multi-View Data with Midpoint MixupRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective. |
Muthu Chidambaram; Xiang Wang; Chenwei Wu; Rong Ge; |

321 | Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic NeuronsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Inspired by lifted neural networks and compartmental neuron models we propose a simple energy based compartmental neuron model, termed dual propagation, in which each neuron is a dyad with two intrinsic states. |
Rasmus Høier; D. Staudt; Christopher Zach; |

322 | Identifying Useful Learnwares for Heterogeneous Label SpacesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we make an attempt to improve the effectiveness of RKME specification for heterogeneous label spaces, where the learnware market does not contain a model that has the same label space as the user’s task, by considering a class-specific model specification explicitly, along with a class-wise learnware identification method. |
Lan-Zhe Guo; Zhi Zhou; Yu-Feng Li; Zhi-Hua Zhou; |

323 | Linear Time GPs for Inferring Latent Trajectories from Neural Spike TrainsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose cvHM, a general inference framework for latent GP models leveraging Hida-Matérn kernels and conjugate computation variational inference (CVI). |
Matthew Dowling; Yuan Zhao; Il Memming Park; |

324 | Fair Neighbor EmbeddingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce a framework of fair neighbor embedding, the Fair Neighbor Retrieval Visualizer, which formulates fair nonlinear dimensionality reduction as an information retrieval task whose performance and fairness are quantified by information retrieval criteria. |
Jaakko Peltonen; Wen Xu; Timo Nummenmaa; Jyrki Nummenmaa; |

325 | Blackout Diffusion: Generative Diffusion Models in Discrete-State SpacesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Here, we develop a theoretical formulation for arbitrary discrete-state Markov processes in the forward diffusion process using exact (as opposed to variational) analysis. |
Javier E. Santos; Zachary R Fox; Nicholas Lubbers; Yen Ting Lin; |

326 | Few-Sample Feature Selection Via Feature Manifold LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present a new method for few-sample supervised feature selection (FS). |
David Cohen; Tal Shnitzer; Yuval Kluger; Ronen Talmon; |

327 | Generalized Reductions: Making Any Hierarchical Clustering Fair and Balanced with Low CostRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Our work vastly improves the previous $O(n^{5/6}poly\log(n))$ fair approximation for cost to a near polylogarithmic $O(n^\delta poly\log(n))$ fair approximation for any constant $\delta\in(0,1)$. |
Marina Knittel; Max Springer; John P Dickerson; MohammadTaghi Hajiaghayi; |

328 | Distribution Free Prediction Sets for Node ClassificationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction. |
Jase Clarkson; |

329 | Bandit Online Linear Optimization with Hints and QueriesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. |
Aditya Bhaskara; Ashok Cutkosky; Ravi Kumar; Manish Purohit; |

330 | Federated Online and Bandit Convex OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications. |
Kumar Kshitij Patel; Lingxiao Wang; Aadirupa Saha; Nathan Srebro; |

331 | Test-time Adaptation with Slot-Centric ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Recent slot-centric generative models attempt to decompose scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised slot-centric scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives. |
Mihir Prabhudesai; Anirudh Goyal; Sujoy Paul; Sjoerd van Steenkiste; Mehdi S. M. Sajjadi; Gaurav Aggarwal; Thomas Kipf; Deepak Pathak; Katerina Fragkiadaki; |

332 | Representation-Driven Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present a representation-driven framework for reinforcement learning. |
Ofir Nabati; Guy Tennenholtz; Shie Mannor; |

333 | Improving L1-Certified Robustness Via Randomized Smoothing By Leveraging Box ConstraintsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, current techniques are not able to utilize the fact that any adversarial example has to lie in the image space, that is $[0,1]^d$; otherwise, one can trivially detect it. To address this suboptimality, we derive new certification formulae which lead to significant improvements in the certified $\ell_1$-robustness without the need of adapting the classifiers or change of smoothing distributions. |
Vaclav Voracek; Matthias Hein; |

334 | LIV: Language-Image Representations and Rewards for Robotic ControlRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. |
Yecheng Jason Ma; Vikash Kumar; Amy Zhang; Osbert Bastani; Dinesh Jayaraman; |

335 | MultiDiffusion: Fusing Diffusion Paths for Controlled Image GenerationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. |
Omer Bar-Tal; Lior Yariv; Yaron Lipman; Tali Dekel; |

336 | On The Relationship Between Explanation and Prediction: A Causal ViewRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: More specifically, we study the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors, i.e., on hyperparameters and inputs used to generate saliency-based Es or Ys. |
Amir-Hossein Karimi; Krikamol Muandet; Simon Kornblith; Bernhard Schölkopf; Been Kim; |

337 | RGE: A Repulsive Graph Rectification for Node Classification Via InfluenceRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In the same vein, we observe that edge groups connecting to the same train node exhibit significant differences in their influences, hence no matter how negative each is, removing them at once may have a rather negative effect as a group. Based on this motivation, we propose a new edge-removing strategy, Repulsive edge Group Elimination (RGE), that preferentially removes edges with no interference in groups. |
Jaeyun Song; SungYub Kim; Eunho Yang; |

338 | Provable Multi-instance Deep AUC Maximization with Stochastic PoolingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We address a neglected yet non-negligible computational challenge of MIL in the context of DAM, i.e., bag size is too large to be loaded into GPU memory for backpropagation, which is required by the standard pooling methods of MIL. To tackle this challenge, we propose variance-reduced stochastic pooling methods in the spirit of stochastic optimization by formulating the loss function over the pooled prediction as a multi-level compositional function. |
Dixian Zhu; Bokun Wang; Zhi Chen; Yaxing Wang; Milan Sonka; Xiaodong Wu; Tianbao Yang; |

339 | Generative Pretraining for Black-Box OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose BONET, a generative framework for pretraining a novel model-based optimizer using offline datasets. |
Satvik Mehul Mashkaria; Siddarth Krishnamoorthy; Aditya Grover; |

340 | Representer Point Selection for Explaining Regularized High-dimensional ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce a novel class of sample-based explanations we term *high-dimensional representers*, that can be used to explain the predictions of a regularized high-dimensional model in terms of importance weights for each of the training samples. |
Che-Ping Tsai; Jiong Zhang; Hsiang-Fu Yu; Eli Chien; Cho-Jui Hsieh; Pradeep Kumar Ravikumar; |

341 | Efficient Displacement Convex Optimization with Particle Gradient DescentRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures. This paper considers particle gradient descent with a finite number of particles and establishes its theoretical guarantees to optimize functions that are *displacement convex* in measures. |
Hadi Daneshmand; Jason D. Lee; Chi Jin; |

342 | Generalized-Smooth Nonconvex Optimization Is As Efficient As Smooth Nonconvex OptimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a notion of $\alpha$-symmetric generalized-smoothness that substantially extends the existing notions and covers many important functions such as high-order polynomials and exponential functions. |
Ziyi Chen; Yi Zhou; Yingbin Liang; Zhaosong Lu; |

343 | Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives. |
Kaiwen Zheng; Cheng Lu; Jianfei Chen; Jun Zhu; |

344 | Compressing Tabular Data Via Latent Variable EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Data used for analytics and machine learning often take the form of tables with categorical entries. We introduce a family of lossless compression algorithms for such data that proceed in four steps: (i) Estimate latent variables associated to rows and columns; (ii) Partition the table in blocks according to the row/column latents; (iii) Apply a sequential (e.g. Lempel-Ziv) coder to each of the blocks; (iv) Append a compressed encoding of the latents. |
Andrea Montanari; Eric Weiner; |

345 | A Conditional Normalizing Flow for Accelerated Multi-Coil MR ImagingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We instead focus on sampling from the posterior distribution, which provides more comprehensive information for downstream inference tasks. |
Jeffrey Wen; Rizwan Ahmad; Philip Schniter; |

346 | Learnability and Algorithm for Continual LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper studies the challenging continual learning (CL) setting of Class Incremental Learning (CIL). |
Gyuhak Kim; Changnan Xiao; Tatsuya Konishi; Bing Liu; |

347 | Online Local Differential Private Quantile Inference Via Self-normalizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Based on binary inquiries, we developed an algorithm to estimate population quantiles under Local Differential Privacy (LDP). |
Yi Liu; Qirui Hu; Lei Ding; Linglong Kong; |

348 | Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum ProblemsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure. |
Atsushi Nitanda; Kazusato Oko; Denny Wu; Nobuhito Takenouchi; Taiji Suzuki; |

349 | Invariant Slot Attention: Object Discovery with Slot-Centric Reference FramesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames. |
Ondrej Biza; Sjoerd van Steenkiste; Mehdi S. M. Sajjadi; Gamaleldin Fathy Elsayed; Aravindh Mahendran; Thomas Kipf; |

350 | On Provable Copyright Protection for Generative ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: There is a growing concern that learned conditional generative models may output samples that are substantially similar to some copyrighted data $C$ that was in their training set. We give a formal definition of near access-freeness (NAF) and prove bounds on the probability that a model satisfying this definition outputs a sample similar to $C$, even if $C$ is included in its training set. |
Nikhil Vyas; Sham M. Kakade; Boaz Barak; |

351 | Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We show that multilingual compression and linguistic fairness are compatible with differential privacy, but that differential privacy is at odds with training data influence sparsity, an objective for transparency. |
Phillip Rust; Anders Søgaard; |

352 | Sequential Strategic ScreeningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In other words, we introduce the combination of strategic classificationwith screening processes. |
Lee Cohen; Saeed Sharifi -Malvajerdi; Kevin Stangl; Ali Vakilian; Juba Ziani; |

353 | Investigating The Role of Model-Based Learning in Exploration and TransferRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we investigate transfer learning in the context of model-based agents. |
Jacob C Walker; Eszter Vértes; Yazhe Li; Gabriel Dulac-Arnold; Ankesh Anand; Theophane Weber; Jessica B Hamrick; |

354 | Unveiling The Latent Space Geometry of Push-Forward Generative ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Building on recent developments in geometric measure theory, we prove a sufficient condition for optimality in the case where the dimension of the latent space is larger than the number of modes. |
Thibaut Issenhuth; Ugo Tanielian; Jeremie Mary; David Picard; |

355 | Identification of The Adversary from A Single Adversarial ExampleRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, instead of enhancing the robustness, we take the investigator’s perspective and propose a new framework to trace the first compromised model copy in a forensic investigation manner. |
Minhao Cheng; Rui Min; Haochen Sun; Pin-Yu Chen; |

356 | K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action PairsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a Policy Clustering algorithm, called K-SHAP, that learns to group anonymous state-action pairs according to the agent policies. |
Andrea Coletta; Svitlana Vyetrenko; Tucker Balch; |

357 | The Acquisition of Physical Knowledge in Generative Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We outline an approach that allows us to examine two distinct hypotheses of human development — stochastic optimization and complexity increase. |
Luca M. Schulze Buschoff; Eric Schulz; Marcel Binz; |

358 | Towards Understanding Generalization of Macro-AUC in Multi-label LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest. |
Guoqiang Wu; Chongxuan Li; Yilong Yin; |

359 | The Persistent Laplacian for Data Science: Evaluating Higher-Order Persistent Spectral Representations of DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Here, we provide the first investigation into the efficacy of the persistence Laplacian as an embedding of data for downstream classification and regression tasks. |
Thomas Davies; Zhengchao Wan; Ruben Sanchez-Garcia; |

360 | DUET: 2D Structured and Approximately Equivariant RepresentationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. |
Xavier Suau; Federico Danieli; T. Anderson Keller; Arno Blaas; Chen Huang; Jason Ramapuram; Dan Busbridge; Luca Zappella; |

361 | Forward-Backward Gaussian Variational Inference Via JKO in The Bures-Wasserstein SpaceRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI. |
Michael Ziyang Diao; Krishna Balasubramanian; Sinho Chewi; Adil Salim; |

362 | Optimal Convergence Rates for Agnostic Nyström Kernel LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, there lacks a unified analysis for Nyström approximation, and the asymptotical minimax optimality for Nyström methods usually require a strict condition, assuming that the target regression lies exactly in the hypothesis space. In this paper, to tackle these problems, we provide a refined generalization analysis for Nyström approximation in the agnostic setting, where the target regression may be out of the hypothesis space. |
Jian Li; Yong Liu; Weiping Wang; |

363 | SeMAIL: Eliminating Distractors in Visual Imitation Via Separated ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Following the convention of MBIL research, existing algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose a new algorithm – named Separated Model-based Adversarial Imitation Learning (SeMAIL) – decoupling the environment dynamics into two parts by task-relevant dependency, which is determined by agent actions, and training separately. |
Shenghua Wan; Yucen Wang; Minghao Shao; Ruying Chen; De-Chuan Zhan; |

364 | Combinatorial Neural BanditsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: The score of an arm is an unknown function of the arm’s feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB ($\texttt{CN-UCB}$) and Combinatorial Neural Thompson Sampling ($\texttt{CN-TS}$). |
Taehyun Hwang; Kyuwook Chai; Min-hwan Oh; |

365 | Variational Autoencoding Neural OperatorsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Here we present Variational Autoencoding Neural Operators (VANO), a general strategy for making a large class of operator learning architectures act as variational autoencoders. |
Jacob H Seidman; Georgios Kissas; George J. Pappas; Paris Perdikaris; |

366 | Who Needs to Know? Minimal Knowledge for Optimal CoordinationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We show that there is a well-defined dichotomy between strategically relevant and irrelevant information. |
Niklas Lauffer; Ameesh Shah; Micah Carroll; Michael D Dennis; Stuart Russell; |

367 | Efficient Parametric Approximations of Neural Network Function Space DistanceRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. |
Nikita Dhawan; Sicong Huang; Juhan Bae; Roger Baker Grosse; |

368 | Predicting Ordinary Differential Equations with TransformersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We develop a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory. |
Sören Becker; Michal Klein; Alexander Neitz; Giambattista Parascandolo; Niki Kilbertus; |

369 | A Unifying Framework to The Analysis of Interaction Methods Using Synergy FunctionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This work presents a unifying framework for game-theory-inspired attribution and $k^\text{th}$-order interaction methods. |
Daniel Lundstrom; Meisam Razaviyayn; |

370 | Stein Variational Goal Generation for Adaptive Exploration in Multi-Goal Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this context, a curriculum over goals helps agents learn by adapting training tasks to their current capabilities. In this work, we propose Stein Variational Goal Generation (SVGG), which samples goals of intermediate difficulty for the agent, by leveraging a learned predictive model of its goal reaching capabilities. |
Nicolas Castanet; Olivier Sigaud; sylvain lamprier; |

371 | Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. |
Pierre Bréchet; Katerina Papagiannouli; Jing An; Guido Montufar; |

372 | Shortest Edit Path Crossover: A Theory-driven Solution to The Permutation Problem in Evolutionary Neural Architecture SearchRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This paper presents the first theoretical analysis of the behaviors of mutation, crossover and RL in black-box NAS, and proposes a new crossover operator based on the shortest edit path (SEP) in graph space. |
Xin Qiu; Risto Miikkulainen; |

373 | Fisher Information Embedding for Node and Graph LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we propose a novel attention-based node embedding framework for graphs. |
Dexiong Chen; Paolo Pellizzoni; Karsten Borgwardt; |

374 | Efficient Latency-Aware CNN Depth Compression Via Two-Stage Dynamic ProgrammingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose a novel depth compression algorithm which targets general convolution operations.We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. |
Jinuk Kim; Yeonwoo Jeong; Deokjae Lee; Hyun Oh Song; |

375 | Git-Theta: A Git Extension for Collaborative Development of Machine Learning ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we introduce Git-Theta’s design and features and include an example use-case of Git-Theta where a pre-trained model is continually adapted and modified. |
Nikhil Kandpal; Brian Lester; Mohammed Muqeeth; Anisha Mascarenhas; Monty Evans; Vishal Baskaran; Tenghao Huang; Haokun Liu; Colin Raffel; |

376 | Finite-Sample Analysis of Learning High-Dimensional Single ReLU NeuronRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper considers the problem of learning single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron [Kakade et al. 2011], and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. |
Jingfeng Wu; Difan Zou; Zixiang Chen; Vladimir Braverman; Quanquan Gu; Sham M. Kakade; |

377 | Explaining Reinforcement Learning with Shapley ValuesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. |
Daniel Beechey; Thomas M. S. Smith; Özgür Şimşek; |

378 | Naive Imputation Implicitly Regularizes High-dimensional Linear ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. |
Alexis Ayme; Claire Boyer; Aymeric Dieuleveut; Erwan Scornet; |

379 | Coin Sampling: Gradient-Based Bayesian Inference Without Learning RatesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we introduce a suite of new particle-based methods for scalable Bayesian inference based on coin betting, which are entirely learning-rate free. |
Louis Sharrock; Christopher Nemeth; |

380 | Distribution Free Domain GeneralizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a Distribution Free Domain Generalization (DFDG) procedure for classification by conducting standardization to avoid the dominance of a few domains in the training process. |
Peifeng Tong; Wu Su; He Li; Jialin Ding; Haoxiang Zhan; Song Xi Chen; |

381 | Simple Hardware-Efficient Long Convolutions for Sequence ModelingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We study whether a simple alternative can match SSMs in performance and efficiency: directly learning long convolutions over the sequence. |
Daniel Y Fu; Elliot L Epstein; Eric Nguyen; Armin W Thomas; Michael Zhang; Tri Dao; Atri Rudra; Christopher Re; |

382 | Neural Signature Kernels As Infinite-width-depth-limits of Controlled ResNetsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Motivated by the paradigm of reservoir computing, we consider randomly initialized controlled ResNets defined as Euler-discretizations of neural controlled differential equations (Neural CDEs), a unified architecture which enconpasses both RNNs and ResNets. |
Nicola Muca Cirone; Maud Lemercier; Cristopher Salvi; |

383 | Feature Programming for Multivariate Time Series PredictionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework. |
Alex Daniel Reneau; Jerry Yao-Chieh Hu; Ammar Gilani; Han Liu; |

384 | Theory on Forgetting and Generalization of Continual LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In particular, there is a lack of understanding on what factors are important and how they affect catastrophic forgetting and generalization performance. To fill this gap, our theoretical analysis, under overparameterized linear models, provides the first-known explicit form of the expected forgetting and generalization error for a general CL setup with an arbitrary number of tasks. |
Sen Lin; Peizhong Ju; Yingbin Liang; Ness Shroff; |

385 | PAL: Program-aided Language ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. |
Luyu Gao; Aman Madaan; Shuyan Zhou; Uri Alon; Pengfei Liu; Yiming Yang; Jamie Callan; Graham Neubig; |

386 | Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute The LeastRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Nevertheless, quantifying the value of examples for SSL has remained an open question. In this work, we address this problem for the first time, by proving that examples that contribute the most to contrastive SSL are those that have the most similar augmentations to other examples, in expectation. |
Siddharth Joshi; Baharan Mirzasoleiman; |

387 | A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment MisalignmentRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, despite its success, the model’s capability can be compromised under conditions of environment misalignment. In this paper, we investigate two challenging conditions for environment misalignment: Cross-Domain and Cross-Context by proposing four datasets that are designed for these challenges: SimB-Border, SimB-Split, BlenB-Border, and BlenB-Split. |
Hanchen Xie; Jiageng Zhu; Mahyar Khayatkhoei; Jiazhi Li; Mohamed E. Hussein; Wael AbdAlmageed; |

388 | Learning Lightweight Object Detectors Via Multi-Teacher Progressive DistillationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a simple yet surprisingly effective sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. |
Shengcao Cao; Mengtian Li; James Hays; Deva Ramanan; Yu-Xiong Wang; Liangyan Gui; |

389 | COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models. |
Jinqi Xiao; Miao Yin; Yu Gong; Xiao Zang; Jian Ren; Bo Yuan; |

390 | Semi-Supervised Offline Reinforcement Learning with Action-Free TrajectoriesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. |
Qinqing Zheng; Mikael Henaff; Brandon Amos; Aditya Grover; |

391 | Emergence of Adaptive Circadian Rhythms in Deep Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we study the emergence of circadian-like rhythms in deep reinforcement learning agents. |
aqeel labash; Florian Stelzer; Daniel Majoral; Raul Vicente; |

392 | Robust Situational Reinforcement Learning in Face of Context DisturbancesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, existing methods on robust RL aim at learning robust policies against the deviations of the entire system dynamics. To tackle this problem, this paper proposes the framework of robust situational Markov decision process (RS-MDP) which captures the possible deviations of context transitions explicitly. |
Jinpeng Zhang; Yufeng Zheng; Chuheng Zhang; Li Zhao; Lei Song; Yuan Zhou; Jiang Bian; |

393 | QAS-Bench: Rethinking Quantum Architecture Search and A BenchmarkRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, beyond a particular domain, we formulate the QAS problem into two basic (and relatively even ideal) tasks: i) arbitrary quantum circuit (QC) regeneration given a target QC; ii) approximating an arbitrary unitary (oracle).Based on these two tasks, we generate a public QAS benchmark including 900 random QCs and 400 random unitary matrices which is still missing in the literature. |
Xudong Lu; Kaisen Pan; Ge Yan; Jiaming Shan; Wenjie Wu; Junchi Yan; |

394 | Half-Hop: A Graph Upsampling Approach for Slowing Down Message PassingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. |
Mehdi Azabou; Venkataramana Ganesh; Shantanu Thakoor; Chi-Heng Lin; Lakshmi Sathidevi; Ran Liu; Michal Valko; Petar Veličković; Eva L Dyer; |

395 | Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept RehearsalRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce Neuro-Symbolic Continual Learning, where a model has to solve a sequence of neuro-symbolic tasks, that is, it has to map sub-symbolic inputs to high-level concepts and compute predictions by reasoning consistently with prior knowledge. |
Emanuele Marconato; Gianpaolo Bontempo; ELISA FICARRA; Simone Calderara; Andrea Passerini; Stefano Teso; |

396 | Group Equivariant Fourier Neural Operators for Partial Differential EquationsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we extend group convolutions to the frequency domain and design Fourier layers that are equivariant to rotations, translations, and reflections by leveraging the equivariance property of the Fourier transform. |
Jacob Helwig; Xuan Zhang; Cong Fu; Jerry Kurtin; Stephan Wojtowytsch; Shuiwang Ji; |

397 | Evaluating Unsupervised Denoising Requires Unsupervised MetricsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose two novel metrics: the unsupervised mean squared error (MSE) and the unsupervised peak signal-to-noise ratio (PSNR), which are computed using only noisy data. |
Adria Marcos Morales; Matan Leibovich; Sreyas Mohan; Joshua Lawrence Vincent; Piyush Haluai; Mai Tan; Peter Crozier; Carlos Fernandez-Granda; |

398 | Rethinking Weak Supervision in Helping Contrastive LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Despite the empirical evidence showing that semi-supervised labels improve the representations of contrastive learning, it remains unknown if noisy supervised information can be directly used in training instead of after manual denoising. Therefore, to explore the mechanical differences between semi-supervised and noisy-labeled information in helping contrastive learning, we establish a unified theoretical framework of contrastive learning under weak supervision. |
Jingyi Cui; Weiran Huang; Yifei Wang; Yisen Wang; |

399 | Near-Optimal Quantum Coreset Construction Algorithms for ClusteringRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3/2})$ query complexity. |
Yecheng Xue; Xiaoyu Chen; Tongyang Li; Shaofeng H.-C. Jiang; |

400 | On Heterogeneous Treatment Effects in Heterogeneous Causal GraphsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We establish the theoretical forms of HCEs and derive their properties at the individual level in both linear and nonlinear models. |
Richard A Watson; Hengrui Cai; Xinming An; Samuel McLean; Rui Song; |

401 | Bilevel Optimization with Coupled Decision-Dependent DistributionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we investigate the inclusion of decision-dependent distributions in bilevel optimization. |
Songtao Lu; |

402 | Differentially Private Distributed Bayesian Linear Regression with MCMCRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose a novel Bayesian inference framework for distributed differentially private linear regression. |
Barış Alparslan; Sinan Yıldırım; Ilker Birbil; |

403 | Nearly Optimal Competitive Ratio for Online Allocation Problems with Two-sided Resource Constraints and Finite RequestsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we investigate the online allocation problem of maximizing the overall revenue subject to both lower and upper bound constraints. |
Qixin Zhang; Wenbing Ye; Zaiyi Chen; Haoyuan Hu; Enhong Chen; Yu Yang; |

404 | InGram: Inductive Knowledge Graph Embedding Via Relation GraphsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose an INductive knowledge GRAph eMbedding method, InGram, that can generate embeddings of new relations as well as new entities at inference time. |
Jaejun Lee; Chanyoung Chung; Joyce Jiyoung Whang; |

405 | Topological Point Cloud ClusteringRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. |
Vincent Peter Grande; Michael T Schaub; |

406 | Provably Efficient Offline Reinforcement Learning with Perturbed Data SourcesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. |
Chengshuai Shi; Wei Xiong; Cong Shen; Jing Yang; |

407 | Trapdoor Normalization with Irreversible Ownership VerificationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper introduces a deep model watermark with an irreversible ownership verification scheme: Trapdoor Normalization (TdN), inspired by the trapdoor function in traditional cryptography. |
Hanwen Liu; Zhenyu Weng; Yuesheng Zhu; Yadong MU; |

408 | Margin-based Sampling in High Dimensions: When Being Active Is Less Efficient Than Staying PassiveRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that PL outperforms margin-based AL even for noiseless data and when using the Bayes optimal decision boundary for sampling. |
Alexandru Tifrea; Jacob Clarysse; Fanny Yang; |

409 | Random Matrix Analysis to Balance Between Supervised and Unsupervised Learning Under The Low Density Separation AssumptionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. |
Vasilii Feofanov; Malik Tiomoko; Aladin Virmaux; |

410 | On The Initialization of Graph Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we analyze the variance of forward and backward propagation across GNN layers and show that the variance instability of GNN initializations comes from the combined effect of the activation function, hidden dimension, graph structure and message passing. |
Jiahang Li; Yakun Song; Xiang song; David Wipf; |

411 | When Does Privileged Information Explain Away Label Noise?Related Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this study, we investigate the role played by different properties of the PI in explaining away label noise. |
Guillermo Ortiz-Jimenez; Mark Collier; Anant Nawalgaria; Alexander D’Amour; Jesse Berent; Rodolphe Jenatton; Effrosyni Kokiopoulou; |

412 | A Category-theoretical Meta-analysis of Definitions of DisentanglementRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose that the concepts of the cartesian and monoidal products should serve as the core of disentanglement. |
Yivan Zhang; Masashi Sugiyama; |

413 | Quantifying The Variability Collapse of Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a novel metric, named Variability Collapse Index (VCI), to quantify the variability collapse phenomenon in the NC paradigm. |
Jing Xu; Haoxiong Liu; |

414 | Global Optimality for Euclidean CCCP Under Riemannian ConvexityRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions. |
Melanie Weber; Suvrit Sra; |

415 | Reinforcement Learning Can Be More Efficient with Multiple RewardsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Instead, in this work, we study whether directly incorporating multiple alternate reward formulations of the same task in a single agent can lead to faster learning. |
Christoph Dann; Yishay Mansour; Mehryar Mohri; |

416 | Adversarial Learning of Distributional Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this study, we propose an adversarial learning framework for distributional reinforcement learning, which adopts the concept of influence measure from the statistics community. |
Yang Sui; Yukun Huang; Hongtu Zhu; Fan Zhou; |

417 | Robust Explanation for Free or At The Cost of FaithfulnessRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, explanation methods are shown as vulnerable to adversarial perturbations, implying security concerns in high-stakes domains. In this paper, we investigate when robust explanations are necessary and what they cost. |
Zeren Tan; Yang Tian; |

418 | Jump-Start Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks: a guide-policy, and an exploration-policy. |
Ikechukwu Uchendu; Ted Xiao; Yao Lu; Banghua Zhu; Mengyuan Yan; Joséphine Simon; Matthew Bennice; Chuyuan Fu; Cong Ma; Jiantao Jiao; Sergey Levine; Karol Hausman; |

419 | Recovery Bounds on Class-Based Optimal Transport: A Sum-of-Norms Regularization FrameworkRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: For this purpose, we propose a convex OT program with a sum-of-norms regularization term, which provably recovers the underlying class structure under geometric assumptions. |
Arman Rahbar; Ashkan Panahi; Morteza Haghir Chehreghani; Devdatt Dubhashi; Hamid Krim; |

420 | Multi-Task Off-Policy Learning from Bandit FeedbackRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To solve the problem, we propose a hierarchical off-policy optimization algorithm HierOPO. |
Joey Hong; Branislav Kveton; Manzil Zaheer; Sumeet Katariya; Mohammad Ghavamzadeh; |

421 | Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. |
Dominik Schnaus; Jongseok Lee; Daniel Cremers; Rudolph Triebel; |

422 | Towards Stable and Efficient Adversarial Training Against $l_1$ Bounded Adversarial AttacksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We address the problem of stably and efficiently training a deep neural network robust to adversarial perturbations bounded by an $l_1$ norm. |
Yulun Jiang; Chen Liu; Zhichao Huang; Mathieu Salzmann; Sabine Süsstrunk; |

423 | Universal Morphology Control Via Contextual ModulationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Existing methods utilize graph neural networks or transformers to handle heterogeneous state and action spaces across different morphologies, but pay little attention to the dependency of a robot’s control policy on its morphology context. In this paper, we propose a hierarchical architecture to better model this dependency via contextual modulation, which includes two key submodules: (1) Instead of enforcing hard parameter sharing across robots, we use hypernetworks to generate morphology-dependent control parameters; (2) We propose a fixed attention mechanism that solely depends on the morphology to modulate the interactions between different limbs in a robot. |
Zheng Xiong; Jacob Beck; Shimon Whiteson; |

424 | CocktailSGD: Fine-tuning Foundation Models Over 500Mbps NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose CocktailSGD, a novel communication-efficient training framework that combines three distinct compression techniques — random sparsification, top-K sparsification, and quantization — to achieve much greater compression than each individual technique alone. |
Jue WANG; Yucheng Lu; Binhang Yuan; Beidi Chen; Percy Liang; Christopher De Sa; Christopher Re; Ce Zhang; |

425 | Is Overfitting Necessary for Implicit Video Representation?Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Here, we propose a new paradigm in efficient INR for videos based on the idea of strong lottery ticket (SLT) hypothesis (Zhou et al., 2019), which demonstrates the possibility of finding an accurate subnetwork mask, called supermask, for a randomly initialized classification network without weight training. |
Hee Min Choi; Hyoa Kang; Dokwan Oh; |

426 | Learning The Right Layers A Data-Driven Layer-Aggregation Strategy for Semi-Supervised Learning on Multilayer GraphsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we assume a semi-supervised learning setting, where the class of a small percentage of nodes is initially provided, and we propose a parameter-free Laplacian-regularized model that learns an optimal nonlinear combination of the different layers from the available input labels. |
Sara Venturini; Andrea Cristofari; Francesco Rinaldi; Francesco Tudisco; |

427 | Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic ExpertsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $\mathcal{O}\big(K^2(\ln T)\sqrt{T}\big)$ where $K$ is the number of experts. |
Dirk van der Hoeven; Ciara Pike-Burke; Hao Qiu; Nicolò Cesa-Bianchi; |

428 | Provable Benefit of Mixup for Finding Optimal Decision BoundariesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. |
Junsoo Oh; Chulhee Yun; |

429 | Covariate Balancing Using The Integral Probability Metric for Causal InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we consider to use the integral probability metric (IPM), which is a metric between two probability measures, for covariate balancing. |
Insung Kong; Yuha Park; Joonhyuk Jung; Kwonsang Lee; Yongdai Kim; |

430 | Fair and Accurate Decision Making Through Group-Aware LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In some cases, these AI systems can be unfair by exhibiting bias or discrimination against certain social groups, which can have severe consequences in real life. Inspired by one of the most well-known human learning skills called grouping, we address this issue by proposing a novel machine learning (ML) framework where the ML model learns to group a diverse set of problems into distinct subgroups to solve each subgroup using its specific sub-model. |
Ramtin Hosseini; Li Zhang; Bhanu Garg; Pengtao Xie; |

431 | Robust and Scalable Bayesian Online Changepoint DetectionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. |
Matias Altamirano; Francois-Xavier Briol; Jeremias Knoblauch; |

432 | Uncertainty Estimation By Fisher Information-based Evidential Deep LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). |
Danruo DENG; Guangyong Chen; Yang YU; Furui Liu; Pheng-Ann Heng; |

433 | Network Effects in Performative Prediction GamesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper studies the multi-agent performative prediction (Multi-PP) games over multiplex networks. |
Xiaolu Wang; Chung-Yiu Yau; Hoi To Wai; |

434 | MyoDex: A Generalizable Prior for Dexterous ManipulationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task. |
Vittorio Caggiano; Sudeep Dasari; Vikash Kumar; |

435 | The Case for 4-bit Precision: K-bit Inference Scaling LawsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: For example, a 30B 8-bit model and a 60B 4-bit model have the same number of bits but may have very different zero-shot accuracies. In this work, we study this trade-off by developing inference scaling laws of zero-shot performance in Large Language Models (LLMs) to determine the bit-precision and model size that maximizes zero-shot performance. |
Tim Dettmers; Luke Zettlemoyer; |

436 | Explore and Exploit The Diverse Knowledge in Model Zoo for Domain GeneralizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper argues that the knowledge contained in weaker models is valuable and presents a method for leveraging the diversity within the model zoo to improve out-of-distribution generalization capabilities. |
Yimeng Chen; Tianyang Hu; Fengwei Zhou; Zhenguo Li; Zhi-Ming Ma; |

437 | Dividing and Conquering A BlackBox to A Mixture of Interpretable Models: Route, Interpret, RepeatRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. |
Shantanu Ghosh; Ke Yu; Forough Arabshahi; kayhan Batmanghelich; |

438 | Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact DynamicsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: As a result, optimization becomes challenging due to chaotic and non-smooth loss landscapes. To tackle this issue, we propose a novel approach called Adaptive Barrier Smoothing (ABS), which introduces a class of softened complementarity systems that correspond to barrier-smoothed objectives. |
Shenao Zhang; Wanxin Jin; Zhaoran Wang; |

439 | Online Restless Bandits with Unobserved StatesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose TSEETC, a learning algorithm based on Thompson Sampling with Episodic Explore-Then-Commit. |
Bowen Jiang; Bo Jiang; Jian Li; TAO LIN; Xinbing Wang; Chenghu Zhou; |

440 | Analyzing Diffusion As Serial ReproductionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: By identifying a correspondence between diffusion models and a well-known paradigm in cognitive science known as serial reproduction, whereby human agents iteratively observe and reproduce stimuli from memory, we show how the aforementioned properties of diffusion models can be explained as a natural consequence of this correspondence. |
Raja Marjieh; Ilia Sucholutsky; Thomas A Langlois; Nori Jacoby; Thomas L. Griffiths; |

441 | Probably Anytime-Safe Stochastic Combinatorial Semi-BanditsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Motivated by concerns about making online decisions that incur undue amount of risk at each time step, in this paper, we formulate the probably anytime-safe stochastic combinatorial semi-bandits problem. |
Yunlong Hou; Vincent Tan; Zixin Zhong; |

442 | Trompt: Towards A Better Deep Neural Network for Tabular DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose Trompt–which stands for Tabular Prompt–a novel architecture inspired by prompt learning of language models. |
Kuan-Yu Chen; Ping-Han Chiang; Hsin-Rung Chou; Ting-Wei Chen; Tien-Hao Chang; |

443 | A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. |
James Urquhart Allingham; Jie Ren; Michael W Dusenberry; Xiuye Gu; Yin Cui; Dustin Tran; Jeremiah Zhe Liu; Balaji Lakshminarayanan; |

444 | Multi-Objective GFlowNetsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. |
Moksh Jain; Sharath Chandra Raparthy; Alex Hernández-García; Jarrid Rector-Brooks; Yoshua Bengio; Santiago Miret; Emmanuel Bengio; |

445 | Learning Dynamic Query Combinations for Transformer-based Object Detection and SegmentationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. |
Yiming Cui; Linjie Yang; Haichao Yu; |

446 | Cramming: Training A Language Model on A Single GPU in One DayRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. |
Jonas Geiping; Tom Goldstein; |

447 | On Computing Optimal Tree EnsemblesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. |
Christian Komusiewicz; Pascal Kunz; Frank Sommer; Manuel Sorge; |

448 | Kernel Logistic Regression Approximation of An Understandable ReLU Neural NetworkRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper proposes an understandable neural network whose score function is modeled as an additive sum of univariate spline functions. |
Marie Guyomard; Susana Barbosa; Lionel Fillatre; |

449 | On Coresets for Clustering in Small Dimensional Euclidean SpacesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We consider the problem of constructing small coresets for $k$-Median in Euclidean spaces. |
Lingxiao Huang; Ruiyuan Huang; Zengfeng Huang; Xuan Wu; |

450 | A Game-Theoretic Framework for Managing Risk in Multi-Agent SystemsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce a new GT-based Risk-Averse Equilibrium (RAE) that always produces a solution that minimises the potential variance in reward accounting for the strategy of other agents. |
Oliver Slumbers; David Henry Mguni; Stefano B Blumberg; Stephen Marcus McAleer; Yaodong Yang; Jun Wang; |

451 | Learning Compiler Pass Orders Using Coreset and Normalized Value PredictionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, instead of predicting passes sequentially, we directly learn a policy on the pass sequence space, which outperforms the default -Oz flag by an average of 4.5% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets. |
Youwei Liang; Kevin Stone; Ali Shameli; Chris Cummins; Mostafa Elhoushi; Jiadong Guo; Benoit Steiner; Xiaomeng Yang; Pengtao Xie; Hugh James Leather; Yuandong Tian; |

452 | SGD with Large Step Sizes Learns Sparse FeaturesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) may lead the iterates to jump from one side of a valley to the other causing *loss stabilization*, and (ii) this stabilization induces a hidden stochastic dynamics that *biases it implicitly* toward simple predictors. |
Maksym Andriushchenko; Aditya Vardhan Varre; Loucas Pillaud-Vivien; Nicolas Flammarion; |

453 | Optimally-weighted Estimators of The Maximum Mean Discrepancy for Likelihood-Free InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose a novel estimator for the MMD with significantly improved sample complexity. |
Ayush Bharti; Masha Naslidnyk; Oscar Key; Samuel Kaski; Francois-Xavier Briol; |

454 | Input Uncertainty Propagation Through Trained Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we address the problem of input uncertainty propagation through trained neural networks. |
Paul Monchot; Loic Coquelin; Sébastien Julien Petit; Sébastien Marmin; Erwan Le Pennec; Nicolas Fischer; |

455 | GeCoNeRF: Few-shot Neural Radiance Fields Via Geometric ConsistencyRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present a novel framework to regularize Neural Radiance Field (NeRF) in a few-shot setting with a geometry-aware consistency regularization. |
Min-Seop Kwak; Jiuhn Song; Seungryong Kim; |

456 | Learning Prescriptive ReLU NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a piecewise linear neural network model that can balance strong prescriptive performance and interpretability, which we refer to as the prescriptive ReLU network, or P-ReLU. |
Wei Sun; Asterios Tsiourvas; |

457 | Achieving Linear Speedup in Non-IID Federated Bilevel LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, several important properties in federated learning such as the partial client participation and the linear speedup for convergence (i.e., the convergence rate and complexity are improved linearly with respect to the number of sampled clients) in the presence of non-i.i.d. datasets, still remain open. In this paper, we fill these gaps by proposing a new federated bilevel algorithm named FedMBO with a novel client sampling scheme in the federated hypergradient estimation. |
Minhui Huang; Dewei Zhang; Kaiyi Ji; |

458 | Grounding Large Language Models in Interactive Environments with Online Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Yet, the alignment between LLMs’ knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. |
Thomas Carta; Clément ROMAC; Thomas Wolf; sylvain lamprier; Olivier Sigaud; Pierre-Yves Oudeyer; |

459 | In Search for A Generalizable Method for Source Free Domain AdaptationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision. |
Malik Boudiaf; tom denton; Bart van Merrienboer; Vincent Dumoulin; Eleni Triantafillou; |

460 | Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint ReductionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. |
Georgii Sergeevich Novikov; Daniel Bershatsky; Julia Gusak; Alex Shonenkov; Denis Valerievich Dimitrov; Ivan Oseledets; |

461 | Learning Controllable Degradation for Real-World Super-Resolution Via Constrained FlowsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: To remedy the issue, we propose to generate realistic SR datasets for unseen degradation levels by exploring the latent space of real LR images and thereby producing more diverse yet realistic LR images with complex real-world artifacts. |
Seobin Park; Dongjin Kim; Sungyong Baik; Tae Hyun Kim; |

462 | Synthetic Data, Real Errors: How (Not) to Publish and Use Synthetic DataRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work we explore how the generative process affects the downstream ML task. |
Boris van Breugel; Zhaozhi Qian; Mihaela van der Schaar; |

463 | Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds Through Algorithmic StabilityRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. |
Anass Aghbalou; Guillaume Staerman; |

464 | Deep Perturbation Learning: Enhancing The Network Performance Via Image PerturbationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Unlike the existing works, in this paper, we introduce a novel framework Deep Perturbation Learning (DPL), the new insights into understanding image perturbations, to enhance the performance of networks rather than decrease the performance. |
Zifan Song; Xiao Gong; Guosheng Hu; Cai Rong Zhao; |

465 | A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit FeedbackRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$. |
Guanyu Nie; Yididiya Y Nadew; Yanhui Zhu; Vaneet Aggarwal; Christopher John Quinn; |

466 | Do Machine Learning Models Learn Statistical Rules Inferred from Data?Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We thereby seek to infer statistical rules from the data and quantify the extent to which a model has learned them. We propose a framework SQRL that integrates logic-based methods with statistical inference to derive these rules from a model’s training data without supervision. |
Aaditya Naik; Yinjun Wu; Mayur Naik; Eric Wong; |

467 | Thompson Sampling with Less Exploration Is Fast and OptimalRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose $\epsilon$-Exploring Thompson Sampling ($\epsilon$-TS), a modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. |
Tianyuan Jin; XIANGLIN YANG; Xiaokui Xiao; Pan Xu; |

468 | Generalized Disparate Impact for Configurable Fairness Solutions in MLRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We make two contributions in the field of AI fairness over continuous protected attributes. |
Luca Giuliani; Eleonora Misino; Michele Lombardi; |

469 | On The Effectiveness of Offline RL for Dialogue Response GenerationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. |
Paloma Sodhi; Felix Wu; Ethan R. Elenberg; Kilian Q Weinberger; Ryan McDonald; |

470 | Computational Asymmetries in Robust ClassificationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $\Sigma^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not affected by this asymmetry, by introducing a proof-of-concept approach named Counter-Attack (CA). Indeed, CA displays a reversed asymmetry: running the defense is $\mathit{NP}$-hard, while attacking it is $\Sigma_2^P$-hard. |
Samuele Marro; Michele Lombardi; |

471 | Nearly-tight Bounds for Deep Kernel LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we prove novel and nearly-tight generalization bounds based on the uniform covering number and the Rademacher chaos complexity for deep (multiple) kernel machines. |
Yifan Zhang; Min-Ling Zhang; |

472 | DoCoFL: Downlink Compression for Cross-Device Federated LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Accordingly, we propose DoCoFL — a new framework for downlink compression in the cross-device setting. |
Ron Dorfman; Shay Vargaftik; Yaniv Ben-Itzhak; Kfir Yehuda Levy; |

473 | KDEformer: Accelerating Transformers Via Kernel Density EstimationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds. |
Amir Zandieh; Insu Han; Majid Daliri; Amin Karbasi; |

474 | Probabilistic Attention-to-Influence Neural Models for Event SequencesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: While neural sequence models are able to capture complex and potentially long-range historical dependencies, they often lack the interpretability of simpler models for event sequence dynamics. We provide a novel neural framework in such a setting – a probabilistic attention-to-influence neural model – which not only captures complex instance-wise interactions between events but also learns influencers for each event type of interest. |
Xiao Shou; Debarun Bhattacharjya; Tian Gao; Dharmashankar Subramanian; Oktie Hassanzadeh; Kristin Bennett; |

475 | Conformal Prediction Sets for Graph Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a conformal procedure to equip GNNs with prediction sets that come with distribution-free guarantees — the output set contains the true label with arbitrarily high probability.By leveraging the network homophily we construct sets with comparable or better efficiency (average size) and significantly improved singleton hit ratio (correct sets of size one). |
Soroush H. Zargarbashi; Simone Antonelli; Aleksandar Bojchevski; |

476 | Flash: Concept Drift Adaptation in Federated LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose a novel adaptive optimizer called Flash that simultaneously addresses both statistical heterogeneity and the concept drift issues. |
Kunjal Panchal; Sunav Choudhary; Subrata Mitra; Koyel Mukherjee; Somdeb Sarkhel; Saayan Mitra; Hui Guan; |

477 | Input Perturbation Reduces Exposure Bias in Diffusion ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. |
Mang Ning; Enver Sangineto; Angelo Porrello; Simone Calderara; Rita Cucchiara; |

478 | Efficient Training of Language Models Using Few-Shot LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study an efficient approach to train language models using few-shot learners. |
Sashank J. Reddi; Sobhan Miryoosefi; Stefani Karp; Shankar Krishnan; Satyen Kale; Seungyeon Kim; Sanjiv Kumar; |

479 | NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal DecompositionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper proposes a general acceleration methodology called NeuralStagger by spatially and temporally decomposing the original learning tasks into several coarser-resolution subtasks. |
Xinquan Huang; Wenlei Shi; Qi Meng; Yue Wang; Xiaotian Gao; Jia Zhang; Tie-Yan Liu; |

480 | Out-of-Distribution Generalization of Federated Learning Via Implicit Invariant RelationshipsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, learning invariant relationships is often in an explicit manner from data, representation, and distribution, which violates the federated principles of privacy-preserving and limited communication. In this paper, we propose FedIIR, which implicitly learns invariant relationships from parameter for out-of-distribution generalization, adhering to the above principles. |
Yaming Guo; Kai Guo; Xiaofeng Cao; Tieru Wu; Yi Chang; |

481 | Differentially Private Sharpness-Aware TrainingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we investigate sharpness, a key factor in achieving better generalization, in private learning. |
Jinseong Park; Hoki Kim; Yujin Choi; Jaewook Lee; |

482 | Monotonic Location Attention for Length GeneralizationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce novel variants of location attention building on top of Dubois et al. (2020) to address the new diagnostic tasks. |
Jishnu Ray Chowdhury; Cornelia Caragea; |

483 | Effective Structured Prompting By Meta-Learning and Representative VerbalizerRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting. |
Weisen Jiang; Yu Zhang; James Kwok; |

484 | End-to-end Differentiable Clustering with Associative MemoriesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM. |
Bishwajit Saha; Dmitry Krotov; Mohammed J Zaki; Parikshit Ram; |

485 | Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression Under Gaussian MarginalsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study the task of agnostically learning halfspaces under the Gaussian distribution. |
Ilias Diakonikolas; Daniel Kane; Lisheng Ren; |

486 | PaLM-E: An Embodied Multimodal Language ModelRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. |
Danny Driess; Fei Xia; Mehdi S. M. Sajjadi; Corey Lynch; Aakanksha Chowdhery; brian ichter; Ayzaan Wahid; Jonathan Tompson; quan vuong; Tianhe Yu; Wenlong Huang; Yevgen Chebotar; Pierre Sermanet; Daniel Duckworth; Sergey Levine; Vincent Vanhoucke; Karol Hausman; Marc Toussaint; Klaus Greff; Andy Zeng; Igor Mordatch; Pete Florence; |

487 | Fighting Fire with Fire: Contrastive Debiasing Without Bias-free Data Via Generative Bias-transformationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose Contrastive Debiasing via Generative Bias-transformation (CDvG) which is capable of operating without explicitly exploiting bias labels and bias-free samples. |
Yeonsung Jung; Hajin Shim; June Yong Yang; Eunho Yang; |

488 | Image Generation with Shortest Path DiffusionRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state. |
Ayan Das; Stathi Fotiadis; Anil Batra; Farhang Nabiei; FengTing Liao; Sattar Vakili; Da-shan Shiu; Alberto Bernacchia; |

489 | Deterministic Equivalent and Error Universality of Deep Random Features LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. |
Dominik Schröder; Hugo Cui; Daniil Dmitriev; Bruno Loureiro; |

490 | DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we study Metropolis-Hastings (MH), one of the most fundamental MCMC methods, for large-scale Bayesian inference under differential privacy. |
Wanrong Zhang; Ruqi Zhang; |

491 | A Fast, Well-Founded Approximation to The Empirical Neural Tangent KernelRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call sum of logits, converges to the true eNTK at initialization. |
Mohamad Amin Mohamadi; Wonho Bae; Danica J. Sutherland; |

492 | DiscoBAX – Discovery of Optimal Intervention Sets in Genomic Experiment DesignRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose DiscoBAX – a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign. |
Clare Lyle; Arash Mehrjou; Pascal Notin; Andrew Jesson; Stefan Bauer; Yarin Gal; Patrick Schwab; |

493 | On The Within-Group Fairness of Screening ClassifiersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we argue that screening policies that use calibrated classifiers may suffer from an understudied type of within-group unfairness—they may unfairly treat qualified members within demographic groups of interest. |
Nastaran Okati; Stratis Tsirtsis; Manuel Gomez Rodriguez; |

494 | UPop: Unified and Progressive Pruning for Compressing Vision-Language TransformersRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This paper proposes the Unified and Progressive Pruning (UPop) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. |
Dachuan Shi; Chaofan Tao; Ying Jin; Zhendong Yang; Chun Yuan; Jiaqi Wang; |

495 | Learning Representations Without Compositional AssumptionsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, this assumption is not always valid for real-world tabular datasets with complex dependencies between feature sets, resulting in localized information that is harder to learn. To overcome this limitation, we propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges. |
Tennison Liu; Jeroen Berrevoets; Zhaozhi Qian; Mihaela van der Schaar; |

496 | Optimization for Amortized Inverse ProblemsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose an efficient amortized optimization scheme for inverse problems with a deep generative prior. |
Tianci Liu; Tong Yang; Quan Zhang; Qi Lei; |

497 | Topologically Faithful Image Segmentation Via Induced Matching of Persistence BarcodesRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce the concept of induced matchings from persistent homology to achieve a spatially correct matching between persistence barcodes in a segmentation setting. |
Nico Daniel Stucki; Johannes C. Paetzold; Suprosanna Shit; bjoern menze; Ulrich Bauer; |

498 | Fast Algorithms for Distributed K-Clustering with OutliersRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study the $k$-clustering problems with outliers in distributed setting. |
Junyu Huang; Qilong Feng; Ziyun Huang; Jinhui Xu; Jianxin Wang; |

499 | GNOT: A General Neural Operator Transformer for Operator LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: However, there are several challenges for learning operators in practical applications like the irregular mesh, multiple input functions, and complexity of the PDEs’ solution. To address these challenges, we propose a general neural operator transformer (GNOT), a scalable and effective transformer-based framework for learning operators. |
Zhongkai Hao; Zhengyi Wang; Hang Su; Chengyang Ying; Yinpeng Dong; Songming Liu; Ze Cheng; Jian Song; Jun Zhu; |

500 | A Kernelized Stein Discrepancy for Biological SequencesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper we propose the “KSD-B”, a novel divergence measure for distributions over biological sequences that is based on the kernelized Stein discrepancy (KSD). |
Alan Nawzad Amin; Eli N Weinstein; Debora Susan Marks; |

501 | Minimax Estimation of Discontinuous Optimal Transport Maps: The Semi-discrete CaseRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We consider the problem of estimating the optimal transport map between two probability distributions, $P$ and $Q$ in $\mathbb{R}^d$, on the basis of i.i.d. samples. |
Aram-Alexandre Pooladian; Vincent Divol; Jonathan Niles-Weed; |

502 | Best Arm Identification in Multi-Agent Multi-Armed BanditsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. |
Filippo Vannella; Alexandre Proutiere; Jaeseong Jeong; |

503 | Streaming Submodular Maximization with Differential PrivacyRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we study the problem of privately maximizing a submodular function in the streaming setting. |
Anamay Chaturvedi; Huy Nguyen; Thy Dinh Nguyen; |

504 | What Do CNNs Learn in The First Layer and Why? A Linear Systems PerspectiveRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: It has previously been reported that the representation that is learned in the first layer of deep Convolutional Neural Networks (CNNs) is highly consistent across initializations and architectures. In this work, we quantify this consistency by considering the first layer as a filter bank and measuring its energy distribution. |
Rhea Chowers; Yair Weiss; |

505 | Retrosynthetic Planning with Dual Value NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase. |
Guoqing Liu; Di Xue; Shufang Xie; Yingce Xia; Austin Tripp; Krzysztof Maziarz; Marwin Segler; Tao Qin; Zongzhang Zhang; Tie-Yan Liu; |

506 | Paging with Succinct PredictionsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study learning-augmented paging from the new perspective of requiring the least possible amount of predicted information. |
Antonios Antoniadis; Joan Boyar; Marek Elias; Lene M. Favrholdt; Ruben Hoeksma; Kim S. Larsen; Adam Polak; Bertrand Simon; |

507 | Superhuman FairnessRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We instead re-cast fair machine learning as an imitation learning task by introducing superhuman fairness, which seeks to simultaneously outperform human decisions on multiple predictive performance and fairness measures. |
Omid Memarrast; Linh Vu; Brian D Ziebart; |

508 | Constrained Phi-EquilibriaRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we introduce and computationally characterize constrained Phi-equilibria—a more general notion than constrained CEs—in normal-form games. |
Martino Bernasconi; Matteo Castiglioni; Alberto Marchesi; Francesco Trovò; Nicola Gatti; |

509 | Expectation-Complete Graph Representations with HomomorphismsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. |
Pascal Welke; Maximilian Thiessen; Fabian Jogl; Thomas Gärtner; |

510 | Masked Bayesian Neural Networks : Theoretical Guarantee and Its Posterior InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we propose a new node-sparse BNN model which has good theoretical properties and is computationally feasible. |
Insung Kong; Dongyoon Yang; Jongjin Lee; Ilsang Ohn; GYUSEUNG BAEK; Yongdai Kim; |

511 | Discover-Then-Rank Unlabeled Support Vectors in The Dual Space for Multi-Class Active LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose to approach active learning (AL) from a novel perspective of discovering and then ranking potential support vectors by leveraging the key properties of the dual space of a sparse kernel max-margin predictor. |
Dayou Yu; Weishi Shi; Qi Yu; |

512 | Communication-Constrained Bandits Under Additive Gaussian NoiseRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. |
Prathamesh Mayekar; Jonathan Scarlett; Vincent Tan; |

513 | Collaborative Causal Inference with Fair IncentivesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This paper presents a reward scheme designed using the unique statistical properties that are required by causal inference to guarantee certain desirable incentive criteria (e.g., fairness, benefit) for the parties based on their contributions. To achieve this, we propose a data valuation function to value parties’ data for CCI based on the distributional closeness of its resulting treatment effect estimate to that utilizing the aggregated data from all parties. |
Rui Qiao; Xinyi Xu; Bryan Kian Hsiang Low; |

514 | Performative Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment. |
Debmalya Mandal; Stelios Triantafyllou; Goran Radanovic; |

515 | Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional InputRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. |
Shokichi Takakura; Taiji Suzuki; |

516 | GuardHFL: Privacy Guardian for Heterogeneous Federated LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We put forth GuardHFL, the first-of-its-kind efficient and privacy-preserving HFL framework. |
Hanxiao Chen; Meng Hao; Hongwei Li; Kangjie Chen; Guowen Xu; Tianwei Zhang; Xilin Zhang; |

517 | Overcoming Simplicity Bias in Deep Networks Using A Feature SieveRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the *feature sieve*. |
Rishabh Tiwari; Pradeep Shenoy; |

518 | Exploring The Limits of Model-Targeted Indiscriminate Data Poisoning AttacksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoning attacks towards target parameters (i.e., model-targeted attacks). |
Yiwei Lu; Gautam Kamath; Yaoliang Yu; |

519 | The Regret of Exploration and The Control of Bad Episodes in Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: The first contribution of this paper is the introduction of a new performance measure of a RL algorithm that is more discriminating than the regret, that we call the *regret of exploration* that measures the asymptotic cost of exploration. The second contribution is a new *performance test* (PT) to end episodes in RL optimistic algorithms. |
Victor Boone; Bruno Gaujal; |

520 | The Wisdom of Hindsight Makes Language Models Better Instruction FollowersRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. |
Tianjun Zhang; Fangchen Liu; Justin Wong; Pieter Abbeel; Joseph E. Gonzalez; |

521 | STEP: Learning N:M Structured Sparsity Masks from Scratch with PreconditionRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We conjecture that learning N:M masks with Adam should take the critical regime of variance estimation into account. In light of this, we propose STEP, an Adam-aware recipe that learns N:M masks with two phases: first, STEP calculates a reliable variance estimate (*precondition phase*) and subsequently, the variance remains fixed and is used as a precondition to learn N:M masks (*mask-learning phase*). |
Yucheng Lu; Shivani Agrawal; Suvinay Subramanian; Oleg Rybakov; Christopher De Sa; Amir Yazdanbakhsh; |

522 | Speeding Up Bellman Ford Via Minimum Violation PermutationsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Its running time is governed by the order the algorithm examines vertices for iterative updates on the value of their shortest path. In this work we study this problem through the lens of ‘Algorithms with predictions,’ and show how to leverage auxiliary information from similar instances to improve the running time. |
Silvio Lattanzi; Ola Svensson; Sergei Vassilvitskii; |

523 | Reflected Diffusion ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: To incorporate data constraints in a principled manner, we present Reflected Diffusion Models, which instead reverse a reflected stochastic differential equation evolving on the support of the data. |
Aaron Lou; Stefano Ermon; |

524 | Compositional Exemplars for In-context LearningRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we systematically formulate in-context example selection as a subset selection problem, and optimize it in an end-to-end fashion. |
Jiacheng Ye; Zhiyong Wu; Jiangtao Feng; Tao Yu; Lingpeng Kong; |

525 | How Much Does Initialization Affect Generalization?Related Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we show on the contrary that, independently of architecture, SGD can itself be the cause of poor generalization if one does not ensure good initialization. |
Sameera Ramasinghe; Lachlan Ewen MacDonald; Moshiur Farazi; Hemanth Saratchandran; Simon Lucey; |

526 | Context Consistency Regularization for Label Sparsity in Time SeriesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Recently, consistency regularization techniques have been used to generate artificial labels from unlabeled augmented instances. To fully exploit the sequential characteristic of time series in consistency regularization, we propose a novel method of data augmentation called *context-attached augmentation*, which adds preceding and succeeding instances to a target instance to form its augmented instance. |
Yooju Shin; Susik Yoon; Hwanjun Song; Dongmin Park; Byunghyun Kim; Jae-Gil Lee; Byung Suk Lee; |

527 | Constrained Monotonic Neural NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Unfortunately, this construction does not work with popular non-saturated activation functions as it can only approximate convex functions. We show this shortcoming can be fixed by constructing two additional activation functions from a typical unsaturated monotonic activation function and employing each of them on the part of neurons. |
Davor Runje; Sharath M Shankaranarayana; |

528 | Attributing Image Generative Models Using Latent FingerprintsRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff. |
Guangyu Nie; Changhoon Kim; Yezhou Yang; Yi Ren; |

529 | Principled Offline RL in The Presence of Rich Exogenous InformationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information and introduce new offline RL benchmarks that offer the ability to study this problem. |
Riashat Islam; Manan Tomar; Alex Lamb; Yonathan Efroni; Hongyu Zang; Aniket Rajiv Didolkar; Dipendra Misra; Xin Li; Harm van Seijen; Remi Tachet des Combes; John Langford; |

530 | Robust Non-Linear Feedback Coding Via Power-Constrained Deep LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we develop a new family of non-linear feedback codes that greatly enhance robustness to channel noise. |
Junghoon Kim; Taejoon Kim; David Love; Christopher Brinton; |

531 | Discrete Continuous Optimization Framework for Simultaneous Clustering and Training in Mixture ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We study a new framework of learning mixture models via automatic clustering called PRESTO, wherein we optimize a joint objective function on the model parameters and the partitioning, with each model tailored to perform well on its specific cluster. |
Parth Vipul Sangani; Arjun Shashank Kashettiwar; Pritish Chakraborty; Bhuvan Reddy Gangula; Durga S; Ganesh Ramakrishnan; Rishabh K Iyer; Abir De; |

532 | Generating Language Corrections for Teaching Physical Control TasksRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We design and build CORGI, a model trained to generate language corrections for physical control tasks, such as learning to ride a bike. |
Megha Srivastava; Noah Goodman; Dorsa Sadigh; |

533 | Revisiting The Linear-Programming Framework for Offline RL with General Function ApproximationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. |
Asuman E. Ozdaglar; Sarath Pattathil; Jiawei Zhang; Kaiqing Zhang; |

534 | Sampling-Based Accuracy Testing of Posterior Estimators for General InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this paper, we introduce Tests of Accuracy with Random Points (TARP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators. |
Pablo Lemos; Adam Coogan; Yashar Hezaveh; Laurence Perreault-Levasseur; |

535 | Monotonicity and Double Descent in Uncertainty Estimation with Gaussian ProcessesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, we prove that by tuning hyperparameters to maximize marginal likelihood (the empirical Bayes procedure), performance, as measured by the marginal likelihood, *improves monotonically* with the input dimension. On the other hand, cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent. |
Liam Hodgkinson; Chris van der Heide; Fred Roosta; Michael W. Mahoney; |

536 | Statistical Foundations of Prior-Data Fitted NetworksRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior. |
Thomas Nagler; |

537 | QASA: Advanced Question Answering on Scientific ArticlesRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Based on our intensive think-aloud study that revealed the three types of questions: surface, testing, and deep questions, we first propose the QASA benchmark that consists of 1798 novel question answering pairs that require full-stack reasoning on scientific articles in AI and ML fields. Then we propose the QASA approach that tackles the full-stack reasoning with large language models via associative selection, evidential rationale-generation, and systematic composition. |
Yoonjoo Lee; Kyungjae Lee; Sunghyun Park; Dasol Hwang; Jaehyeon Kim; Hong-in Lee; Moontae Lee; |

538 | Anti-Exploration By Random Network DistillationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. |
Alexander Nikulin; Vladislav Kurenkov; Denis Tarasov; Sergey Kolesnikov; |

539 | Truncating Trajectories in Monte Carlo Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths, i.e., *truncated*. |
Riccardo Poiani; Alberto Maria Metelli; Marcello Restelli; |

540 | Fast, Differentiable and Sparse Top-k: A Convex Analysis PerspectiveRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose new differentiable and sparse top-$k$ operators. |
Michael Eli Sander; Joan Puigcerver; Josip Djolonga; Gabriel Peyré; Mathieu Blondel; |

541 | Certified Robust Neural Networks: Generalization and Corruption ResistanceRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: Surprisingly, overfitting is a major concern in adversarial training despite being mostly absent in standard training. We provide here theoretical evidence for this peculiar “robust overfitting” phenomenon. |
Mohammed Amine Bennouna; Ryan Lucas; Bart Van Parys; |

542 | NNSplitter: An Active Defense Solution for DNN Model Via Automated Weight ObfuscationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we propose an active model IP protection scheme, namely NNSplitter, which actively protects the model by splitting it into two parts: the obfuscated model that performs poorly due to weight obfuscation, and the model secrets consisting of the indexes and original values of the obfuscated weights, which can only be accessed by authorized users with the support of the trusted execution environment. |
Tong Zhou; Yukui Luo; Shaolei Ren; Xiaolin Xu; |

543 | FP-Diffusion: Improving Score-based Diffusion Models By Enforcing The Underlying Score Fokker-Planck EquationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In this work, we derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). |
Chieh-Hsin Lai; Yuhta Takida; Naoki Murata; Toshimitsu Uesaka; Yuki Mitsufuji; Stefano Ermon; |

544 | Data Poisoning Attacks Against Multimodal EncodersRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: In contrast to previous work, only poisoning visual modality, in this work, we take the first step to studying poisoning attacks against multimodal models in both visual and linguistic modalities. |
Ziqing Yang; Xinlei He; Zheng Li; Michael Backes; Mathias Humbert; Pascal Berrang; Yang Zhang; |

545 | Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data Via AmalgamationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: This work proposes a compositional SDR that can handle zeros naturally while incorporating the nonlinear nature and spurious negative correlations among components rigorously. |
Junyoung Park; Jeongyoun Ahn; Cheolwoo Park; |

546 | Team Belief DAG: Generalizing The Sequence Form to Team Games for Fast Computation of Correlated Team Max-Min Equilibria Via Regret MinimizationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we provide new complexity results on the computation of optimal strategies for teams, and propose a new representation, coined *team belief DAG (TB-DAG)*, that describes team strategies as a convex set. |
Brian Hu Zhang; Gabriele Farina; Tuomas Sandholm; |

547 | A Theory of Representation Learning Gives A Deep Generalisation of Kernel MethodsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: However, standard theoretical approaches (formally NNGPs) involving infinite width limits eliminate representation learning. We therefore develop a new infinite width limit, the Bayesian representation learning limit, that exhibits representation learning mirroring that in finite-width models, yet at the same time, retains some of the simplicity of standard infinite-width limits. |
Adam X. Yang; Maxime Robeyns; Edward Milsom; Ben Anson; Nandi Schoots; Laurence Aitchison; |

548 | MANSA: Learning Fast and Slow in Multi-Agent SystemsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we introduce a novel plug & play IL framework named Multi-Agent Network Selection Algorithm (MANSA) which selectively employs CL only at states that require coordination. |
David Henry Mguni; Haojun Chen; Taher Jafferjee; Jianhong Wang; Longfei Yue; Xidong Feng; Stephen Marcus McAleer; Feifei Tong; Jun Wang; Yaodong Yang; |

549 | Causal Discovery with Latent Confounders Based on Higher-Order CumulantsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In light of the power of the closed-form solution to OICA corresponding to the One-Latent-Component structure, we formulate a way to estimate the mixing matrix using the higher-order cumulants, and further propose the testable One-Latent-Component condition to identify the latent variables and determine causal orders. |
Ruichu Cai; Zhiyi Huang; Wei Chen; Zhifeng Hao; Kun Zhang; |

550 | Unsupervised Skill Discovery for Learning Shared Structures Across Changing EnvironmentsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we introduce a new unsupervised skill discovery algorithm that discovers a set of skills that can represent shared structures across changing environments. |
Sang-Hyun Lee; Seung-Woo Seo; |

551 | Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language ModelsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We introduce Synthetic prompting, a method that leverages a few handcrafted examples to prompt the model to generate more examples by itself, and selects effective demonstrations to elicit better reasoning. |
Zhihong Shao; Yeyun Gong; yelong shen; Minlie Huang; Nan Duan; Weizhu Chen; |

552 | Contextual Conservative Interleaving BanditsRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we investigate the contextual conservative interleaving bandit problem, which has a performance constraint that requires the chosen actions to be not much worse than given baseline actions in each round. |
Kei Takemura; |

553 | Multi-Objective Population Based TrainingRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: PBT is a single-objective algorithm, but many real-world hyperparameter optimization problems involve two or more conflicting objectives. In this work, we therefore introduce a multi-objective version of PBT, MO-PBT. |
Arkadiy Dushatskiy; Alexander Chebykin; Tanja Alderliesten; Peter Bosman; |

554 | Active Learning Based Structural InferenceRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we propose a novel framework, Active Learning based Structural Inference (ALaSI), to infer the existence of directed connections from observed agents’ states over a time period in a dynamical system. |
Aoran Wang; Jun Pang; |

555 | Multi-Fidelity Covariance Estimation in The Log-Euclidean GeometryRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold. |
Aimee Maurais; Terrence Alsup; Benjamin Peherstorfer; Youssef Marzouk; |

556 | MolDiff: Addressing The Atom-Bond Inconsistency Problem in 3D Molecule Diffusion GenerationRelated Papers Related Patents Related Grants Related Venues Related Experts Related Code ViewHighlight: We define this problem as the atom-bond inconsistency problem and claim it is the main reason for current approaches to generating unrealistic 3D molecules. To overcome this problem, we propose a new diffusion model called MolDiff which can generate atoms and bonds simultaneously while still maintaining their consistency by explicitly modeling the dependence between their relationships. |
Xingang Peng; Jiaqi Guan; qiang liu; Jianzhu Ma; |

557 | Deep Temporal Sets with Evidential Reinforced Attentions for Unique Behavioral Pattern DiscoveryRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Real-life applications, such as digital behavioral biomarker identification, often require the discovery of complex spatiotemporal patterns in multimodal data, which is largely under-explored. To fill this gap, we propose a novel model that integrates uniquely designed Deep Temporal Sets (DTS) with Evidential Reinforced Attentions (ERA). |
Dingrong Wang; Deep Shankar Pandey; Krishna Prasad Neupane; Zhiwei Yu; Ervine Zheng; Zhi Zheng; Qi Yu; |

558 | Vector Quantized Wasserstein Auto-EncoderRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this work, we study learning deep discrete representations from the generative viewpoint. |
Long Tung Vuong; Trung Le; He Zhao; Chuanxia Zheng; Mehrtash Harandi; Jianfei Cai; Dinh Phung; |

559 | Simple Embodied Language Learning As A Byproduct of Meta-Reinforcement LearningRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: Specifically, we design an office navigation environment, where the agent�s goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). |
Evan Zheran Liu; Sahaana Suri; Tong Mu; Allan Zhou; Chelsea Finn; |

560 | Spred: Solving L1 Penalty with SGDRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. |
Liu Ziyin; Zihao Wang; |

561 | Text-To-4D Dynamic Scene GenerationRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. |
Uriel Singer; Shelly Sheynin; Adam Polyak; Oron Ashual; Iurii Makarov; Filippos Kokkinos; Naman Goyal; Andrea Vedaldi; Devi Parikh; Justin Johnson; Yaniv Taigman; |

562 | Effective and Efficient Structural Inference with Reservoir ComputingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighlight: In this paper, we present an effective and efficient structural inference approach by integrating a Reservoir Computing (RC) network into a Variational Auto-encoder-based (VAE-based) structural inference framework. |
Aoran Wang; Tsz Pan Tong; Jun Pang; |

563 | Fractional Denoising for 3D Molecular Pre-trainingRelated Papers Related Patents Related Grants Related Venues Related Experts ViewHighli |